[KLUG Members] Standard Regular Expressions (REs)

Big M members@kalamazoolinux.org
Mon, 24 Mar 2003 15:50:24 -0800 (PST)


--- Vernon <vsingleton@cfl.rr.com> wrote:
> Adam Williams wrote:
> 
> >>Somewhere out in the OSS community, this exists ... we hope.
> >>    
> >>
> >
> >I'm not aware of one, but RFC822 should contain 99% of what you
> want.  I
> >know several people have written wild procmail scripts for mail
> >filtering, they might be able to help (assuming they are reading).
> >
> 
> RFC822 is great, but it does not contain a list anything like this:
> social security number
> \d{3}-\d{2}-\d{4}
> phone number
> \(\d{3}\) (\d{3}-\d{4})
> e-mail
> [\w\.\-]+@[\w\.\-]+
> url
> ...
> 
> Even if we could find something close, and then fix it up as best we
> can.
> Just keep your eye out for something like it.
> 
> Vernon
> 

This is from http://email.about.com/library/weekly/aa062298.htm

Fortunately, you are not completely free (ie not completely on your
own). There are some characters that are not permitted to be used in an
email address, thus they may not be used in the username part of the
email address either. Actually, these characters are many: only ASCII
characters decimal 0 to 127 are permitted in 'traditional' email (that
means accents, no carons). We're striving for compatibility, of course,
so we buckle down and accept that restriction.

According to RFC 822 (yes, this is from 1982) the local part of an
email address consists of words, separated by dots [`.']. A word is an
"atom" or a quoted string. An "atom" is a sequence of ASCII characters
(from 33 to 126; 0 to 31 and 127 are control characters, 32 is
whitespace), excluding braces [`(', `)', `[', `]', `<', `>'],
punctuation marks [`.', `,', `;', `:'], two other characters [`\',
`"'], spaces [` '] and our good old friend [`@']. In contrast to the
"atom" a quoted string begins and ends with a quote [`"']. In between
the quotes you can put any ASCII character (now from 0 to 177, though
allowing control characters seems a bit dangerous) excluding a quote
itself [`"'] and the carriage return character [`\r']. You can,
however, quote the quote with a backslash [`\']. The backslash will
indeed quote any character; "quoting" here means what is usually called
"escaping": the backslash "escapes" the following character and the
latter will not be treated with the special meaning it usually would be
given in that context, i.e. `\"' does not end the quoted string but
appears as a quote in it.

Forget about the quoted string and quoting.

Now what can/should I use?

It boils down to using any US-ASCII alphanumeric character plus some
fancy, but otherwise in an US-centric world (this does not necessarily
involve judgment; as if I was not EU-centric) `normal' characters [`!',
`#', `$', `% `&', `*' `+', `-' `~', and whatever you can find in
between ASCII 33 and 47].

What you should use are characters, numbers and the underscore. That
should suffice. Still plenty space for cre@ivity, huh?



__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com