[KLUG Members] Standard Regular Expressions (REs)

Adam Tauno Williams members@kalamazoolinux.org
Tue, 25 Mar 2003 10:05:09 -0500


>>To the right of the "@" can be essentially ANY BINARY VALUE. 
>>The days of ASCII only DNS is gone. (See RFC2181).
>I don't know how to reconcile that RFC:
>    The DNS itself places only one restriction on the
>    particular labels that can be used to identify resource
>    records.  That one restriction relates to the length of
>    the label and the full name.  The length of any one label
>    is limited to between 1 and 63 octets.  A full domain name
>    is limited to 255 octets (including the separators).  The
>    zero length full name is defined as representing the root
>    of the DNS tree, and is typically written and displayed as
>    ".".  Those restrictions aside, any binary string whatever
>    can be used as the label of any resource record.
>With RFC 2396, which explains the syntax of DNS names that can be
>used in a URI:
>   The host is a domain name of a network host, or its IPv4
>   address as a set of four decimal digit groups separated by
>   ".".  Literal IPv6 addresses are not supported.
>
>      hostport      = host [ ":" port ]
>      host          = hostname | IPv4address
>      hostname      = *( domainlabel "." ) toplabel [ "." ]
>      domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
>      toplabel      = alpha | alpha *( alphanum | "-" ) alphanum
>      alphanum = alpha | digit
>Maybe 2396 needs to be updated?

They cannot be reconciled,  it is a real problem.  2396 is not the only RFC to
conflict with 2181.  I know that 2739 well not as "infratructural" (word?) as
2396 is used heavily by us LDAP guys and lots of MUAs,  but requires that URLs
be entirely ASCII, or at least UTF-8. 

This simply reflects the fact that the Internet has its roots in countries
speaking latin derivatives.  I supsect this will take a long time to shake out.
 Alot of developers still need to get their heads around the fact that this
planet has a huge continent called "asia".  Occasionally this gets pretty
hot-n-heavy on LDAP lists; there still being a large "who cares" contingent.

>Here's the RFC822 regex from Email::Valid:
> [\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-

Yikes!  Kudos and much respect to the guy who worked that one out!

My advice if you want to validate e-mail addresses from various clients is to
whip up a little XML-RPC server in perl (real easy!) and export the
functionality of Email::Valid.


Adam Tauno Williams
Network & Systems Administrator
Morrison Industries
Grand Rapids, Mi. USA