Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Beast of the Number: Parsing the Feral Phone

by mojotoad (Monsignor)
on Apr 16, 2002 at 22:05 UTC ( #159645=perlmeditation: print w/ replies, xml ) Need Help??

The humble phone number. Global, local, extensions, alternates, and sometimes pure garbage: Without data entry restraints there is no telling what you might find in a typical phone number data field. Until now.

The topic of phone number crunching has arisen at the Monastery before, multiple times, with answers and insightful speculations, but thus far all seem to have underestimated the complexity of the unrestrained Beast.

I come bearing loads of international phone number DATA found running rampant in the wilds of the data entry savanahs, plus my particular solution to the problem of making sense of it all. From a representative field of nearly 100,000 numbers I distilled a subset of über-patterns and their appearance frequencies.

What follows is my eventual solution for parsing these noisy numbers, a rather brute-force solution that evolved to fit the data at hand for a data set perhaps better suited for analysis by a neural net; my reflections on the nature of data entry, ambiguity, and alternate approaches; and most importantly, a scrambled but meaningful representative data set on which you are free to chew at your leisure -- I'm sure better approaches exist. I invite thoughts, code commentary, and better solutions.

warning: node size ~63k

I am here to chew bubble gum and parse some data...and I'm all out of bubble gum.
-- Me Nada

Contents

Overview

Below you will find my commentary, a basic script, and three modules used for my crunching. The example numbers are included below the __DATA__ portion of the script.

Parsing Thoughts

Phone numbers, like email addresses, are inherently impossible to prove "correct" merely by parsing. The only way to discover if a phone number is valid is to dial it and see what rings. More generally speaking, of course, there are general formats we expect to see, regardless of whether the number is indeed a valid number. Beyond the International Country Codes, however, numbers are subject to the vagaries and capacities of the host country's network. There are no guarantees, without knowing the rules for every country, what bits of a number apply to an area or province code, municipality, etc. The only parts of a number we can reasonably expect to identify in a globally generic way are:
  1. International Dial Direct codes (what locals use to dial out of their country)
  2. International Country Codes (what the world dials to reach a particular country after the IDD)
  3. The local phone number, including area/province codes and possibly long distance codes
  4. Extensions (to be dialed after a connection is made)

On top of this we should also expect indications of alternates for numbers, suffixes, and extensions in unconstrained data entry fields.

All is not doom and gloom for the country networks, however. If, for example, you happen to know that a large proportion of your numbers are supposed to be in a ten-digit format then you can use that information to infer information and rules of thumb, especially for parsing alternate suffixes.

1-900-ILOVEYOU: I have made no attempt to parse vanity numbers. First of all, none are represented in this data set. Second, I toyed with the idea and eventually decided that there was too much ambiguity involved with extracting extensions usually indicated by some combination of letters from 'extension', periods, hashes, and whitespace. I can see how distinguishing the difference might be done, but did not implement it since this data has loads of extensions but no vanity numbers.

Finally, there is one unavoidable fact: There are plenty of garbage entries that are either incomplete or incomprehensible even to a human. In a well-controlled universe this garbage would have been caught at the data-entry stage -- even a rudimentary attempt at enforcing validity would clean up much of the garbage. Such is not the case here, however, though those tasked with parsing the result may fervently wish it otherwise. Though no longer in vogue, the GIGO principle ultimately still stands for these cases.

The Data

As mentioned above, the original data comes from 100,000 or so international and U.S. domestic numbers entered into unconstrained entry fields. From these numbers I derived meta patterns with which to play:

Pattern CountGenerality
1503single digits (\d), single alphas ([a-z])
1269single digits, single alphas, whitespace (\d, [a-zA-Z], and \s+)
328digit clusters, single alphas, whitespace (\d+, [a-zA-Z], and \s+)
312digit clusters, alpha clusters, whitespace (\d+, [a-zA-Z]+, and \s+)

If extraction were the only goal, then the most general pattern collection at the bottom of that list would be sufficient for this particular data set. However, in this case we just have a raw data field that is supposed to have a phone number in it. Our job is to parse that number -- a more complicated task that presents its own set of challenges. Therefore in the __DATA__ section in the phone.pl script below I have included 1269 example entries that represent patterns derived from letting single alphanumerics (not clusters) float and collapsing whitespace.

These are not real phone numbers, except perhaps some by random chance. International country codes, where identifiable, were replaced with a random country code of identical length. 1's and 0's were largely left alone (due mostly to the presence of IDD codes) and the rest of the digits were sequentially overwritten. Text strings have been replaced with nonsense unless they are somehow generically germane (eg "Extension", "PAGER", "email only", " - xxxx", etc). The result is a set of fake but convincing numbers with valid country codes (when present) that each correspond to one of the patterns.

Each number is preceded by a percentage measuring its match frequency in the original data set. Though I did not use this information for parsing purposes, it is instructive to see that the majority of numbers fall into patterns that are reasonable to extract and that indeed, we need not fear for the collective skills of our world's typists. The percentages might also be useful to those seeking their own solutions -- either to form heuristics or to realize when "enough is enough" and be content with their 95%. Note that with a data set this size, percentages of less than 1% are common and can represent a significant number of entries.

My Solution

There are three tasks involved with data such as this: extraction, normalization, and parsing. Though logically distinct, in reality they are hopelessly entangled. Much of the noise that gets dropped during normalization, for instance, can briefly serve as clues to the meaning of various parts of a phone number. My end result, therefore, is a series of steps, sequentially cohesive, that are executed in a specific order with each step passing the remnants of its operation to the next. The steps involved are a direct reflection of the nature of this particular data set.

Loosely stated, my approach boils down to the following steps:

  1. Split entry into multiple numbers, if present.
  2. Extract phone extensions.
  3. Remove IDD prefixes (possibly using them to infer upcoming country codes)
  4. Interpolate alternate suffixes into separate list of complete numbers.
  5. Extract country codes where present and map to appropriate numbers.
  6. Remaining data is likely to be the core number for a locale.

Some clarifications are in order. In step #3 I mention removing IDD prefixes -- the sequence of numbers used to dial out of a particular country. These are of little use to someone wanting to dial to a number if they do not happen to live in a country with that same IDD. Sometimes in the data there is no '+' to indicate an international number -- sometimes there is merely an IDD, usually some combination of 0 and 1 -- so these codes can be handy to infer the imminent arrival of a country code. I store the IDD's where found, but other than the inferal process they are of no particular use for my purposes.

In step #4 I mention interpolating numbers. Sometimes a number might be listed as something like '555-555-6666, 7777, or 8888'. There are three numbers there, all beginning with '555-555'. There are cases such as '555 555 6666-7' that present ambiguity: is the '7' an extension or an alternate ending to another suffix? In my solution that particular example is interpreted as an alternate suffix '6667'. In the original data it was more obvious because these tended to appear as numerically sequential numbers, i.e., adjacent suffixes. The scrambling of the data has destroyed some of its intuitive "look" and might cause you to wonder about my decisions in these ambiguous areas. These decisions are not bulletproof -- at the time they just seemed more likely to be correct.

Step #5 is perhaps the most interesting. I broke down and eventually came to rely on a list of actual, valid Country Codes. Mechanically detecting country codes can go only so far. Think "+44 555 666 7777" vs "+445556667777". With no knowledge of country codes, other than perhaps their typical maximum length of three digits and Huffman encoding, there is no bulletproof way of pulling the country code out of the second example. In addition the mechanical approach cannot deal with invalid country codes following a '+'. So in the CountryCodes package I provide some routines for pulling valid country codes out of a string of digits; in addition, there is a small routine for grabbing an updated list of codes off of the Net. There are two methods included: pull_cc_smart and pull_cc_guess (not used) that illustrate the difference.

The Code

As I mentioned, I expect that the dataset is the most valuable contribution here. My code is not optimized, tricky, or beautiful -- it is merely a straightforward evolution of a solution from data with pollution. (Am I a poet or what?)

The test script phone.pl is a simple harness around the data. For each line it will print the raw entry and extracted phone numbers, separated by a colon. In cases where multiple number were extracted, they appear on a line of their own below the first number found.

All code and data licensed under the same terms as Perl itself.

Enjoy,
Matt
Listing 1. PhoneParse.pm
package PhoneParse; # Attempts to parse international phone numbers found "in the # wild" where operaters entered the numbers with no attempt at # prior format enforcement. Only handles International Dial # Direct codes, Country Codes, extensions, and the numbers # themselves. No attempt is made to identify "area codes" if # present in the number. use strict; use vars qw( @EXPORT $DEBUG ); use base qw(Exporter); @EXPORT = qw( parse_phone ); use Carp; # Home grown use PhoneNumber; use CountryCodes qw( is_country_code pull_country_code ); sub parse_phone { # Attempt to normalize feral phone numbers Lots of sequential # calls here, where the remnants of the last call are passed # along to the next routine. my $entry = shift; return () if $entry =~ /^\s*$/; # Normalize dividing cues that look like attempts to indicate # alternat phone listings $entry = normalize_alts($entry); # Using those cues, crack into multiple phone numbers my @raw_numbers = split_nums($entry); my @numbers; # storage bin. my @first; # cache for 1st country code and number foreach my $raw_number (@raw_numbers) { # Normalize oddball extension indicators $raw_number = normalize_exts($raw_number); # It helps to strip parens, as opposed to other noise, at # this point. Sequential cohesion, anyone? Bleah. $raw_number = zap_parens($raw_number); # Grab the extensions. my @exts; ($raw_number, @exts) = extract_exts($raw_number); # Trim the fat from the ends $raw_number = zap_border_noise($raw_number); # Yank any International Dial Direct codes, that is, codes # used to dial *out* of various countries. These are of no # interest, although they can serve as an indicator of an # upcoming country code. my $idd; ($raw_number, $idd) = extract_idd($raw_number); # Extensions have been clipped and IDD's plucked. Now look # for stealth dash alternates -- that is to say, a single # dash (normally ubiquitous) that is actually indicating # alternate numbers. This will tank if the lone dash was # meant to indicate an extension. Oh well. $raw_number = normalize_dashslash($raw_number); # Now we can interpolate over alternate suffixes, if any, and # crack each number. my @raws = interpolate($raw_number); foreach my $raw (@raws) { my($num, $cc) = extract_country_code($raw, $first[1]); # Cache first found country code and number @first = ($num, $cc) if !@first && $cc; # Propogate first country code if appropriate. If the root # numbers don't have the same digit count then we do not # propogate the cc. my $native_cc = 1; if (!$cc && $first[1]) { if ($num =~ tr/0-9// == $first[0] =~ tr/0-9//) { $cc = $first[1]; $native_cc = 0; }; } $native_cc = 0 if @raws > 1 && $raw ne $raws[0]; push(@numbers, PhoneNumber->new( num => $num, cc => $cc, ext => \@exts, idd => $idd, )); $numbers[-1]->_native_cc($native_cc); } } # All done @numbers; } sub normalize_dashslash { # Normally dashes cannot tell us much, but if they are towards # the end and the *only* use of a dash on a relatively long # number, it's reasonabe to infer that that the dash is # indicating alternative suffixes for a number. In this case # just replace the dash with something more obvious: a slash. my $raw_number = shift; my $dash_count = $raw_number =~ tr/\-//; return $raw_number unless $dash_count == 1; my $total_dcount = $raw_number =~ tr/0-9//; my($left, $right) = split(/\s*\-\s*/, $raw_number); my($pre, $left) = $left =~ /(.*?)(\d+)$/; my $pre_dcount = $pre =~ tr/0-9//; return $raw_number unless $pre_dcount; my $left_dcount = $left =~ tr/0-9//; my $right_dcount = $right =~ tr/0-9//; my $r_pct = $right_dcount/($pre_dcount + $left_dcount); # If there are lots of digits, proceed. Also proceed on smaller # digit streams if the righthand chunk is "big enough", guessed # and gollied here at around 32%. This avoids simple numbers # such as +dd dddd-dddd and +ddd ddd ddd-ddd if ($total_dcount > 12 || $r_pct <= 0.32) { # We have an "interesting" number if ($right_dcount == $left_dcount || $right_dcount == 1) { # Balanced lengths around a lone dash; probably an # alternate ending. Otherwise an rlength of 1, which is an # alt or an ext but we'll presume alt. return join('/', "$pre$left", $right); } } # No suspicious dashes $raw_number; } sub zap_border_noise { # Zap leading and trailing non-numerics (but not +) my $raw_number = shift; $raw_number =~ s/^[^\+\d]+//; $raw_number =~ s/\D+$//; $raw_number; } sub zap_parens { # Zap parentheses my $raw_number = shift; $raw_number =~ s/[()]//g; $raw_number; } sub extract_idd { # Clip International Direct Dial Codes if they were entered # instead of country codes. Specifically we go after # combinations of leading zeros followed by ones: 00, 011, # 0011, 010, etc. This does not cover all IDDs, but gets many # of them. Oftentimes the country code will remain next in # line. Note that we take special care *not* to clip a mere # leading '1' or '001', the CC for the USA. my $raw_number = shift; my $original_number = $raw_number; # Remove start noise, such as quotes and whitespace $raw_number =~ s/^[^\d\+]+//; # Isolate a '+' if present with dashes, additional pluses, or # any other non-numeric cruft following it. $raw_number =~ s/^\+\D*/\+/; my $idd; if ($raw_number =~ s/^\+(0+1{2,})/\+/) { # Look for a + followed by zeros and at least two 1's, and # replace with a '+'. By far the most common occurrence of # this is '+ 011'. $idd = $1; } elsif ($raw_number =~ s/^\+(0+)1/\+/) { # Check for a + followed by zeros and a single 1. Here we # also check the remaining digit count in order to guess # whether we're dealing with the Country Code of the USA (1) # or an IDD from within somewhere else. my $digit_count = $raw_number =~ tr/[0-9]//; if ($digit_count == 10) { # CC for U.S.A. $idd = $1; $raw_number =~ s/^\+/\+1/; } else { # IDD from within somewhere else? $idd = "${1}1"; } } elsif ($raw_number =~ s/^\+(0+[01]*)/\+/) { # Wrap-up for mandatory '+' sightings: Replace a plus # followed by zero followed by any combination of 1's and 0's # with just a '+'. $idd = $1; } elsif ($raw_number =~ s/^(0+[01]*)/\+/) { # Infer '+' for remaining 01 combinations with no '+', in # particular '00'. $idd = $1; } else { # No idd found return $original_number; } # Booty $raw_number =~ /\d/ ? ($raw_number, $idd) : $original_number; } sub normalize_exts { # attempt to normalize odd ext indicators to a single 'x' my $raw_number = shift; $raw_number =~ s/[\#\*]+\s*(\d+.*)$/x$1/; $raw_number; } sub extract_exts { # Extract extensions. Multiple extensions are assumed to be # indicated with slashes of some sort (hence the earlier # normalizing attempt) my $raw_number = shift; return $raw_number unless $raw_number =~ /\D/; my @exts; if ($raw_number =~ s/[xX]+\D*(\d+.*)$//) { my $ext = $1; $ext =~ s/[^\d,\/\\\|]+//g; @exts = split(/\D+/, $ext); } # clean up non-numeric extension debris $raw_number =~ s/\D+$//; ($raw_number, @exts); } sub normalize_alts { # attempt to normalize delimeters for alternate numbers or # extensions my $entry = shift; return $entry unless $entry =~ /\D/; $entry = lc($entry); $entry =~ s/\s+or\s+/\//g; $entry =~ s/\s*[,;\|]\s*/\//g; $entry; } sub extract_country_code { # Yank or infer country codes if possible my($raw, $cc_known) = @_; my($num, $cc); if ($raw !~ /^\+/) { # No '+', see if the first number group looks like a country # code. If so make sure there are enough digits in the number # to make sense with a country code. if (!$cc_known && ($raw =~ /^(\d+)[\s\-]+\d+/ && is_country_code($1)) || $raw =~ tr/0-9// > 10) { ($num, $cc) = pull_cc_smart($raw); } else { # No country code to pull. Just strip non nums. $num = $raw; $num =~ s/\D+//g; } } else { # There was a leading '+' so we'll have a go at pulling a # country code, even if there is no valid one present. ($num, $cc) = pull_cc_smart($raw); } # Booty ($num, $cc); } sub pull_cc_smart { # Yank country codes by scanning for valid country codes my $raw_number = shift; $raw_number =~ s/\D+//g; my($num, $ccode) = pull_country_code($raw_number); ($num, $ccode); } sub pull_cc_guess { # Attempt to mechanically yank country codes without any # information on what represents a valid cc. my $raw_number = shift; my $pat = qr/^\s*\++[\s\+\-]*(\d+)/; my($ccode) = $raw_number =~ /$pat/; $raw_number =~ s/$pat// if defined $ccode; $raw_number =~ s/\D+//g; ($raw_number, $ccode); } sub split_nums { # Attempt to detect and split multiple numbers. my $raw_number = shift; my @numbers; if ($raw_number =~ tr/0-9// > 18) { if ($raw_number =~ /\+[^\+]+([^\+\s]\s*\+)/) { # Attempt to split on '+' in cases where there # are multiple country codes. @numbers = split("\Q$1\E", $raw_number); map($numbers[$_] = "+$numbers[$_]", 1..$#numbers); } else { # Otherwise go for slashes and length ratios. We # guess/golly chunk lengths of 9 digits or larger. my @chunks; ($numbers[0], @chunks) = split(/\s*[\\\/,]+\s*/, $raw_number); foreach my $chunk (@chunks) { my($ext_guard) = $chunk =~ /^([^x]+)/i; my $chunk_digits = $ext_guard =~ tr/0-9//; if ($chunk_digits >= 9 && $numbers[-1] =~ tr/0-9// >= 9) { push(@numbers, $chunk); } else { $numbers[-1] .= "/$chunk"; } } } # Check for numbers such as +1 555 555 5555 1 444 444 4444 # This is some hard-coded US-centric whack for sure. if (@numbers <= 1) { my @chunks = split(/[\s\+\-]+1[\s\+\-]+/, $raw_number); shift @chunks if $chunks[0] =~ /^\s*$/; if (@chunks >= 2) { @numbers = (); foreach (@chunks) { if (tr/0-9// >= 8) { push(@numbers, "+1 $_"); } else { $numbers[$#numbers] .= " +1 $_"; } } } } } # Booty @numbers ? @numbers : $raw_number; } sub interpolate { # Many times multiple suffixes, rather than whole numbers, are # indicated by slashes, etc. We take these suffixes and join # them to their common prefix. my $raw_num = shift; # Split on our chosen delimeters ( '/' or '\') my($base, @frags) = split(/\s*[\\\/]+\s*/, $raw_num); return $raw_num unless @frags; # Pull the digits from the first alternate in order to capture # the digit count my($nchunk) = $frags[0] =~ /(\d+)/; return $raw_num unless defined $nchunk; # Check to make sure our prefix isn't a shorter stub. If it is, # interpolation makes no sense. my $base_dcount = $base =~ tr/0-9//; my $chunk_dcount = $nchunk =~ tr/0-9//; return $raw_num if $base_dcount <= $chunk_dcount; # Using that length, pull the root number from the string # containing that root plus the *first* alternative. my($prefix) = $base =~ /(.*)\d{$chunk_dcount}$/; # Entirely separate numbers if no prefix. return $raw_num unless defined $prefix; # Apply the prefix to the remaining alternatives. We drop # duplicates. In the real world this probably meant that the # alternative presented was an extension rather than an # alternative. Oh well. my @interpolated; my %seen; foreach ($base, map("$prefix$_", @frags)) { next if $seen{$_}; push(@interpolated, $_); ++$seen{$_}; } # Booty @interpolated; } 1;
Listing 2. CountryCodes.pm
package CountryCodes; use strict; use Exporter; use base qw(Exporter); use vars qw( @EXPORT @EXPORT_OK ); @EXPORT = qw( is_country_code pull_country_code ); @EXPORT_OK = qw( initialize_from_net ); use Carp; ### Initialization my @Icodes = qwmy $Fresh_Codes_url = 'http://kropla.com/dialcode.htm'; my(%Icodes, %Icodes_by_length, %Icodes_huff); initialize(); sub initialize { # Set up data structures -- handy when we want to update with # fresh codes off the Net. if (@_) { @Icodes = @_; } %Icodes = %Icodes_by_length = %Icodes_huff = (); grep(++$Icodes{$_}, @Icodes); foreach my $code (@Icodes) { my $l = length $code; $Icodes_by_length{$l} ||= []; push(@{$Icodes_by_length{$l}}, $code); } foreach my $l (keys %Icodes_by_length) { @{$Icodes_by_length{$l}} = sort @{$Icodes_by_length{$l}}; } foreach my $code (@Icodes) { my @digits = split(//, $code); my $str = join('}{', @digits); eval "++\$Icodes_huff{$str}"; } } ### Accessors sub is_country_code { my $code = shift; return unless $code; $Icodes{$code}; } sub country_codes_of_length { my $l = shift; return unless $Icodes_by_length{$l}; @{$Icodes_by_length{$l}}; } sub random_country_code_of_length { my $l = shift; return unless $Icodes_by_length{$l}; $Icodes_by_length{$l}[rand(scalar @{$Icodes_by_length{$l}})]; } sub pull_country_code { # Given a string of digits, pull a matching country code # from the beginning and return the resulting code and # remaining digits. my $number = shift; return unless $number; croak "Non numeric data\n" unless $number =~ /^\d+$/; my @digits = reverse split(//,$number); my @pulled; my $ptr = \%Icodes_huff; while (@digits) { $_ = pop @digits; last unless $ptr->{$_}; push(@pulled, $_); $ptr = $ptr->{$_}; last unless ref $ptr; } my $cc = join('', @pulled); my $left = join('', reverse @digits); return $number unless $left =~ /\d/; return($left, $cc); } ### Get new country codes from the Net sub initialize_from_net { # earlier versions of TE will not work for this site require LWP::Simple; eval "use HTML::TableExtract 1.08"; die "Oops: $@\n" if $@; my $html = LWP::Simple::get($Fresh_Codes_url); my $te = HTML::TableExtract->new ( headers => ['Country\s+Code', 'Country\s+Name'], br_translate => 1, ); $te->parse($html); my(@ccodes, %seen); foreach my $row ($te->rows) { my($cruft, $country) = @$row; my($code) = $cruft =~ /^\s*(\d+)/; next unless defined $code; next if length $code > 3; next if $seen{$code}; push(@ccodes, $code); ++$seen{$code}; } initialize(sort @ccodes); } 1;
Listing 3. PhoneNumber.pm
package PhoneNumber; # Simple class to store various bits of a phone number and roll # them out as a string when needed. use strict; use Carp; my @Valid_Parms = qw( num idd cc ext ); my $Ppat = join('|', @Valid_Parms); sub new { my $class = shift; my %parms = @_; foreach (keys %parms) { croak "Invalid parameter '$_' passed.\n" unless /^$Ppat$/o; } my $self =\%parms; $self->{ext} ||= []; $self->{_native_cc} = 1; bless $self, $class; } sub number { my $self = shift; if (@_) { $self->{num} = shift; delete $self->{_chunked}; } $self->{num}; } sub idd { my $self = shift; @_ ? $self->{idd} = shift : $self->{idd}; } sub _native_cc { # hack for scrambling original dataset my $self = shift; @_ ? $self->{native_cc} = shift : $self->{native_cc}; } sub country_code { my $self = shift; @_ ? $self->{cc} = shift : $self->{cc}; } sub extensions { my $self = shift; if (@_) { @{$self->{ext}} = @_; } @{$self->{ext}}; } sub chunked_number { # Regurgitate a phone number with a 4 digit grouping last, # preceded by 3 digit groups prior to that. my $self = shift; my $num = $self->number; return unless defined $num; if (!$self->{_chunked}) { # Optimize for chunking $num = reverse $num; my @tphn; if ($num =~ s/^(\d{1,4})//) { push(@tphn, $1); } push(@tphn, $num =~ /(\d{1,3})/g); # Undo reversals grep($_ = reverse, @tphn); @tphn = reverse @tphn; # Cache $self->{_chunked} = \@tphn; } @{$self->{_chunked}}; } sub as_string { # Attempt some nice formatting my $self = shift; my $str; my $icode = $self->country_code; $str = "+$icode " if $icode; $str .= join(' ', $self->chunked_number); my @exts = $self->extensions; $str .= (' x ' . join('/', @exts)) if @exts; $str; } 1;
Listing 4. phone.pl
#!/usr/bin/perl use strict; use FindBin; use lib $FindBin::Bin; use PhoneParse; my $Show_Line_Count = 1; my $col1w = 35; while (<DATA>) { chomp; s/^\s*\S+\s+//; # clip pct next if /^\s*$/; my $entry = $_; our $attempt; ++$attempt; printf("%4d. ", $attempt) if $Show_Line_Count; printf("%${col1w}s : ", $entry); # Main voodoo my @numbers = parse_phone($entry); foreach (0 .. $#numbers) { if ($_ > 0) { printf("%6s", ' ') if $Show_Line_Count; printf("%${col1w}s ", ' '); } print $numbers[$_]->as_string, "\n"; } } __DATA__ 18.5765% +1 234 567 8923 14.3681% 234-567-8923 9.0357% +21 30 450 6789 7.6921% +21 10 3456 7189 5.6347% +20 1 3456 7809 4.1844% +00 121 340 5670 3.7609% +23 1456 71 8000 2.7520% (203) 456-7000 2.1150% +203 4567892 2.0478% +01 2345 6789 1.5463% +21 134 56 7892 1.4852% +231 45 6789 1.2932% +23 4 560 0070 1.2176% +23 11 4516078 0.9285% +23 4 567 8000 ext 9000 0.8841% +23 1 45 67 1819 0.8493% 0231 0.6802% +010 231 4516 0.6658% +20 1 3456 07890 0.6550% +21 3 45 16 71 89 0.6202% +234 5678 9234 0.6094% 0123456107 0.5938% 234-567-8901 x 2304 0.5302% +21 0134 567189 0.5122% +02 3 4516789 0.4775% +21 134 567 892 0.4547% +20 034 516780 0.4379% +234 1 5601700 ext 0189 0.4151% +1 234 567 8902 ext 3001 0.4091% +23 41 5611 000 0.3035% +20 1 03456718 0.2903% +02 34156789 0.2759% +21 30 405 111 0.2735% +12314567892 0.2699% +020 3 456 7089 0.2627% +213 45 67 8901 0.2567% +234 1-5601700 80192 0.2507% +21 0345 678 902 0.2207% +234 506 78 9020 0.2027% +20 31 456171 0.2015% +21 10 345 61789 0.2015% +210103456011 0.1943% +21 10 34111567 0.1931% +23 04 50016 708 0.1871% 203 456-0780 0.1871% +1 234-567-8192 0.1871% +23 41 560 78 90 0.1727% 1-234-567-1189 0.1703% +231 41 567 8923 0.1548% +23 41 5601711 ext 1089 0.1428% +21 345 67 100 0.1404% +23 040 5678 9010 0.1344% +23 4 56789 2340 0.1320% +23 145678900 0.1308% +21 341 5 60781 0.1296% +21 345 67 89011 0.1176% 203.456.7018 0.1140% +21 30 40 5678 0.1128% +23 45 6789 2034 ext. 1156 0.1128% +234 567 08 09 ext 2345 0.1092% +213 1 450067 ext 1008 0.1080% +21 030 4056781 0.1008% +21 134 56 78 90 0.0996% +23 4 516 78 92 0.0972% +1 234 561-7892 0.0924% + 21 10 341 5067 0.0912% +234 51 6701111 0.0900% +21 0300 405 6780 0.0876% +234 51 6708111 ext 9111 0.0852% +234 51670018 ext 9210 0.0852% +230 4 56078 923 0.0840% 020 304 5678 0.0816% +23 40 5678-9200 0.0804% +21 0345678902 0.0780% +23 1 40115 ext 600 0.0780% +23 1 45678 ext. 9023 0.0768% +234 5 67 8920 0.0756% +230 4 50116789 0.0756% +1 213 456 1789 X2300 0.0732% +234 5 67 111 0.0696% +203 450 6111 ext 1007 0.0684% +21 010 3456 789 0.0660% +2 01 3041506 0.0648% +231 4567892 ext. 340 0.0648% (213)456-1178 0.0648% +213 1 450067 0.0636% +01 23 45 67 89 0.0624% +21 3415 60117 0.0600% +2011 340056 0.0600% +0123451678 0.0600% +23 1 4567800 ext. 901 0.0588% +21 345 607809 ext. 203 0.0588% + 1 234 501 6078 0.0588% + 21 010 341 5016 0.0576% +1-231-456-7189 0.0564% +23 10 45067801 ext.9023 0.0564% +23 11 4150 6000 ext 7819 0.0564% +23 04 51101 ext. 6789 0.0552% +234 51670018 0.0540% +231 41 5116 789 0.0540% +1 234 567 8902 ext. 3456 0.0540% +2034 501 111 0.0516% +23 141 506 7000 ext. 8092 0.0504% +234 5111611 ext 7189 0.0504% +234 1 506 7 819 0.0504% +234 1 5617180 0.0492% 01213 0.0492% + 23 04 5110 6007 0.0480% +234 567892 ext 3456 0.0480% +23 01 45 67 18 92 0.0468% 0121 304 5607 0.0468% (203) 456 0781 0.0444% +231 4 5678 9123 0.0444% 23-456-0700 0.0444% +23 45106 718 0.0444% +23 4567 89230 ext 111 0.0432% 100 0.0420% +23 1 4567 89 02 0.0420% 23-145-678-9200 0.0420% +23 1 4567800 ext. 9234 0.0420% +23 045110 6718 0.0408% +230 456078 0.0396% 23-4-567-8911 0.0372% +1 234 5678923 0.0360% +020-3045167 0.0360% +23 456 780 00 ext 10 0.0360% +23 0 141 567 8923 0.0348% +23 4567 89 000 0.0348% +23 04 506789 20 0.0348% 2304 5000 0.0336% +234 1 5617180 ext. 9010 0.0336% +23 0456 7892 31 0.0336% +21-10-3456789 0.0336% +02 341 5678 0.0336% +0121 304 5067 0.0312% +23 01456 718000 0.0312% +23 45607892 ext. 3401 0.0300% +231 45678 9123 0.0300% +20 341 56789 0.0300% 201-345-6700 ext. 892 0.0288% +23 11451 6100 0.0288% +213 45 678923 0.0288% +1 23 4567 8092 0.0288% + 23 1 45 06 07 81 0.0288% +234-56078902 0.0276% +21 341 560 70089 0.0264% +23 1 4567800 ext. 11923 0.0264% +23 40 56789-110 0.0264% +2345 6007892 ext. 100 0.0252% 01203410567 0.0252% +1 (203) 456-7892 0.0252% + 203 456 0789 0.0252% +231 415116718 0.0252% 020 3456 0170 0.0240% +21-10-341-5678 0.0240% +21 10 34 56 708 0.0240% +1 231 4 56 1789 0.0240% +23 40 5678923 ext. 1401 0.0228% +203-4516-7819 0.0228% +23 04 51101 ext. 601 0.0228% +234 5 678 92 31 0.0228% +2301 0.0216% 123-1456 0.0216% +23 10 45067801 ext. 9023 0.0216% +203-456 0708 0.0216% + 23 1 4567 8923 0.0204% +23 04 5110 ext. 6789 0.0204% +231 4567892 ext. 3456 0.0204% 01234 516789 0.0204% +2 031 456789 0.0192% +23 (0)40 5617 8902 0.0192% +20 1 3456 710 0.0192% 203- 456-7892 0.0192% +20 1 345617 0.0192% +234 567 80923 0.0192% +2 03 451 6708 0.0192% +23 0145 67 89 20 0.0192% +23 4 05 6789 2345 0.0180% + 21 10 3045678 0.0180% +23 411 567892 13 0.0180% +23 4 5678920 ext 3401 0.0180% +23 451 67890 11 0.0168% +21 3 45 167 892 0.0168% +23 10 45067801-9023 0.0168% +23-04-5110-6708 0.0168% +23 415678920 ext. 3415 0.0168% +1 234 567 8923 x4500/IR 61 0.0168% 203-456-7000 x 101 0.0168% 213-456-0178 x911 0.0168% +21 10 304 0.0156% +23 01456 71 8921 0.0156% +230 4567 0.0156% +23 40 5678 -9234 0.0156% +02 3405670 0.0156% 231/456-7890 0.0156% 234-567-8921 x3415 0.0156% +203-4516 7892 0.0156% +23 141 560 78 92 0.0156% +2110341 5678 0.0156% +21-10-341 5601 0.0156% +23 141 506 7000 ext 8092 0.0156% 010 2345167 0.0144% 23456789 0.0144% +234 56789 0.0144% +234 1 5067 801 0.0144% +23 4567 89-2345 0.0144% 203-456-7189 x 10 0.0144% +23 45 671-8923 0.0144% +2 01 03041560 0.0132% +231-456-7892 0.0132% +23 04 5110.6789 0.0132% +23 04 51101 6789 0.0132% +234 506 781 0.0132% +23 45067 80921 0.0132% +011 23 40 5678 1092 0.0132% +21 31 4506 00710 0.0132% +23 4567 809 0 0.0132% +1 000 000 000 0.0132% +234 567 8923 405 0.0132% +20 31 45678 0.0132% 201-345-6700 x18 0.0120% +23 456 708900 ext. 2301 0.0120% +23 14 1 567 8923 0.0120% +20-34-567891 0.0120% +23 (0)405 678 9100 0.0120% + 21 0 341 5607 0.0120% +2311 4516700 0.0120% +23 1 4567 ext. 8923 0.0120% 234-567- 0.0120% 2134560178EXT923 0.0108% 234-567 8092 0.0108% 1 213 451 6780 0.0108% +1 231456 7811 0.0108% + 231 4567 1892 0.0108% + 23 040 5678 1923 0.0108% +20 3 4516-7892 0.0108% +21 0 10 341 5011 0.0108% +23 4567 8000 ext 9020 0.0108% 231.456.0789 x10 0.0108% +001 231 456 7892 0.0108% +23 4560 100 0.0108% 234-567-8923 Ext. 10 0.0108% + 23 1456 718923 0.0108% 2034567891ext11 0.0108% +23 1456 789230 ext. 4506 0.0096% +23 11 40567891 ext 2340 0.0096% +23 45 678 9230 10 0.0096% +01-231-456-7189 0.0096% +0021345678923 0.0096% +2304 5110 6789 0.0096% + 01 23 40 56 78 0.0096% +21 3 110 451 67 0.0096% +231 415 116 789 0.0096% 2034567 0.0096% +203 4 561171 ext 181 0.0096% +23.40.5678-9230 0.0096% + 23 1456 789 234 0.0096% +234 5 600 7890 ext 120 0.0096% +203 - 4516 7892 0.0096% +21 10 3456 0 0.0096% + 21 3 45 16 7892 0.0096% +20 1 34 56 000 0.0096% +231 450101 ext. 600 0.0096% +23 4 567 8000 ext 000 0.0084% +23-4567 8923 0.0084% +23 04 51101 ext. 06789 0.0084% +21-0345-678902 0.0084% + 21 34 567 89 23 0.0084% +21 10 345 6700 ext. 8900 0.0084% +23 456 70 0 8900 0.0084% +231456 789201 0.0084% + 20 1 3456789 0.0084% 01234 56 7809 0.0084% +231 456 78 92 0.0084% +23 4 50016 718 0.0084% +231 41 511 67 18 0.0084% 234-567-8902 ext 103 0.0084% +1 234 501 0.0084% +23 1 4567 0.0084% +23 045 6780 9123 ext. 4005 0.0072% 213/4567018 0.0072% +23 4567 89 21 0.0072% +21 134-567892 0.0072% +23 45 670 0000 ext. 00080 0.0072% +234 1-5601700 -89234 0.0072% +203 4516-7892 0.0072% +21 0300 4056780 0.0072% +20 34 56 7 892 0.0072% +023405607 0.0072% +23 40 560 710 89 0.0072% + 23 1 4560 708 0.0072% +2(3456)708902 0.0072% +23 04 5110 - 6708 0.0072% +23 1 4567800 ext. 9 234 0.0072% +234 51 11 61 781 0.0072% 234-50-67-89-213 0.0072% 023-4567892 0.0072% +21 345 607809 ext 234 0.0072% +21 30 405 0 0.0072% +23 145 67 89 0 0.0072% +23 40 5678 9200 3456 0.0072% + 21 3 45167892 0.0072% +23 1 45 678923 0.0072% +23 456 780 00 ext 0 0.0072% +213 41 56 07 89 0.0072% +21 0 345 678 912 0.0072% +1 234 5671 0.0072% +23 4 567 0891 ext.1231 0.0072% +23 0145 678911 ext 234 0.0072% +21 345 607809 0000 0.0072% 234-567-8921 x 3 0.0072% 01 23 45 61 17 0.0072% +21 134 567 0 0.0060% +23 01456 789 110 0.0060% +23 40 560710-11 0.0060% +1 234 567 8923 x4567/8923 0.0060% +23 1 0400 516 789 0.0060% +23 4567-8912 0.0060% +23 141 567 ext 1811 0.0060% 21-30-411-1506 0.0060% 2134516789x213 0.0060% +00 23 141 506 7892 0.0060% +02 341 561 78 0.0060% +21 31 4506 78 91 0.0060% +20034 501106 0.0060% 230 0451 0.0060% 00-234-5-678-9234 0.0060% +23 04567892314 0.0060% +234 567892 ext 34 0.0060% +23 1 4567800 ext. 0-920 0.0060% +1 2314567189 0.0060% 01234 567 892 0.0060% +23-04-51106780 0.0060% +21-345-607891 0.0060% +2130 405 67 89 0.0060% 0121 3456708 0.0060% +234 506708 ext. 9203 0.0060% +234 1 506 789 0.0060% 1234567081ex109 0.0060% +23 141 560 -1789 0.0060% +203 - 451 6789 0.0060% +21 345-6 78923 0.0060% +2 013456710 0.0060% 231456-7892 0.0060% +00 21 30 411 5167 0.0060% + 23 4567 8921 0.0060% +234 1 506 07181 0.0060% +21 30 4100 0.0060% +2 034 5678920 ext. 3456 0.0060% +1 234 0.0060% +23 141 567 0.0048% (230)456-7800 #9023 0.0048% +23 141 567 0 8923 0.0048% +1 231.456.7892 0.0048% +231 41 567 0.0048% X-2345 0.0048% +02345 161 789 0.0048% +23 1456 78 92304 0.0048% +23 1 451161 ext 789 0.0048% +23 04 5110 6789-2341 0.0048% +23 045 6780 9123 ext.4005 0.0048% +234 1561 70892 0.0048% +231 4151167891 0.0048% +0234 56789231 0.0048% +234 1 5601700 ext 89020 0.0048% +23 4 567-8923 0.0048% +21 030-4056708 0.0048% + 23 4 567 00 18 0.0048% -1231 0.0048% + 213 451-6708 0.0048% 23-1456-789213 0.0048% 0234 561789 0.0048% +23 10 450678019102 0.0048% + 234 1 5067801 0.0048% +00231415067819 0.0048% + 21 134 567108 0.0048% +23141 567 8901 0.0048% +21 (0)10 304 5678 0.0048% +21 30-4056178 0.0048% +23 4567 8192/3145 0.0048% +23 41 56789213 ext.104 0.0048% +1 203 456 7819/1023 0.0048% +21.10.3004156 0.0048% +2 0341 56780 0.0048% +231 451 678923 0.0048% +21 345 67 0 0.0048% +234 1 567 0892 0345 0.0048% 012034567 0.0048% 002301045061 0.0048% +21-3-45167089 0.0048% + 23 04 516789 0.0048% 23 01 4567 8923 0.0048% +011 23 141 567 8902 0.0048% +23 01 45 67 8912 0.0048% +21 0341-560789 0.0048% +234 567089 ext. 23 0.0048% + 2 03 4516 7892 0.0048% 0 0.0048% + 201 3456789 0.0048% 213-456-7089 Ext 10 0.0036% + 1 231-456-7892 0.0036% +21 10 345678902 0.0036% +231 41 567 89 02-00 0.0036% +23 04 51101 ext. .6781 0.0036% +1 234 567 8923x4501 0.0036% +23 4 567 0891 ext 1213,4516 0.0036% 231--456-1789 0.0036% +23 (0) 40 5678 9123 0.0036% +23 141-567-8902 0.0036% +213 41 560078 ext 1902 0.0036% 231-456-7891. 0.0036% +23 1 45610 ext 7892 0.0036% 213-451-11111 0.0036% +23 04 51101 ext 6107 0.0036% +23 4561 78100 19 0.0036% +23045607 0.0036% +23 41 516 7000 ext 8092 0.0036% 231-456-789002 0.0036% +234 1 5601700 89123 0.0036% +23-141-506 7892 0.0036% + 23 14567800 0.0036% +23 456 789200 34560 0.0036% +23 1 456 10 7892 0.0036% + 23 0141 567 8900 0.0036% +23 4567 xxxx 0.0036% 203-45-6789 0.0036% 23 141 567 8923 0.0036% +234 1 567 8902 ext 3451 0.0036% +23 1 4506000 ext.7892 0.0036% + 23 04 56171890 0.0036% +234 56 78923 ext. 111 0.0036% +213456 78912 0.0036% +21 10 345.... 0.0036% 203-456-7189 ext.12 0.0036% +1 231 456 -7892 0.0036% +21 10 300.4156 0.0036% +23 456 78912 3415 0.0036% 23 1 45 67 89 23 0.0036% +23 40 567008-0 0.0036% +23045110 6789 0.0036% +23 4 567 0891ext 1121 0.0036% +00 1234 56 0.0036% +2-03-4516-7819 0.0036% + 21 30 41 15 678 0.0036% +23 1 01 40 51 67 00 0.0036% +02-30456171 0.0036% +21 3 456 70 819 0.0036% 2.34E+15 0.0036% +1 (234) 567 8923 0.0036% +023 04 5110 6781 0.0036% 234-506-789 0.0036% +23 40 5678-0 0.0036% (203)-456-7892 0.0036% +21 10 34 56781 0.0036% +203 456-1718 0.0036% +234 567 8923 ext 456 0.0036% +234 50 67819 0.0036% +234 51 678923 ext 456 0.0036% +234567 8000 0.0036% +23 456 7892000 ext.3405 0.0036% +23 4567 8192-3456 0.0036% +23 45 678- 92030 0.0036% +234 51 671 891 0.0036% +230 4 5116 07108 0.0036% +23 4 516 7801 ext. 9213 0.0036% +23 141 506 7000 ext.8019 0.0036% +23 0 040 5678 9231 0.0036% +23 1 45.67.80.92 0.0036% + 23 1456 71 8902 0.0036% + 21 345 67 8923 0.0036% +23-40-5678 9234 0.0036% +23 1 45 67 0.0036% +23 45 6789123 ext. 100 0.0036% +2130411 0.0036% +23 040 56789234 0.0036% +2 0310 456078 0.0036% 203-456-XXXX 0.0036% +23 1 4560 7892^03^ 0.0036% +2 01 0.0024% +23 405678-0923 0.0024% 0021 10 304 5678 0.0024% +23 41 567 0800 ext. 901 0.0024% +231 41 511 16010 0.0024% +23 014560 100 0.0024% +21 3 4500 6001 ext. 7892 0.0024% + 2345 0.0024% +23 141 567 8902 - 3045 0.0024% +21 030 405 6780/9234 0.0024% +2 034 567 89 23 0.0024% +1 234-501 6789 0.0024% +23 141 560 .... 0.0024% 23 1 4567 8101 0.0024% +23 4 5678921 0345 0.0024% +234 5 67892 ext 3456 0.0024% + 1 213 456-1007 0.0024% +23 1456 781921 1341 0.0024% + 23 405678 9234 0.0024% +23 40 516 789-23 0.0024% +23 4 5678 .... 0.0024% +20-3-456-7189 0.0024% 21 3456789234 0.0024% +234 11 5617890 ext 23456 0.0024% +203 456 1700 8923 0.0024% +21.10.345 6780 0.0024% +23.1456.789023 0.0024% 231-456 0.0024% EXT 1213 0.0024% 0023 141 567 8923 0.0024% +23 40 567 89 120 0.0024% +2034 50 1167 0.0024% + 2130456 7811 0.0024% +23 045110 ext. 6789 0.0024% +23-1-450617 0.0024% 234-5-670-0008 0.0024% +23 1 4567800 ext. -9234 0.0024% +231-415116708 0.0024% +23 4 5671 89.23 0.0024% +21311045 607 0.0024% +23-1456-781-923 0.0024% +2 011 34 56 780 9203 0.0024% +21 10 - 345 6071 0.0024% (213) 451- 6078 0.0024% +23 40 1567 892340 0.0024% 234 56 711892 0.0024% + 231 41 5116789 0.0024% +23 4 56 7800 0.0024% 0230 4501 106 0.0024% +23 4 516 780 0.0024% 203-4516-7809 0.0024% +23 40 5678.... 0.0024% +20 34 -567892 0.0024% +2 34516 78923 0.0024% +21.10.341.5678 0.0024% 23-1-456-789002 0.0024% +20 34 - 567 892 0.0024% 234-506-789-230 0.0024% +21 - 30 - 405 6789 0.0024% 234-567-8921 x 3456 or 7809 0.0024% +21 30 405 -6789 0.0024% +23 40-5678-9234 0.0024% 2345-6708 0.0024% +2-03-4516789 0.0024% +23 4 56 78 92345 0.0024% + 2341 506 7819 0.0024% +23 45607 892 10 0.0024% 2345678912x11 0.0024% +23 4 567-0891 ext 2134 0.0024% +23 40 56789-0 0.0024% +23 411 561-1789 0.0024% +234 1 5601700 ext. 89200 0.0024% 203) 456-7892 0.0024% +23 40 56 107 890 0.0024% +21 0304156 789 0.0024% + 21 103045678 0.0024% +23-405-6789234 0.0024% +234 511161-789 0.0024% +234 1 567 8902 ext 34 0.0024% +2034-567809 0.0024% +234 51 678191 ext. 2311 0.0024% 234 5 601789 0.0024% +0200-3045670 0.0024% +23-40-56708-921 0.0024% ++ 234 1 5167 809 0.0024% +231 45 6789123-4 ext. 5016 0.0024% 231.456-7892 0.0024% +0021 30 405 6789 0.0024% +23.451.6700118 0.0024% +234 5678 923 456 0.0024% 203451 0.0024% +21 30 450 6789-2034 0.0024% +21 30 4115678-9234567 0.0024% 234 1 5067892 0.0024% +234-1-5067 891 0.0024% +23 4567.... 0.0024% +23-10-45067801 ext. 9200 0.0024% + 234 1 506 7891 0.0024% 231-456- 7892 0.0024% +23 10 45067801 9234 0.0024% + 21 3 45 167 892 0.0024% +23 (0) 1456 781923 0.0024% +23 04 5110.1 ext. 16789 0.0024% +23 04-51106789 0.0024% 234-5-678902 0.0024% + 213-456-7892 0.0024% +1 213 456 78923 0.0024% +21 - 30 - 411 56 17 0.0024% +234-1-506-7819 0.0024% +23 04 5110 ext. .6789 0.0024% +23 4 56 78 9 0.0024% 010230456017892 0.0024% + 23 45678923456 0.0024% +20 3-4516 7811 0.0024% +23 141 506 ext.7089 0.0024% 230-456--1789 0.0024% +2 0310 0.0024% +23 045 67809123 ext.4015 0.0024% +23 451 671-1 0.0024% 01234-567892 0.0024% 02-30456117 0.0024% 0234567-8920 0.0024% +23-405-1607 892 0.0024% +21 345 607809 ext. 02345 0.0024% +23 405 1607 0 0.0024% +21-304561789 0.0024% +23 4561 100-0 0.0024% +230405 678 9234 0.0024% +23 1 45678191 ext. 2131 0.0024% 203-456-7811 x -9234 0.0024% +2 01 3041 0.0024% +23 45 6178 92 0.0024% +23 145 67 08 923 0.0024% +23 141 560 7892-3450 0.0024% (203) 456- 0.0024% +21 030-411 5678 0.0024% +23 1456 789230... 0.0024% +21-30-405.6789 0.0024% +23 141 506 ext. 7189 0.0024% +23 4 516 0 0.0024% +21 3456 0.0024% +23 (0)40 5617 0.0024% 234-567-8921 ext 3456 0.0024% +23 4 567 0891 ext 0.0024% +203 456 0 1708 0.0024% +23 0.0024% 234-567-8902 ext103 0.0024% +1 0.0024% (203) 456-7189X 12 0.0024% 213-451-1161 ext 7 0.0024% +234 506708 ext 0.0024% + 23 011 4561708 0.0024% +21 10 0.0024% 2345678011Ext9002 0.0024% 2134567892ex13 0.0012% +23 40 561070 - 819 0.0012% +23 4567-89230 0.0012% +23 141 506 7000 x 8923 0.0012% +1 203-456-1789 X234 0.0012% +23 141 567 ext 0 0.0012% +23 4156 789 - 234 0.0012% +2311 451-6178 0.0012% +231 4567892 ext. - 0.0012% +23 456 789200 345 0.0012% +23 0 0405 678 9234 0.0012% +23 1 405 678923 0.0012% + 23 1 451678 0.0012% 213- 456- 7800 0.0012% +23 04 5110(1) ext. 6718 0.0012% +203 - 45167892 0.0012% +23 1456 789... 0.0012% +231 41 5116... 0.0012% 01 234 560780 0.0012% +20 1 34 56178 0.0012% 0234-567811 0.0012% +23 1 4567 ext. 892 0.0012% +21 30 456 1780/9231 0.0012% 2034567819/2034567809 0.0012% +23 451 ???????? 0.0012% 234-567- 8920 x13 0.0012% +23 04 5110 6789-+23 04 50678 9234 0.0012% 2-345 0.0012% +23 4 567 0892 ext345 0.0012% +234 51 678000-9 0.0012% 0200 34156178 0.0012% +1 -213-456-0780 0.0012% +21 0 30 456 0.0012% +231456-718923 0.0012% +1 203-456-7809. 0.0012% +0021 0304 567 8923 0.0012% 213 - 456 - 7819 0.0012% +23 456 0.0012% +23 4567 89-0 0.0012% +23 4 56 78 92 0.0012% 213 456-7108- 0.0012% +23 4 567 8921/+31 45 678 912 0.0012% (234) 567-8912X113 0.0012% +23405678-9230 0.0012% +2 011 34 506 780 9234 0.0012% +1 230 456 78 0.0012% 01.23.45.16.71. 0.0012% +234 5 600 7892 ext. 134 0.0012% (234) 567-8921 x 3456 0.0012% +23 141 567.... 0.0012% +1 231 4 561789 0.0012% + 23 40 567 08 910 0.0012% +23 40- 5006 7819 0.0012% + 23 1 +4567800 0.0012% +231 456 0.0012% +23 41- 567892 0.0012% _203 451 6789 0.0012% +231415067000 ext.8923 0.0012% +21 134 56.... 0.0012% +234 51 6701111 ext. 1891 0.0012% +23 40 5678 - 0.0012% +21-345-678 192 0.0012% +1 234 501- 0.0012% +0023-40-56789230 0.0012% +203 4 561171 ext 1 0.0012% +203- 4516 7892 0.0012% +1 231 456 0000 ext 111 0.0012% +21 (0)345 678 923 0.0012% +23 0145 678911 ext. 234 0.0012% +011 23 0 405 670 8923 0.0012% +234 5 610789-2 0.0012% +234 567 8923456 0.0012% +23 01456 - 78- 9102 0.0012% +21 134 56 0.0012% +234-1-5067892 0.0012% 0023 40 5678 9231 0.0012% 231 456 0.0012% +203 4156701- 18 0.0012% +23 40 5678-902 0.0012% 23- 145-678-1923 0.0012% +23 (0) 141 567 8092 0.0012% +23 141 567 8923 - 4567 - 8923 0.0012% +21 345 678-923 0.0012% +2 031 0 0.0012% +2-034-5678920 0.0012% 0023145678923 0.0012% +23 45 607-89230 0.0012% 2345678092*103 0.0012% 23 0 400 516 789 0.0012% +23 40 567892 0 0.0012% +20 1 010-3041567 0.0012% +234 5 678 - 9234, 5678 or 9234 ext 56 0.0012% 23-450-6789-213 0.0012% +234 5 610789/2 0.0012% 011-231-45 678-1923 0.0012% +20 1 304151617 0.0012% +23 1 456 01 70 11 0.0012% +00 23 1 45678923 0.0012% +1-231-456 7891 0.0012% +231 415 678 92 0.0012% +23 0 1405 678923 0.0012% +23 4567 89 2345-6789 0.0012% +23- 40 5678 9234 0.0012% + 23 01456 71 8923 0.0012% +1 23 40 5678 9213 0.0012% +01 234 501 xxxx 0.0012% +21-345-67892 0.0012% +1 213 451 1678^09^ 0.0012% +2301456 71 8923 0.0012% + 234 5670181 ext. 0.0012% +21 345 - 6 78902 0.0012% (234)567-8912 ext.113 0.0012% + 23 01456 789 234 0.0012% +234 05678923405 0.0012% +21 30456 0.0012% +234 1-5601700 - 0.0012% +21 101-3456-789 0.0012% +1 203 456 7892 Gondor City 0.0012% +21 11 3456178 ext. 9 2 3 0.0012% +20 1 3456 780-1 0.0012% +234 51 670 1111ext. 1892 0.0012% +23 4 56 78 9234 5670 0.0012% 0234 50678 9203 0.0012% +23 01 450678, 092314561 0.0012% +00231 451 678 092 0.0012% +20-34 567892 0.0012% 21 030405 6710 0.0012% +23 40 560 710-0 0.0012% +231 45-67-1892 0.0012% +231 4 567892 ext. 34567 0.0012% +23 4 567.00.80 0.0012% + 234 15 670891 0.0012% + 23 4 511 161 708 0.0012% 0023-45678923 0.0012% +23141 5067892 0.0012% 2134567118;9123415678 0.0012% 23 40 5678 0 0.0012% 010 230456017892 0.0012% +23 40 56789-0 Ext. 2 311 0.0012% + 23 04 51 678 9234 0.0012% +21-30.4567081 0.0012% 234-567-0892 (Pager) 0.0012% +23 45678923 4567 0.0012% +23.04.5110.6789 0.0012% +21 - 3 - 45 16 71 80 0.0012% + 23456078 0.0012% +01 234-5678 0.0012% ++234 1 5167 809 0.0012% +21 - 3- 11045678 0.0012% Sorgum 010 200 3(101) 0.0012% +23 1456 789230 ext 4105 0.0012% 21 10 341 5670 0.0012% + 234 506 78 9234 0.0012% 230/456/7819 0.0012% +201 3 40506 ext. 0708 0.0012% Folley: 231-4516 0.0012% 00231 4 5670892 0.0012% +23(0)1415678923 0.0012% +23 1 45 60 - 78 90 0.0012% +234 56-7892 0.0012% 1123,1145 0.0012% 12134516789Ext213 0.0012% 200-345-6780x9023 0.0012% +1 (203)456 7809 0.0012% 2345 ou 6789 0.0012% +23 45601 789 0 0.0012% 0021 0 30411 5678 0.0012% +21 11 345 6178-9-2 0.0012% +2 0345 0 67892 0.0012% +1 234-567 1891 x2341 0.0012% + 21304115678 0.0012% +23 40- 567 08 923 0.0012% +23 1 45 67 800 ext. 923 0.0012% +234 5607.... 0.0012% +23 4056789 0 0.0012% 234.567.8902x103 0.0012% +21 - 30 - 456-7189 0.0012% +23-405678-9234 0.0012% + 203 4516-7810 0.0012% +23-4567-8923 0.0012% +23 0 141 5678923 0.0012% +231 4516 718 0.0012% +21(0)30 456 1789 0.0012% +21(0)3045 67890 0.0012% +201 34 51 670 0.0012% +23 40 56789 0 0.0012% 234 506 789 213 0.0012% 010-23045601 0.0012% +002310 45067801-9213 0.0012% +23-405-678 ext. 9012 0.0012% 234/567-8912ext34 0.0012% +23 041 56789234-0 0.0012% +23 4561 7 892 0.0012% +23 0456718-0 0.0012% +23-140115 0.0012% 00 234 511161708 0.0012% +23 4156 789 02 0.0012% +23 4 05 67892300 0.0012% +21345-678192 0.0012% +234 567 892 340 5 0.0012% +23 0405678 9234 0.0012% +230 45 678-9231 0.0012% +234 1 5678190-23 Ext. 4005 0.0012% +23 41 567891/2 0.0012% + 234567800 0.0012% +0023 0141-567-8912 0.0012% +20.3. 4516780 0.0012% 2345670890 x 12 0.0012% +234 (0)1 5678902 0.0012% +23 4567 890-2345 0.0012% +1213-456-7189 0.0012% +23 40 56107 08920 0.0012% +23 1 456 10 ext 7892 0.0012% +230 4 560 78 923 0.0012% + 23 1 45 67 800 0.0012% +1 213-4567181 0.0012% +00213 041 56789 0.0012% +234 5678923 405 0.0012% 230 456 7800 ext.923 0.0012% +21 30 45 0.0012% 213-451-6789 / 231-456-7892 0.0012% +2311 451 - 6171 0.0012% 234 561 78923 0.0012% +23 40 5678923 ext.1415 0.0012% +23 04 50678.912 0.0012% 234 567 8923 ext. 405 0.0012% +234-51-6789234 0.0012% +2-034-567-89-21 0.0012% +21 345 67 8923-1 0.0012% +23 1 415 67809 2345 0.0012% +23 4 567 89 21 ext. 345 0.0012% + 23 0145678923 0.0012% 02030456789HAL 0.0012% 234-567-8111 x.111 0.0012% +23 1456 78 9 234 0.0012% +23 04 506789.1 0.0012% 234.516.7080 x923 0.0012% +2 03 4516-7892 0.0012% +234 51 670 1111 ext. 1890 0.0012% +23 04 506789.. 0.0012% + 23 1 451678923 0.0012% +20 1 3456... 0.0012% +21 345 678 910 2 0.0012% +234 1 56781902 3405 0.0012% +21 10 300 4156 7118 0.0012% 203.450678 0.0012% +23 4567xxxx 0.0012% +2 01 3456 178 0.0012% +2 314 567 1819 ext. 23 0.0012% +2 034 56 7189 0.0012% 01 234 510 670 0.0012% +234 1 56 78092 0.0012% +23 40560017-0 0.0012% +23 1 4560700 ext. 89 0.0012% + 2 03 451 6789 0.0012% +21-345 678 923 0.0012% +23-1-40115 0.0012% +23 451 67892-0 0.0012% 234.567.8921 x3456 0.0012% +1 23 45 670 890 0.0012% +23.01.45.67.89.21 0.0012% 1-203-456-01781 0.0012% +23 4 5607892-31 0.0012% +23 0451101-6781 0.0012% +23 141 567 8192 - 3405 - 678 9123 0.0012% +234 50 67 89 0.0012% + 23 415678920 ext. 3456 0.0012% +23.4567.89020 0.0012% +23 450 - 678 09 0.0012% +231045067801--9231 0.0012% +203 456 78923/40 0.0012% +21-3-4500-6017 0.0012% +23 1 4567 ext 8923 0.0012% 23 45 678 000 0.0012% 234-567-8902-103 0.0012% +23 4567.8923 0.0012% +23 045 6780 9123ext. 4105 0.0012% +20 10 3456700 - 1 0.0012% 234-567-8923x405 0.0012% +21 345 6-78923 0.0012% +203 41506 789 0.0012% +23 0141-560-7892 0.0012% + 23 0 4056789234 0.0012% +23 11 4567 0892 ext 103 0.0012% 011-23-1456-789001 0.0012% (234) 51 67 89 12 0.0012% +1 213 456 7000 ext 8902 (Baz) 0.0012% 00230145678923 0.0012% 230-456-7891, x-23. 0.0012% +23.405 678 9234 0.0012% +1 234 567 0891 ext-213 0.0012% +230-415-678921 0.0012% +23 4156 789 2-1 0.0012% +23 04506 7892-13 0.0012% (0)1234 567892 0.0012% +23 04 5110 . 6718 0.0012% +2034516 ext 7892 0.0012% +23 4 567 18920 0.0012% 234 05 00 0.0012% +234 0506 78 9213 0.0012% +20 3-4516071 0.0012% +23 40 56708 9-123 0.0012% +23 4567 89 -2345 0.0012% +203 4156701 - 08 0.0012% 23 0141 560 1789 0.0012% +231 4567892 ext - 0.0012% +1 234 567 8912 1-304-567-8092 0.0012% +1 23 45 670 8901 0.0012% +23 (0)-45-67-08-92 0.0012% +201 3 40506 ext 789 0.0012% 234- 0.0012% +2034516-7892 0.0012% +23 04 5110.1 ext.6789 0.0012% +23 (0)4561 178 1921 0.0012% +1 231 -456 7891 0.0012% +21-30-4156 789 0.0012% 012 345678 0.0012% +234 56 789200-13 0.0012% +234 1 5678190-2 Ext. 3405 0.0012% 0200 345 678 0.0012% +2 03-451 6789 0.0012% +21 30 405 67892 ext 3456 0.0012% +0023 41567 89231 0.0012% +23 40 5 6 07 10 80 0.0012% 23 01456 789 200 0.0012% +1 203 456 7819-2345 0.0012% + 23 0.0012% +2340-5678-9234 0.0012% + 011 23 1456 7892 31 0.0012% (23) 456-789-2345 0.0012% +21 3 4567 89 0 0.0012% +2340-5678 9234 0.0012% 2345- 0.0012% 0203045-6789, -2345 0.0012% + 23 4 56 78 0.0012% +23 04 506789.1. 0.0012% +23 04 506789.23 0.0012% +230405678 9234 0.0012% +203 456 1700 -892 0.0012% +23 04 5110.1 0.0012% +1 203 456 7 0.0012% +2 314 50 67 089 0.0012% (234) 516-000 0.0012% + 234 567 0891 ext. 2345 0.0012% +23-451-60078-19 0.0012% +23 01456 78 92 34 0.0012% 2034567890/EXT112 0.0012% +23 045-67809234 0.0012% +2-03-45167891 0.0012% +23 41 5678... 0.0012% +23 1 4516.... 0.0012% +234 56 789. 0.0012% +23 45678921 ext 103 0.0012% +23 456 78 92 03 / 41 5670 819 0.0012% 0021 (0) 30 456 1781 0.0012% +21.30.405.67.80 0.0012% +23 0 4056789023 0.0012% +21 34 5678 ext 9234 0.0012% +1 213 456- 7892 0.0012% 00234 5 678912 0.0012% +21 (0)30 405 6780/9234 0.0012% +23 4567 8091\\23 0.0012% +2301456 718920 0.0012% 234-567-0890ext12 0.0012% +23 01045067801 ext.9021 0.0012% 203-456-7811x 9230 0.0012% +23-40-516789-12 0.0012% 210-345-6170q 0.0012% +23 45.67.80.00 0.0012% +23 1 451161 ext708 0.0012% +0 1 231-456-7892 0.0012% 231 4567-8923 0.0012% + 234 567 0891 ext 1213 0.0012% + 23 01405 67890 0.0012% +234567 89 2345 0.0012% 2130456 7892 0.0012% +20 34 - 567892 0.0012% +1 213 456 ???? 0.0012% 213-405-1678 pgr 0.0012% 12134567108ext92 0.0012% +23 04 5110.1 ext. 6789 0.0012% +1 EMAIL ONLY 0.0012% +23 4516-780 0.0012% 213-450-6708E x 19 0.0012% +23.04.5110. 6789 0.0012% +23 04567 189 2345 0.0012% [213]4567811 0.0012% +2-034 567 8920 0.0012% +2310-45067801-9231 0.0012% +1 23 4 567 8092 0.0012% 213.-451-6789 0.0012% +23-10-4506-7892 ext. 3045 0.0012% +23 41 5678901-2 0.0012% + 21-30-4567800 0.0012% (234) 567-8192 EXT 3 0.0012% +234 567 8900 ext. 234 0.0012% +23 145.67.89.02 0.0012% +23 4567 8920\\31\\45 0.0012% +23 4 5670891 ext. 0.0012% 231- 456- 0.0012% 213-451-6000x7 0.0012% 21-345-67-8923 0.0012% 23 04 51 678 9234 0.0012% +23 4 5678921 ext 345 0.0012% + 2340 5678 9234 0.0012% +20 1 3045100 ext 67 0.0012% +2 03-4567 8923 0.0012% "+1 213 456 7892" 0.0012% + 00 23 1456 718 092 0.0012% +21 30 - 456 78 92 0.0012% +234 567891 ext. 21-3 0.0012% 23 0.0012% 203-456-7800x92304 0.0012% 23 1456 781923 0.0012% 231 41 5116781 0.0012% +1 213 456 XXXX 0.0012% 231 456-0789 ext. 10 0.0012% 01-23-45-60-78 0.0012% +23 - 405 - 678 9123 0.0012% 213-451-1161 ext. 7 0.0012% +23 4567- 819230 0.0012% +23 141 560 7089 <bk> 0.0012% ++ 20 3456 0.0012% 2-34 567892 0.0012% 23-040-5678-9023 0.0012% +011-23-45-670-0892 0.0012% +234 5 67809023 ext.451 0.0012% +23 45 671 89-2 0.0012% +23 4561 789 23 4 0.0012% 231- 456-0789 12 0.0012% 0021 10 34 56 718 0.0012% (213) 451-6789 ext. 230 0.0012% +23 4156 789-231 0.0012% '+21 30 405 6789 0.0012% + 2 034 567181 0.0012% +20-3-4516 7892 0.0012% +234 5 - 6780 9023 0.0012% +23 4151 6178-10 0.0012% ++23 141 567 8019 0.0012% +21xx xxx xxxx 0.0012% +234 156 780-092 0.0012% +213456-78912 0.0012% +1 23456789023 0.0012% +23 04 500161 ext. 789 0.0012% +203- 456 7189 0.0012% +23141-567 8923 0.0012% ++23-0451106789 0.0012% 213-451-0001- 0.0012% +21 30 411 .... 0.0012% +23 405 160718 0 0.0012% +23 - 04 - 50678 910 0.0012% 1-200-340-5678 ext-912 0.0012% 0021-10-300 4567 0.0012% +23 45 67.81.92 0.0012% +0023 04 5110 6708 0.0012% +23-0-456789231 0.0012% +234 1 5601700-18 0.0012% +23.45.67.89.23 0.0012% +21-345-6 78110 0.0012% +23 45 67 089 0.0012% +20 314 561 7000 8192 0.0012% 2314 5601 7809 0.0012% +20-34- 567800 0.0012% +23 451 678 9234, 5678 0.0012% +23 4156 78912 0 0.0012% (203) 456-7891 EXT.12 0.0012% (213 ) 456-7819 0.0012% 2034567892Hobson/3456789203/PogI 0.0012% +23.1.45.67.89.23 0.0012% +23 4567 89 -234 0.0012% +234 5 0.0012% +23 4-567 8100 0.0012% +23 0141- 567-8923 0.0012% +23 1456 78.... 0.0012% +201 3 40506 0.0012% +23 456789231 4567 0.0012% +23 - 4105 -617 - 189 0.0012% + 21010 341 5016 0.0012% +0021134 56 10 78 0.0012% +1 231- 456 -7189 0.0012% 203-456-7809 - Bobco 0.0012% +20 34 56 7892 1 0.0012% +23 45 6789200 301 0.0012% +23 1456 789 23456 0.0012% Ext. 2310 0.0012% +23 1 45678 0.0012% +21 3 4567 1 1 0.0012% +23.40.56789234 0.0012% +2-1304 0.0012% +23 40 567892 3456 0.0012% +2340-56789234 0.0012% +23-040-560789-21 0.0012% +23 41567189 21345000 0.0012% +011 23 141 567-8901 0.0012% +23 040 5678-9234 0.0012% +21 345 607809 ext 02345 0.0012% 234 51670018 0.0012% +2 010 3456 789 0.0012% +23 456789 231ext. 4567 0.0012% +21 30 411 5678-9 0.0012% +2 03-4516-7892 0.0012% 21 10 341 0.0012% + 20 345 678 92 00 0.0012% (pager) 203-401-5678 0.0012% +23 4567 89 0 0.0012% (0121) 3456781 0.0012% +23-45678-92310 0.0012% +203 450 6111 ext 78 0.0012% + 23 4 567 0891 ext 1234 0.0012% +23 4 56 11 ext. 7080 0.0012% 23 040 5678 9234 0.0012% 213/ 456-7892 0.0012% +00 21 30 411- 5671 0.0012% +23 40 5678..9234.. 0.0012% +23 4567 8923-40567 0.0012% +1 231- 456- 7892 0.0012% + 23 1 451161 Ext 789 0.0012% +203- 4516789 0.0012% +234 5 06 78 9234 0.0012% 231 456- 7819 0.0012% +23 141 560 7800ext 9230 0.0012% +1 23 45 1167 89 0.0012% +20 3 4516789 2 0.0012% +23 45 607 89 2 0.0012% 0231 0101 45 0.0012% (213456-7892 0.0012% 0231 45670 0.0012% +-23 -405-6789234 0.0012% +230 4 56 078 923 0.0012% + 23 01456 789002 0.0012% +23 - 4567 - 81 923 0.0012% 234-567-8921 x 0.0012% 1-200-345-6780x 9023 0.0012% 020 31 14500 0.0012% ext:2-3104 0.0012% +23.04.5110.1 0.0012% 011-23-41-567891 0.0012% +23 045 67809123 ext. 4110 0.0012% +23-0141-567-8921 0.0012% +21 0 30 405 67 89 0.0012% + 23 (0) 405 678 1912 0.0012% + 21 10 341 0.0012% +23 41 5670 0 892 0.0012% + 203 - 451-6789 0.0012% 23 45 6700000 0.0012% +02-341 5678 0.0012% +23-4-567 8921 0.0012% 2 3456 708921 0.0012% +234567892 13 0.0012% +23-45-61-78-92 0.0012% +1 (203) xxx xxxx 0.0012% 23 01456 781923 0.0012% +1 234 567 8902-110 0.0012% +2110 34 56 718 0.0012% +23 11 451 - 6789 0.0012% +23 04 51101.6718 0.0012% 21-30-41-11-506 0.0012% +21 345 67 89231 4 0.0012% .213.451.6178 0.0012% +234 1-5601700 0.0012% 00 23 1 45 67 80 19 0.0012% +234 567 89 23456 0.0012% +21 10 134 561078 0.0012% +2310450678019231 0.0012% +20.3.451-6789 0.0012% +21 01-34-56-78-92 0.0012% +23 4156 0 0.0012% +23 456789 234 ext. 5617 0.0012% +23 1 4567ext. 8009 0.0012% +23 (0) 40 - 56 78 91 20 0.0012% +1 213 456 7089 (x203) 0.0012% + 23 0 141 567 8923 0.0012% +23 4 5670891 ext 0.0012% 02.34.56.78.19 0.0012% + 2-3456-78-9102 0.0012% 1-200-340-5678 ext.912 0.0012% +23 40 506 1708 921 0.0012% + 23 40 516789-23 0.0012% +234 51 11 61 ext. 780 0.0012% +23 40 56789- 234 0.0012% +2- 3 4516 7809 0.0012% +2310 45067801-9234 0.0012% +23 1 4567800 923 0.0012% + 23 1 45610 ext. 7891 0.0012% +23 45607892 ext. 03401 0.0012% + 0023145671809 0.0012% 00 234516700809 0.0012% +2345 6007892 ext 0.0012% + 21 30 4156 780 0.0012% +23 -141-567-8091 0.0012% +234 5 67892 3415 0.0012% + 23 1456 718923 ext 4150 0.0012% +23 1456 789211 .... 0.0012% +23 4 5678923 ext -456 0.0012% +23 1456-780-923 0.0012% 213-451-6789 xt. 213 0.0012% +23 40 561 07 - 892 0.0012% +234-51-671891 0.0012% +23 4 567 0891 ext. 0.0012% +23 4 56078 91 23 0.0012% 203-456-7180 x-902 0.0012% +23 45 6789 2034 -1156 0.0012% +23 451 67892345/67890234/56789234 ext.1510 0.0012% +23 041 516101718 0.0012% + 02345 607 892 0.0012% +203 4 561171-89 0.0012% +21-(0)30-4111506 0.0012% +23 40 506 7 -8 923 0.0012% +1 234 567 8090 x 234 0.0012% +1 203 456 7892 341 567 8921 0.0012% 234 56 0.0012% +21134 56 7892 0.0012% +2130 4156 780 0.0012% 234-5167-Hexnet 0.0012% + 231 4 5067 892 0.0012% +23 410 56107 892 0.0012% +21 300 - 4056780 0.0012% +23 040 56 00 78 92 0.0012% 0213 410 0 0.0012% +23-40-5678-912 0.0012% + 21 0 10 341 5016 0.0012% +234 506 78 -9230 0.0012% 203-456-7892???????????????????? 0.0012% + 23 4567 89 234 0.0012% +23 405 ????????? 0.0012% 011 21 34 567 8923 0.0012% +0121 3456000 EX 0718 0.0012% +23 45678 ????? 0.0012% +234 5 678 0-91 0.0012% +2 0 0.0012% +23 40 5678 .... 0.0012% +23 415 678 900 2340 0.0012% 213 456 7890 x2304 0.0012% +231 4 56789 110 ext. 112 0.0012% 2310 450678019234 0.0012% +21 341-5-67809

Comment on Beast of the Number: Parsing the Feral Phone
Select or Download Code
Re: Beast of the Number: Parsing the Feral Phone
by strat (Canon) on Apr 17, 2002 at 14:26 UTC
    I've just seen a recommendation for telephonenumbers. It is called: E.164 from the ITU-T (TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU)

    Due to copyright issues, I must not post it here. But if you want additional input, you could contact ITU and ask them for a copy.

    Best regards,
    perl -le "s==*F=e=>y~\*martinF~stronat~=>s~[^\w]~~g=>chop,print"

Re: Beast of the Number: Parsing the Feral Phone
by demerphq (Chancellor) on Apr 17, 2002 at 16:21 UTC
    Big time ++ dude!

    Couple of quickie comments before I start trying to run your code against the 10 million german CLI(call line identifiers) that I have access to and the 100k or so UK numbers that are on hand as well.

    Regarding parsing extensions. In some countries (like Germany) you arent allowed to have extensions. I believe this is due to the authorities needing to be able to uniquely identifiy the locaion of every handset in the country. This of course means that if you can find the list of countries that have such a law you can simplify the logic of parsing out extensions.

    Regarding number formats, I believe that you can take advantage of the +1 code. All of these numbers are in a 3-3-4 pattern (with optional extension). These should be easy to parse. OTOH Germany uses a floating format (anywhere for 6 digits (maybe smaller!) for a local number to a full blown 14 digit (including +, country code and area code) for my own phone number (they can get larger).

    Which brings me to area codes. These are/should be easy to parse in the +1 area. But theres no way to do so in a country that uses floating length area codes (like Germany with 2-5 digit area codes) short of knowing the full list for that country. Of course thats not real feasable considering that Germany alone has 5226 of them... (I know I converted the DTAG list into the AOC data used on our switches...) (Actually ive always thought it interesting that Germany has so many, but the entire NA uses less than a thousand. I guess thats why extensions are so common in NA, in order to work around the (currently) antiquitated telecoms industry that is the result of NA's early lead in the area)

    Anyway, these are just quick of the cuff comments. A node this big and serious will need a lot more time for thought.

    Big ++ once again!

    O btw, heres a list of the German area codes in ranged form. (ie 2051-2054 means 2051, 2052, 2053, 2054)

    :-)

    <super>

    my @zones=qw( 201-203  2041  2043  2045  2051-2054  2056  2058  2064-2066  208-209  2102-2104  211  2120-2129  2131-2133 
     2137  214  2150-2154  2156-2159  2161-2166  2171  2173-2175  2181-2183  2191-2193  2195-2196  2202-2208  221  2222-2228 
     2232-2238  2241-2248  2251-2257  2261-2269  2271-2275  228  2291-2297  2301-2309  231  2323-2325  2327  2330-2339  234  2351-2355 
     2357-2369  2371-2375  2377-2379  2381-2385  2387-2389  2391-2395  2401-2409  241  2421-2429  2431-2436  2440-2441  2443-2449 
     2451-2456  2461-2465  2471-2474  2482  2484-2486  2501-2502  2504-2509  251  2520-2529  2532-2536  2538  2541-2543  2545-2548 
     2551-2558  2561-2568  2571-2575  2581-2588  2590-2599  2601-2608  261  2620-2628  2630-2639  2641-2647  2651-2657  2661-2664 
     2666-2667  2671-2678  2680-2689  2691-2697  271  2721-2725  2732-2739  2741-2745  2747  2750-2755  2758-2759  2761-2764 
     2770-2779  2801-2804  281  2821-2828  2831-2839  2841-2845  2850-2853  2855-2859  2861-2867  2871-2874  2902-2905  291  2921-2925 
     2927-2928  2931-2935  2937-2938  2941-2945  2947-2948  2951-2955  2957-2958  2961-2964  2971-2975  2977  2981-2985  2991-2994 
     30  3301-3304  33051  33053-33056  3306-3307  33080  33082-33089  33093-33094  331  33200-33209  3321-3322  33230-33235 
     33237-33239  3327-3329  3331-3332  33331-33338  3334-3335  33361-33369  3337-3338  33393-33398  3341-3342  33432-33439  3344 
     33451-33452  33454  33456-33458  3346  33470  33472-33479  335  33601-33609  3361-3362  33631-33638  3364  33652-33657  3366 
     33671-33679  33701-33704  33708  3371-3372  33731-33734  33741-33748  3375  33760  33762-33769  3377-3379  3381-3382  33830-33839 
     33841  33843-33849  3385-3386  33870  33872-33878  3391  33920-33926  33928-33929  33931-33933  3394-3395  33962-33979  33981-33984 
     33986  33989  340-341  34202-34208  3421  34221-34224  3423  34241-34244  3425  34261-34263  34291-34299  3431  34321-34322 
     34324-34325  34327-34328  3433  34341-34348  3435  34361-34364  3437  34381-34386  3441  34422-34426  3443  34441  34443-34446 
     3445  34461-34467  3447-3448  34491-34498  345  34600-34607  34609  3461-3462  34632-34633  34635-34639  3464  34651-34654 
     34656  34658-34659  3466  34671-34673  34691-34692  3471  34721-34722  3473  34741-34743  34745-34746  3475-3476  34771-34776 
     34779  34781-34783  34785  34901  34903-34907  34909  3491  34920-34929  3493-3494  34953-34956  3496  34973  34975-34979 
     3501  35020-35028  35032-35033  3504  35052-35058  351  35200-35209  3521-3523  35240-35249  3525  35263-35268  3528-3529 
     3531  35322-35327  35329  3533  35341-35343  3535  35361-35365  3537  35383-35389  3541-3542  35433-35436  35439  3544  35451-35456 
     3546  35471-35478  355  35600-35609  3561-3564  35691-35698  3571  35722-35728  3573-3574  35751-35756  3576  35771-35775 
     3578  35792-35793  35795-35797  3581  35820  35822-35823  35825-35829  3583  35841-35844  3585-3586  35872-35877  3588  35891-35895 
     3591-3592  35930-35939  3594  35951-35955  3596  35971  35973-35975  3601  36020-36029  3603  36041-36043  3605-3606  36071-36072 
     36074-36077  36081-36085  36087  361  36200-36209  3621-3624  36252-36259  3628-3629  3631-3632  36330-36338  3634-3636 
     36370-36379  3641  36421-36428  3643-3644  36450-36454  36458-36459  36461-36465  3647  36481-36484  365  36601-36608  3661 
     36621-36626  36628  3663  36640  36642-36649  36651-36653  36691-36695  36701-36705  3671-3672  36730-36739  36741-36744 
     3675  36761-36762  36764  36766  3677  36781-36785  3679  3681-3683  36840-36849  3685-3686  36870-36871  36873-36875  36878 
     3691  36920-36929  3693  36940-36941  36943-36949  3695  36961-36969  371  37200  37202-37204  37206-37209  3721-3727  37291-37298 
     3731  37320-37329  3733  37341-37344  37346-37349  3735  37360-37369  3737  37381-37384  3741  37421-37423  37430-37439 
     3744-3745  37462-37465  37467-37468  375  37600-37609  3761-3765  3771-3774  37752  37754-37757  381  38201-38209  3821 
     38220-38229  38231-38234  38292-38297  38300-38309  3831  38320-38328  38331-38334  3834  38351-38356  3836  38370-38379 
     3838  38391-38393  3841  38422-38429  3843-3844  38450-38459  38461-38462  38464  38466  3847  38481-38486  38488  385  3860-3861 
     3863  3865-3869  3871  38720-38729  38731-38733  38735-38738  3874  38750-38759  3876-3877  38780-38785  38787-38789  38791-38794 
     38796-38797  3881  38821-38828  3883  38841-38845  38847-38848  38850-38856  38858-38859  3886  38871-38876  39000-39009 
     3901-3902  39030-39039  3904  39050-39059  39061-39062  3907  39080-39089  3909  391  39200-39209  3921  39221-39226  3923 
     39241-39248  3925  39262-39268  3928  39291-39298  3931  39320-39325  39327-39329  3933  39341-39349  3935  39361-39366 
     3937  39382-39384  39386-39409  3941  39421-39428  3943-3944  39451-39459  3946-3947  39481-39485  39487-39489  3949  395 
     39600-39608  3961-3969  3971  39721-39724  39726-39728  3973  39740-39749  39751-39754  3976  39771-39779  3981  39820-39829 
     39831-39833  3984  39851-39859  39861-39863  3987  39881-39889  3991  39921-39929  39931-39934  3994  39951-39957  39959 
     3996  39971-39973  39975-39978  3998  39991-39999  40  4101-4109  4120-4129  4131-4144  4146  4148-4149  4151-4156  4158-4159 
     4161-4169  4171-4189  4191-4195  4202-4209  421  4221-4224  4230-4249  4251-4258  4260-4269  4271-4277  4281-4289  4292-4298 
     4302-4303  4305  4307-4308  431  4320-4324  4326-4340  4342-4344  4346-4349  4351-4358  4361-4367  4371-4372  4381-4385 
     4392-4394  4401-4409  441  4421-4423  4425-4426  4431-4435  4441-4447  4451-4456  4458  4461-4469  4471-4475  4477-4489 
     4491-4499  4501-4506  4508-4509  451  4521-4529  4531-4537  4539  4541-4547  4550-4559  4561-4564  4602-4609  461  4621-4627 
     4630-4639  4641-4644  4646  4651  4661-4668  4671-4674  4681-4684  4702-4708  471  4721-4725  4731-4737  4740-4749  4751-4758 
     4761-4779  4791-4796  4802-4806  481  4821-4830  4832-4839  4841-4849  4851-4859  4861-4865  4871-4877  4881-4885  4892-4893 
     4902-4903  491  4920-4929  4931-4936  4938-4939  4941-4948  4950-4959  4961-4968  4971-4977  5021-5028  5031-5037  5041-5045 
     5051-5056  5060  5062-5069  5071-5074  5082-5086  5101-5103  5105  5108-5109  511  5121  5123  5126-5132  5135-5139  5141-5149 
     5151-5159  5161-5168  5171-5177  5181-5187  5190-5199  5201-5209  521  5221-5226  5228  5231-5238  5241-5242  5244-5248 
     5250-5255  5257-5259  5261-5266  5271-5278  5281-5286  5292-5295  5300-5309  531  5320-5329  5331-5337  5339  5341  5344-5347 
     5351-5358  5361-5368  5371-5379  5381-5384  5401-5407  5409  541  5421-5429  5431-5439  5441-5448  5451-5459  5461-5462 
     5464-5468  5471-5476  5481-5485  5491-5495  5502-5509  551  5520-5525  5527-5529  5531-5536  5541-5546  5551-5556  5561-5565 
     5571-5574  5582-5586  5592-5594  5601-5609  561  5621-5626  5631-5636  5641-5648  5650-5659  5661-5665  5671-5677  5681-5686 
     5691-5696  5702-5707  571  5721-5726  5731-5734  5741-5746  5751-5755  5761  5763-5769  5771-5777  5802-5808  581  5820-5829 
     5831-5846  5848-5855  5857-5859  5861-5865  5872-5875  5882-5883  5901-5909  591  5921-5926  5931-5937  5939  5941-5948 
     5951-5957  5961-5966  5971  5973  5975-5978  6002-6004  6007-6008  6020-6024  6026-6029  6031-6036  6039  6041-6059  6061-6063 
     6066  6068  6071  6073-6074  6078  6081-6087  6092-6096  6101-6109  611  6120  6122-6124  6126-6136  6138-6139  6142  6144-6147 
     6150-6152  6154-6155  6157-6159  6161-6167  6171-6175  6181-6188  6190  6192  6195-6196  6198  6201-6207  6209  6211-6218 
     62190-62199  6220-6224  6226-6229  6231-6239  6241-6247  6249  6251-6258  6261-6269  6271-6272  6274-6276  6281-6287  6291-6298 
     6301-6308  631  6321-6329  6331-6349  6351-6353  6355-6359  6361-6364  6371-6375  6381-6387  6391-6398  6400-6409  641  6420-6436 
     6438-6447  6449  6451-6458  6461-6462  6464-6468  6471-6479  6482-6486  6500-6509  651  6522-6527  6531-6536  6541-6545 
     6550-6559  6561-6569  6571-6575  6578  6580-6589  6591-6597  6599  661  6620-6631  6633-6639  6641-6648  6650-6661  6663-6670 
     6672-6678  6681-6684  6691-6698  6701  6703-6704  6706-6709  671  6721-6728  6731-6737  6741-6747  6751-6758  6761-6766 
     6771-6776  6781-6789  6802-6806  6809  681  6821  6824-6827  6831-6838  6841-6844  6848-6849  6851-6858  6861  6864-6869 
     6871-6876  6881  6887-6888  6893-6894  6897-6898  69  7021-7026  7031-7034  7041-7046  7051-7056  7062-7063  7066  7071-7073 
     7081-7085  711  7121-7136  7138-7139  7141-7148  7150-7154  7156-7159  7161-7166  7171-7176  7181-7184  7191-7195  7202-7204 
     721  7220-7229  7231-7237  7240  7242-7269  7271-7277  7300  7302-7309  731  7321-7329  7331-7337  7340  7343-7348  7351-7358 
     7361-7367  7371  7373-7376  7381-7389  7391-7395  7402-7404  741  7420  7422-7429  7431-7436  7440-7449  7451-7459  7461-7467 
     7471-7478  7482-7486  7502-7506  751  7520  7522  7524-7525  7527-7529  7531-7534  7541-7546  7551-7558  7561-7579  7581-7587 
     7602  761  7620-7629  7631-7636  7641-7646  7651-7657  7660-7669  7671-7676  7681-7685  7702-7709  771  7720-7729  7731-7736 
     7738-7739  7741-7748  7751  7753-7755  7761-7765  7771  7773-7775  7777  7802-7808  781  7821-7826  7831-7839  7841-7844 
     7851-7854  7903-7907  791  7930-7955  7957-7959  7961-7967  7971-7977  8020-8029  8031-8036  8038-8039  8041-8043  8045-8046 
     8051-8057  8061-8067  8071-8076  8081-8086  8091-8095  8102  8104-8106  811  8121-8124  8131  8133-8139  8141-8146  8151-8153 
     8157-8158  8161  8165-8168  8170-8171  8176-8179  8191-8196  8202-8208  821  8221-8226  8230-8234  8236-8239  8241  8243 
     8245-8254  8257-8259  8261-8263  8265-8269  8271-8274  8276  8281-8285  8291-8296  8302-8304  8306  831  8320-8338  8340-8349 
     8361-8370  8372-8389  8392-8395  8402-8407  841  8421-8424  8426-8427  8431-8435  8441-8446  8450  8452-8454  8456-8469 
     8501-8507  8509  851  8531-8538  8541-8558  8561-8565  8571-8574  8581-8586  8591-8593  861  8621-8624  8628-8631  8633-8642 
     8649-8652  8654  8656-8657  8661-8667  8669-8671  8677-8679  8681-8687  8702-8709  871  8721-8728  8731-8735  8741-8745 
     8751-8754  8756  8761-8762  8764-8766  8771-8774  8781-8785  8801-8803  8805-8809  881  8821-8825  8841  8845-8847  8851 
     8856-8858  8860-8862  8867-8869  89  906  9070-9078  9080-9094  9097  9099  9101-9107  911  9120  9122-9123  9126-9129  9131-9135 
     9141-9149  9151-9158  9161-9167  9170-9199  9201-9209  921  9220-9223  9225  9227-9229  9231-9236  9238  9241-9246  9251-9257 
     9260-9289  9292-9295  9302-9303  9305-9307  931  9321  9323-9326  9331-9360  9363-9367  9369  9371-9378  9381-9386  9391-9398 
     9401-9409  941  9420-9424  9426-9429  9431  9433-9436  9438-9439  9441-9448  9451-9454  9461-9469  9471-9474  9480-9482 
     9484  9491-9493  9495  9497-9499  9502-9505  951  9521-9529  9531-9536  9542-9549  9551-9556  9560-9569  9571-9576  9602-9608 
     961  9621-9622  9624-9628  9631-9639  9641-9648  9651-9659  9661-9666  9671-9677  9681-9683  9701  9704  9708  971  9720-9729 
     9732-9738  9741-9742  9744-9749  9761-9766  9771-9779  9802-9805  981  9820  9822-9829  9831-9837  9841-9848  9851-9857 
     9861  9865  9867-9869  9871-9876  9901  9903-9908  991  9920-9929  9931-9933  9935-9938  9941-9948  9951-9956  9961-9966 
     9971-9978 );
    
    </super>

    Yves / DeMerphq
    ---
    Writing a good benchmark isnt as easy as it might look.

      Interesting data regarding Germany -- I had no idea.

      It was for precisely this sort of reason, however, that I made no attempt to try and figure out area codes for numbers from various countries around the world. The result of this parsing could easily be passed along to a country-specific module for more appropriate parsing and beautification.

      There are a couple of things to point out here that I did not mention in the article (due to 64k limit on nodes). I made no attempt to parse valid IDD prefixes even though lists for each country are available on the net. The reason is that the IDD prefixes are not mutually exclusive to the Country Codes. Nor, unfortunately or unexpectedly, are area codes for a particular locale.

      This reality produces ambiguous areas where I could be slurping up an area code or IDD as a country code. What's needed in *that* case is some concept of the natural phone number length for that locality. Rather than get that specific, though, I relied on a threshold length and size percentages as measured against the remainder of the number. It's not perfect, but for my data set it worked suprisingly well.

      Given your information about the variability of German numbers, particularly the 14-digit monsters, this technique might fail if the area/province codes happen to match valid country codes elsewhere. This of course only applies to numbers that are presented *without* their Country Code.

      Once this code has its hands on what it thinks is the local number, it's just stored as a single number. I chunk it for display purposes, but generically and in a U.S.-centric kind of way: 4 digits on the suffix, preceded by groups of three digits as long as there are digits left.

      Also, keep in mind that this code is intended to operate on raw, unrestricted data fields. Typoes, blippoes (???? or xxxx) and all of it are present in the data. There's not a whole lot you can do in these cases to pull out a valid number without knowing in excruciating detail the particulars of the intended country.

      GIGO, GIGO, it's off to the dumpster we go!

      BTW, I suspect this code might take quite a while to run on 10 million numbers, even if they are well-behaved.

      Thanks for the comments and feedback. Any thoughts on whether any of this should be CPAN-bound once cleaned up? (new names and POD, obviously, but beyond that...)

      Matt

Re: Beast of the Number: Parsing the Feral Phone
by htoug (Deacon) on Jan 16, 2003 at 08:20 UTC
    Just another 0.02€:

    In Denmark there is no areacode. Phone numbers are just 8-digit numbers with a possible extension (no standard for that, format etc depends on the local switchboard). The number is traditionally formatted as dd dd dd dd, but even that has begun to vary: some people use 2 groups of 3 digits and a 2 digit group, others 2 groups of 4 digits!

    Earlier you could figure out which phonecentral the number was attached to, but following (EU instigated?) rulechanges you can take your number with you when you move from one part of the country to another. Thus there is no areacode, or all of Denmark (including cellular phones!) is in the same area.

    Things do vary.

      Interesting fatctiod about Denmark having no area codes.

      This is why I do not bother attempting to interpret the core number once I have dealt with IDD codes, country codes, extensions, and various representations of multiple numbers. Figuring out an area code is beyond the scope of this set of tools -- it does, however, make a huge step in providing a base phone number suitable for interpretation by a module tailored to a particular country or region.

      I admit that my *display* functionality is US-centric. It chunks the core number width 4 digits on the suffix, preceded by groups of three (or less if the first digits). So in your example, dd dd dd dd would come out looking like d ddd dddd.

      That's just cosmetic, however. The internal representation makes no distinction for area codes of any sort. The PhoneNumber.pm module can be provided with new chunked_number() and as_string() methods suitable for any locale. If I ever put it on CPAN I would attempt to structure it so that subclasses could easily provide for internationalization (perhaps based on country code, but with a default format for the local region).

      (it's worth repeating that the actual display of these numbers was more of an afterthought -- the main thrust is the normalization and parsing of unverified and unruly international phone number strings)

      Matt

Re: Beast of the Number: Parsing the Feral Phone
by Abigail-II (Bishop) on Jan 16, 2003 at 09:35 UTC
    Interesting stuff. I've been considering to add phone numbers to Regexp::Common, and this work might be helpful.

    Abigail

      I was just looking at Regexp::Common, and noticed that phone numbers are still on the TODO list according to the POD. Do you know if phone numbers are any closer to being added to that module?
Reaped: Re: Beast of the Number: Parsing the Feral Phone
by NodeReaper (Curate) on Aug 04, 2009 at 13:01 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://159645]
Approved by FoxtrotUniform
Front-paged by trs80
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2014-10-01 23:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (41 votes), past polls