Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Recognizing numbers and creating links

by htmanning (Friar)
on Apr 13, 2015 at 20:32 UTC ( [id://1123330]=perlquestion: print w/replies, xml ) Need Help??

htmanning has asked for the wisdom of the Perl Monks concerning the following question:

I posted something about this the other day and got some good suggestions. Unfortunately, some were above my pay grade. I am using the following to recognize 3-digit and 4-digit numbers in a text field. It works, but with a few issues. For one thing it tags phone numbers. No big deal, but I wish I could recognize a 10-digit or 7-digit number with a dash and not link to it. The main question I have now is, if the same number appears in the text field, it screws up the first link and doesn't link to the second. Obviously, the code I have here has a limitation. I need a suggestion for a way around this.
my @numbers4 = $text =~ /\b \d{4} \b/gx; foreach $unit4 (@numbers4) { $text =~ s/$unit4/\<a href=\"unit=$unit4\"\>\<b\>$unit4\<\/b\>\<\/a\>/ +i; } # look for 3 digit numbers and make link to Resident Info card. my @numbers3 = $text =~ /\b \d{3} \b/gx; foreach $unit3 (@numbers3) { $text =~ s/$unit3/\<a href=\"?unit=$unit3\"\><b\>$unit3<\/b\>\<\/a\>/i +; }
How can I work around the same 3 or 4 digit number appearing multiple times in the $text field? Thanks,

Replies are listed 'Best First'.
Re: Recognizing numbers and creating links
by AnomalousMonk (Archbishop) on Apr 13, 2015 at 21:15 UTC

    As for avoiding phone numbers, I can't really think of much that's better than the solutions you received recently. Here's my take for substitution:

    c:\@Work\Perl\monks>perl -wMstrict -le "use 5.010; ;; my $s = 'Sent 1 12 to 1402, 222-2222, 1304, 555.555.5555 and 501, 666 666 6 +666 & 6789'; print qq{'$s'}; ;; my $sep = qr{ [-. ] }xms; my $pn = qr{ \d{3} ($sep) (?: \d{3} \1)? \d{4} }xms; my $an = qr{ \d{3,4} }xms; ;; $s =~ s{ (?| $pn (*SKIP)(*F) | (?<! \d) ($an) (?! \d)) } {<a $1><b>$1</b></a>}xmsg; print qq{'$s'}; " 'Sent 1 12 to 1402, 222-2222, 1304, 555.555.5555 and 501, 666 666 6666 + & 6789' 'Sent 1 12 to <a 1402><b>1402</b></a>, 222-2222, <a 1304><b>1304</b></ +a>, 555.555.5555 and <a 501><b>501</b></a>, 666 666 6666 & <a 6789><b +>6789</b></a>'

    Of course, this still uses regex features introduced with Perl version 5.10, and you still haven't said what version you have available, so a possible sticking point there...


    Give a man a fish:  <%-(-(-(-<

      Sorry, I think I'm on Perl 5 but I'll try this. It's a little above my head but I'll give it a whirl. Thanks!

        You can find out what your version of Perl is with the  -v (lowercase v) command line switch:

        c:\@Work\Perl\monks>perl -v This is perl 5, version 14, subversion 4 (v5.14.4) built for MSWin32-x +86-multi-thread Copyright 1987-2013, Larry Wall ... more stuff ...
        The  -V (that's uppercase V) switch will give much more info, but you don't need that for now. See all the command line switches in perlrun.

        (And BTW: I hope you aren't running Perl version 5.5!)


        Give a man a fish:  <%-(-(-(-<

Re: Recognizing numbers and creating links
by AnomalousMonk (Archbishop) on Apr 13, 2015 at 20:51 UTC

    Maybe something like:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $s = '1 x 12 x 123 x 1234 x 12345 x 123 x 1234 x'; print qq{'$s'}; ;; $s =~ s{ \b (\d{3,4}) \b }{<a $1><b>$1</b></a>}xmsg; print qq{'$s'}; " '1 x 12 x 123 x 1234 x 12345 x 123 x 1234 x' '1 x 12 x <a 123><b>123</b></a> x <a 1234><b>1234</b></a> x 12345 x <a + 123><b>123</b></a> x <a 1234> <b>1234</b></a> x'
    (with the HTML tags simplified). I.e., just make one substitution pass through the string instead of many.

    Update: Or did you mean that you do not want the second instances of '123' or '1234' in the string to be en-tag-ified?


    Give a man a fish:  <%-(-(-(-<

Re: Recognizing numbers and creating links
by FreeBeerReekingMonk (Deacon) on Apr 14, 2015 at 07:54 UTC

    I also suggest just a 2 pass solution: Just grab all that seems nummeric. Once you do that, process the output with a function. In our case the function will allow only numbers of 3 or 4 characters, and perhaps a comma as separator. Then you can play adding more cases, like a trailing dot, or allow negative numbers, etc. Sure, It can be done in less lines, but this way you can easily document each line with its purpose.

    $_="ent 1 12 to 1402, 222-2222, 1304, 555.555.5555 and 501, 777 7777 1 +2,4567 11.111"; sub GO{ return "LINK($1)$2" if $_[0]=~/^(\d{3,4})(,?)$/; # here more cases return "NO($_[0])"; }; s/([\.\-\,\d]+)/&GO($1)/gexi; print $_ . "\n";

Re: Recognizing numbers and creating links
by Anonymous Monk on Apr 13, 2015 at 22:31 UTC
    perldoc -q 'remove duplicate'
Re: Recognizing numbers and creating links
by bitingduck (Chaplain) on Apr 14, 2015 at 04:21 UTC

    There are certainly more clever ways to do it, but for small data sets (as in takes less than a few minutes to process) I'd just do multiple passes, where on the first one I'd find all the phone numbers and wrap them in some kind of tag (typically an xml-ish tag, but really anything you won't see in the regular data set, like a double underscore) so that I know to ignore them on the next pass. Then on the next pass I'd make the regex and/or logic such that it ignores things tagged as phone numbers. If I get motivated later this evening I'll post some code.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1123330]
Front-paged by GotToBTru
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-04-18 00:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found