Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Replacing numbers with links

by danambroseUK (Beadle)
on Jul 16, 2006 at 14:18 UTC ( #561572=perlquestion: print w/replies, xml ) Need Help??
danambroseUK has asked for the wisdom of the Perl Monks concerning the following question:


I have unstructured strings which contain ID reference numbers inline. A reference number is made up of between 3-5 digits i.e 200, 50210, 121, 3222 etc.

What I would like to do is change all of the numbers into hyperlinks; to the URL lookup.cgi?id=XXXXX.

I tried doing a match to extract the numbers, then a substitution to replace - this works fine, except that as the URL also contains a number of 3-5digits in length, this screws the substitution.

sub Linker { ($mystring)=@_; @myarray = ($mystring =~ m/(\d\d\d+)/g); foreach $a (@myarray){ $url = "<b><a href='lookup.cgi?id=$a'>$a</a></b>"; $mystring=~s/$a/$url/g; } return $mystring; }

Any ideas? Maybe a solution that does the match, and substitution in one go?
Thanks in advance

Replies are listed 'Best First'.
Re: Replacing numbers with links
by liverpole (Monsignor) on Jul 16, 2006 at 14:28 UTC
    Hi danambroseUK,

    What I think you want to do is replace all of the number strings in a single, global regular expression substitute.

    You could do this with, for example:

    $mystring =~ s/(\d{3,5})/<b><a href='lookup.cgi?id=$1<\/a>/g;

    Several things to note:  1) you can change the \d{3,5} if it turns out you have numeric strings which are less than 3 or greater than 5 digits.  2) you need to escape the '/' in <\/a>, since it would improperly terminate the regex otherwise.  3) the expression $1 is a backreference, used to specify the thing that was matched.

      you need to escape the '/' in <\/a>, since it would improperly terminate the regex otherwise.

      or use something other than a forward slash as your regex delimiter, for example paired curly brackets like this

      $mystring =~ s{(\d{3,5})} {<b><a href='lookup.cgi?id=$1</a>}g;

      You often see matching against *nix paths using the default forward slash delimiter (and omitting the m as it is not required with / ... /) when using a different delimiter would be more readable.

      if( $path =~ /\/path\/to\/file/ ) { ... }

      is, I feel, much harder to read than

      if( $path =~ m{/path/to/file} ) { ... }



      As it stands the regex will match a minimum of 3 and maximum of 5 digits, but doesn't care if they are part of a large number of digits or are imbedded in text.

      use strict; use warnings; while (<DATA>) { s|\b(\d{3,5})\b|<b><a href='lookup.cgi?id=$1</a></b>|g; print; } __DATA__ foo 300 bar bas. foo300bar bas. foo 123456 bar 1234 bas 56789. foo bar bas. 1 foo 1 bar 2 bas bas.


      foo <b><a href='lookup.cgi?id=300</a></b> bar bas. foo300bar bas. foo 123456 bar <b><a href='lookup.cgi?id=1234</a></b> bas <b><a href=' +lookup.cgi?id=56789</a></b>. foo bar bas. 1 foo 1 bar 2 bas bas.

      Is it a concern that the substitution introduces an orphaned bold tag for each substitution made?

      Update: fixed bold tags and changed regex delimiters to avoid picket fence

      DWIM is Perl's answer to Gödel
      Liverpole... A perfect solution to my problem within 10minutes! It works a treat - Thats going to save me all morning tomorrow scratching my head!

      Much appricaite your help :)


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://561572]
Approved by McDarren
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (9)
As of 2017-06-22 13:23 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (520 votes). Check out past polls.