Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Replacing numbers with links

by danambroseUK (Beadle)
on Jul 16, 2006 at 14:18 UTC ( #561572=perlquestion: print w/replies, xml ) Need Help??
danambroseUK has asked for the wisdom of the Perl Monks concerning the following question:


I have unstructured strings which contain ID reference numbers inline. A reference number is made up of between 3-5 digits i.e 200, 50210, 121, 3222 etc.

What I would like to do is change all of the numbers into hyperlinks; to the URL lookup.cgi?id=XXXXX.

I tried doing a match to extract the numbers, then a substitution to replace - this works fine, except that as the URL also contains a number of 3-5digits in length, this screws the substitution.

sub Linker { ($mystring)=@_; @myarray = ($mystring =~ m/(\d\d\d+)/g); foreach $a (@myarray){ $url = "<b><a href='lookup.cgi?id=$a'>$a</a></b>"; $mystring=~s/$a/$url/g; } return $mystring; }

Any ideas? Maybe a solution that does the match, and substitution in one go?
Thanks in advance

Replies are listed 'Best First'.
Re: Replacing numbers with links
by liverpole (Monsignor) on Jul 16, 2006 at 14:28 UTC
    Hi danambroseUK,

    What I think you want to do is replace all of the number strings in a single, global regular expression substitute.

    You could do this with, for example:

    $mystring =~ s/(\d{3,5})/<b><a href='lookup.cgi?id=$1<\/a>/g;

    Several things to note:  1) you can change the \d{3,5} if it turns out you have numeric strings which are less than 3 or greater than 5 digits.  2) you need to escape the '/' in <\/a>, since it would improperly terminate the regex otherwise.  3) the expression $1 is a backreference, used to specify the thing that was matched.

      you need to escape the '/' in <\/a>, since it would improperly terminate the regex otherwise.

      or use something other than a forward slash as your regex delimiter, for example paired curly brackets like this

      $mystring =~ s{(\d{3,5})} {<b><a href='lookup.cgi?id=$1</a>}g;

      You often see matching against *nix paths using the default forward slash delimiter (and omitting the m as it is not required with / ... /) when using a different delimiter would be more readable.

      if( $path =~ /\/path\/to\/file/ ) { ... }

      is, I feel, much harder to read than

      if( $path =~ m{/path/to/file} ) { ... }



      As it stands the regex will match a minimum of 3 and maximum of 5 digits, but doesn't care if they are part of a large number of digits or are imbedded in text.

      use strict; use warnings; while (<DATA>) { s|\b(\d{3,5})\b|<b><a href='lookup.cgi?id=$1</a></b>|g; print; } __DATA__ foo 300 bar bas. foo300bar bas. foo 123456 bar 1234 bas 56789. foo bar bas. 1 foo 1 bar 2 bas bas.


      foo <b><a href='lookup.cgi?id=300</a></b> bar bas. foo300bar bas. foo 123456 bar <b><a href='lookup.cgi?id=1234</a></b> bas <b><a href=' +lookup.cgi?id=56789</a></b>. foo bar bas. 1 foo 1 bar 2 bas bas.

      Is it a concern that the substitution introduces an orphaned bold tag for each substitution made?

      Update: fixed bold tags and changed regex delimiters to avoid picket fence

      DWIM is Perl's answer to Gödel
      Liverpole... A perfect solution to my problem within 10minutes! It works a treat - Thats going to save me all morning tomorrow scratching my head!

      Much appricaite your help :)


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://561572]
Approved by McDarren
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (8)
As of 2017-08-22 21:09 GMT
Find Nodes?
    Voting Booth?
    Who is your favorite scientist and why?

    Results (341 votes). Check out past polls.