Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Generalizing Regex with Multiple Match

by neversaint (Deacon)
on Nov 25, 2008 at 08:32 UTC ( #725789=perlquestion: print w/replies, xml ) Need Help??

neversaint has asked for the wisdom of the Perl Monks concerning the following question:

Dear Masters,
My code below has no problem capturing such data:
__DATA__ >r4713.1 |SOURCES={GI=162960844,bw,7184325-7184361} >r4714.1 |SOURCES={GI=162960844,fw,6257219-6257255}
How can generalize the regex in my code such that it also support the following entries?
# note that there can me more than two "GI"s inside the {} bracket __DATA__ >r7.1 |SOURCES={GI=162960844,bw,0-4;GI=162960844,bw,9025576-9025608}| >r6.1 |SOURCES={GI=152989753,bw,0-30;GI=152989753,bw,1877925-1877931}|
My code:
use Data::Dumper; my %all_entry; while (<DATA>) { chomp; next unless (/^>/); my $line = $_; $line =~ />.*\{GI=(\d+),(\w+),(\d+\-\d+)\}/g; # 'g' doesn't seem to work #print "$line --- $1 $str{$2} $3\n"; push @{ $all_entry{$1}{$2} }, $3; } print Dumper \%all_entry; __DATA__


---
neversaint and everlastingly indebted.......

Replies are listed 'Best First'.
Re: Generalizing Regex with Multiple Match
by ccn (Vicar) on Nov 25, 2008 at 08:36 UTC
    print "$1, $2, $3\n" while $line =~ /GI=(\d+),(\w+),(\d+\-\d+)/g;
Re: Generalizing Regex with Multiple Match
by prasadbabu (Prior) on Nov 25, 2008 at 08:51 UTC

    Hi neversaint,

    You have to add while statement to get all the matches.

    use strict; use warnings; use Data::Dumper; my %all_entry; while (<DATA>) { chomp; next unless (/^>/); my $line = $_; while ($line =~ /GI\=(\d+)\,(\w+)\,(\d+\-\d+)/g){ push @{ $all_entry{$1}{$2} }, $3; } } print Dumper \%all_entry; output: ------- $VAR1 = { '162960844' => { 'bw' => [ '0-4', '9025576-9025608' ] }, '152989753' => { 'bw' => [ '0-30', '1877925-1877931' ] } };

    Prasad

Re: Generalizing Regex with Multiple Match
by shmem (Chancellor) on Nov 25, 2008 at 09:59 UTC
    # 'g' doesn't seem to work

    It doesn't work since there is only one pair of curly braces per line. Also, a m//g in scalar context matches once and sets the position for further matches at the end of the match (see pos).

    $_ = ">r7.1 |SOURCES={GI=162960844,bw,0-4;GI=162960844,bw,9025576-9025 +608}|"; $_ =~ /GI=(\d+),(\w+),(\d+\-\d+)/g; print "$_\n"; print "-" x pos(),"^\n"; print pos(),"\n"; __END__ >r7.1 |SOURCES={GI=162960844,bw,0-4;GI=162960844,bw,9025576-9025608}| -----------------------------------^ 35

    If you want to match the stuff inside the curlies and then build your structure from multiple matches, you need two passes - first isolate what's inside the curlies, then match with m//g:

    use Data::Dumper; my %all_entry; while (<DATA>) { chomp; next unless (/^>/); my ($line) = />.*\{((?:GI=\d+,\w+,\d+\-\d+;?)+)\}/; push @{ $all_entry{$1}{$2} }, $3 while $line =~ /GI=(\d+),(\w+),(\d+\-\d+)/g; } print Dumper \%all_entry; __DATA__ >r7.1 |SOURCES={GI=162960844,bw,0-4;GI=162960844,bw,9025576-9025608}| >r6.1 |SOURCES={GI=152989753,bw,0-30;GI=152989753,bw,1877925-1877931}|

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://725789]
Approved by ccn
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2020-02-21 11:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What numbers are you going to focus on primarily in 2020?










    Results (94 votes). Check out past polls.

    Notices?