Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Regular expressions - match

by flaviusm (Acolyte)
on Jan 24, 2011 at 22:19 UTC ( [id://884012]=perlquestion: print w/replies, xml ) Need Help??

flaviusm has asked for the wisdom of the Perl Monks concerning the following question:

I try to filter a csv data file based on some optional columns. This is the expected behavior:

e.g.

data line: "one=1,two=2,three=3,four=4,five=5"
result: $1==1; $2==2; $3==3; $4==4; $5==5

or

data line: "one=1,three=3,five=5"
result: $1==1; $2==''; $3==3; $4==''; $5==5

This is the code I have:

my $string1 = "one=1,two=2,three=3,four=4,five=5"; my $string2 = "one=1,three=3,five=5"; $string1 =~ m/ (?:one=(.*?),)? .* (?:two=(.*?),)? .* (?:three=(.*?),)? .* (?:four=(.*?),)? .* (?:five=(.*?),)? /x; print "one: $1, two: $2, three: $3, four: $4, five: $5\n";

The above code doesn't work as expected neither for string1 nor for string2. I would appreciate your feedback and ideas.

Thank you.

Replies are listed 'Best First'.
Re: Regular expressions - match
by eff_i_g (Curate) on Jan 24, 2011 at 22:31 UTC
    For CSV data see Text::CSV.

    Here's a simple example using regex:
    use warnings; use strict; my $string1 = "one=1,two=2,three=3,four=4,five=5"; my $string2 = "one=1,three=3,five=5"; for ($string1, $string2) { print "$_\n"; print "$1=$2\n" while /(\w+)=(\d+)/g; }
    You can modify the RE as you see fit and/or put the data in a hash.

    Update/FYI: Your RE does not work because the .*'s are greedy.
Re: Regular expressions - match
by jethro (Monsignor) on Jan 24, 2011 at 22:34 UTC

    Could I interest you in a more modern data structure for your result, a hash for example?

    #!/usr/bin/perl use warnings; my $string = "one=1,three=3,five=5"; %f=map {m/(\w+)=(.*)/} split /,/, $string; use Data::Dumper; print Dumper(\%f); #prints $VAR1 = { 'three' => '3', 'five' => '5', 'one' => '1' };
Re: Regular expressions - match
by wind (Priest) on Jan 24, 2011 at 23:33 UTC

    There are a few problems with your regex, but your first is the extraneous use of .* as delimiters. The one in the second line of your regex is greedily matching everything else in your line and therefore none of the other sections are able to match.

    I would advise that you use Text::CSV to progress the csv file and then use a regex to split apart each section. However, forgoing that, something like this would work as well.

    use strict; while (my $line = <DATA>) { chomp $line; my %vals = ($line =~ m{(\w+)=([^,]*)}g); print "one: $vals{one}, two: $vals{two}, three: $vals{three}, four +: $vals{four}, five: $vals{five}\n"; } 1; __DATA__ one=1,two=2,three=3,four=4,five=5 one=1,three=3,five=5
Re: Regular expressions - match
by flaviusm (Acolyte) on Jan 24, 2011 at 23:05 UTC

    I tried to simplify the requirements and I ended up confusing you. These are some more details:

    The data lines are like csv, but I cannot rely on the comma delimiters (the data line syntax is not always correct), therefore I chose to match the values based on the text before and after the match group.
    Because of the above, "split" is not an option and I am not interested to use a hash for remembering the values (I have about 6 fields in total).

    I have modified the initial regexp based on "eff_i_g"'s suggesion and it seems to work:

    $string1 =~ m/ (?:one=(.*?),)? .*? (?:two=(.*?),)? .*? (?:three=(.*?),)? .*? (?:four=(.*?),)? .*? (?:five=(.*?),)? /x;
    Thank you very much.
Re: Regular expressions - match
by shree (Acolyte) on Jan 25, 2011 at 06:39 UTC
    my $line = 'one=1,two=2,three=3,four=4,five=5'; $line =~ /\w+\=(\d+)\,\w+\=(\d+)\,\w+\=(\d+)\,\w+\=(\d+)\,\w+\=(\d+)/i +g; print "\$1=$1, \$2=$2, \$3=$3, \$4=$4, \$5=$5\n";

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://884012]
Approved by kennethk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-19 23:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found