Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: dumb regex question

by Nkuvu (Priest)
on Apr 07, 2009 at 00:10 UTC ( #755897=note: print w/ replies, xml ) Need Help??


in reply to dumb regex question

I'd change the regex to exclude quotes, rather than match everything: m,"?(/[^"]*)"?,

Test script (including some lines where I tried to break the match):

#!/usr/bin/perl use strict; use warnings; while (my $line = <DATA>) { chomp $line; if ($line =~ m,"?(/[^"]*)"?,) { print "Line matched: $line ($1)\n"; } else { print "Line didn't match: $line\n"; } } __DATA__ "/moreIters 10" "/bootMe any text here" /fewIter /some stuff here "/albatross" foo bar baz monkeys leprechauns /not monkeys /gnomes "not leprechauns though"

Output:

Line matched: "/moreIters 10" (/moreIters 10) Line matched: "/bootMe any text here" (/bootMe any text here) Line matched: /fewIter (/fewIter) Line matched: /some stuff here (/some stuff here) Line matched: "/albatross" foo bar baz (/albatross) Line didn't match: monkeys Line matched: leprechauns /not monkeys (/not monkeys) Line matched: /gnomes "not leprechauns though" (/gnomes )


Comment on Re: dumb regex question
Select or Download Code
Replies are listed 'Best First'.
Re^2: dumb regex question
by linuxfan (Beadle) on Apr 07, 2009 at 00:38 UTC
    I just noticed that this regex fails for the following input:
    /gnomes more data here
    My expected string is only /gnomes, whereas it matches everything upto end of the line.. Any idea on how to fix this?

    thanks

      if (m{"(/[^"]+)"|(/\S+)}) { my $match = defined $1 ? $1 : $2; ... }
      Or whatever's appropriate instead of \S.

      Update: Fixed slashes

        ...yeah. Or that. Although the regex as given needs a tweak, with embedded slashes in there.

        If it wasn't late in the day on a Monday, I might have come up with a regex that would work. Maybe. But at least the Text::CSV_XS solution is not totally wrong.

      With that additional qualification, it will get a bit more tricky. My first thought was to add a space to the character class: m,"?(/[^" ]*)"?,

      But that doesn't work because it won't care that it has found a space inside or outside of a quote, and will stop the regex. Meaning it would capture just "/bootMe" from the line "/bootMe any text here".

      I'd suggest looking into a module like Text::xSV or Text::CSV_XS and setting the delimiter to spaces. Then reject any entry that doesn't have a leading slash. This means dropping the regex entirely.

      Something like:

      #!/usr/bin/perl use strict; use warnings; use Text::CSV_XS; my $csv = Text::CSV_XS->new ({sep_char => ' '}); while (my $line = <DATA>) { chomp $line; # See perldoc Text::CSV_XS for warnings # about this approach with possible embedded # newlines: my $status = $csv->parse($line); my @fields; if ($status) { @fields = $csv->fields(); } else { warn "Problem parsing $line\n"; } for my $field (@fields) { print "Captured ($field) from $line\n" if $field =~ m!^/!; } } __DATA__ "/moreIters 10" "/bootMe any text here" /fewIter /some stuff here "/albatross" foo bar baz monkeys leprechauns /not monkeys /gnomes "not leprechauns though" /gnomes more data here

      Which gives the output:

      Captured (/moreIters 10) from "/moreIters 10" Captured (/bootMe any text here) from "/bootMe any text here" Captured (/fewIter) from /fewIter Captured (/some) from /some stuff here Captured (/albatross) from "/albatross" foo bar baz Captured (/not) from leprechauns /not monkeys Captured (/gnomes) from /gnomes "not leprechauns though" Captured (/gnomes) from /gnomes more data here

Re^2: dumb regex question
by linuxfan (Beadle) on Apr 07, 2009 at 00:24 UTC
    Thank you so much. This is exactly what I wanted.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://755897]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (16)
As of 2015-07-31 17:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (279 votes), past polls