Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^2: dumb regex question

by linuxfan (Beadle)
on Apr 07, 2009 at 00:38 UTC ( #755901=note: print w/ replies, xml ) Need Help??


in reply to Re: dumb regex question
in thread dumb regex question

I just noticed that this regex fails for the following input:

/gnomes more data here
My expected string is only /gnomes, whereas it matches everything upto end of the line.. Any idea on how to fix this?

thanks


Comment on Re^2: dumb regex question
Download Code
Re^3: dumb regex question
by Nkuvu (Priest) on Apr 07, 2009 at 01:01 UTC

    With that additional qualification, it will get a bit more tricky. My first thought was to add a space to the character class: m,"?(/[^" ]*)"?,

    But that doesn't work because it won't care that it has found a space inside or outside of a quote, and will stop the regex. Meaning it would capture just "/bootMe" from the line "/bootMe any text here".

    I'd suggest looking into a module like Text::xSV or Text::CSV_XS and setting the delimiter to spaces. Then reject any entry that doesn't have a leading slash. This means dropping the regex entirely.

    Something like:

    #!/usr/bin/perl use strict; use warnings; use Text::CSV_XS; my $csv = Text::CSV_XS->new ({sep_char => ' '}); while (my $line = <DATA>) { chomp $line; # See perldoc Text::CSV_XS for warnings # about this approach with possible embedded # newlines: my $status = $csv->parse($line); my @fields; if ($status) { @fields = $csv->fields(); } else { warn "Problem parsing $line\n"; } for my $field (@fields) { print "Captured ($field) from $line\n" if $field =~ m!^/!; } } __DATA__ "/moreIters 10" "/bootMe any text here" /fewIter /some stuff here "/albatross" foo bar baz monkeys leprechauns /not monkeys /gnomes "not leprechauns though" /gnomes more data here

    Which gives the output:

    Captured (/moreIters 10) from "/moreIters 10" Captured (/bootMe any text here) from "/bootMe any text here" Captured (/fewIter) from /fewIter Captured (/some) from /some stuff here Captured (/albatross) from "/albatross" foo bar baz Captured (/not) from leprechauns /not monkeys Captured (/gnomes) from /gnomes "not leprechauns though" Captured (/gnomes) from /gnomes more data here

Re^3: dumb regex question
by ikegami (Pope) on Apr 07, 2009 at 01:15 UTC
    if (m{"(/[^"]+)"|(/\S+)}) { my $match = defined $1 ? $1 : $2; ... }
    Or whatever's appropriate instead of \S.

    Update: Fixed slashes

      ...yeah. Or that. Although the regex as given needs a tweak, with embedded slashes in there.

      If it wasn't late in the day on a Monday, I might have come up with a regex that would work. Maybe. But at least the Text::CSV_XS solution is not totally wrong.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://755901]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (13)
As of 2014-09-18 14:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (116 votes), past polls