Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Extract pattern match from file

by kepler (Scribe)
on Sep 17, 2016 at 16:02 UTC ( [id://1172009]=perlquestion: print w/replies, xml ) Need Help??

kepler has asked for the wisdom of the Perl Monks concerning the following question:

Hello again

I'm trying to extract from a file (which is loaded in a variable) all the strings between "", that is, "example 1 for instance" will save in the array example 1 for instance. I'm trying to use this code: my @strings = $data =~ /\"[^\"]+\"/g; but it's extracting almost every line... Any help would be apreciated...

Kepler

Replies are listed 'Best First'.
Re: Extract pattern match from file
by haukex (Archbishop) on Sep 17, 2016 at 16:14 UTC

    Hi kepler,

    Try Regexp::Common:

    use warnings; use strict; use Regexp::Common qw/delimited/; while (<DATA>) { while (/$RE{delimited}{-delim=>'"'}{-keep}/g) { my $str = $3; print "<$str>\n"; } } __DATA__ nothing "hello" foo "bar" quz "hello" "world" foo "bar" quz "baz" blah "" blah "" blah nothing

    Outputs:

    <hello> <bar> <hello> <world> <bar> <baz> <> <>

    There's also the core module Text::Balanced, but I don't like its API as much.

    Hope this helps,
    -- Hauke D

Re: Extract pattern match from file
by AnomalousMonk (Archbishop) on Sep 17, 2016 at 20:07 UTC

    FWIW and just as a matter of interest, the reason your OPed regex
        my @strings = $data =~ /\"[^\"]+\"/g;
    was "... extracting almost every line..." may be because it will not handle an empty (i.e., zero-length) string properly: the  [^\"]+ regex sub-expression requires at least one non-double-quote character. If there is any  "" empty string in the text, parsing would get "out of sync" by taking the end quote of the empty quote as the start of the spurious body of a quote.

    use warnings; use strict; use Data::Dump qw(dd); my $data = do { local $/; <DATA> }; my @strings = $data =~ /\"[^\"]+\"/g; dd \@strings; __DATA__ nothing "hello" foo "bar" quz "hello2" "world" foo2 "bar2" quz2 "baz" blah blah2 "" blah3 many lines of unquoted stuff "example 1 for instance"
    Output:
    c:\@Work\Perl\monks\kepler>perl extract_double_quote_bodies_2.pl [ "\"hello\"", "\"bar\"", "\"hello2\"", "\"world\"", "\"bar2\"", "\"baz\"", "\" blah3\nmany\nlines\nof\nunquoted stuff\n\"", ]
    Note that  [^"] "not a double-quote" includes the newline character.

    Update: Also note that  /"[^"]+"/g and  /"[^"]*"/g will not properly handle a double-quoted string containing an escaped double-quote (e.g., "x\"y") and will end up "out of sync" in the same way as  /"[^"]+"/g with an empty string.


    Give a man a fish:  <%-{-{-{-<

Re: Extract pattern match from file
by Marshall (Canon) on Sep 17, 2016 at 17:02 UTC
    Hi Kepler,

    I just tweaked your regex a bit and used test data from haukex. I like his regexp-common solution, but for something simple, you are close.

    use warnings; use strict; my $data = do { local $/; <DATA> }; my @strings = $data =~ /\"([^\"]*)\"/g; print map{"<$_>\n"}@strings; =Prints <hello> <bar> <hello> <world> <bar> <baz> <> <> <example 1 for instance> =cut __DATA__ nothing "hello" foo "bar" quz "hello" "world" foo "bar" quz "baz" blah "" blah "" blah nothing "example 1 for instance"
    Update: I saw the post from AnomalousMonk, re: "". That is why I changed the + to a * to handle that situation. And yes, if a quote went between 2 lines, the new line would get captured and have to be dealt with in some way.
Re: Extract pattern match from file
by johngg (Canon) on Sep 17, 2016 at 21:47 UTC

    It is also worth pointing out that double quotes are not regular expression meta-characters and do not need to be escaped.

    johngg@shiraz:~/perl/Monks > perl -Mstrict -Mwarnings -E ' open my $inFH, q{<}, \ <<EOD or die $!; nothing "hello" foo "bar" quz "hello2" "world" foo2 "bar2" quz2 "baz" blah blah2 "" blah3 many lines of unquoted stuff "example 1 for instance" EOD my $data = do { local $/; <$inFH>; }; close $inFH or die $!; say qq{-->$1<--} while $data =~ m{"([^"]*)"}g;' -->hello<-- -->bar<-- -->hello2<-- -->world<-- -->bar2<-- -->baz<-- --><-- -->example 1 for instance<-- johngg@shiraz:~/perl/Monks >

    I hope this is helpful.

    Cheers,

    JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1172009]
Approved by stevieb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2024-04-19 21:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found