Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^2: Match on line, read backwards to opening xml tag then forward to closing tag

by shadowfox (Beadle)
on Nov 14, 2011 at 20:41 UTC ( [id://938021]=note: print w/replies, xml ) Need Help??


in reply to Re: Match on line, read backwards to opening xml tag then forward to closing tag
in thread Match on line, read backwards to opening xml tag then forward to closing tag

Several interesting ideas floating around but I'd like to try one like this, jethro's being the closest to what I'd like to use. I realized my inital XML example was flawed, so let me try again with a more clear example.
<DataStore> <DataRecord> <Data>123456</Data> <Data2>654321</Data2> <Data>123456</Data> </DataRecord> <DataRecord> <Data>123456</Data> <Data>123456</Data> <Data2>123456</Data2> <Data>1234/3456</Data> <Data>123456</Data> <Data>1234/3456</Data> <Data3>123456</Data3> <Data>123456</Data> </DataRecord> <DataRecord> <Data>123456</Data> <Data>123456</Data> <Data5>123456</Data5> </DataRecord> </DataStore> # From that I want it to loop through and store each <DataRecord> ... +</DataRecord> # From then, if it matches on 4 digits followed by a forward slash I # want it to output the whole <DataRecord> to screen, not just the mat +ched lines from second filter. # For that, I've tried this example open(FILE, "< $FILE") or die "ERROR: $!"; while (<>) { if (/<DataRecord>/ ... /<\/DataRecord>/) { @cache=(); } push(@cache, $_); if (m/<Data>\d{4}\//){ print @cache; } } close (FILE); # The output of that is <Data>1234/3456</Data> <Data>1234/3456</Data> # where I would prefer to see <Data>123456</Data> <Data>123456</Data> <Data2>123456</Data2> <Data>1234/3456</Data> <Data>123456</Data> <Data>1234/3456</Data> <Data3>123456</Data3> <Data>123456</Data>
I wrote it several different ways, and either it prints every <DataRecord> or only filtered <Data> lines, neither is what I need. I want it to print the entire <DataRecord> if that record matches on the second pattern. Clearly I'm doing it wrong but I'm not seeing what, so I assume its glaringly obvious.
  • Comment on Re^2: Match on line, read backwards to opening xml tag then forward to closing tag
  • Download Code

Replies are listed 'Best First'.
Re^3: Match on line, read backwards to opening xml tag then forward to closing tag
by choroba (Cardinal) on Nov 14, 2011 at 23:39 UTC
    Move the push inside the if block - only cache the lines between the matches. Also, your condition for printing is tested for each line, so the program might print too early - only set a flag and print after the whole record was read if the flag is set.
Re^3: Match on line, read backwards to opening xml tag then forward to closing tag
by jethro (Monsignor) on Nov 15, 2011 at 10:32 UTC

    Probably you changed my script because I used "<DataStart>" and "<DataEnd>" instead of the correct "<Dataentry>" and "</Dataentry>" in my regexes. Sorry about that mistake but apart from that my script is working (I tested it just now to be sure). Just use the right strings in the regexes and the script will work, even with the new data you provided.

    my @cache; my $found=0; while (<$file>) { if ( /stringtobefound/) { $found++; } if (/<start fo record>/) { @cache=(); } push(@cache, $_); # print "-----------\nFound is $found, Cache is\n".@cache."----------- +---"; if (/<\/end of record>/ and $found) { print @cache; $found=0; } }

    A tip on general debugging: If something doesn't work, print out important variables and watch what your script is doing and find the first place where it does something different than it should. See the comment line for an example, with that you can see if the cache works or not

      Thanks Jethro, that is exactly what I wanted. Also thanks choroba, you're right about where my logic failed, I was focusing to hard on it to see what the issue. And thanks to everyone else who replied too, I know XML parsing is the better way to go when possible and it looks much easier to use.
Re^3: Match on line, read backwards to opening xml tag then forward to closing tag
by furry_marmot (Pilgrim) on Nov 16, 2011 at 20:08 UTC

    I want it to loop through and store each <DataRecord>...</DataRecord>. Then, if it matches on 4 digits followed by a forward slash I want it to output the whole <DataRecord> to screen, not just the matched lines from second filter.

    Okay, try this. You were very close, but it seems a bit more complicated than necessary. Also, is the cache just to hold the matches until you print them? If so, you could eliminate that step entirely.

    open(FILE, "< $FILE") or die "ERROR: $!"; my $data; { local $/=undef; $data=<FILE> } while ( $data =~ m{<DataRecord>(.+?)</DataRecord>}sg ) { my $rec = $1; if ( $rec =~ m{\d+/\d+} ) { push @cache, $rec; print "$rec"; } } close (FILE);
    Prints:
    $ test.pl <Data>123456</Data> <Data>123456</Data> <Data2>123456</Data2> <Data>1234/3456</Data> <Data>123456</Data> <Data>1234/3456</Data> <Data3>123456</Data3> <Data>123456</Data>

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://938021]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2024-03-30 05:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found