Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Split files based on regexp

by TonySoprano (Initiate)
on May 17, 2012 at 10:51 UTC ( [id://971037]=perlquestion: print w/replies, xml ) Need Help??

TonySoprano has asked for the wisdom of the Perl Monks concerning the following question:

Hello there.

I am reading one big text file. It contains several lines specifing events. I will use the actual values. Its a combat log from a game I am parsing.

Here is what I want to do, read the whole file, and find the bits where the combat starts, this is working, I match agains /EnterCombat/.

foreach my $line (@AR_WHOLE_FILE) { if ( $line =~ /EnterCombat/ ) { ( $timestamp ) = split ($line); print $timestamp . "\n"; } }
I want to keep parsing the data after the match, untill I find ExitCombat. I want to find all the targets in between EnterCombat and ExitCombat. This is so I can find the 'boss' fights in the game.

I need some hints on how to approach this, I dont know how to continue to parse when I am already parsing by line, so how do I advance while I am already in a foreach loop..

Replies are listed 'Best First'.
Re: Split files based on regexp
by RichardK (Parson) on May 17, 2012 at 11:20 UTC
Re: Split files based on regexp
by druthb (Beadle) on May 17, 2012 at 11:26 UTC

    Here's some meta-code that should show the logic of how I'd approach this:

    my $in_combat = 0; #open the file for input LINE: while (my $line=readline($input_file)) { if (# line contains start-combat flag) { $in_combat = 1; } next LINE if (!$in_combat); # do stuff with lines *during* combat--beginning and end will be he +re, too! if (#line contains end-combat flag) { $in_combat = 0; } } # close files, etcetc

    TIMTOWTDI, of course.

    D Ruth Bavousett

      druthb:

      You may want to experiment with the flip/flop operator "..". It basically does the work of your $in_combat logic. It takes a little while to get used to, but it certainly makes the code a bit simpler once you're used to it:

      $ cat krunk.pl #!/usr/bin/perl # # Demo for flip-flop operator # use 5.14.0; use warnings; use autodie; while (<DATA>) { if (/start/ .. /stop/) { # in combat! print "COMBAT: $_"; } } __DATA__ soon the combat will begin start combat! Biff! Pow! Bam! (Looks like Batman!) stop combat fight had three superlatives $ perl krunk.pl COMBAT: start combat! COMBAT: Biff! Pow! Bam! COMBAT: (Looks like Batman!) COMBAT: stop combat

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

        Brilliant! I had not seen that operator before. Thanks oodles, roboticus!

        D Ruth Bavousett
Re: Split files based on regexp
by ww (Archbishop) on May 17, 2012 at 14:15 UTC

    Yet another approach (TIMTOWTDI): set the input record separator to let you collect the log entries, stanza by stanza. IOW, look for the end of a log entry, "stop combat" and slurp what precedes that, repeatedly

    Borrowing from and extending roboticus' excellent sample data:

    #!/usr/bin/perl use 5.014; # 971051 $/ = "stop combat"; my @array; my $stanza; for ( <DATA> ) { push @array, $_; } for $stanza ( @array ) { $stanza =~ s/\n/ /g; if ( $stanza =~ /.*?(?=start combat!)(.*?)(?=stop combat)/s ) { my $values = $1; $values =~ s/start combat!//; say "Log Entry: $values"; } else { say "Whoops."; } } __DATA__ soon the combat will begin start combat! Biff! Pow! Bam! (Looks like Batman!) stop combat fight had three superlatives start combat! Smash Bang Boom! stop combat more garbage... here (even tho log is not described as having non-value inclusions. start combat! foo stop combat

    Output:

    Log Entry: Biff! Pow! Bam! (Looks like Batman!) Log Entry: Smash Bang Boom! Log Entry: foo
Re: Split files based on regexp
by Anonymous Monk on May 17, 2012 at 11:12 UTC

    If it is a big file, read it one line at a time instead of slurping into an array.

    Look up Finite-State Machine (FSM) on Google. Your app toggles between SEARCHING_FOR_ENTERCOMBAT and FIND_THE_BOSS states.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://971037]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-19 21:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found