Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Extract Block Of Text From Log

by ImJustAFriend (Scribe)
on Nov 20, 2018 at 22:02 UTC ( #1226079=perlquestion: print w/replies, xml ) Need Help??

ImJustAFriend has asked for the wisdom of the Perl Monks concerning the following question:

Greetings again. I have tried I don't know how many different things to get this block extracted from this log file. Here is a sample of the log showing 2 records:
<13>Nov 13 17:27:25 OamCOMM[12260]: TIMESTAMP=Tue Nov 13 17:27:25 2018 MSGCLS=OAMOPE Title=OAM Create OPERATION Severity=Inform message={username:xxxx@::xxxx:xxx.xxx.xxx.x; causeDISTINGUISH_NAME=epsProfileId=xxxxx,pSProfileDataId=x,subscripti +onProfileDataId=x,managedElementId=xxxx USER_LABEL= ******************* parameters after change =******************* epsMaxRequestedBandwidthUL=xxxxxxxxx epsMaxRequestedBandwidthDL=xxxxxxxxx epsQosAllocRetPrioVulnerabilit=xxxxxxx epsQosAllocRetPrioCapability=xxxxxxx epsQosAllocRetPrioLevel=xxxxxxxxx epsQosClassId=xxxxxx sessionTimeout=xxxxxxx idleTimeout=xxxx epsChargingCharacteristics=xxxxxx epsVplmnDynamicAddrAllowed=xxxxxxxxxx epsGwAllocType=xxxxxx epsPdnType=xxxxxx epsAccessPointName=xxxxxx.xxxxxxx ******************* NRG location =******************* NRG_KEY=x } Message Id=10011 END OF REPORT <13>Nov 13 17:27:25 OamCOMM[12260]: TIMESTAMP=Tue Nov 13 17:27:25 2018 MSGCLS=OAMOPE Title=OAM Create OPERATION Severity=Inform message={username:xxxx@::xxxx:xxx.xxx.xxx.x; causeDISTINGUISH_NAME=packetDataProtocolProfileId=xxxxx,gPacketProtoc +olProfileDataId=x,pSProfileDataId=x,subscriptionProfileDataId=x,manag +edElementId=xxxx USER_LABEL= ******************* parameters after change =******************* meanThroughPutClass=xxxxxx peakThroughPutClass=xxxxx reliabilityClass=xxxx delayClass=xxxx precedenceClass=xxxxx hSDPAguaranteeFlag=x hSUPAguaranteeFlag=x hSUPAflag=x hSDPAflag=x downlinkGuaranteedBR=xxx uplinkGuaranteedBR=xx trafficHandlingPriority=xxxxxxx transferDelay=xx ratioSduError=xxxx residualBER=xxxx downlinkMaxBR=xxx uplinkMaxBR=xx maxSduSize=xxx deliveryOfErroneousSdu=xx deliveryOrder=xx trafficClass=xxxxxxxx priorityOfUmtsBearer=xxxxxxx vplmnAddressAllowedFlag=x pdpChargingCharacteristic=xxxxxx accessPointName=xxxxxx.xxxxxxx pdpType=xxxx ******************* NRG location =******************* NRG_KEY=x } Message Id=10011 END OF REPORT
There are MANY entries in the log, all formatted like above - more or less data between "parameters after change" and before "NRG location", sometimes other blocks of data that are irrelevant to what I'm trying to accomplish here. So far, I have tried using the flip-flop operator in various combinations, regex in various combinations... I don't know what else to try. This file is read in a while loop. My goal is to basically extract the block of lines between "parameters after change" and "NRG location". My current test code looks like this:
#!/bin/perl use strict; use warnings; use Data::Dumper; my $file = "t.txt"; local $/ = "END OF REPORT\n"; open IN, "<", $file or die "IN: $!\n"; while (<IN>) { print if /\*\sparameters after change\s\=/ .. /\*\sNRG locatio +n\s\=/; }
This really shouldn't be this hard, which tells me I must be missing something simple. Any thoughts?

Replies are listed 'Best First'.
Re: Extract Block Of Text From Log
by davido (Cardinal) on Nov 20, 2018 at 22:29 UTC

    You're almost there. If you intend to read record by record, but then use the flip-flop operator as though it's operating line by line, you'll have to modify your code's flow a little:

    local $/ = 'END OF REPORT'; while (<DATA>) { foreach my $line (split /\n/) { print "$line\n" if $line =~ /\*\sparameters after change\s=\*/ + .. $line =~ /\*\sNRG location\s=/; } }

    But it might be easier to keep the existing "record" file read, and then treat the record as a multi-line string (since it is):

    local $/ = 'END OF REPORT'; while (<DATA>) { print "$1\n" if m/(^\*+\sparameters after change\s=\*+\n.+?^\*+\sN +RG location\s=\*+$)/ms; }

    Another strategy could be going back to using the flip-flop operator, but reading your file line by line, though doing so means you'll have to take care to check that you haven't included a record termination line within the truthy state of the flip-flops, as that would indicate that your record ended prematurely or was otherwise poorly formed.


    Dave

Re: Extract Block Of Text From Log
by GrandFather (Saint) on Nov 20, 2018 at 22:21 UTC

    Your while loop is handling blocks, not lines in each block. One fix is:

    #!/bin/perl use strict; use warnings; local $/ = "END OF REPORT\n"; while (<DATA>) { my @lines = split /\n/; for (@lines) { print if /\*\sparameters after change\s\=/ .. /\*\sNRG locatio +n\s\=/; } } __DATA__ <13>Nov 13 17:27:25 OamCOMM[12260]: TIMESTAMP=Tue Nov 13 17:27:25 2018 MSGCLS=OAMOPE Title=OAM Create OPERATION Severity=Inform ...

    or just:

    #!/bin/perl use strict; use warnings; while (<DATA>) { print if /\*\sparameters after change\s\=/ .. /\*\sNRG location\s\ +=/; } __DATA__ <13>Nov 13 17:27:25 OamCOMM[12260]: TIMESTAMP=Tue Nov 13 17:27:25 2018 ...
    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: Extract Block Of Text From Log
by LanX (Cardinal) on Nov 20, 2018 at 22:19 UTC
    > I must be missing something simple

    My guess:

    the flip/flop op is nice if you are reading single lines, but you are swalling whole records, since you changed the $/ separator.

    You either need to apply a multiline-regex or you re-open the record-string from IN to read line by line in order to apply the flip/flop. NB: It's possible to open from a scalar-ref \$chunk .

    Though most people would probably nest flip-flops:

    • One to define the record
    • a nested one to read the part you wanted.

    TIMTOWTDI :)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

    ) doh, GrandFather's solution to simply split on "\n" is indeed easier! :)

      > Though most people would probably nest flip-flops:

      Here a demonstration:

      use strict; use warnings; my $count=0; while (<DATA>) { if ( /^<13>/ .. /^END OF REPORT$/ ){ if (/\*\sparameters after change\s\=/ .. /\*\sNRG location\s\= +/) { print "$count: $_"; } } else { $count++ } } __DATA__ <13>Nov 13 17:27:25 OamCOMM[12260]: TIMESTAMP=Tue Nov 13 17:27:25 2018 MSGCLS=OAMOPE Title=OAM Create OPERATION Severity=Inform message={username:xxxx@::xxxx:xxx.xxx.xxx.x; causeDISTINGUISH_NAME=epsProfileId=xxxxx,pSProfileDataId=x,subscripti +onProfileD +ataId=x,managedElementId=xxxx USER_LABEL= ******************* parameters after change =******************* epsMaxRequestedBandwidthUL=xxxxxxxxx epsMaxRequestedBandwidthDL=xxxxxxxxx epsQosAllocRetPrioVulnerabilit=xxxxxxx epsQosAllocRetPrioCapability=xxxxxxx epsQosAllocRetPrioLevel=xxxxxxxxx epsQosClassId=xxxxxx sessionTimeout=xxxxxxx idleTimeout=xxxx epsChargingCharacteristics=xxxxxx epsVplmnDynamicAddrAllowed=xxxxxxxxxx epsGwAllocType=xxxxxx epsPdnType=xxxxxx epsAccessPointName=xxxxxx.xxxxxxx ******************* NRG location =******************* NRG_KEY=x } Message Id=10011 END OF REPORT <13>Nov 13 17:27:25 OamCOMM[12260]: TIMESTAMP=Tue Nov 13 17:27:25 2018 MSGCLS=OAMOPE Title=OAM Create OPERATION Severity=Inform message={username:xxxx@::xxxx:xxx.xxx.xxx.x; causeDISTINGUISH_NAME=packetDataProtocolProfileId=xxxxx,gPacketProtoc +olProfileD +ataId=x,pSProfileDataId=x,subscriptionProfileDataId=x,managedElementI +d=xxxx USER_LABEL= ******************* parameters after change =******************* meanThroughPutClass=xxxxxx peakThroughPutClass=xxxxx reliabilityClass=xxxx delayClass=xxxx precedenceClass=xxxxx hSDPAguaranteeFlag=x hSUPAguaranteeFlag=x hSUPAflag=x hSDPAflag=x downlinkGuaranteedBR=xxx uplinkGuaranteedBR=xx trafficHandlingPriority=xxxxxxx transferDelay=xx ratioSduError=xxxx residualBER=xxxx downlinkMaxBR=xxx uplinkMaxBR=xx maxSduSize=xxx deliveryOfErroneousSdu=xx deliveryOrder=xx trafficClass=xxxxxxxx priorityOfUmtsBearer=xxxxxxx vplmnAddressAllowedFlag=x pdpChargingCharacteristic=xxxxxx accessPointName=xxxxxx.xxxxxxx pdpType=xxxx ******************* NRG location =******************* NRG_KEY=x } Message Id=10011 END OF REPORT

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Re: Extract Block Of Text From Log
by cavac (Curate) on Nov 21, 2018 at 06:58 UTC

    Well, i'd use a classic state machine for this problem, then we can work line-by-line, without having to load the whole file into memory.

    #!/usr/bin/env perl use strict; use warnings; use diagnostics; my @blocklines; my $inblock = 0; open(my $IFH, '<', 'blockextract.txt') or die($!); while((my $line = <$IFH>)) { chomp $line; if($line =~ /parameters\ after\ change/) { # Start of a block we want to read $inblock = 1; next; } # Skip handling line unless we are in a block next unless($inblock); if($line =~ /NRG\ location/) { # Block ends here # Do whatever you want to do to the block lines # stored in @blocklines. # I'm just dumping them to STDOUT print "*** START ***\n"; print join("\n", @blocklines), "\n"; print "*** END ***\n\n"; # Clean up block @blocklines = (); $inblock = 0; next; } # just some line within the interesting block # remember it for later in @blocklines push @blocklines, $line; next; } close $IFH; exit(0);

    That way, we can even modify the program very slightly to make it work via pipes, working live on a stream of data generated by some other program. We just remove the open and close calls and change the while loop a bit:

    while((my $line = <>)) {

    Then we can use the program on an arbitrary stream of this kind of data, and it extracts each block as soon as it is pushed into the programs STDIN:

    cat blockextract.txt | perl blockextract.pl

    And the only thing the state machine has to hold in memory is the block it is currently working on and a single state variable...

    perl -e 'use MIME::Base64; print decode_base64("4pmsIE5ldmVyIGdvbm5hIGdpdmUgeW91IHVwCiAgTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3duLi4uIOKZqwo=");'
      > Well, i'd use a classic state machine for this problem, then we can work line-by-line, without having to load the whole file into memory.

      Sorry for nitpicking, but ... :)

      Flip/Flop solutions are state machines and do it in one pass.

      I even demonstrated that you can safely nest them, hence the technology is "scalable".

      The advantage of your solution is that it's easily portable to other languages, i.o.W. it ignores Perl's possibilities. ;-)

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

        I fully support your argument. I just like to write easy-to-read easy-to-port examples.

        When you write stuff that might still have to be supported in a couple of decades, then making it easy to read and understand is much more important than making it "nice and fast and only understandable by perlmonks" ;-)

        perl -e 'use MIME::Base64; print decode_base64("4pmsIE5ldmVyIGdvbm5hIGdpdmUgeW91IHVwCiAgTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3duLi4uIOKZqwo=");'
Re: Extract Block Of Text From Log
by kcott (Bishop) on Nov 21, 2018 at 08:41 UTC

    G'day ImJustAFriend,

    Looking at the code you've posted, I'm wondering if you were trying to adapt the solution I provided to you in "Re: Multiple Line Regex Not Working". If so, it seems there may be some part(s) you didn't understand. Please ask if this is still the case.

    I put your input data in a file this time and wrote the following script. Compare it with the last one.

    #!/usr/bin/env perl use strict; use warnings; use autodie; my $file = 'pm_1226079_input.txt'; my $SEP = '-' x 40 . "\n"; open my $fh, '<', $file; { local $/ = "\n******************* NRG location"; while (<$fh>) { chomp; /parameters after change .+?\n(.*)\z/ms && print "$SEP$1\n"; } }

    Notes:

    • Use a lexical filehandle. See open.
    • Always localise changes to special variables in the smallest scope possible. These are global variables and you don't want these changes propagated throughout your entire script. See also perlvar (and its multiple uses of local throughout).
    • The autodie pragma takes care of all of the "... or die "...";" parts of your script. It saves you having to write them: they're easy to forget; and easy to get wrong (such as leaving out the name of the file that failed to open :-)

    Here's the output from that script:

    ---------------------------------------- epsMaxRequestedBandwidthUL=xxxxxxxxx epsMaxRequestedBandwidthDL=xxxxxxxxx epsQosAllocRetPrioVulnerabilit=xxxxxxx epsQosAllocRetPrioCapability=xxxxxxx epsQosAllocRetPrioLevel=xxxxxxxxx epsQosClassId=xxxxxx sessionTimeout=xxxxxxx idleTimeout=xxxx epsChargingCharacteristics=xxxxxx epsVplmnDynamicAddrAllowed=xxxxxxxxxx epsGwAllocType=xxxxxx epsPdnType=xxxxxx epsAccessPointName=xxxxxx.xxxxxxx ---------------------------------------- meanThroughPutClass=xxxxxx peakThroughPutClass=xxxxx reliabilityClass=xxxx delayClass=xxxx precedenceClass=xxxxx hSDPAguaranteeFlag=x hSUPAguaranteeFlag=x hSUPAflag=x hSDPAflag=x downlinkGuaranteedBR=xxx uplinkGuaranteedBR=xx trafficHandlingPriority=xxxxxxx transferDelay=xx ratioSduError=xxxx residualBER=xxxx downlinkMaxBR=xxx uplinkMaxBR=xx maxSduSize=xxx deliveryOfErroneousSdu=xx deliveryOrder=xx trafficClass=xxxxxxxx priorityOfUmtsBearer=xxxxxxx vplmnAddressAllowedFlag=x pdpChargingCharacteristic=xxxxxx accessPointName=xxxxxx.xxxxxxx pdpType=xxxx

    — Ken

Re: Extract Block Of Text From Log
by tybalt89 (Prior) on Nov 21, 2018 at 08:30 UTC
    #!/usr/bin/perl # https://perlmonks.org/?node_id=1226079 use strict; use warnings; my $file = "t.txt"; local $/ = "NRG location"; open IN, "<", $file or die "IN: $!\n"; print /.*parameters after change\V*\n(.*\n)\V*NRG location/s while <IN +>;
Re: Extract Block Of Text From Log
by ImJustAFriend (Scribe) on Nov 21, 2018 at 16:53 UTC
    Thank you all for the help I got this all squared away now, and learned a few new things in the process!! :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1226079]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2021-04-19 14:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?