http://www.perlmonks.org?node_id=1049377


in reply to Re: Epoch based parser
in thread Epoch based parser

Hi Ken,

Finally I made some changes and then also added code to read the json string from a file and parse it. However, my file is going to contain the same kind of json output appended to it continuously over time.

When I try to decode_json the entire file, it doesnt allow, as I believe decode_json looks for the braces to see if it finished. How can I parse multiline json from a file?

Heres the code I have now:

#!/usr/bin/env perl -l use strict; use warnings; use Time::Local; use JSON; use Time::Piece; use Math::Round; use File::Read; use Text::Table; my $DestIP="192.168.127.111"; my $jsonc = read_file ('C:\Documents and Settings\SonicWALL User\My Do +cuments\Scripts\logs\json.txt'); my $data = decode_json join '' => map { chomp; $_ } ($jsonc); #my $data = decode_json $jsonc; my $tb = Text::Table->new( "Date", "Start Time", "End Time", "Time Taken" , "Egress: Conv +eyed", "Egress : Sent" , "Ingress: Conveyed" , "Ingress: Sent"); for (@{$data->{aaData}}) { my $difftime=(@$_[1] - @$_[0]); my $destip=@$_[6]; my @egress=split(',', @$_[8]); my @ingress=split(',', @$_[9]); my $econveyed=$egress[0]/1000000; my $esent=$egress[1]/1000000; my $iconveyed=$ingress[0]/1000; my $isent=$ingress[1]/1000; my $startTime=scalar(localtime(@$_[0])); my $endTime=scalar(localtime(@$_[1])); #Filter only for required traffic and eliminate any traffic less than + 1MB if ( $destip eq $DestIP && $egress[0]>1000000) { #print "@$_"; my $date = $startTime->mdy; my $stime1 = $startTime->hms; my $etime1 =$endTime->hms; $tb->load( [ $date, $stime1 , $etime1 ,convert_time($difftime) , nearest(.01,$e +conveyed) , nearest(.01,$esent), $iconveyed , $isent ], ); } } print $tb;
The file input would be something like:
{"DisplayRecords":"12","Records":"12","sColumns":"startTime,endTime, remoteNode,srcIP,srcPort,destIP,destPort,egress,ingress","aaData":[["1 +375976271" ,"1375976430","LAN","D0:05:FE","172.20.30.2",1093,"172.20.28.2",1330," +1034,348"]]} {"DisplayRecords":"12","Records":"12","sColumns":"startTime,endTime, remoteNode,srcIP,srcPort,destIP,destPort,egress,ingress","aaData":[["1 +375976271" ,"1375976430","LAN","D0:05:FE","172.20.30.2",1093,"172.20.28.2",1330," +1034,348"]]} .... and so on

I couldnt figure this part out. Thanks for your help so far.

Replies are listed 'Best First'.
Re^3: Epoch based parser
by kcott (Archbishop) on Aug 14, 2013 at 06:48 UTC
    "When I try to decode_json the entire file, it doesnt allow, as I believe decode_json looks for the braces to see if it finished. How can I parse multiline json from a file?"

    With the sample input you posted, you can just read records as being delimited with "}\n" by setting the input record separator: $/ (see perlvar: Variables related to filehandles). You can then remove the embedded "\n" and "\n+" strings with s/\n[+]?//gm (see perlre if you're unfamiliar with that). Here's a modification of my original code that does this. [Note: you haven't supplied data that matches your original 12:15 or subsequent any:20 — I've made an additional change in order to get some output.]

    #!/usr/bin/env perl -l use strict; use warnings; use JSON; use Time::Piece; my $wanted_minute = 40; { local $/ = "}\n"; while (<DATA>) { s/\n[+]?//gm; my $data = decode_json $_; for (@{$data->{aaData}}) { print "@$_" if is_wanted_time(@$_[0,1]); } } } sub is_wanted_time { for (@_) { my $t = gmtime $_; return 1 if $t->min == $wanted_minute; } return 0; } __DATA__ {"DisplayRecords":"12","Records":"12","sColumns":"startTime,endTime, remoteNode,srcIP,srcPort,destIP,destPort,egress,ingress","aaData":[["1 +375976271" ,"1375976430","LAN","D0:05:FE","172.20.30.2",1093,"172.20.28.2",1330," +1034,348"]]} {"DisplayRecords":"12","Records":"12","sColumns":"startTime,endTime, remoteNode,srcIP,srcPort,destIP,destPort,egress,ingress","aaData":[["1 +375976271" ,"1375976430","LAN","D0:05:FE","172.20.30.2",1093,"172.20.28.2",1330," +1034,348"]]}

    Output:

    $ pm_epoch_from_json_2.pl 1375976271 1375976430 LAN D0:05:FE 172.20.30.2 1093 172.20.28.2 1330 1 +034,348 1375976271 1375976430 LAN D0:05:FE 172.20.30.2 1093 172.20.28.2 1330 1 +034,348
    "The file input would be something like: ..."

    I doubt it!

    That looks like you've just pasted it from the web page including the leading '+'s indicating text wrapping at 70 characters. Assuming that's right, you only need "s/\n//gm" for the regex.

    -- Ken

      Hi Ken,

      Thanks for a quick response.

      Note: you haven't supplied data that matches your original 12:15 or subsequent any:20 — I've made an additional change in order to get some output.

      indeed I have two scripts, one that does time based as you had given me, and the other one that takes in file as an input and then parses json. This script (in the above post) takes input (as json output from a webpage logged to the file) and parses it.

      I want to attach the file but I am not sure how I can do that. I tried it as above but it gives me the foll. error:

      use strict; use warnings; use Time::Local; use JSON; use File::Read; my $jsonc = read_file ('H:\Work\perl\latest\Scripts\logs\json.txt'); { local $/ = "}\n"; while ($jsonc) { s/\n[+]?//gm; my $data = decode_json $_; for (@{$data->{aaData}}) { #my print/parse function } } }
      What does having while(<DATA>) do? Does it read till end of the _DATA_ ? if so is it correct that in my above case I do a while($jsonc)? I understood the substitution part, thanks a lot. However, in my case when I parse my file I get the below error:
      c:\perl>perl json.pl Use of uninitialized value $_ in substitution (s///) at get.pl line 14 +. malformed JSON string, neither array, object, number, string or atom, +at character offset 0 (before "(end of string )") at get.pl line 15.
      Content of json.txt looks pretty much same as what you have mentioned above in the _DATA_ section.And I can clearly see that the end of first json block has a "]]}\n". So its supposed to work as you mentioned.

        When I first read your latest post, the code looked like this:

        my $jsonc = read_file ('H:\Work\perl\latest\Scripts\logs\json.txt'); local $/ = "}"; while ($jsonc) { s/\n[+]?//gm; my $data = decode_json $jsonc; for (@{$data->{aaData}}) { #my print function }

        I wrote some more example code, dug up a few more references to help you out and started to respond. Upon doing so, I find you've changed that code to this:

        my $jsonc = read_file ('H:\Work\perl\latest\Scripts\logs\json.txt'); { local $/ = "}\n"; while ($jsonc) { s/\n[+]?//gm; my $data = decode_json $_; for (@{$data->{aaData}}) { #my print/parse function } } }

        If you make changes then clearly indicate what you've changed! See "How do I change/delete my post?".

        Here's the new example code I wrote:

        #!/usr/bin/env perl -l use strict; use warnings; use JSON; use Time::Piece; my $json_file = 'json.txt'; my $wanted_minute = 40; open my $json_fh, '<', $json_file or die "Can't read '$json_file': $!" +; { local $/ = "}\n"; while (<$json_fh>) { s/\n[+]?//gm; my $data = decode_json $_; for (@{$data->{aaData}}) { print "@$_" if is_wanted_time(@$_[0,1]); } } } close $json_fh; sub is_wanted_time { for (@_) { my $t = gmtime $_; return 1 if $t->min == $wanted_minute; } return 0; }

        With this input:

        $ cat json.txt {"DisplayRecords":"12","Records":"12","sColumns":"startTime,endTime, remoteNode,srcIP,srcPort,destIP,destPort,egress,ingress","aaData":[["1 +375976271" ,"1375976430","LAN","D0:05:FE","172.20.30.2",1093,"172.20.28.2",1330," +1034,348"]]} {"DisplayRecords":"12","Records":"12","sColumns":"startTime,endTime, remoteNode,srcIP,srcPort,destIP,destPort,egress,ingress","aaData":[["1 +375976271" ,"1375976430","LAN","D0:05:FE","172.20.30.2",1093,"172.20.28.2",1330," +1034,348"]]}

        Here's the output (it's the same as from the last example, i.e. pm_epoch_from_json_2.pl):

        $ pm_epoch_from_json_3.pl 1375976271 1375976430 LAN D0:05:FE 172.20.30.2 1093 172.20.28.2 1330 1 +034,348 1375976271 1375976430 LAN D0:05:FE 172.20.30.2 1093 172.20.28.2 1330 1 +034,348

        Here's some more references:

        -- Ken