Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re^3: parsing text files continued

by AltBlue (Chaplain)
on Jul 21, 2008 at 18:51 UTC ( [id://699110]=note: print w/replies, xml ) Need Help??

in reply to Re^2: parsing text files continued
in thread parsing text files continued

Hm, at first glance your reply sounds kinda messy, so, I guess providing some more details could help, still trying to avoid doing your homework at the same time (you know, the rules) ;-)

First, let's drop some lines from your second record, so we could see what happens when fields are missing:

1/3/2007 12:20:01 AM
1/3/2007 12:49:22 AM

Now let's rewrite my previous HoH solution and move the "key" (time stamp) *inside* the hash, using an AoH:

$ perl -MData::Dumper -F, -lane '@F % 2 and push @D, {q{Stamp},@F} or +$D[-1] = { %{$D[-1]}, @F } }{ print Dumper @D' input.txt
$VAR1 = {
          'Stamp' => '1/3/2007 12:20:01 AM',
          'ClientAdd' => '0.784989',
          'CMALoad' => '1.859894',
          'SearchDelete' => '2.066482',
          'CMASave' => '3.249620',
          'CMADelete' => '0.450952',
          'Login' => '12.588309',
          'ClientDelete' => '0.305768',
          'SearchCount' => '20:0.196329',
          'SearchDetails' => '6.873061',
          'Logout' => '0.823402',
          'SearchResults' => '7.418672',
          'SearchSave' => '3.616305',
          'SearchLoad' => '9.432586'
$VAR2 = {
          'Stamp' => '1/3/2007 12:49:22 AM',
          'Login' => '10.958312',
          'SearchCount' => '41:0.483233'

And now, let's print the fields we need from this data as CSV.

$ perl -F, -lane '@F % 2 and push @D, {q{Stamp},@F} or $D[-1] = { %{$D +[-1]}, @F } }{ $,=","; print @{$_}{qw(Stamp Login SearchResults Searc +hLoad SearchCount Logout)} for @D' input.txt
1/3/2007 12:20:01 AM,12.588309,7.418672,9.432586,20:0.196329,0.823402
1/3/2007 12:49:22 AM,10.958312,,,41:0.483233,

As you may notice, the values that are missing from any records generate empty fields, which should be just fine for CSV

Obviously, this lazy toy will trigger "undefined" warnings, but I'm sure you'll know how to handle them in your real/production code. ;-)

My apologies if this code looked too messy for you, I'll try adding some spoilers...

  1. run perl in "autosplit mode" (using "comma" as field separator) iterating over the lines contained in the "input.txt" file;
  2. on each line run a sloppy test (@F % 2) to determine if the current line contains one or two fields;
  3. if the line contains one field then this is a "new" record (a time stamp marker), so use it to initialize a new hash (push @D, {q{Stamp},@F});
  4. if the line contains two fields than use them to populate the latest record's hash ($D[-1] = { %{$D[-1]}, @F });
  5. in the end (}{) modify the "output field separator" ($,) to "comma", preparing to print "comma separated values" and
  6. iterate through the gathered data (for @D) ...
  7. ... printing only the fields we need from each record (print @{$_}{qw(Stamp Login SearchResults SearchLoad SearchCount Logout)})

Finally, I have to warn you again: DON'T use this code in production, this is just a proof of concept :)



Replies are listed 'Best First'.
Re^4: parsing text files continued
by grashoper (Monk) on Jul 31, 2008 at 19:55 UTC
    I see you ran this from the command line didn't you, I tried as a seperate script initially I just tried typing it all in now I am down to one error unable to find string terminator but I am not sure where the problem is, I am really impressed with your example though, I wish I had as deep an understanding and command of the language as you possess. :) my error is can't find string terminator "'" anywhere before eof at -e line 1 Doh single versus double quotes.. ok now it does run but all it outputs is 6 followed by 9 comma's then 7 followed by 9 comma's I don't understand why its not showing all of it.

      Ok, I guess you are in a Microsoft environment. Let's test using Strawberry Perl under Microsoft Windows XP SP3:

      C:\> perl -F, -lane "@F % 2 and push @D, {q{Stamp},@F} or $D[-1] = { % +{$D[-1] }, @F } }{ $,=q{,}; print @{$_}{qw(Stamp Login SearchResults +SearchLoad SearchCount Logout)} for @D" input.txt 1/3/2007 12:20:01 AM,12.588309,7.418672,9.432586,20:0.196329,0.823402 1/3/2007 12:49:22 AM,10.958312,,,41:0.483233,

      Seems to work as expected, so I think you made some typo when trying to comply with Gate's rules.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://699110]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (3)
As of 2024-07-20 00:36 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.