Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Parsing a text file

by nimajneb (Initiate)
on Apr 16, 2008 at 14:05 UTC ( [id://680803]=perlquestion: print w/replies, xml ) Need Help??

nimajneb has asked for the wisdom of the Perl Monks concerning the following question:

I have a plain text file containing entries such as these:
01/03/2008 angie 53 01/03/2008 kristen 95 01/03/2008 MaryT 123 01/03/2008 Nicole 27 01/03/2008 sylvarius 33 01/03/2008 yasmin 67 02/03/2008 angie 2 02/03/2008 kristen 121 02/03/2008 MaryT 81 02/03/2008 Nicole 47 02/03/2008 sylvarius 15 02/03/2008 Tanya 22 02/03/2008 yasmin 60 03/03/2008 angie 3 03/03/2008 donna 78 03/03/2008 Kimberly 9 03/03/2008 kristen 257 03/03/2008 MaryT 181
..and so on and so forth, all the way through to the end of the month. I'm completely new to perl and been trying to come up with a script to add all the numbers up and assosciate them with the corresponding name, the date field at this point is irrelevant. Like I said I don't really know what I'm doing at all so point, laugh and taunt away, I probably deserve it, this is as far as I have gotten so far:
#!/usr/bin/perl # invoice.pl $file = 'msgcount.txt'; open(LOG, $file); #Store the file in an array, split by newlines while (<LOG>){ $string .= $_; } @array = split(/\n/, $string); #Loop through the array foreach (@array) { #Split the lines up into words for easy referencing @tmpwords = split(/ /); #If the operator name is not already in the array, add it. unless (grep /@tmpwords[1]/, @operators) { push (@operators, @tmpwords[1]); } foreach (@operators) { if (@tmpwords[1] eq $_) { $tmptotal = $tmptotal+@tmpwords[2]; } push (@totals, $tmptotal); } } print "@totals[1]\n";
I was thinking of having the names in one array then their totals in another, in the same order as they are in the first array (a ridiculous way to do it, I'm sure..) then somehow print them all together (Total for <NAME> is <TOTAL>..) The print line at the end was just a bit of a debugging test, but that's basically where my brain has frozen over and I just have no idea what I'm doing. Any pointers would be greatly appreciated...

Replies are listed 'Best First'.
Re: Parsing a text file
by kyle (Abbot) on Apr 16, 2008 at 14:27 UTC

    I would first encourage you to Use strict and warnings.

    A hash would work well for this application. If you're using strict, you can declare it like so:

    my %total_for;

    Then you can access the elements by operator name.

    my $operator = 'Nicole'; $total_for{$operator} = 123; # total for Nicole is 123 $total_for{$operator} += 45; # increase Nicole's total by 45.

    You can loop over the names using keys like so:

    foreach my $operator ( sort keys %total_for ) { print "Total for $operator: $total_for{$operator}\n"; }

    I also used sort here because the keys come out in no particular order.

    A few other notes:

    • Check that your open succeeded.
      open(LOG, $file) or die "Can't read '$file': $!";
    • Rather than read the whole file before you begin processing, process it a line at a time.
      while (<LOG>) { @tmpwords = split; # etc. }
    • If you're reading a line at a time like that, you'll want to chomp your incoming data. (Your code as-is takes care of this with split.)
    • In general, it's a good idea to name your loop variables instead of using $_ everywhere.
      foreach my $blah ( @array ) { # use $blah instead of $_ for things }
    • When you interpolate a variable into a regular expression as in "grep /@tmpwords[1]/, @operators", any regexp metacharacters get interpreted as metacharacters. You can avoid this by using \Q like so: /\Q@tmpwords[1]\E/. In your case, this doesn't make a difference because there aren't any metacharacters in your data file. One problem you might have, however, is that an operator named "Jen" would "match" another operator named "Jennifer" because the pattern is not anchored at the ends. In general, if you're looking for an exact match, you should say something like "grep $_ eq @tmpwords[1], @operators".

    Update: Since others have posted full working solutions now, I might as well also (I was trying to treat this as a student).

    use strict; use warnings; my $file = 'msgcount.txt'; open my $fh, '<', $file or die "Can't read '$file': $!"; my %total_for; while ( <$fh> ) { my $line = $_; chomp; s{ \A \d\d / \d\d / \d{4} \s+ }{}xms or die "line does not match: $line"; my ( $name, $numb ) = m{ \A ( .+ ) \s+ ( \d+ ) \s* \z }xms; if ( ! $name ) { die "line does not match: $line"; } $total_for{ $name } += $numb; } close $fh or die "Failed to close: $!"; foreach my $operator ( sort keys %total_for ) { print "Total for '$operator': $total_for{$operator}\n"; }

    This accounts for the suggestion from mr_mischief (in case names are not all non-spaces). It will die if it hits a line outside the format it expects. I haven't run it.

Re: Parsing a text file
by moritz (Cardinal) on Apr 16, 2008 at 14:29 UTC
    The trick is to use a hash:
    use strict; use warnings; my $file = 'msgcount.txt'; open(LOG, '<', $file) or die "Can't read '$file': $!"; my %sum; while (<LOG>){ chomp; # remove newline my ($date, $name, $number) = split m/ /, $_; $sum{$name} += $number; } # now print the result: while (my ($name, $total) = each %sum){ print "$name\t$total\n"; }
Re: Parsing a text file
by ww (Archbishop) on Apr 16, 2008 at 14:27 UTC
    No taunts. No laughter.

    But, think "hash" for your data.

    Yes, this can be done with arrays or multiple arrays, but for your purpose here (and for your future code), understanding hashes and the many ways you can use them will be "priceless."

    And I hope this doesn't bring the credit card company's lawyers down on us :-)

Re: Parsing a text file
by apl (Monsignor) on Apr 16, 2008 at 14:39 UTC
    Now that you've gotten answers to your specific question, something general. If you really wanted to store the file in an array, you could replace
    #Store the file in an array, split by newlines while (<LOG>){ $string .= $_; } @array = split(/\n/, $string);

    with

    while (<LOG>){ chomp; push( @array $_ ); }
      ... or even with
      my @array = <LOG>; # now remove the newlines at the end: chomp @array;
        ... or even

        chomp( my @array = <LOG> );
Re: Parsing a text file
by wade (Pilgrim) on Apr 16, 2008 at 16:50 UTC

    Other posters have already provided fine solutions so I won't go there but, in general, your code should always include:

    use strict; use warnings; use diagnostics; # not strictly necessary but really nice

    The other posters did hint at that but I wanted to be more explicit.

    Also, in the future it would be especially cool if you would clearly flag homework in the header line.

    Hope this helps!

    (update: minor clarification mod)
    --
    Wade
Re: Parsing a text file
by elmex (Friar) on Apr 16, 2008 at 14:30 UTC
    Hmm, your code is a bit chaotic. But here is my try to solve your problem:
    #!/usr/bin/perl $file = 'msgcount.txt'; open (LOG, $file); my %records; # creat a hash while (<LOG>) { # each line is parsed by this regular expression: if (/^\S+ (\S+) (\d+)/) { # and the value for the name, which is now in $1 # is increased by the number in $2 $records{$1} += $2; } } # then we go through all keys (the names) # in our %record hash for (keys %records) { # and print out their sum: print "$_: $records{$_}\n"; }
    Hope this was useful?
      Thank you all for your replies, extraordinarily helpful, I'm looking into the use of hashes as we speak, as the rest of the application is bound to need them also. What a wonderful community.
Re: Parsing a text file
by holli (Abbot) on Apr 16, 2008 at 18:34 UTC
    For the experienced Perl programmer it's easy to laugh and taunt away about code like
    while (<LOG>){ $string .= $_; } @array = split(/\n/, $string); foreach (@array) {
    which be would be more perlish as
    while (<LOG>){ chomp;
    However, you old horses, remember the last time you were exploring a larger Perl module/framework/whatever and failed that Framework::Foo::Bar is actually a subclass of XY::Unknown (via eval in Framework::Factory and some invisible magic in Framework::Magician)

    :-D


    holli, /regexed monk/
Re: Parsing a text file
by mr_mischief (Monsignor) on Apr 16, 2008 at 20:24 UTC
    For the problem as stated and the example data, I think you've got plenty of nice solutions. In particular, I think I like the one from moritz. However, before you use a solution which splits on spaces, are you really sure your example data is representative and that none of the names in your names field will ever contain a space?
Re: Parsing a text file
by sirrobert (Acolyte) on Apr 16, 2008 at 14:51 UTC

    nimajneb, try something like this:

    ### Always use strict =) use strict; ### Set up a hash table ("associative array") to associate ### numbers with names my %totals; ### Open the file open my $fIN, "<msgcount.txt"; ### Read through the file line by line. Inside the loop, the ### special variable $_ will refer to the "current line" and ### this loop will move to the next line in turn with each ### loop iteration. It will quit when it runs out of file to read. ### ### I'll define some regexp patterns here, but you could do it ### all at once, of course. I'm only splitting it here to make ### the code in the loop more readable. my $date = '\d\d\/\d\d\/\d\d\d\d'; ### Matches MM/DD/YYYY my $name = '\w+'; ### Matches any number of a-z, A-Z, or _ my $count = '\d+'; ### Matches any number of digits. while( <$fIN> ) { ### We only care about this line if it's our special format. This ### will ignore lines that don't have valid data, such as blanks ### trailing blank lines in the file or something. ### matches 01/03/2008 yasmin 67 if( $_ =~ /($date) ($name) ($count)/ ) { ### The parenthases above captured the data in this line (if ### applicable). Now we can access the first, second, and third ### matches: my $this_date = $1; ### The first () field my $this_name = $2; ### The second () field my $this_count = $3; ### The third () field ### Do one thing if the user already has data, and something ### else if not. if( not exists $totals{$this_name} ) { ### If this person isn't already in the hash, add her with ### this data. $totals{$this_name} = $this_count; } else { ### This person is already in the hash, so just add to the ### existing totals count. $totals{$this_name} += $this_count; } } } ### Don't forget to close your file. close $fIN; ### That's it! Now you've got a hash full of your data (garaunteed ### not to have duplicates =) Now you can access someone's ### data directly: print 'yasmin has ' . $totals{'yasmin'} . ' posts.'; ### or with a loop foreach my $username (keys(%totals)) { print "User $username has " . $totals{$username} . " posts.\n"; }

    That could all be done in a much more compact manner, but that makes it harder to learn at first, of course =)

    For information about hashes using hashes in perl, do a google search for "perl hash tutorial". (here).

    For information about capturing data as I did above (called "regular expressions") check out the Perl Regular Expressions documentation page.

    Hope it helps!

      That's good advice, "Always use strict." Unfortunately, you've fallen almost at the first hurdle by forgetting to use my when opening your lexical filehandle. It is also recommended practice to test for the success of the open statement (and close as well) and to use it's three argument form.

      open my $fIN, q{<}, q{msgcount.txt} or die qq{open: msgcount.txt: $!\n};

      Failing to test for success can lead to "readline() on closed filehandle $fIN at myscript.pl line nnn" errors if, say, you mis-type the path or the file has been deleted or ...

      I hope this is of interest.

      Cheers,

      JohnGG

Re: Parsing a text file
by raj8 (Sexton) on Apr 18, 2008 at 05:47 UTC

    Hope this helps.

    while($input = <DATA>) { @fields = split(" ",$input); print "\n$fields[1] => $fields[2]"; }; __DATA__ 01/03/2008 angie 53 01/03/2008 kristen 95 01/03/2008 MaryT 123 01/03/2008 Nicole 27 01/03/2008 sylvarius 33 01/03/2008 yasmin 67 02/03/2008 angie 2 02/03/2008 kristen 121 02/03/2008 MaryT 81 02/03/2008 Nicole 47 02/03/2008 sylvarius 15 02/03/2008 Tanya 22 02/03/2008 yasmin 60 03/03/2008 angie 3 03/03/2008 donna 78 03/03/2008 Kimberly 9 03/03/2008 kristen 257 03/03/2008 MaryT 181

    Output

    angie => 53 kristen => 95 MaryT => 123 Nicole => 27 sylvarius => 33 yasmin => 67 angie => 2 kristen => 121 MaryT => 81 Nicole => 47 sylvarius => 15 Tanya => 22 yasmin => 60 angie => 3 donna => 78 Kimberly => 9 kristen => 257 MaryT => 181

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://680803]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-20 02:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found