Parsing a text file

nimajneb has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Parsing a text file by kyle (Abbot) on Apr 16, 2008 at 14:27 UTC
I would first encourage you to Use strict and warnings. A hash would work well for this application. If you're using strict, you can declare it like so: `my %total_for;` [download] Then you can access the elements by operator name. `my $operator = 'Nicole'; $total_for{$operator} = 123; # total for Nicole is 123 $total_for{$operator} += 45; # increase Nicole's total by 45.` [download] You can loop over the names using keys like so: `foreach my $operator ( sort keys %total_for ) { print "Total for $operator: $total_for{$operator}\n"; }` [download] I also used sort here because the keys come out in no particular order. A few other notes: Check that your open succeeded. `open(LOG, $file) or die "Can't read '$file': $!";` [download] Rather than read the whole file before you begin processing, process it a line at a time. `while (<LOG>) { @tmpwords = split; # etc. }` [download] If you're reading a line at a time like that, you'll want to chomp your incoming data. (Your code as-is takes care of this with split.) In general, it's a good idea to name your loop variables instead of using `$_` everywhere. `foreach my $blah ( @array ) { # use $blah instead of $_ for things }` [download] When you interpolate a variable into a regular expression as in "`grep /@tmpwords[1]/, @operators`", any regexp metacharacters get interpreted as metacharacters. You can avoid this by using `\Q` like so: `/\Q@tmpwords[1]\E/`. In your case, this doesn't make a difference because there aren't any metacharacters in your data file. One problem you might have, however, is that an operator named "Jen" would "match" another operator named "Jennifer" because the pattern is not anchored at the ends. In general, if you're looking for an exact match, you should say something like "`grep $_ eq @tmpwords[1], @operators`". Update: Since others have posted full working solutions now, I might as well also (I was trying to treat this as a student). use strict; use warnings; my $file = 'msgcount.txt'; open my $fh, '<', $file or die "Can't read '$file': $!"; my %total_for; while ( <$fh> ) { my $line = $_; chomp; s{ \A \d\d / \d\d / \d{4} \s+ }{}xms or die "line does not match: $line"; my ( $name, $numb ) = m{ \A ( .+ ) \s+ ( \d+ ) \s* \z }xms; if ( ! $name ) { die "line does not match: $line"; } $total_for{ $name } += $numb; } close $fh or die "Failed to close: $!"; foreach my $operator ( sort keys %total_for ) { print "Total for '$operator': $total_for{$operator}\n"; } [download] This accounts for the suggestion from mr_mischief (in case names are not all non-spaces). It will die if it hits a line outside the format it expects. I haven't run it.	[reply] [d/l] [select]
Re: Parsing a text file by moritz (Cardinal) on Apr 16, 2008 at 14:29 UTC
The trick is to use a hash: `use strict; use warnings; my $file = 'msgcount.txt'; open(LOG, '<', $file) or die "Can't read '$file': $!"; my %sum; while (<LOG>){ chomp; # remove newline my ($date, $name, $number) = split m/ /, $_; $sum{$name} += $number; } # now print the result: while (my ($name, $total) = each %sum){ print "$name\t$total\n"; }` [download]	[reply] [d/l]
Re: Parsing a text file by ww (Archbishop) on Apr 16, 2008 at 14:27 UTC
No taunts. No laughter. But, think "hash" for your data. Yes, this can be done with arrays or multiple arrays, but for your purpose here (and for your future code), understanding hashes and the many ways you can use them will be "priceless." And I hope this doesn't bring the credit card company's lawyers down on us :-)	[reply]
Re: Parsing a text file by apl (Monsignor) on Apr 16, 2008 at 14:39 UTC
Now that you've gotten answers to your specific question, something general. If you really wanted to store the file in an array, you could replace `#Store the file in an array, split by newlines while (<LOG>){ $string .= $_; } @array = split(/\n/, $string);` [download] with `while (<LOG>){ chomp; push( @array $_ ); }` [download]	[reply] [d/l] [select]
Re^2: Parsing a text file by moritz (Cardinal) on Apr 16, 2008 at 14:42 UTC
... or even with `my @array = <LOG>; # now remove the newlines at the end: chomp @array;` [download]	[reply] [d/l]
Re^3: Parsing a text file by johngg (Canon) on Apr 16, 2008 at 14:47 UTC
... or even `chomp( my @array = <LOG> );` [download]	[reply] [d/l]
Re: Parsing a text file by wade (Pilgrim) on Apr 16, 2008 at 16:50 UTC
Other posters have already provided fine solutions so I won't go there but, in general, your code should always include: `use strict; use warnings; use diagnostics; # not strictly necessary but really nice` [download] The other posters did hint at that but I wanted to be more explicit. Also, in the future it would be especially cool if you would clearly flag homework in the header line. Hope this helps! (update: minor clarification mod) -- Wade	[reply] [d/l]
Re: Parsing a text file by elmex (Friar) on Apr 16, 2008 at 14:30 UTC
Hmm, your code is a bit chaotic. But here is my try to solve your problem: `#!/usr/bin/perl $file = 'msgcount.txt'; open (LOG, $file); my %records; # creat a hash while (<LOG>) { # each line is parsed by this regular expression: if (/^\S+ (\S+) (\d+)/) { # and the value for the name, which is now in $1 # is increased by the number in $2 $records{$1} += $2; } } # then we go through all keys (the names) # in our %record hash for (keys %records) { # and print out their sum: print "$_: $records{$_}\n"; }` [download] Hope this was useful?	[reply] [d/l]
Re^2: Parsing a text file by nimajneb (Initiate) on Apr 16, 2008 at 14:38 UTC
Thank you all for your replies, extraordinarily helpful, I'm looking into the use of hashes as we speak, as the rest of the application is bound to need them also. What a wonderful community.	[reply]
Re: Parsing a text file by holli (Abbot) on Apr 16, 2008 at 18:34 UTC
For the experienced Perl programmer it's easy to laugh and taunt away about code like `while (<LOG>){ $string .= $_; } @array = split(/\n/, $string); foreach (@array) {` [download] which be would be more perlish as `while (<LOG>){ chomp;` [download] However, you old horses, remember the last time you were exploring a larger Perl module/framework/whatever and failed that `Framework::Foo::Bar` is actually a subclass of `XY::Unknown` (via eval in `Framework::Factory` and some invisible magic in `Framework::Magician`) :-D holli, /regexed monk/	[reply] [d/l] [select]
Re: Parsing a text file by mr_mischief (Monsignor) on Apr 16, 2008 at 20:24 UTC
For the problem as stated and the example data, I think you've got plenty of nice solutions. In particular, I think I like the one from moritz. However, before you use a solution which splits on spaces, are you really sure your example data is representative and that none of the names in your names field will ever contain a space?	[reply]
Re: Parsing a text file by sirrobert (Acolyte) on Apr 16, 2008 at 14:51 UTC
nimajneb, try something like this: ### Always use strict =) use strict; ### Set up a hash table ("associative array") to associate ### numbers with names my %totals; ### Open the file open my $fIN, "<msgcount.txt"; ### Read through the file line by line. Inside the loop, the ### special variable $_ will refer to the "current line" and ### this loop will move to the next line in turn with each ### loop iteration. It will quit when it runs out of file to read. ### ### I'll define some regexp patterns here, but you could do it ### all at once, of course. I'm only splitting it here to make ### the code in the loop more readable. my $date = '\d\d\/\d\d\/\d\d\d\d'; ### Matches MM/DD/YYYY my $name = '\w+'; ### Matches any number of a-z, A-Z, or _ my $count = '\d+'; ### Matches any number of digits. while( <$fIN> ) { ### We only care about this line if it's our special format. This ### will ignore lines that don't have valid data, such as blanks ### trailing blank lines in the file or something. ### matches 01/03/2008 yasmin 67 if( $_ =~ /($date) ($name) ($count)/ ) { ### The parenthases above captured the data in this line (if ### applicable). Now we can access the first, second, and third ### matches: my $this_date = $1; ### The first () field my $this_name = $2; ### The second () field my $this_count = $3; ### The third () field ### Do one thing if the user already has data, and something ### else if not. if( not exists $totals{$this_name} ) { ### If this person isn't already in the hash, add her with ### this data. $totals{$this_name} = $this_count; } else { ### This person is already in the hash, so just add to the ### existing totals count. $totals{$this_name} += $this_count; } } } ### Don't forget to close your file. close $fIN; ### That's it! Now you've got a hash full of your data (garaunteed ### not to have duplicates =) Now you can access someone's ### data directly: print 'yasmin has ' . $totals{'yasmin'} . ' posts.'; ### or with a loop foreach my $username (keys(%totals)) { print "User $username has " . $totals{$username} . " posts.\n"; } [download] That could all be done in a much more compact manner, but that makes it harder to learn at first, of course =) For information about hashes using hashes in perl, do a google search for "perl hash tutorial". (here). For information about capturing data as I did above (called "regular expressions") check out the Perl Regular Expressions documentation page. Hope it helps!	[reply] [d/l]
Re^2: Parsing a text file by johngg (Canon) on Apr 16, 2008 at 17:45 UTC
That's good advice, "Always use strict." Unfortunately, you've fallen almost at the first hurdle by forgetting to use `my` when opening your lexical filehandle. It is also recommended practice to test for the success of the open statement (and close as well) and to use it's three argument form. `open my $fIN, q{<}, q{msgcount.txt} or die qq{open: msgcount.txt: $!\n};` [download] Failing to test for success can lead to "`readline() on closed filehandle $fIN at myscript.pl line nnn`" errors if, say, you mis-type the path or the file has been deleted or ... I hope this is of interest. Cheers, JohnGG	[reply] [d/l] [select]
Re: Parsing a text file by raj8 (Sexton) on Apr 18, 2008 at 05:47 UTC
Hope this helps. `while($input = <DATA>) { @fields = split(" ",$input); print "\n$fields[1] => $fields[2]"; }; __DATA__ 01/03/2008 angie 53 01/03/2008 kristen 95 01/03/2008 MaryT 123 01/03/2008 Nicole 27 01/03/2008 sylvarius 33 01/03/2008 yasmin 67 02/03/2008 angie 2 02/03/2008 kristen 121 02/03/2008 MaryT 81 02/03/2008 Nicole 47 02/03/2008 sylvarius 15 02/03/2008 Tanya 22 02/03/2008 yasmin 60 03/03/2008 angie 3 03/03/2008 donna 78 03/03/2008 Kimberly 9 03/03/2008 kristen 257 03/03/2008 MaryT 181` [download] Output angie => 53 kristen => 95 MaryT => 123 Nicole => 27 sylvarius => 33 yasmin => 67 angie => 2 kristen => 121 MaryT => 81 Nicole => 47 sylvarius => 15 Tanya => 22 yasmin => 60 angie => 3 donna => 78 Kimberly => 9 kristen => 257 MaryT => 181	[reply] [d/l]


Pathologically Eclectic Rubbish Lister
	PerlMonks