Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Extracting information from file to Hash

by tobyink (Abbot)
on Jan 18, 2012 at 21:52 UTC ( #948630=note: print w/ replies, xml ) Need Help??


in reply to Extracting information from file to Hash

Here's a quick stab at a solution. It works with your example data...

use strict; use Data::Dumper; my %hash; my $regexp = qr{ ^ \s* Collection \s* => \s* (\d+)? \s* ImageCount \s* => \s* (\d+)? \s* Status \s* => \s* (\w+)? \s* Missing \s* => \s* ([\d,]+)? \s* Modified \s* => \s* ([\d/]+\s[\d:]+)? \s* $}x; while (defined(my $line = <DATA>)) { chomp $line; my %linehash; if ($line =~ $regexp) { %linehash = ( Collection => $1, ImageCount => $2, Status => $3, Missing => $4, Modified => $5, ); } next unless defined $linehash{Collection}; $hash{ $linehash{Collection} } = \%linehash; } my @sorted_by_status = sort { $a->{Status} cmp $b->{Status} } values % +hash; print Dumper \@sorted_by_status; __DATA__ Collection=>168245 ImageCount=>6 Status=>SI Missing=>1,3 Modified=>01/ +18/2012 11:14:30 Collection=>161745 ImageCount=>6 Status=>I Missing=>2,3 Modified=>01/1 +8/2012 11:16:38 Collection=>162451 ImageCount=>6 Status=>SC Missing=> Modified=>01/20/ +2012 11:16:38 Collection=>117481 ImageCount=>8 Status=>C Missing=> Modified=>01/18/2 +011 7:16:38

It would be nice if the regular expression could be made less specific, but some features of your data format make that tricky (e.g. the fact that the value following "=>" can be a zero-length string).


Comment on Re: Extracting information from file to Hash
Download Code
Re^2: Extracting information from file to Hash
by bart (Canon) on Jan 18, 2012 at 22:07 UTC
    It would be nice if the regular expression could be made less specific, but some features of your data format make that tricky (e.g. the fact that the value following "=>" can be a zero-length string).
    However, I think it's pretty much guaranteed that there will be whitespace between the key/value pairs. Yet, you use /\s*/. Also, nowhere do I see specified that there even may be whitespace around the "=>" — you just made that up. As this file looks to be computer generated, I sincerely doubt that this will ever be the case. Finally: the only place do I see whitespace inside a column value, is in the final column of the line: the timestamp.

    In short: I think this regex will do:

    /^Collection=>(\S*) \s+ ImageCount=>(\S*) \s+ Status=>(\S*) \s+ Missing=>(\S*) \s+ Modified=>(.*\S) /x

    And if you do this:

    my %r = /^ (Collection)=>(\S*) \s+ (ImageCount)=>(\S*) \s+ (Status)=>(\S*) \s+ (Missing)=>(\S*) \s+ (Modified)=>(.*\S) /x;
    you even get a nice hash record out of it, even though it is restricted to one match per line (otherwise, when using /g you'd get list context, with a different behavior as a result.
Reaped: Re^2: Extracting information from file to Hash
by NodeReaper (Curate) on Jan 18, 2012 at 22:11 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://948630]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (8)
As of 2014-12-22 03:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (110 votes), past polls