Here's a quick stab at a solution. It works with your example data...
use strict;
use Data::Dumper;
my %hash;
my $regexp = qr{
^ \s*
Collection \s* => \s* (\d+)? \s*
ImageCount \s* => \s* (\d+)? \s*
Status \s* => \s* (\w+)? \s*
Missing \s* => \s* ([\d,]+)? \s*
Modified \s* => \s* ([\d/]+\s[\d:]+)? \s*
$}x;
while (defined(my $line = <DATA>))
{
chomp $line;
my %linehash;
if ($line =~ $regexp)
{
%linehash = (
Collection => $1,
ImageCount => $2,
Status => $3,
Missing => $4,
Modified => $5,
);
}
next unless defined $linehash{Collection};
$hash{ $linehash{Collection} } = \%linehash;
}
my @sorted_by_status = sort { $a->{Status} cmp $b->{Status} } values %
+hash;
print Dumper \@sorted_by_status;
__DATA__
Collection=>168245 ImageCount=>6 Status=>SI Missing=>1,3 Modified=>01/
+18/2012 11:14:30
Collection=>161745 ImageCount=>6 Status=>I Missing=>2,3 Modified=>01/1
+8/2012 11:16:38
Collection=>162451 ImageCount=>6 Status=>SC Missing=> Modified=>01/20/
+2012 11:16:38
Collection=>117481 ImageCount=>8 Status=>C Missing=> Modified=>01/18/2
+011 7:16:38
It would be nice if the regular expression could be made less specific, but some features of your data format make that tricky (e.g. the fact that the value following "=>" can be a zero-length string).