Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Parsing a Large file with no reason

by mrras25 (Acolyte)
on Jan 28, 2010 at 23:23 UTC ( #820270=note: print w/ replies, xml ) Need Help??


in reply to Parsing a Large file with no reason

I was just trying to have people to bounce some ideas off of - however this is what I came up with - its crude and runs slow on large files (the 1 file I am running this off of as a test is 80,000+ lines long) - If someone sees something I can do differently I am open to suggestions

#!/usr/bin/perl use strict; use warnings; no warnings 'uninitialized'; use Data::Dumper; use Tie::File; my $base = $ARGV[0]; open(FILE, $base) || die "Unable to locate file: $!\n"; my (@searray,@flarray); tie(@flarray, 'Tie::File',$base); while(<FILE>) { my ($start,$end); chomp; if($_ =~ /-El\s+vg/../vgserial_id/) { $start = (split /\s+/,$_)[3] if($_ =~/-El/); $end = (split /\s+/, $_)[1] if($_ =~/vgserial_id/); } if(defined $start) { push(@searray, $start); } else { $start = ''; +} if(defined $end) { push(@searray, $end); } else { $end = ''; } } my %hash_ref = @searray; #print Dumper \%hash_ref; foreach my $hkey(keys %hash_ref) { my $hvalue = $hash_ref{$hkey}; my $count = 0; for (my $i = 0; $i < @flarray; $i++) { next unless $flarray[$i] =~ /$hvalue/; next if($flarray[$i] =~ /vgserial_id/); my ($mc,$lvsip) = (($i-1),($i-5)); my $mount = (split /\s+/, $flarray[$mc])[1]; my $lvnam = (split /\s+/, $flarray[$lvsip])[3]; next if($mount =~ /None/); print "$i: VG: $hkey : MOUNT: $mount : LV_name: $lvnam : SIZE: $ +size\n"; } }


Comment on Re: Parsing a Large file with no reason
Download Code
Replies are listed 'Best First'.
Re^2: Parsing a Large file with no reason
by Cristoforo (Deacon) on Jan 29, 2010 at 02:25 UTC
    It is so slow on large files because for each matching record, you loop through all 80,000 lines; So, if you had had 4000 matching records, you would have 4000 * 80000 = 320,000,000 iterations. There must be a better method I think.

    And I don't know if you can 'tie' the same file (as an array) while opening it for reading (both at the same time).

    Note that I set the input record separator, $/, to ---- lsattr , (with a space following lsattr), to read a record at a time.

    Not seeing more sample data, I made a guess at what might work and it did work with your sample data. But again, it's difficult to tell.

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $base = $ARGV[0] or die "Must supply a filename to open. $!"; open my $fh, "<", $base or die "Unable to locate file: $!\n"; my %data; { local $/ = "---- lsattr "; while (<$fh>) { chomp; next unless /^-El\s+(\S+)/; my $vg = $1; next unless /^label\s+(\S+)/m; my $label = $1; next if $label eq "None"; next unless /^lvserial_id\s+(\S+)/m; my $name = $1; next unless /^size\s+(\d+)/m; my $size = $1; @{ $data{ $vg } }{ qw/ label name size / } = ($label, $name, $ +size); #print "VG: $vg : MOUNT: $label : LV_name: $name : SIZE: $size +\n"; } } print Dumper \%data;
    Update: The data structure created above will only work if there is only 1 record for each sought key, ($vg). If there is more than 1 record with the same key, the data structure will only contain the last fields parsed from the file. It will silently give you incorrect results.

    That said, I would need to know more about your file to be able to suggest a suitable data structure.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://820270]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2015-07-28 06:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (252 votes), past polls