Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: How to manage a big file

by AnomalousMonk (Archbishop)
on Apr 23, 2014 at 08:24 UTC ( [id://1083298]=note: print w/replies, xml ) Need Help??


in reply to How to manage a pattern matching & counting with big data file

Some very discursive comments after a very brief inspection...

my @resilt = (); while (<fh>) { my @a = (); @a = &get_data_path_report ($_); my %seen = (); push (@result, @a); @result = grep { !$seen{$_}++ } @result; } shift(@result);

You seem to have used lexicals pretty consistently, but then undermined their use by not enabling strictures — and warnings for good measure. See warnings and strict. Add these two lines at the very start of your program
    use warnings;
    use strict;
and then fix all the errors and warnings.

push (@result, @a); @result = grep { !$seen{$_}++ } @result;

The push statement is redundant. The statement
    @result = grep { !$seen{$_}++ } @a;
would have the same effect, and a further simplification would be to use List::MoreUtils::uniq as in
    @result = uniq get_data_path_report();
alone, all other statements in the while-loop being needless.The statement
    use List::MoreUtils qw(uniq);
must be added at the start of the script to import uniq. (The function  get_data_path_report() does not need to have  $_ passed to it because the function takes no arguments — as far as I can see by quick inspection.)

The function

sub get_first_elements_of_string { my @a = (split (' ' ,"$_[0]")); return $a[0]; }
could be re-written (untested)
sub get_first_elements_of_string { my ($first) = split ' ', $_[0], 2; return $first; }
(see split for the LIMIT parameter) which will not change its effect, but may improve its performance.

Updates:

  1. Also WRT the  while (<fh>) { ... } loop: As it stands in the OP, this loop reads and processes the entire file, but then only keeps the final record read for further processing. Is this what you intended? Did you perhaps intend something like
        push @result, uniq get_data_path_report();
    instead?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1083298]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2024-04-26 03:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found