Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re: Grep Effeciency

by baku (Scribe)
on Feb 28, 2001 at 01:46 UTC ( #61184=note: print w/replies, xml ) Need Help??

in reply to Grep Effeciency

What scares me (potential-efficiency-wise) is the fact that you loop over the same array for each key. Not knowing the rest of your code, I'm going to take a stab that generating the list for all keys in a single pass down the list would be better...

To preserve the "remnants" in @ECL_DATA:

# create a "cache": "is this a NETID?" # (if I understand your variables correctly...) my %ids = map { $_ => 1 } values %totalnetname; my %ECLDATA = (); { # lexical scope for leftovers my @leftovers = (); while (my $entry = shift @ECL_STAT) { # ignore non-|-delimited lines next unless my ($netid) = split /\|/, $_; # push lines which begin with any NETID push ( $ids{$netid} ? @{ $ECLDATA{$netid} } : @leftovers ), $_; } @ECL_STAT = @leftovers; } for my $key (@totalkeys) { my $NETID = $totalnetname{$key}; my @ECLDATA = @{ $ECLDATA{$NETID} }; # ... lots, as before. }

If you're reasonably certain that every line begins with a NETID, you could get rid of some of that...

Replies are listed 'Best First'.
Re: Re: Grep Effeciency
by ImpalaSS (Monk) on Feb 28, 2001 at 01:49 UTC
    Hey Baku,
    The reason i loop over the same array, because the "#--- more code" line is actually, taking that array, and doing a foreach loop on it, splitting every line at the | then, doing the calulations and printing the results. Is this inneffecient? I figure searching just once for the NETID and just parsing that data, then moving on is the best way? But, I dont knonw nearly as much about perl as many people here :)



      Sorry for my terseness yesterday, that bloody "real world" kept interfering with my PerlMonks time :-)

      Here's a breakdown of what I came up with, with the change that we now split the data only once.

      # create a "cache": "is this a NETID?" # (if I understand your variables correctly...) # # map takes in the list of values of %totalnetname, which # I am "assuming" are all of the "interesting" net ID's. # In order to look them up quickly, we build a hash by # emitting (from inside the {}) key=>value pairs of the # form $id => 1. # # If there were "uninteresting" ID's, now would be the # ideal time (IMHO) to "filter" them out. This could be # most easily done by changing the BLOCK to something like # { &is_it_interesting($_) ? ( $_ => 1 ) : () } # # Better yet, never put in the "uninteresting" values # (via %totalnetname) to begin with... # # (As a guess: is %totalnetname perhaps a tied DB_File? # That would probably make editing it directly unwise in # this situation. If it's not, and the NETIDs are fairly # constant, putting it into a DB_File would likely be a # good idea, YMMV/TIMTOWTDI/standard disclaimers...) # my %ids = map { $_ => 1 } values %totalnetname; # # Clear out the 'output' hash, declaring it lexically # (lexically ~~ "my") # my %ECLDATA = (); # # This brace starts a lexical scope so that our messy # temporary variables are garbage-collected when we're # done with them. There's likely a way to do this without # temporary variables -- it seems that there always is -- # but that will have to be [merlyn]'s problem to right ;-) # { # lexical scope for leftovers # # Where do bad records go? If the dataset is known (or at # least expected) to contain only NETID|data... records, # there's no need to have a @leftovers array. # my @leftovers = (); # # This is almost a 'for my $entry (@ECL_STAT),' but it # isn't. Looking at it again today, I don't remember why # it isn't, so it probably could be, or even should be, # if only for readability. # # UPDATED: It's not, because, of course, shift removes its # value from @ECL_STAT, leaving it empty for @leftovers. # This is marginally unimportant, since @ECL_STAT will be # clobbered afterwards anyways, but might decrease memory # usage somewhat since each record is in RAM only ~~ 1ce # at a time. # while (my $entry = shift @ECL_STAT) { # ignore non-|-delimited lines # # This 'rejects' any lines which don't contain |'s. # Reading 'inner-first:' # 1. split $entry on | (note my typo yesterday of $_!) # 2. assign (in list context) to $netid and @stuff # 3. assignment (in list context) returns the number # of items assigned, so the 'unless' is getting # a number >= 1 (since split will return $entry # if we have no |'s). The missing >1 was another # hasty mistake of mine yesterday :-P # 4. go through the loop again if we got 1 (or 0?) next unless (my ($netid, @stuff) = split /\|/, $entry) > 1; # push lines which begin with any NETID # # This stores a reference to a new list in either # $ECLDATA{$netid} (if the $netid we got, above, is in # %ids) or @leftovers (if not). This could be simplified # to push ( @{ $ECLDATA{$netid} } ), [ $netid, @stuff ]; # if every possible netid is "interesting." (This would # also remove the my %ids = map... above) # # NB. that @{ } is the 'array dereference' operator, # which just means 'give me the array which this is a # reference to, rather than the reference.' I read it # aloud as "the list/array referred to by ..." # # The use of references is due to the fact that an # array cannot be (directly) the 'value' of an hash # element. # push ( $ids{$netid} ? @{ $ECLDATA{$netid} } : @leftovers ), [ $netid, @stuff ]; } # # For compatibility with the original script, we push # back 'leftovers' into @ECL_STAT; this might be # unnecessary. # @ECL_STAT = @leftovers; # # At this point, @leftovers goes out of scope and is # removed from memory. # } # # This routine doesn't need to split the data, or any of # that, because it's picking up its data from the @ECLDATA # array. # for my $key (@totalkeys) { my $NETID = $totalnetname{$key}; # # Don't try to do any work if no data was found. # next unless exists $ECLDATA{$NETID}; # # This could also assign to the various 'field' # variables you might have, eg. # my ($name, $address, $city) = @{ $ECLDATA{$NETID} }; # my @ECLDATA = @{ $ECLDATA{$NETID} }; # ... lots, as before. }

      Hopefully, that's a bit easier to understand. Good luck in your efforts!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://61184]
[Corion]: But maybe it's also due to that I play with friends, which makes a game more enjoyable anyway ;)
[Corion]: Oh - I released a new version of some module, thanks to a pull request. But I don't consider "update Makefile.PL" and "update author tests" as "writing code" ;-D
[marto]: sounds fun, the opposite of my weekend :P
[Corion]: marto: You wrote Perl? Or did you have to work (and wrote Perl)?
[marto]: the fun part, I had no fun this weekend, very stressful :)
[Corion]: But I feel an introductory talk gestating, working title "Reading CPAN" - how to read module documentation, how to judge a module, how to read the module tests/examples
[Corion]: marto: Ouch :-/ The kids are making trouble?
[Corion]: Meh - food becons, and colleagues are staring at me, talk to you later, sorry!

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2017-08-21 09:30 GMT
Find Nodes?
    Voting Booth?
    Who is your favorite scientist and why?

    Results (319 votes). Check out past polls.