Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Ra: Grep Effeciency

by baku (Scribe)
on Feb 28, 2001 at 23:56 UTC ( #61411=note: print w/ replies, xml ) Need Help??


in reply to Re: Re: Grep Effeciency
in thread Grep Effeciency

Sorry for my terseness yesterday, that bloody "real world" kept interfering with my PerlMonks time :-)

Here's a breakdown of what I came up with, with the change that we now split the data only once.

# create a "cache": "is this a NETID?" # (if I understand your variables correctly...) # # map takes in the list of values of %totalnetname, which # I am "assuming" are all of the "interesting" net ID's. # In order to look them up quickly, we build a hash by # emitting (from inside the {}) key=>value pairs of the # form $id => 1. # # If there were "uninteresting" ID's, now would be the # ideal time (IMHO) to "filter" them out. This could be # most easily done by changing the BLOCK to something like # { &is_it_interesting($_) ? ( $_ => 1 ) : () } # # Better yet, never put in the "uninteresting" values # (via %totalnetname) to begin with... # # (As a guess: is %totalnetname perhaps a tied DB_File? # That would probably make editing it directly unwise in # this situation. If it's not, and the NETIDs are fairly # constant, putting it into a DB_File would likely be a # good idea, YMMV/TIMTOWTDI/standard disclaimers...) # my %ids = map { $_ => 1 } values %totalnetname; # # Clear out the 'output' hash, declaring it lexically # (lexically ~~ "my") # my %ECLDATA = (); # # This brace starts a lexical scope so that our messy # temporary variables are garbage-collected when we're # done with them. There's likely a way to do this without # temporary variables -- it seems that there always is -- # but that will have to be [merlyn]'s problem to right ;-) # { # lexical scope for leftovers # # Where do bad records go? If the dataset is known (or at # least expected) to contain only NETID|data... records, # there's no need to have a @leftovers array. # my @leftovers = (); # # This is almost a 'for my $entry (@ECL_STAT),' but it # isn't. Looking at it again today, I don't remember why # it isn't, so it probably could be, or even should be, # if only for readability. # # UPDATED: It's not, because, of course, shift removes its # value from @ECL_STAT, leaving it empty for @leftovers. # This is marginally unimportant, since @ECL_STAT will be # clobbered afterwards anyways, but might decrease memory # usage somewhat since each record is in RAM only ~~ 1ce # at a time. # while (my $entry = shift @ECL_STAT) { # ignore non-|-delimited lines # # This 'rejects' any lines which don't contain |'s. # Reading 'inner-first:' # 1. split $entry on | (note my typo yesterday of $_!) # 2. assign (in list context) to $netid and @stuff # 3. assignment (in list context) returns the number # of items assigned, so the 'unless' is getting # a number >= 1 (since split will return $entry # if we have no |'s). The missing >1 was another # hasty mistake of mine yesterday :-P # 4. go through the loop again if we got 1 (or 0?) next unless (my ($netid, @stuff) = split /\|/, $entry) > 1; # push lines which begin with any NETID # # This stores a reference to a new list in either # $ECLDATA{$netid} (if the $netid we got, above, is in # %ids) or @leftovers (if not). This could be simplified # to push ( @{ $ECLDATA{$netid} } ), [ $netid, @stuff ]; # if every possible netid is "interesting." (This would # also remove the my %ids = map... above) # # NB. that @{ } is the 'array dereference' operator, # which just means 'give me the array which this is a # reference to, rather than the reference.' I read it # aloud as "the list/array referred to by ..." # # The use of references is due to the fact that an # array cannot be (directly) the 'value' of an hash # element. # push ( $ids{$netid} ? @{ $ECLDATA{$netid} } : @leftovers ), [ $netid, @stuff ]; } # # For compatibility with the original script, we push # back 'leftovers' into @ECL_STAT; this might be # unnecessary. # @ECL_STAT = @leftovers; # # At this point, @leftovers goes out of scope and is # removed from memory. # } # # This routine doesn't need to split the data, or any of # that, because it's picking up its data from the @ECLDATA # array. # for my $key (@totalkeys) { my $NETID = $totalnetname{$key}; # # Don't try to do any work if no data was found. # next unless exists $ECLDATA{$NETID}; # # This could also assign to the various 'field' # variables you might have, eg. # my ($name, $address, $city) = @{ $ECLDATA{$NETID} }; # my @ECLDATA = @{ $ECLDATA{$NETID} }; # ... lots, as before. }

Hopefully, that's a bit easier to understand. Good luck in your efforts!


Comment on Ra: Grep Effeciency
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://61411]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (9)
As of 2015-07-03 13:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (53 votes), past polls