Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re^4: please help me to resolve the Line comments and appending issue

by suno (Acolyte)
on Aug 10, 2012 at 12:51 UTC ( #986735=note: print w/replies, xml ) Need Help??

in reply to Re^3: please help me to resolve the Line comments and appending issue
in thread please help me to resolve the Line comments and appending issue

hi, ...

i am not still clear with the macro file..

my requirement is :

I have hardcoded all keywords to an array(keywords come only at the begining of the sentence).I want to capture all words other than the keywords and push them to the macro file.

So i thought of comparing each word in my input file(only the first word) with the keyword array and if it is not present in the array then push it to the macro file..

in the above example : CLI is a keyword ...where else @PUT is a macro

but i couldnt able to figure it out clearly.Can you please help me out in this ?

  • Comment on Re^4: please help me to resolve the Line comments and appending issue

Replies are listed 'Best First'.
Re^5: please help me to resolve the Line comments and appending issue
by aitap (Deacon) on Aug 10, 2012 at 14:04 UTC
    Array search can be done using grep, but if you want just to check whether an item exists in the list, using hash is faster:
    my %keywords = map { $_ => 1 } qw/L CLI BNE LTR .../; ... #later in the code if (exists $keywords{$whatever_is_suspected_to_be_a_keyword}) { # it is a keyword ... } ...
    Sorry if my advice was wrong.

      Almost always true in practice. But there is more overhead in a hash than in an array, and that's magnified if you already have an array and have to duplicate it as a hash. If you're going to check more than one value against the list, as in the FAQ "How do I search file2 for all the values in file1?" then a hash is almost certainly the best solution. But if you only have to check one or two values, you may be better off sticking with grep:

      % cat #!/usr/bin/env perl use Modern::Perl; use Benchmark qw(:all); my @words = split /\s+/, `cat bigfile`; #8.5MB file say scalar @words, " words in bigfile"; #1.3M words my $match = 'professional'; # appears 16 times scattered through bigfi +le cmpthese( 10, { 'hash it' => \&hashit, 'grep it' => \&grepit, 'first it' => \&firstit, }); sub hashit { my %h; @h{@words} = (); my $exists = exists $h{$match}; } sub grepit { my $exists = grep { $_ eq $match } @words; } sub firstit { use List::Util 'first'; my $exists = first { $_ eq $match } @words; } % perl 1293687 words in bigfile Rate hash it grep it first it hash it 3.33/s -- -43% -89% grep it 5.80/s 74% -- -81% first it 29.8/s 795% 413% --

      So if I only need to search the list once, grep wins over a hash. For multiple searches, a hash comes out ahead. List::Util's first() routine splits the difference a little; with my dataset it beats the hash for up to about 8 searches, but I assume that would vary greatly depending on how early in the array the match is made. It'd take more testing with matches found earlier/later/never in the array to form a good comparison there.

      My simple conclusion would be: always build a hash unless you know your program will only search it once; then use grep or List::Util::first. Also, if I'm pulling data in from somewhere (like a file) and I know I'm going to be searching it this way, I put it straight into a hash from the start.

      Aaron B.
      Available for small or large Perl jobs; see my home node.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://986735]
[ambrus]: Petroza had trouble posting yesterday, but has posted Issues Fetching URL with a variable token since.
[mz2255]: Yes, just edited the scratchpad. Not sure if I'm doing something wrong, it's my first time.
[ambrus]: No, you're not doing anything wrong. It's just that our automatic spam filters confuse you with the spammers who post advertisments for online shops of counterfeit branded clothing.

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (14)
As of 2017-10-19 15:29 GMT
Find Nodes?
    Voting Booth?
    My fridge is mostly full of:

    Results (255 votes). Check out past polls.