Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^2: searching in large file

by sabas (Novice)
on Jan 07, 2018 at 17:39 UTC ( #1206857=note: print w/replies, xml ) Need Help??


in reply to Re: searching in large file
in thread searching in large file

What an excellent code. Thank you sir! I timed and it took less than 7 secs ONLY to complete the process ! unbelievable... if its not too much to ask i have few more questions: 1. is the while(<$smallfile>) same reading the file while not end of file? 2. kindly explain or put comment on this expression: $search_expression .="\Q$_\E|"; 3. also this one: $search_expression = qr($search_expression); 4. next unless m/$search_expression/; Respectfully Yours, Sabas

Replies are listed 'Best First'.
Re^3: searching in large file
by Cristoforo (Curate) on Jan 07, 2018 at 23:37 UTC
    Hello sabas

    To answer your questions.

    1. yes
    2. In Perl, these metacharacters need to be escaped if they are to be matched literally \ | ( ) [ { ^ $ * + ? . (also called the dirty dozen) By using \Q ... \E, you escape any possible metacharacters in the variable being used for the regular expression - in this case $_
    3. He is compiling the regular expression $search_expression From Regexp Quote Like Operators the reason is Precompilation of the pattern into an internal representation at the moment of qr() avoids the need to recompile the pattern every time a match /$pat/ is attempted So, this avoids compiling the regular expression each time it is encountered in the while loop below next unless m/$search_expression/;
    4. Go to the top of the while loop and get the next line unless the regular expression matches this line. This skips the lines of code below if true.
      Thank you, Cristoforo (++) for your excellent explanation.

      I wanted to add that for point (2), in addition to escaping the content($_) (with \Q and \E), the line appends the alternation meta-chaacter (|).

      This makes the regular expression search for ANY of the lines in the smallfile.
      Note also the "chop" statement which deletes the extra "|" appended.

                      We're living in a golden age. All you need is gold. -- D.W. Robertson.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1206857]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (2)
As of 2020-11-26 04:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?