Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Don't re-read file for each new pattern...

by cgmd (Beadle)
on May 30, 2007 at 12:03 UTC ( #618159=perlquestion: print w/ replies, xml ) Need Help??
cgmd has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to understand one of the requirements within the final exercise of the Llama book. The exercise reads:

"Make a program that reads a list of strings from a file, one string per line, and then lets the user interactively enter patterns that may match some of the strings. For each pattern,the pattern should tell how many strings from the file matched, then which ones those were. DON'T RE-READ THE FILE FOR EACH NEW PATTERN; KEEP THE STRINGS IN MEMORY. the filename...."
The offered code to satisfy this example is:

use strict; my $filename = '/home/cgmd/bin/learning_perl/sample_text'; open FILE, $filename or die "Can't open '$filename': $!"; chomp(my @strings = <FILE>); while (1) { print "Please enter a pattern: "; chomp(my $pattern = <STDIN>); last if $pattern =~ /^\s*$/; my @matches = eval { grep /$pattern/, @strings; }; if ($@) { print "Error: $@"; } else { my $count = @matches; print "There were $count matching strings:\n", map "$_\n", @matches; } print "\n"; }
My confusion is with the segment of the example (shown above in uppercase) regarding "keep the strings in memory". What segment of the script is designed to meet that requirement? And could someone please explain if the infinite "while (1)" loop is required for this?

Thanks!

Comment on Don't re-read file for each new pattern...
Download Code
Re: Don't re-read file for each new pattern...
by mickeyn (Priest) on May 30, 2007 at 12:11 UTC
    the file's content is kept in @strings and the pattern match is done on @strings each time without re-reading the file.

    Enjoy,
    Mickey

      So, trying to put this together: It appears the combination of assigning the text to the variable ("@strings = <FILE>"), followed by the script, continuing to run (via the infinite loop), that can re-use the contents of @strings as many times as required, is the essence of the mechanism to avoid re-reading the file...

      Can I assume that to be the case?

      Edited to correct the typo... $strings

      Thanks!

        It's pretty common to see scripts process files one line at a time. So you have code like this:
        open FILE, $filename or die "cant open: $!"; while(<FILE>) { chomp; print "read line: $_ \n"; } close FILE;
        This loop will iterate once for each line of the file. The instruction "Don't re-read the file..." means don't put a loop like this inside the infinite while(1) loop. Instead, you read all the lines (once) and save them in the @strings array. This is more efficient, because memory access is faster than file I/O.
        So, trying to put this together: It appears the combination of assigning the text to the variable ("$strings = <FILE>"), followed by the script, continuing to run (via the infinite loop), that can re-use the contents of $strings as many times as required, is the essence of the mechanism to avoid re-reading the file...

        Not really. For one thing you have a mistake (repeated twice) which may be a typo but which is also substantial, so I'm pointing it out: it's not $strings, but @strings! The difference is that the former on the lhs of an assignment imposes scalar context. Thus $strings = <FILE> puts into $strings a single "line". Now, I write "line" in double quotes because it may even be the whole file, as a single string, depending on the input record separator ($/ - look it up in perldoc perlvar). For simplity let's assume that the latter has not been changed from the default and that lines are actually lines: when you do @strings = <FILE> you're in list context instead and each element of the @strings array is a line. Then you iterate over it as over any other list. That's it.

        Let's move on: perhaps a bigger and more severe misunderstanding on your part is with the infinite loop: that has nothing to do with looping over @strings, it is orthogonal. Indeed the latter is nested in the former: here you have two loops one within the other, the second of which disguised as a grep.

        In all earnestness, I'm not familiar with the Llama book, but if this is its final exercise I must presume you've gone through all of it and please don't take it as a personal offense, but I find it a bit surprising that you're still doing all this confusion...

Re: Don't re-read file for each new pattern...
by citromatik (Curate) on May 30, 2007 at 12:32 UTC
    What segment of the script is designed to meet that requirement?
    chomp(my @strings = <FILE>);

    @strings is a variable stored in memory that contains the lines of FILE (one line per position in the array)

    And could someone please explain if the infinite "while (1)" loop is required for this?

    The infinite while asks for an input pattern, the only way of exiting from it is "typing" an empty pattern (i.e. /^\s*$/). Because the program doesn't know when the user will enter an empty pattern, it iterates forever checking if the pattern is empty.

    citromatik

    UPDATE: Of course, TIMTOWTDI:

    perl -e 'while (<>!~/^\s$/){print "Not empty\n"}'
      I'm interested, but not sure how I would use...
      perl -e 'while (<>!~/^\s$/){print "Not empty\n"}'
      ...in the script I provided.

      Could you please elaborate?

      Thanks!

        use strict; my $filename = '/home/cgmd/bin/learning_perl/sample_text'; open FILE, $filename or die "Can't open '$filename': $!"; chomp(my @strings = <FILE>); print "Please enter a pattern: "; while ((my $pattern = <STDIN>)!~/^\s$/) { chomp $pattern; my @matches = eval { grep /$pattern/, @strings; }; if ($@) { print "Error: $@"; } else { my $count = @matches; print "There were $count matching strings:\n", map "$_\n", @matches; print "Please enter a pattern: "; } print "\n"; }

        In this version the while condition is not infinite, it breaks when the input pattern is empty.

        citromatik

Re: Don't re-read file for each new pattern...
by Util (Priest) on May 30, 2007 at 14:39 UTC
    could someone please explain if the infinite "while (1)" loop is required for this?

    To really see why an infinite while() loop was used, let's try re-writing the code to remove or replace that loop.

    In all of the code below, we will assume that the user already knows to just press ENTER to exit.

    Here is the original code, shrunk to the minimum needed to illustrate the issue:

    while (1) { print 'Please enter a word: '; chomp(my $word = <STDIN>); last if $word =~ /^\s*$/; print "You entered '$word'\n"; }
    If we did not need to prompt for input, and STDIN was magically auto-chomping, we could use a while(my $word=<STDIN>){...} construct. Since neither of those is true, what *can* we put in the while() condition to make it finite?

    Let's try checking for the empty $word in the while() condition:

    # Had to duplicate the prompt, <read>, and chomp. # Duplication is bad! # Use the DRY principle: [D]on't [R]epeat [Y]ourself! my $word; print 'Please enter a word: '; chomp($word = <STDIN>); while ( $word !~ /^\s*$/ ) { print "You entered '$word'\n"; print 'Please enter a word: '; chomp($word = <STDIN>); }

    Let's try that again, but not do any work outside the loop:

    # Had to add `not defined $word` to handle the special case # during the first time through the loop. Very Ugly! # Had to duplicate code `$word !~ /^\s*$/` to prevent the # final ENTER from printing "You entered ''" as it exits # the loop. my $word; while ( ( not defined $word ) or ( $word !~ /^\s*$/ ) ) { print 'Please enter a word: '; chomp($word = <STDIN>); if ( $word !~ /^\s*$/ ) { print "You entered '$word'\n"; } }

    Maybe initializing $word will help:

    # Instead of using `not defined $word`, I force # non-whitespace into $word. # Still very ugly. # Still had to duplicate code `$word !~ /^\s*$/` my $word = 'JunkToPreventFailingTheFirstLoop'; while ( $word !~ /^\s*$/ ) { print 'Please enter a word: '; chomp($word = <STDIN>); if ( $word !~ /^\s*$/ ) { print "You entered '$word'\n"; } }

    Invert the while() loop into a do...while loop?

    # Still had to duplicate code `$word !~ /^\s*$/` my $word; do { print 'Please enter a word: '; chomp( $word = <STDIN> ); if ( $word !~ /^\s*$/ ) { print "You entered '$word'\n"; } } while $word !~ /^\s*$/;

    We *can* use a finite while() loop by moving the prompt/<read>/chomp into a sub-routine.

    sub prompt { my ($prompt_string) = @_; print $prompt_string; my $input = <STDIN>; chomp $input; return $input; } # Success! while ( my $word = prompt('Please enter a word: ') ) { print "You entered '$word'\n"; }

    Since that prompt() sub would be useful in many different programs, it should be turned into a module.
    TheDamian already did, and his IO::Prompt module is on CPAN.

    use IO::Prompt; while (my $word=prompt 'Please enter a word: ', -while => qr/\S/) { print "You entered '$word'\n"; }

    I would conclude that the Llama's use of the infinite while() loop is the clearest option that does not involve a prompt() sub or module.

      You have provided an excellent "walk through" of the TIMTOWTDI choices, and why the Llama's suggested script is superior.I'm very grateful for your help!

      Since you mentioned the IO::Prompt module, I downloaded it to try. The script you provided:

      use IO::Prompt; while (my $word=prompt 'Please enter a word: ', -while => qr/\S/) { print "You entered '$word'\n"; }
      ...gives me the following error:

      Use of uninitialized value in pattern match (m//) at /usr/lib/perl5/site_perl/5.8.8/IO/Prompt.pm line 91.

      What should I make of that?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://618159]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2014-07-12 19:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (240 votes), past polls