http://www.perlmonks.org?node_id=674908

negzero7 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am trying to get this script to find words with the letter 'P' in it (case insensitive so p or P) and then print out each word it finds in the file. I'm reading in from a text document.

I thought using the split function would be the best way to go. Here's my code so far:

#!usr/bin/env perl use warnings; use strict; open my $test_fh, '<', @ARGV or die "Can not open file $!\n"; while (<$test_fh>) { my @line = split (); # if ($line[1] =~ /\wP\w/i); print "$line[1]"; } close ($test_fh);

The text I'm testing it with is this:

CPAN stands for comprehensive Perl Archive Network. ^ and $ are used as anchors in a regular expression. /pattern/ is a pattern match operator. Perl is very easy to learn. Enter 'H' or 'h' for help.

It only prints out standsandisis'H' . Can someone point out my errors and what I should be doing?

Replies are listed 'Best First'.
Re: Help with split() function
by GrandFather (Saint) on Mar 19, 2008 at 01:58 UTC

    There are actually two questions here: "Why is my code doing that" and "How do I do what I want". First the "Why".

    You read a line at a time, so far so good. Then, for each line, you split the line on whitespace to generate a list of words (in @line?). Then you print the second word in the list - that is, you print the second word from each line - with no whitespace.

    What you want to do is find each word with a p in it. "Find each" should generally fire the grep neuron in your brain. Consider:

    #!usr/bin/env perl use warnings; use strict; while (<DATA>) { my @words = split (); my @pWords = grep {/p/i} @words; print "Line $. contains "; if (@pWords) { print "the following words containing p: @pWords\n"; } else { print "no words containing p\n"; } } __DATA__ CPAN stands for comprehensive Perl Archive Network. ^ and $ are used as anchors in a regular expression. /pattern/ is a pattern match operator. Perl is very easy to learn. Enter 'H' or 'h' for help.

    Prints:

    Line 1 contains the following words containing p: CPAN comprehensive P +erl Line 2 contains the following words containing p: expression. Line 3 contains the following words containing p: /pattern/ pattern op +erator. Line 4 contains the following words containing p: Perl Line 5 contains the following words containing p: help.

    Perl is environmentally friendly - it saves trees
Re: Help with split() function
by driver8 (Scribe) on Mar 19, 2008 at 01:03 UTC

    This question has been asked before: Pattern matching.

    Out of curiosity, what book are you using? It seems to be a poor one (based on the number of times these questions have been asked).

    -driver8

    PS: I also posted about this in your old thread.

      I'm using Beginning Perl. It's a good book, just regular expression pattern matching is a whole new concept to me. I read through most of that stuff, but I don't feel like I gained any insight into this problem. What am I doing wrong that it's printing out the wrong stuff?

        I don't want to sound too harsh here, but you are missing many of the basics of Perl. You are using split(), arrays, regular expression matching, and "if" statements wrong. The only parts you have right are the ones you took from answers to your previous questions. You need to actually read your book and get familiar with the perl documentation if you want to Learn Perl. So far I can't see that you've learned anything but how to copy and paste.

        I'll try to point you in the right direction anyways, though. Get one thing working at a time, and don't move on to the next thing until everything before it is doing what you want. I can tell you that "my @line = split ();" is not doing what you want it to. Read the perldoc for the split function.


        -driver8

      I've edited it down to this

      #!usr/bin/env perl use warnings; use strict; open my $test_fh, '<', @ARGV or die "Can not open file $!\n"; while (<$test_fh>) { my @line = split (); # if ($line[1] =~ /P/i); print "$line[1]"; } close ($test_fh);

      I'm still getting the same thing. I think my split() needs something but I have no idea what. Any help?

        You have several problems here. The main one is that you decided to use split for no obvious reason. Yes it can be done that way, but why would you want to?

        The simplest would be just to use a regular expression to find a 'p' in the line, all you need is: print if /p/i; No split is required.

        Where split could be used is split '' which breaks up each character into a seperate list element. You could then test each one in a loop. This method is occasionally useful, but only for specialised character-by-character operations, not for a simple task like this. You also seem unclear as to what accessing an array is doing. I suggest you read the section on arrays before you try to use them.

        Keep plugging with the book, and keep writing Perl.
        Well, here's what i would do, and presuming i can still read the output is tested and correct:

        #!/usr/bin/perl -w use strict; while (<DATA>) { my @line = split (); foreach my $word (@line) { if ($word =~ /P/i) { print "$word\n"; } } } __DATA__ CPAN stands for comprehensive Perl Archive Network. ^ and $ are used as anchors in a regular expression. /pattern/ is a pattern match operator. Perl is very easy to learn. Enter 'H' or 'h' for help.

        Why?

        Your use of split is fine. But you need to loop thru your array:
      • line 6: if it doesn't make sense, figure it out
      • line 7: ditto
      • line 8: ditto
Re: Help with split() function
by toolic (Bishop) on Mar 19, 2008 at 02:01 UTC
    I believe what you are trying to do is print out all words in which a p or P is surrounded by other letters. Using \w in the regular expression will actually grab numbers as well as letters, so I'll assume it is ok to grab "words" with numbers in them. You are on the right track using split and splitting your lines on whitespace. I think where you might be confused is that split creates a list (or array) of "words", which I will generalize and call "tokens". Once you do the split, you want to loop through all your tokens and print those which match your regex condition.
    #!/usr/bin/env perl use warnings; use strict; my $filename = shift; open my $test_fh, '<', $filename or die "Can not open file $filename: +$!\n"; while (<$test_fh>) { my @tokens = split; for my $token (@tokens) { if ($token =~ /\wp\w/i) { print $token, "\n" } } } close ($test_fh);

    Prints out:

    CPAN comprehensive expression. operator.

    Please also note that I got rid of the @ARGV in the open line. While it is legal to do so, it is not usually done that way since @ARGV may contain several items, but open will only accept one filename. Please read about shift to understand what's going on there.

Re: Help with split() function
by wrinkles (Pilgrim) on Mar 19, 2008 at 10:18 UTC
    I'm working through some of these "Seekers" questions as a learning tool. This is my first perlmonks post. :)
    To print one "word" per line, stripping out punctuation at the end, try this:
    #!/usr/bin/perl use warnings; use strict; my $filename = shift; open my $TEST_FH, '<', "$filename" or die "Cannot open file $filename +\n$!\n"; while (<$TEST_FH>) { chomp; my @words = split; my @pmatch = grep {/p/i} @words; if (@pmatch) { foreach (@pmatch) { s/[^\w]$//; print "$_\n"; } } } close ($TEST_FH);

      Looks pretty good, but I'll make some suggestions. Instead of "[^\w]" you can use "\W". It means the same thing. Also putting "if (@pmatch)" before your "foreach" is redundant. If there is nothing in the array, it won't enter the loop. Here's my solution:

      while (<DATA>) { for (split /\W/) { print "$_\n" if /p/i; } } __DATA__ CPAN stands for comprehensive Perl Archive Network. ^ and $ are used as anchors in a regular expression. /pattern/ is a pattern match operator. Perl is very easy to learn. Enter 'H' or 'h' for help.

      It separates punctuation and other "\W" characters in the split statement. Then it skips any temporary arrays and just prints if the element contains a "p".


      -driver8
Re: Help with split() function
by hipowls (Curate) on Mar 19, 2008 at 10:19 UTC

    As others have pointed out your code is very wrong in places. I'll go over it line by line

    #!usr/bin/env perl use warnings; use strict;
    so far so good

    Now it starts to go awry

    open my $test_fh, '<', @ARGV or die "Can not open file $!\n";
    If your program is invoked with no arguments it will exit with the error Can not open file No such file or directory and if it has two or more arguments it will exit with More than one argument to open(,':perlio') at ./Perl-1.pl line 11.. Neither error is helpful to the user. You should
    • Check the number of arguments and exit with a helpful usage message
    • Put the name of the file that couldn't be opened in the die message.
    Marks for $!.

    while (<$test_fh>) {
    OK although many people would explicitly assign <test_fh> to a lexical variable.

    Now we get to the real problems

    my @line = split (); # if ($line[1] =~ /\wP\w/i); print "$line[1]";

    First you split each line on white space which is almost good but given your placement of the parentheses I'm not sure you realize that you are calling split with no arguments or perhaps you think the parentheses are an argument to split.

    Splitting on white space doesn't give you words, you get a list of things that could be words such as words, $var1 and "quoted?". That may or may not be what you want. I'm just saying you need to think about it.

    The next two lines look like they were originally one and I'll treat them as such. Having got your list of words you then test only the second and if it contains the pattern an alphanumeric or '_', a 'P' or 'p' followed by another alphanumeric or '_' then print it. Without any surrounding white space.

    What you want to do is test every word to see if has a 'P' or 'p' regardless of surrounding characters. 'Perl' will fail your test because there isn't anything to match the first \w.

    You are back on track with the last line.

    close ($test_fh);

    Since this is homework and I'd like full marks for it I'll submit my complete worked solution

    #!/net/perl/5.10.0/bin/perl use strict; use warnings; die "Invalid number of arguments\nUsage: program <file_name>\n" if @AR +GV != 1; open my $fh, '<', $ARGV[0] or die "Can't open $ARGV[0]: $!\n"; while ( my $line = <$fh> ) { print "$_\n" # print one word per line for grep {/p/i} # select those with a p or P $line =~ /(\p{IsAlpha}+)/g; # Words only have letters # and match all of them } close $fh; __END__ CPAN comprehensive Perl expression pattern pattern operator Perl help

Re: Help with split() function
by carol (Beadle) on May 29, 2008 at 22:41 UTC
    I thought using the split function would be the best way to go.
    As an alternative you can also set the input record separator to space:
    use strict; use warnings; local $/ = ' '; while ( <DATA> ) { print "$_\n" if ( /p/i ); } __DATA__ CPAN stands for comprehensive Perl Archive Network. ^ and $ are used as anchors in a regular expression. /pattern/ is a pattern match operator. Perl is very easy to learn. Enter 'H' or 'h' for help.
    See $/ for more information about the input record separator.