Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Extract word

by etheral (Acolyte)
on Oct 07, 2011 at 17:41 UTC ( [id://930225]=perlquestion: print w/replies, xml ) Need Help??

etheral has asked for the wisdom of the Perl Monks concerning the following question:

#################################################################### ## MAIN #################################################################### my $input_file = get_file ('no_sense_hogwash.txt'); #define an input f +ile my %ignore = ( 'and' => 1, ); while($input_file =~ /(\w[\w'-]*)/g) { my $word = lc $1; if (defined $ignore{$word}) { next; } } print "$input_file\n"; #################################################################### ## SUBS #################################################################### sub get_file { #to open a file (currently restriction.txt) #and store all the data within $sequence my ($input_file) = @_; open (IN, $input_file) or die "Cannot open $input_file for reading: +$OS_ERROR\n"; #open a filehandle or die my $sequence = ''; foreach my $line (<IN>) { #for each line in the filehandle IN if ($line =~ /^\s*$/) { # discard blank line next; #skip the rest of the statement block and continue with th +e next iteration of the loop } $sequence .= $line # add (concatenate) to a string sequence } return $sequence; #return the string sequence close IN; }

I'd like to extract from a file all the text apart from the word 'and', but this code doesn't do that (but I tell it to do that, don't I?). Please help me, where is the bug?

Replies are listed 'Best First'.
Re: Extract word
by ikegami (Patriarch) on Oct 07, 2011 at 17:52 UTC

    The first problem is that you do the same thing regardless of the truth of defined $ignore{$word}.

    You could use the following:

    perl -pe's/\band\b//g;' in > out
    Or maybe you want to skip the entire line:
    perl -ne'print if !/\band\b/;' in > out
Re: Extract word
by suaveant (Parson) on Oct 07, 2011 at 19:05 UTC
    You ARE skipping and... but you aren't actually DOING anything when you skip it. You load the WHOLE file into $input_file and match words and then skip, but you aren't removing the word from the file. You'd need a substitute s/// to REMOVE text, something like s/(\w[\w'-]*)/$ignore{lc $1} ? '' : $1/g though you'd probably be better off generating a regular expression that matches any words you want, rather than matching all words and comparing against a hash. There are modules out there to do that, I see Regexp::List, though I think there was another one...

    That's assuming you just want to remove certain words not the whole line.

    Edit Regexp::Assemble is the one I was trying to think of

                    - Ant
                    - Some of my best work - (1 2 3)

Re: Extract word
by Lotus1 (Vicar) on Oct 07, 2011 at 19:22 UTC
    return $sequence; #return the string sequence close IN;

    The 'return' statement should come after the close. Otherwise the close statement will never be executed.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://930225]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (2)
As of 2024-04-25 06:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found