Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Printing out matches for two regular expressions

by Maire (Scribe)
on Oct 22, 2017 at 08:29 UTC ( [id://1201828]=perlquestion: print w/replies, xml ) Need Help??

Maire has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I am trying to get a very basic script to print out the matches from two regular expressions at once. Specifically, I am trying to print out all of the numbers (digits) and all of the words in between a "#" and the word "fin" in .txt files which take the following format:

The 2 cats and the dog. The 8 cats and the 6 dogs. The 3 pigs and the 2 sheep. #story fin #cats and dogs fin #sheep fin

So, for example, from the above file, I would expect the output to be:

2 8 6 3 2 story cats and dogs sheep

At the moment, I am using the following script:

open(FILE, 'C:\Users\li\perl\animals.txt'); $/ = " "; while (<FILE>) { if (m/((\d)+?)|((?<=#)(.*?)(?= fin))/g) { print "$1\n"; } }

However, while this returns the numbers, it does not return the desired words. I believe that my mistake is using the | operator, which I think is telling the script to finish becuase it has found the first part of the regex and doesn't need to continue for the rest?

A google search suggested that lookaheads could be used in a way that mirrors an "and" operator:  (?=.*word1)(?=.*word2)(?=.*word3) (http://www.ocpsoft.org/tutorials/regular-expressions/and-in-regex/) However, the following regex, created using the lookaheads suggested above, returns no results for me

open(FILE, 'C:\Users\li\perl\animals.txt'); $/ = " "; while (<FILE>) { if (m/(?=.*((\d)+?))(?=.*((?<=#)(.*?)(?= fin)))/g) { print "$1\n"; } }

I also read about using Smart Match How do I efficiently match many regular expressions at once?. However, when I run the following script, the only thing that appears is a notification that "Smartmatch is experimental at C:\Users\li\perl\animalscript2.pl line 3."

open(FILE, 'C:\Users\li\perl\animals.txt'); my @patterns = ( qr/((\d)+)/, qr/((?<=#)(.*?)(?= fin))/); if( $string ~~ @patterns ) { print "$1\n"; };

Any help would be greatly appreciated!

Replies are listed 'Best First'.
Re: Printing out matches for two regular expressions
by choroba (Cardinal) on Oct 22, 2017 at 09:08 UTC
    Why do you set $/ to a space? It reads the input file word by word and can never match the second part of the expression, it never sees the #cats together with their fin.

    To find all the matches on one line, use while instead of if.

    Finally, the second capture group populates $2, even after the vertical bar. Use a restart pattern to always start populating $1 in alternatives:

    while (m/(?|(\d)+?|(?<=#)(.*?)(?= fin))/g) {

    Which could be simplified to

    while (m/(?|(\d)+?|#(.*?) fin)/g) {

    Update: Are you sure about (\d)+? ? Have you tested it with numbers of more than one digit? You probably wanted just plain (\d+) .

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      Brilliant! Thank you very much for your help!

      Also thanks for your tip about setting $/ to a space. In an earlier version of the script, I was trying to just locate multiple numbers across several lines; setting $/ to a space stopped the script from only printing the first number on each line. However, I realise now that it is inappropriate for the current task. Thanks again!

        The only thing I would add to choroba's comprehensive comments is that you might consider adding some kind of boundary assertion to the  fin delimiter pattern: see what happens when the  \b assertion in the match used below is omitted. I agree that the look-arounds don't seem needed, so I've left them out. (I use  \x23 instead of  # in my pattern only because my REPL doesn't like octothorpes.)

        c:\@Work\Perl\monks>perl -wMstrict -le "use 5.010; ;; my @lines = ( 'The 2 cats and the dog.', 'The 8 cats and the 6 dogs.', 'The 3 pigs and the 2 sheep.', '', '#story fin #cats and dogs fin #sheep fin', 'blah yada', '#sharkfin soup fin #fish fingers fin', '9 fleas, 87 ticks, 654 lice.', '42 cats #some sheep fin and 1 dog', ); ;; for my $line (@lines) { printf qq{'$line' -> }; ;; my $parsed = my @extracted = $line =~ m{ (?| (\d+) | \x23 (.*?) \s+ fin \b) }xmsg; ;; print $parsed ? map qq{'$_' }, @extracted : 'nothing parsed'; } " 'The 2 cats and the dog.' -> '2' 'The 8 cats and the 6 dogs.' -> '8' '6' 'The 3 pigs and the 2 sheep.' -> '3' '2' '' -> nothing parsed '#story fin #cats and dogs fin #sheep fin' -> 'story' 'cats and dogs' +'sheep' 'blah yada' -> nothing parsed '#sharkfin soup fin #fish fingers fin' -> 'sharkfin soup' 'fish finger +s' '9 fleas, 87 ticks, 654 lice.' -> '9' '87' '654' '42 cats #some sheep fin and 1 dog' -> '42' 'some sheep' '1'


        Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1201828]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-04-18 22:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found