Printing out matches for two regular expressions

Maire has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I am trying to get a very basic script to print out the matches from two regular expressions at once. Specifically, I am trying to print out all of the numbers (digits) and all of the words in between a "#" and the word "fin" in .txt files which take the following format:

The 2 cats and the dog.
The 8 cats and the 6 dogs.
The 3 pigs and the 2 sheep.

#story fin #cats and dogs fin #sheep fin
[download]

So, for example, from the above file, I would expect the output to be:

2
8
6
3
2
story
cats and dogs
sheep
[download]

At the moment, I am using the following script:

open(FILE, 'C:\Users\li\perl\animals.txt');  
$/ = " ";
while (<FILE>) {
    if (m/((\d)+?)|((?<=#)(.*?)(?= fin))/g) {
        print "$1\n";
}
        
}
[download]

However, while this returns the numbers, it does not return the desired words. I believe that my mistake is using the | operator, which I think is telling the script to finish becuase it has found the first part of the regex and doesn't need to continue for the rest?

A google search suggested that lookaheads could be used in a way that mirrors an "and" operator: (?=.*word1)(?=.*word2)(?=.*word3) (http://www.ocpsoft.org/tutorials/regular-expressions/and-in-regex/) However, the following regex, created using the lookaheads suggested above, returns no results for me

 
open(FILE, 'C:\Users\li\perl\animals.txt');  
$/ = " ";
while (<FILE>) {
    if (m/(?=.*((\d)+?))(?=.*((?<=#)(.*?)(?= fin)))/g) {
        print "$1\n";
}
        
}
[download]

I also read about using Smart Match How do I efficiently match many regular expressions at once?. However, when I run the following script, the only thing that appears is a notification that "Smartmatch is experimental at C:\Users\li\perl\animalscript2.pl line 3."


open(FILE, 'C:\Users\li\perl\animals.txt');  
    my @patterns = ( qr/((\d)+)/, qr/((?<=#)(.*?)(?= fin))/);
    if( $string ~~ @patterns ) {
        print "$1\n";
    };
[download]

Any help would be greatly appreciated!

Comment on Printing out matches for two regular expressions Select or Download Code

Replies are listed 'Best First'.
Re: Printing out matches for two regular expressions by choroba (Cardinal) on Oct 22, 2017 at 09:08 UTC
Why do you set $/ to a space? It reads the input file word by word and can never match the second part of the expression, it never sees the `#cats` together with their `fin`. To find all the matches on one line, use `while` instead of `if`. Finally, the second capture group populates $2, even after the vertical bar. Use a restart pattern to always start populating $1 in alternatives: `while (m/(?\|(\d)+?\|(?<=#)(.?)(?= fin))/g) {` [download] Which could be simplified to `while (m/(?\|(\d)+?\|#(.?) fin)/g) {` [download] Update: Are you sure about `(\d)+?` ? Have you tested it with numbers of more than one digit? You probably wanted just plain `(\d+)` . ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l] [select]
Re^2: Printing out matches for two regular expressions by Maire (Scribe) on Oct 22, 2017 at 10:07 UTC
Brilliant! Thank you very much for your help! Also thanks for your tip about setting $/ to a space. In an earlier version of the script, I was trying to just locate multiple numbers across several lines; setting $/ to a space stopped the script from only printing the first number on each line. However, I realise now that it is inappropriate for the current task. Thanks again!	[reply]
Re^3: Printing out matches for two regular expressions by AnomalousMonk (Archbishop) on Oct 22, 2017 at 14:50 UTC
The only thing I would add to choroba's comprehensive comments is that you might consider adding some kind of boundary assertion to the `fin` delimiter pattern: see what happens when the `\b` assertion in the match used below is omitted. I agree that the look-arounds don't seem needed, so I've left them out. (I use `\x23` instead of `#` in my pattern only because my REPL doesn't like octothorpes.) c:\@Work\Perl\monks>perl -wMstrict -le "use 5.010; ;; my @lines = ( 'The 2 cats and the dog.', 'The 8 cats and the 6 dogs.', 'The 3 pigs and the 2 sheep.', '', '#story fin #cats and dogs fin #sheep fin', 'blah yada', '#sharkfin soup fin #fish fingers fin', '9 fleas, 87 ticks, 654 lice.', '42 cats #some sheep fin and 1 dog', ); ;; for my $line (@lines) { printf qq{'$line' -> }; ;; my $parsed = my @extracted = $line =~ m{ (?\| (\d+) \| \x23 (.*?) \s+ fin \b) }xmsg; ;; print $parsed ? map qq{'$_' }, @extracted : 'nothing parsed'; } " 'The 2 cats and the dog.' -> '2' 'The 8 cats and the 6 dogs.' -> '8' '6' 'The 3 pigs and the 2 sheep.' -> '3' '2' '' -> nothing parsed '#story fin #cats and dogs fin #sheep fin' -> 'story' 'cats and dogs' +'sheep' 'blah yada' -> nothing parsed '#sharkfin soup fin #fish fingers fin' -> 'sharkfin soup' 'fish finger +s' '9 fleas, 87 ticks, 654 lice.' -> '9' '87' '654' '42 cats #some sheep fin and 1 dog' -> '42' 'some sheep' '1' [download] Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^4: Printing out matches for two regular expressions by Maire (Scribe) on Oct 23, 2017 at 06:40 UTC


The stupid question is the question not asked
	PerlMonks