regular expressions

wannabeboy has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: regular expressions by brian_d_foy (Abbot) on Mar 01, 2005 at 08:45 UTC
I think you want the regex I show in this little program. It starts with an "a", has one of more non-newline characters, then uses the alternation `(f\|g)` to denote that the whole thing can end with either of those characters. `#!/usr/bin/perl while( <DATA> ) { chomp; print "$_ matches!\n" if /^a.+(f\|g)$/; } __DATA__ axxxi bxg cxxf axxxh axxxg axxxf` [download] -- brian d foy <bdfoy@cpan.org>	[reply] [d/l] [select]
Re^2: regular expressions by reasonablekeith (Deacon) on Mar 01, 2005 at 08:50 UTC
Slightly picky, but I'd use a character class to match the f or g. I'm pretty certain it'd be quicker, as perl isn't jumping through hoops to catch $1. `print "$_ matches!\n" if /^a.+[fg]$/;` [download]	[reply] [d/l]
Re^2: regular expressions by wannabeboy (Novice) on Mar 01, 2005 at 08:56 UTC
Hi it's me again, this is really what I'm trying to do, get each line of a file end if the word on the line begins with a has 1 or more words then ends with f or g (images files like amm.gif or ammre.jpg) get the word into a table @amourss. `#!usr/bin/perl # 2005-03-01 : # ouvrir le fichier contenant la liste des cartes open(CARTES, "imagescartes.txt") or die "Ouverture du ficher imagescar +tes.txt impossible: $!\n"; -T "imagescartes.txt" or print "ceci n'est pas un fichier texte\n"; while ($ligne = <CARTES>) { chop($ligne); while ($amourss .= /^a.+(f\|g)$/g ) { $total++; } }` [download]	[reply] [d/l]
Re^3: regular expressions by brian_d_foy (Abbot) on Mar 01, 2005 at 09:01 UTC
If you are looking for particular filenames, I would expand the regular expression. I'd specify the file extension as much as possible, including the literal full stop that separates the name and extension. The /i flag is sometimes a good idea since some things like to make everything upper case. `/^a.*\.(jpg\|gif)$/i` [download] -- brian d foy <bdfoy@cpan.org>	[reply] [d/l]
Re^3: regular expressions by Random_Walk (Prior) on Mar 01, 2005 at 09:18 UTC
I think this will do what you want. #!/usr/bin/perl # 2005-03-01: # Please always use strict and warnings, they will capture # many common mistakes and typos use strict; use warnings; # we may as well test it before we try to open it unless (-T 'imagescartes.txt') { print "ceci n'est pas un fichier texte\n"; exit 1; } # ouvrir le fichier contenant la liste des cartes open CARTES, '<', 'imagescartes.txt' or die 'Ouverture du ficher image +scartes.txt impossible: $!\n'; # this is the array where we store the image file names my @amourss; while (my $ligne = <CARTES>) { chomp $ligne ; if ($ligne =~ /^a\S+\.(?:gif\|jpg)$/) # will match exactly one word # begining a and ending in .g +if or .jpg # the \S+ is one or more non +space chrs # a.gif not allowed use \S* t +o allow it { push @amourss, $ligne; } } close CARTES; print "found ", scalar @amourss, " images: "; print join ", ", @amourss; print "\n"; __END__ # input file used test.gif this asilly.gif abc.jpg abcd.gif absolutely not a.jpg # results of running >./amourss found 2 images: abc.jpg, abcd.gif > [download] Update as Jasper points out a space is valid in filenames in most OSes so please feel free to change the regex to `/^a.+\.(?:gif\|jpg)$/` if you wish to allow spaces in filenames or `/^a.*\.(?:gif\|jpg)$/` if you wish to allow a.gif and a.jpg Cheers, R. Pereant, qui ante nos nostra dixerunt!	[reply] [d/l] [select]
Re^4: regular expressions by Jasper (Chaplain) on Mar 01, 2005 at 10:13 UTC
Re^5: regular expressions by Random_Walk (Prior) on Mar 01, 2005 at 13:47 UTC
Re: regular expressions by saintmike (Vicar) on Mar 01, 2005 at 08:49 UTC
Hmmm ... seems like some of the choices of operators and regular expresssions are somewhat arbitrary. Did you guess :) ? Couple of hints: The operator to match a string with regular expression is `=~`, not `.=` Capturing a single dot in parentheses (`(.)`) doesn't really make sense unless you want to capture the very character this dot matches and save it in a variable. To match either f or g, use a character class: `[fg]` Remains the question where you've got those ideas from ...	[reply] [d/l] [select]
Re: regular expressions by tirwhan (Abbot) on Mar 01, 2005 at 09:00 UTC
Appending the matched text like that won't work, you need to do it in two steps: `/<regex>/; $amourss.=$1;` [download] This is what I'm guessing you want to do with your regular expression(with comments): `/^ # Match the start of the string a # Single occurrence of the character a (.+) # Grab one or more other characters # and put into $1 [fg] # a single character, either "f" or "g" $ # match the end of the string /x # x modifier to allow comments in regex` [download] So your complete code would look something like this: `/^a(.+)[fg]$/ $amourss.=$1;` [download]	[reply] [d/l] [select]
Re: regular expressions by TedPride (Priest) on Mar 01, 2005 at 08:46 UTC
You want `if ($amourss =~ /^a.+[fg]$/) { }` [download] or `if ($amourss =~ /^a.+(?:f\|g)$/) { }` [download] Assuming you're doing a test and not trying to return part of the motif. EDIT: The example given above is inefficient because it uses `(f\|g)` instead of `(?:f\|g)`, requiring the regex to return this part of the match as $1 when not needed.	[reply] [d/l] [select]
Re: regular expressions by inman (Curate) on Mar 01, 2005 at 09:04 UTC
The following code shows some variations on what you want. It matches and captures a string containing one or more no white space characters(represented by \S). The various options show the difference between greedy and non-greedy matching as well as anchoring on word boundaries. #! /usr/bin/perl -w use strict; use warnings; my $data = "abcdefghijklmnopFqrstuvwxyz arrrrhhhhg!"; # Greedily match as much non-whitespace (\S+) as we can that # starts with an a and ends with an f or g print "Greedily matched $1\n" if $data =~ /(a\S+(f\|g))/i; # Add the ? modifier to make the expression match minimally print "Minimally matched $1\n" if $data =~ /(a\S+?(f\|g))/i; # Anchor each end of the expression to a word boundary (\b) # so that we only match words that start with an a and ends with an f +or g print "Word matched $1\n" if $data =~ /(\ba\S+(f\|g)\b)/; # Apply the same regular expression to pick out all of the # matches in the data while ($data =~ /(a\S+?(f\|g))/ig) { print "Word: $1\n"; } [download]	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.


Welcome to the Monastery
	PerlMonks

regular expressions

Update