regular expressions. help

apocalyptica has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: regular expressions. help by gaal (Parson) on Jun 29, 2004 at 19:35 UTC
First of all, is that assignment to `$evalme` on purpose? Are you in fact `eval`ing the variable after this? Second, you don't need the `s/^ //;` line. What it does is remove a single space from the beginning of the line. But if you want to ignore one leading space, you may as well ignore any leading whitespace: `if (/^\sHISTOGRAM ...rest of re.../)` In fact, it may (or may not, depending on how well-formed your data is) be reasonable to just drop the ^ anchor. Finally, for the reason why this regexp fails. You have "OF(\w+)$", which reads "the word characters that continuously occupy from immediately after the letters "OF" to the end of the line". This must fail because you have non-word characters in the rest of your line, indeed you have a space immediately* after "OF"! I couldn't understand if you're looking for the word that comes between the next two asterixes ("gpa" in this example) or if the next text inside parentheses (" 226" in this example) is what you want to capture. If the former, the following should work. I'm using extended regexp syntax for added readability: `m{ HISTOGRAM \s+ OF \s+ \* \s* # a literal "", escaped because is a metacharacter ([^]+?) # (capture) anything that isn't a "", nongreedy \s* \* }x;` [download] The group "(^+?)" is "nongreedy", which means that (since it is followed by \s, whitespace) it will automatically not include any trailing whitespace between the word and the following asterix.	[reply] [d/l] [select]
Re: regular expressions. help by matija (Priest) on Jun 29, 2004 at 19:26 UTC
Of course it's not matching: there's an asterisk in the way. You should change it to: `/^HISTOGRAM OF\s\\s*(\w+)$/`	[reply] [d/l]
Re^2: regular expressions. help by apocalyptica (Acolyte) on Jun 29, 2004 at 19:48 UTC
Hmmm... That looks like it should be correct, yes (like I said, I'm no good at regular experessions, but it looks right to me), but it's still not working. Let me just post the whole stupid program to give you an idea what I am trying to do: #!/usr/local/bin/perl $fl = '-?\d+\.\d+'; $evalme = q[ while(<>) { s/^ //; if(^HISTOGRAM OF\s\\s*(\w+)$/) { printf ("In loop.\n"); #just here for testing purposes +. write if $header; undef($cache); $header=$1; $varnum=$2; } if($header) { ]; eval <<EOM; $evalme (\$meanH, \$usersH) = (\$1, \$2) if /^GROUP\\s+(\\S+)\\s+( +\\S+)/; (\$mean, \$users) = (\$1, \$2) if /^MEAN\\s+(${fl})\\s+(${ +fl})/; \$levene = \$1 if /\\s+VARIABILITY\\s+${fl}\\s+(${fl})/; \$pooled = \$1 if /\\s+POOLED T\\s+${fl}\\s+(${fl})/; \$separate = \$1 if /\\s+SEPARATE T\\s+${fl}\\s+(${fl})/; \$mann = \$1 if /\\s+MANN-WHIT.\\s+${fl}\\s+(${fl})/; } } EOM write STDOUT; format STDOUT_TOP = \| @\|\|\|\| \| @\|\|\|\| \| Levene-P \| Pooled-P \| Mann-P \| Sep +arate $meanH, $usersH ----------+----------+----------+----------+----------+----------+---- +------ . format STDOUT = @<<<<<<<< \| @##.#### \| @##.#### \| @##.#### \| @##.#### \| @##.#### \| @## +.#### $header, $mean, $users, $levene, $pooled, $mann, $se +parate ----------+----------+----------+----------+----------+----------+---- +------ . [download] It reads through the input file until it finds HISTOGRAM OF and then begins pulling out the data as per above. Does any of it work? Well, I don't know, I still can't get this one stupid thing to work.	[reply] [d/l]
Re^3: regular expressions. help by shemp (Deacon) on Jun 29, 2004 at 20:22 UTC
The (\w+)$ is killing you again. You match 'HISTOGRAM OF', whitespace, asterisk, whitespace, but the rest of your string is not all \w (word chars), and since you added the '$' to match until the end, the \w+ fails to match when it hits whitespace again. I cannot stress enough to regex learners that whitespace NEEDS to be treated like all other characters.	[reply]
Re^4: regular expressions. help by apocalyptica (Acolyte) on Jun 29, 2004 at 20:40 UTC
Re: regular expressions. help by pzbagel (Chaplain) on Jun 29, 2004 at 19:42 UTC
If it's always the 4th field, use split. `my $data=(split())[3];` [download] Later	[reply] [d/l]
Re: regular expressions. help by shemp (Deacon) on Jun 29, 2004 at 19:29 UTC
Your regex in the if() has problems. \w represents 'word characters', letters, numbers, and underscore. Your regex is looking for a \w+ immediately after the 'OF'. You need to account for the whitespace in your regex. Im not exactly sure what your data will all potentially look like, but if the value you are looking for is the only thing in parenthesis, you could: `if ( /$[^)]+)$/ ) { $thing = $1; }` [download]	[reply] [d/l]
Re: regular expressions. help by NovMonk (Chaplain) on Jun 29, 2004 at 19:27 UTC
Let me see if I understand what you want-- if the line begins with `"HISTOGRAM OF"` you want to print the words "in loop"? Why would you need anything more than this for the match: `if (/^HISTOGRAM OF/){...etc}`? As to what you're doing with it, you could get at the gpa value by splitting the data on the white space and/or the asterisk. I'm not sure what you're after, but you could make an array of the gpa values that way and use them. That's how I'd start anyway. Hope this is helpful. Good luck. Pax, NovMonk	[reply] [d/l] [select]
Re^2: regular expressions. help by apocalyptica (Acolyte) on Jun 29, 2004 at 19:36 UTC
Well, what I'm doing is taking the value of the data in the fourth field (in this example, it is gpa, but it could be a whole host of random letters strung together) and putting it into a variable. But, that isn't the problem I'm having right now -- the problem is getting the blasted thing to match and acknowledge that there is anything there.	[reply]
Re^3: regular expressions. help by Fletch (Bishop) on Jun 29, 2004 at 19:42 UTC
When trying to construct an re to match something, `perl -de 0` can be very helpful. Set `$_` to your sample data and then you can iteratively construct your re with `x /blah/` (seeing if it matches and what matches at each step).	[reply] [d/l] [select]
Re: regular expressions. help by ercparker (Hermit) on Jun 29, 2004 at 23:07 UTC
`this worked for me: /^HISTOGRAM OF\s+?\*\s+(\w+)\s+/` [download]	[reply] [d/l]


P is for Practical
	PerlMonks