Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

regexp: extracting info

by jeanluca (Deacon)
on Dec 23, 2005 at 14:37 UTC ( #518762=perlquestion: print w/ replies, xml ) Need Help??
jeanluca has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I've the following issue:
$str = abc dddd 10 ; $str1 = "abc dddd ee 10" ;

I need to extract "10" however there might be 2 or 3 words preceding. Somehow I cannot construct the regexp. My quess was something like:
$number = $str =~ /abc\sdddd\s[ee]\s+(\d+)/gs ;
but it didn't work:(
Furthermore I realized that I don't understand the ?:, ?= etc patterns. I tried to read the perlre but that didn't help very much
Any suggestions for a good tut or an other post ?

Thanks in advance
Luca

Comment on regexp: extracting info
Select or Download Code
Re: regexp: extracting info
by prasadbabu (Prior) on Dec 23, 2005 at 14:43 UTC

    If i understood your question correctly, this will work. Recently Roy Johnson has explained about this in his tutorial Using Look-ahead and Look-behind which will help you to understand '?=' etc.

    $str = 'abc dddd 10 '; ($number) = $str =~ /(\d+)\s*$/; print "$number";

    Prasad

Re: regexp: extracting info
by Joost (Canon) on Dec 23, 2005 at 14:45 UTC
Re: regexp: extracting info
by tirwhan (Abbot) on Dec 23, 2005 at 14:50 UTC

    You almost have it. Two things are wrong:

    1. You need to put $number into list context to retrieve $1
    2. Read up the bit on character classes again in perlre, they allow you to match a range of single characters, you still need to quantify them if you want to match more than once. [ee] and [e] are equivalent and will both match a single character "e".

    this will work:

    ($number) = $str =~ /abc\sdddd\s+(?:ee\s+)?(\d+)/;

    As for your question about the two extended operators, they influence capturing of an expression in brackets. perlre really explains it the best way I know, though you may want to take a look at Mastering Regular Expressions which is a very worthwhile book to read when learning regular expressions.

    Update:Removed "gs" as they're not necessary (thanks to prasadbabu for pointing that out)


    Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
Re: regexp: extracting info
by jeanluca (Deacon) on Dec 23, 2005 at 14:52 UTC
    The problem I have with this string is a little bit miss understood. Assume you have a very large multi line string, and only one line looks likt describe above!
    Thanks for the tut suggestions!

    Luca

      If you want to match a line in a multiline string, you need the m-modifier and the line-begin (^) and line-end ($) indicator.
      You can read about those in the perldocs already meantioned.

      Going with your specification so far:

      I need to extract "10" however there might be 2 or 3 words preceding.
      and
      a very large multi line string, and only one line looks likt describe

      Going with this, i would suggest:

      $str =~ /^(?:\w+\s+){2,3}10$/m

      There might be other restrictions you want to place on your match (more specific whitespace preceding the 10? optional whitespace (or even other words) after the 10?)

      We don't know. You have to make up your mind, what the appropriate pattern is.

Re: regexp: extracting info
by ptum (Priest) on Dec 23, 2005 at 14:55 UTC

    Well, your description of the extraction rules and what you expect to encounter are somewhat sparse, but if the number will always be at the end of the string, something simple might do the trick for you:

    use strict; my $str = "abc dddd 10 "; my $number; if ($str =~ /^.*?(\d+)\s*$/) { $number = $1; } else { # do something with unexpected record }

    Alternatively, you may be looking for two or three words, like this:

    if ($str =~ /^(\w+\s+){2,3}(\d+)/) { $number = $2; }

    (code not tested)


    No good deed goes unpunished. -- (attributed to) Oscar Wilde
Re: regexp: extracting info
by ikegami (Pope) on Dec 23, 2005 at 15:40 UTC

    Noone clearly explained your problem, IMHO, so allow me.

    1) While square brackets (as in [ee]) usually refer to optional components in BNF and command usage syntax, they mean something else in regexps. Append a question mark to make an atom (a non-special character, a character escape, something in brackets or something in parenthesis) optional (e.g. a?, (?:regexp)?). In this case, [ee] should be changed to (?:ee)?.

    2) Regexps do not return captured values in a scalar context.
    my $number = $str =~ /.../;
    should be
    my ($number) = $str =~ /.../;

    3) Did you really mean to say /g? Also, /s is useless since there's no "." in your regexp.

     

    Here are some solutions (probably already mentioned):

    From the front:
    /^(\w+\s+){2,3}(\d+)/
    From the back:
    /(\d+)\s*$/
    If it's the only number:
    /(\d+)/

Re: regexp: extracting info
by doctor_moron (Scribe) on Dec 23, 2005 at 15:45 UTC

    Furthermore I realized that I don't understand the ?:, ?=

    Perl for Dummies's book said
    if for some odd reason you want to suppress $1 or $2 to prevent them from getting assigned, add ?: after the left paranthesis.

    example

    $test = 'NESTING'; $test =~ /(?:N.)S(.I)/; print "$1 \n";

    when you run this program, Perl displays "TI"

    zak

Re: regexp: extracting info
by planetscape (Canon) on Dec 23, 2005 at 23:11 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://518762]
Approved by neversaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (11)
As of 2014-10-01 17:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (31 votes), past polls