Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Regex Extraction Help

by Kenosis (Priest)
on Aug 09, 2012 at 16:27 UTC ( #986550=note: print w/ replies, xml ) Need Help??


in reply to Regex Extraction Help

Here's another option:

use Modern::Perl; my $dat = 'DR Pfam; PF00070; Pyr_redox; 2.'; my $info = (split '; ', $dat)[1]; say $info;

Output:

PF00070

Hope this helps!


Comment on Re: Regex Extraction Help
Select or Download Code
Re^2: Regex Extraction Help
by Flexx (Pilgrim) on Aug 09, 2012 at 17:26 UTC

    invaderzard, just wanted to make clear that this solution by Kenosis is the far quicker and easier version, which I'd, of course use anytime I'd just need a quick split by a field separator on an input.

    But: There is one caveat here to keep in mind. Split, of course, does not test the format of the input. So if you wanted the second field of a record that goes like this:

    $record = 'A;B;C;D';
    then
    $second_field = (split ';', $record)[1];

    does work. However so it does for inputs like:

    A;B #foo;B;ar ;B

    All of the above inputs would leave a B in $second_field. Which, you know might be correct in a particular case, but in general, we don't want to just ignore malformed records, so if we, say, iterate over records, then make sure to test and capture using a regexp in an if:

    if($record =~ m/^.;(.);.;.$/) { $second_field = $1; }

    Now this will only set $second_field if the record matches the four single-character fields delimited by one semicolon format. Even if the input is ';;;;;;;'. ;)

    Have fun with regexen. They're cool. ;)

    So long,
    Flexx

      You make a good point about splitting on a record separator within possibly malformed records. Based upon the OP's regex, it appears that the pattern's stable--with one space after the semi-colon. However, we can ask split to 'test' the format of the input, like this:

      my $info = (split /\s*;\s*/, $dat)[1];

      This will return the info the OP wants, whether there are spaces before or after the semi-colon, or not.

      And within a regex on the OP's data:

      use Modern::Perl; my $dat = 'DR Pfam; PF00070; Pyr_redox; 2.'; $dat =~ /;\s*(\w+)\s*;.+;/ and say $1; #prints PF00070

      It was a good call to address this issue...

        « Based upon the OP's regex, it appears that the pattern's stable--with one space after the semi-colon »

        Oh indeed, my "warning" was meant more like a general tip, I didn't just mean this particular example. Just meant to say that it's a difference in how split vs if(m//) with some rather "strict" regexp typically result in a different level of defensiveness of the code. Again, I mean just typically. I mean hey, "just use split" would've been first answer, too. But you wrote that already, so I had to come up with something nitpicking. ;)

        « However, we can ask split to 'test' the format of the input »

        Umm... ok, you wrote 'test' in quotes, so alright... ;)

        Sure, you can combine the split and trim operation, but still, this split would happily work on any input you throw at it (including undef, with a warning, though). It won't tell you (by not even matching) that your input looks a bit strange there.

        Now, again, I am not so much talking about the OP's concrete problem, but was trying to educate a bit on what method to use when, since his usage of \d\d\d\d\d instead of \d{5} suggested that regexen ain't something he works with since years (No offence meant.)

        So long,
        Flexx

        Cheers, Kenosis, Flexx and Ratazong for your help!

        Kenosis, your method really worked like a charm for mine, but Kudos to Flexx and Ratazong for giving me a better insight on how to settle regex in perl.

        Thanks again!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://986550]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2014-10-02 08:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (52 votes), past polls