Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Complicated regexp problem

by stefanches7 (Initiate)
on Aug 02, 2017 at 16:51 UTC ( [id://1196568]=perlquestion: print w/replies, xml ) Need Help??

stefanches7 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, dear perl monks!

I have a problem with regexp. Code looks like

my $regexp = "(?:ftp:\/\/)?\/{5}(a-z_)\/"; my $link = "ftp://ftp.ensemblgenomes.org/pub/release-36/metazoa/vcf/ix +odes_scapularis/ixodes_scapularis_incl_consequences.vcf.gz"; if ($link =~ /$regexp/) { print "Captured info: $1 \n"; }

What I wanted to say with this regexp: either match or don't match "ftp://" a without capturing, then match the front slash exactly five times, after this capture all characters a-z or underscore before the next front slash. So in this exact case I would expect the regexp to match and $1 to return "ixodes_scapularis". However, if block seems to fail (nothing is printed).

What am I doing wrong? (I guess there could be several mistakes)

Replies are listed 'Best First'.
Re: Complicated regexp problem
by hippo (Bishop) on Aug 02, 2017 at 17:03 UTC
    then match the front slash exactly five times

    The string against which you are matching does not contain 5 consecutive slashes.

Re: Complicated regexp problem
by stevieb (Canon) on Aug 02, 2017 at 17:05 UTC
    / (?: # non capture group start ftp:\/{2} # ftp:// )? # end optional non-capture group (?: # non capture group start .*?\/ # anything non-greedy, followed by a / ){5} # end non-capture group, match five times (.*?) # capture everything, non-greedy \/ # until the very next fwd slash /x

    Note that you can stringify that into a single line and put it back into the variable. The x modifier allows you to have whitespace in the regex for clarity, and adding comments.

Re: Complicated regexp problem
by kevbot (Vicar) on Aug 03, 2017 at 05:12 UTC
    I see that you have already received good advice from other monks. I just wanted to mention that item 9 in Basic debugging checklist shows how to use the YAPE::Regex::Explain module to demystify regular expressions. For example, you can see an explanation of your regex by running this
    perl -MYAPE::Regex::Explain -E 'say YAPE::Regex::Explain->new("(?:ftp: +\/\/)?\/{5}(a-z_)\/")->explain()'
    The output is
    The regular expression: (?-imsx:(?:ftp://)?/{5}(a-z_)/) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?: group, but do not capture (optional (matching the most amount possible)): ---------------------------------------------------------------------- ftp:// 'ftp://' ---------------------------------------------------------------------- )? end of grouping ---------------------------------------------------------------------- /{5} '/' (5 times) ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- a-z_ 'a-z_' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- / '/' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Re: Complicated regexp problem
by Anonymous Monk on Aug 02, 2017 at 17:24 UTC
    (a-z_) should probably be ([a-z_]*), or better yet, (\w*).

    You're asking for trouble by putting your pattern in a string before using it in a regex. To avoid multiple-quoting headaches, use qr like this:

    my $regexp = qr{(?:ftp:\/\/)?\/{5}(a-z_)\/}; if ($link =~ /$regexp/) { ... }

      No need to escape the slashes as you are not using them as delimiters.

      my $regexp = qr{(?:ftp://)?/{5}(a-z_)/};

      Cheers,

      JohnGG

Re: Complicated regexp problem (Updated)
by AnomalousMonk (Archbishop) on Aug 02, 2017 at 22:38 UTC

    This is a variation on stevieb's reply. The optional  ftp:// doesn't make much sense unless it's anchored to something; I anchor it with  \A absolute-start-of-string. (Update: By the same token, the  (?: [^/]+ /){5} five-level-deep directory nesting pattern only makes sense if you specify five levels deep from what. Again,  \A is used, but I don't know if this is appropriate to stefanches7 true requirement.)

    c:\@Work\Perl\monks>perl -wMstrict -le "my $regexp = qr{ \A (?: ftp://)? (?: [^/]+ /){5} ([a-z_]+) / }xms; ;; my $link = 'ftp://ftp.ensemblgenomes.org/pub/release-36/metazoa/vcf/ixodes_sca +pularis/ixodes_lapuscaris_incl_consequences.vcf.gz'; ;; print qq{captured '$1'} if $link =~ $regexp; " captured 'ixodes_scapularis'


    Give a man a fish:  <%-{-{-{-<

Re: Complicated regexp problem
by Anonymous Monk on Aug 02, 2017 at 17:41 UTC
    Use URI instead!
Re: Complicated regexp problem
by stefanches7 (Initiate) on Aug 03, 2017 at 15:17 UTC

    All of your comments were helpful, thanks everybody! It works now, and with your help I do understand regex better now :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1196568]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-03-19 07:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found