Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: extract text between slashes

by halley (Prior)
on Oct 31, 2007 at 16:26 UTC ( [id://648293]=note: print w/replies, xml ) Need Help??


in reply to extract text between slashes

Your current attempt is good, but the match is greedy. It looks for the longest possible match, not the shortest. Adding a ? after the .+ would work fine. To understand "greedyness," check the perldocs for regular expressions: perlre
m{/(.*?)/}

A second issue is if the string has empty slots between slashes, such as the string "%///US1252691001". You probably want to be able to return an empty result in this case, so I changed your use of .+ (one or more) to .* (zero or more) characters. Otherwise, you might get a match back of "/" for strings like my example.

Update: As others mentioned but I didn't parse correctly, to get the THIRD field (e.g., "~/~/THIS/~") takes a little more work. Instead of a bunch of complicated lookaheads and lookbehinds, or switching to a split() instead, I would just parse through. This has the advantage of easily changing the pattern to capture the other fields if the requirements change.

m{/.*?/(.*?)/}

--
[ e d @ h a l l e y . c c ]

Replies are listed 'Best First'.
Re^2: extract text between slashes
by johngg (Canon) on Oct 31, 2007 at 16:44 UTC
    Adding a ? after the .+ would work fine

    I might be missing something but I don't think that will work as desired. It will return the first item between slashes which is %, not ISIN. I think split might be better here. Something like

    my $str = '%/%/ISIN/US1252691001'; my @elems = split m{/}, $str; my $isin = $elems[2];

    Cheers,

    JohnGG

    Update:

    You need a more complex regex to do this without split using zero-width look-around assertions, an alternation of two look-behinds and a look-ahead with an alternation.

    my @elems = $str =~ m{(?(?<=\A)|(?<=/))(.*?)(?=/|\z)}g;

Re^2: extract text between slashes
by EvanK (Chaplain) on Oct 31, 2007 at 17:00 UTC
    Also, keep in mind that he wants the contents of the *second* pair of slashes. Assuming that the first one with the percent sign is static, m{/\%/(.*?)/} might work. otherwise, he could grab all matches and filter out the wrong ones, or split the whole string beforehand:
    # method 1 @matches = $string =~ m{/(.*?)/}g; # method 2 @matches = split m{/}, $string; # print the one you want print $matches[1];

    __________
    Systems development is like banging your head against a wall...
    It's usually very painful, but if you're persistent, you'll get through it.

      Unfortunately, your method 1 isn't going to do the trick because the regex is going to consume %/%/ when doing the first match and the next attempted match is left with ISIN/US1252691001 to work with so the match fails.

      $ perl -le ' > $string = q{%/%/ISIN/US1252691001}; > @matches = $string =~ m{/(.*?)/}g; > print for @matches;' % $

      Cheers,

      JohnGG

Re^2: extract text between slashes
by RaduH (Scribe) on Oct 31, 2007 at 17:21 UTC
    I think we don't know enough about what he's looking for. It was said that he's looking for the text between the second pair of slashes. What if he is looking for the string between the last %/ and the very next / ? I think the input string is not described well enough in the original question. For all I can say, he could be looking for %/%/ as a fixed token, suck out all of the following characters until the first /, but this assumes all his input strings begin with %/%/ followed by what he needs to extract, which may not be a correct assumption.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://648293]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (7)
As of 2024-04-23 21:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found