Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

special regexp

by alfie (Pilgrim)
on Mar 26, 2001 at 11:32 UTC ( #67124=perlquestion: print w/replies, xml ) Need Help??

alfie has asked for the wisdom of the Perl Monks concerning the following question:

Morning :-)

A dear friend of me asked me for a very special regular expression problem, which I tried to solve. It goes like this:
He needs to extract every character that resides between two x in a string. So that would make xaxlxfxixex -> alfie. Now the problem is, that it should also work with x inbetween. So I tried a little here and there, and the best I could came up with is the following:

alfie:~$ perl -ne 'while (m/\G[^x]*?x(.)(?=x)/g) {print $1;}' xaxbxcxdexfx abc
So, I am missing the f in the output... So, what happened here? I'm slightly confused. Thanks for any hint you can offer.

Replies are listed 'Best First'.
Re: special regexp
by Corion (Patriarch) on Mar 26, 2001 at 11:44 UTC

    From what I see, the only thing wrong with your regular expression is, that you don't allow for any characters between the last match and the next match. From your data, this regular expression works :

    F:\>perl -ne "while (m/\G.*?x(.)(?=x)/g) { print $1; }" axbxxcxdsxxxtx bcxt

    An interesting boundary case is xxxx - what do you expect to be printed ? My solution prints x, as it matches xxx and is then left with x, which does not match. Conceivably, xx would also be a solution, as you could first match xxx and then move to the second x and match xxx again.

      Uhm, wasn't it you last time that told me about Death to Dot Star! last time? ;-)
      I had [^x]*? where you are now using .*? - so why is .*? working here but [^x]*? not? *totallypuzzled*

        Yours is failing because you've anchored the match to where the previous one left off (with \G) and are also preventing moving across any 'x' characters that don't fit the x(.)(?=x) pattern by using the negative character class. The \G.*?x may traverse over an 'x' if the remainder of the expression fails.

        A simpler solution if you do not want to allow a "matched" 'x' to also count as a boundary 'x' is just:

        $_ = 'xaxbxcxdexfxxxx'; print $1 while /x(.)(?=x)/g; # abcfx

        If you do want to count 'matched' 'x' chars as potential boundary chars as well -- ie, 'xxxx' would produce 'xx' because there are two 'x' characters that have an 'x' on either side -- then:

        $_ = 'xaxbxcxdexfxxxx'; print $1 while /(?<=x)(.)(?=x)/g; # abcfxx
(Ovid) Re: special regexp
by Ovid (Cardinal) on Mar 26, 2001 at 12:42 UTC
    What are your boundary conditions (the point Corion raised). The following seems to be fine:
    perl -e "print \"xaxbxxxxdexfx\" =~ /x(.)(?=x)/g"
    That will print "abxf". This may or may not be appropriate. If you can clarify what to do with multiple x's, it will be easier to come up with a match.


    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: special regexp
by mirod (Canon) on Mar 26, 2001 at 13:36 UTC

    A pure regexp solution, although I would use jeroenes solution:

    s{(x+)}{"x" x (length( $1) /2);}eg;
Re: special regexp
by jeroenes (Priest) on Mar 26, 2001 at 12:48 UTC
    I'm a missing something? It seems to me this is a perfect target for split. You only have (if you are sure you always begin and end with 'x') to check for empty strings. Or join rightaway.

    $str='xjxexrxoxexnexsx'; print join '', split /x/, $str; #prints my monkname


    "We are not alone"(FZ)

      The tricky bit u and nysus seem to have missed is that he would need the 'x' when he matches 'xjxoxrxgxXx' (uppercase X for clarity) and extracts 'jorgx' rather than just 'jorg'


      "Do or do not, there is no try" -- Yoda

        Let's keep trying it regex-less.

        If you need just any second character, you could use substr:

        my $str='xjxexrxoxexnxexsxxx'; my $idx = -1; my $str_2nd = ''; $str_2nd .= substr( $str, $idx+=2, 1) while( $idx < length $str); print "\n$str_2nd\n";
        You can achieve the same with split:
        my $str='xjxexrxoxexnxexsxxx'; my @chars = split //, $str; my @chars_2nd; while ( scalar @chars ){ shift @chars; push @chars_2nd, shift @chars; } print join '', @chars_2nd;

        Is this more like it? Just note that these solutions don't know what each other character ('x') is. It gets trickier if you allow more than one character between each delimiter 'x'.

        "We are not alone"(FZ)

        Here's a two step reg. expression that takes care of that problem:
        $line = "xaxlxxxfxixex"; $line =~ s/(x(\w)x)/$2/g; $line =~ s/x$//;
Re: special regexp
by nysus (Parson) on Mar 26, 2001 at 13:14 UTC
    Unless I don't understand what you want to do, this will work:
    $_ = "xaxlxfxixex"; $line =~ s/x//g;

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://67124]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2022-05-18 03:44 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (68 votes). Check out past polls.