Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

RegEx Doubt

by mecrazycoder (Sexton)
on Sep 30, 2009 at 11:34 UTC ( #798296=perlquestion: print w/ replies, xml ) Need Help??
mecrazycoder has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I am having a line like
. Now i want to write a regex to fetch a,a3,a2,a1 alone.How can i do that one. Regex i wrote was something like
This doesn't seems to work.Please guide me

Comment on RegEx Doubt
Select or Download Code
Re: RegEx Doubt
by ccn (Vicar) on Sep 30, 2009 at 11:37 UTC
      Tanx man
Re: RegEx Doubt
by Bloodnok (Vicar) on Sep 30, 2009 at 11:45 UTC
    TIMTOWTDI - why not use split ...
    $ perl -e '@a = split /<\/?xyz>/, q(<xyz>a</xyz><xyz>a3</xyz><xyz>a2</ +xyz><xyz>a1</xyz>); print qq/@a\n/' a a3 a2 a1

    Ahhh, maybe I see one reason, using Data::Dumper to print the output gives:

    $ perl -MData::Dumper -e '@a = split /\<\/?xyz\>/, q(<xyz>a</xyz><xyz> +a3</xyz><xyz>a2</xyz><xyz>a1</xyz>); print Dumper \@a' $VAR1 = [ '', 'a', '', 'a3', '', 'a2', '', 'a1' ];
    Question, for me at least is: why doesn't split swallow the sub-strings on which the string is split ? I'm obviously missing something, but can't see it - any enlightenment appreciated.


    A user level that continues to overstate my experience :-))

      Hm, its non-zero-width, so it's still nice and easy: You've multiple 'split-points' in sequence in the source. Try either grep /./ on the split results or use split /(?:...)+/ to 'combine' them into one 'split-point'.

      Aren't they cute, those little regexes? Remembering apocalypse5 fondly :).

        ...use split /(?:...)+/ to 'combine' them into one 'split-point'. and then grep for empty lines i.e. grep /./, ..., since the first element is still empty, so might as well use grep /./, ... on the lot to start with.

        I tried the zero capture approach, but a) transposed the '?' and the ';' and b) didn't use '+' ... doh !!!


        To reduce any confusion, the transposition to which I referred in the above was entirely down to the paucity of my typing i.e. I typed ':?' instead of '?:' and didn't notice .oO(Maybe I ought to use a larger font...) ;-)

        A user level that continues to overstate my experience :-))
      It puzzled me for a minute, but I found explanation. You have two sub-strings every time. Separated by nothing. </xyz>_nothing_<xyz>. And this nothing is what you find in your output.
      It does swallow the substrings on which the string is split. I don't see any <\/?xyz>s in the Dumper output.

      The blank entries being returned are the zero-length substrings in the middle of </xyz><xyz> - that combination is two matches of the split pattern with nothing in between.

Re: RegEx Doubt
by moritz (Cardinal) on Sep 30, 2009 at 11:45 UTC
    Use an XML or HTML parser (like XML::Twig or HTML::TreeBuilder) - parsing HTML with regexes can be very painful and time consuming.
    Perl 6 - links to (nearly) everything that is Perl 6.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://798296]
Approved by planetscape
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2015-07-06 05:06 GMT
Find Nodes?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...

    Results (70 votes), past polls