Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

RegEx Doubt

by mecrazycoder (Sexton)
on Sep 30, 2009 at 11:34 UTC ( #798296=perlquestion: print w/ replies, xml ) Need Help??
mecrazycoder has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I am having a line like
<xyz>a</xyz><xyz>a3</xyz><xyz>a2</xyz><xyz>a1</xyz>
. Now i want to write a regex to fetch a,a3,a2,a1 alone.How can i do that one. Regex i wrote was something like
$text=`/<xyz>(.*)<\/xyz>/
This doesn't seems to work.Please guide me

Comment on RegEx Doubt
Select or Download Code
Re: RegEx Doubt
by ccn (Vicar) on Sep 30, 2009 at 11:37 UTC
      Tanx man
Re: RegEx Doubt
by Bloodnok (Vicar) on Sep 30, 2009 at 11:45 UTC
    TIMTOWTDI - why not use split ...
    $ perl -e '@a = split /<\/?xyz>/, q(<xyz>a</xyz><xyz>a3</xyz><xyz>a2</ +xyz><xyz>a1</xyz>); print qq/@a\n/' a a3 a2 a1
    Update:

    Ahhh, maybe I see one reason, using Data::Dumper to print the output gives:

    $ perl -MData::Dumper -e '@a = split /\<\/?xyz\>/, q(<xyz>a</xyz><xyz> +a3</xyz><xyz>a2</xyz><xyz>a1</xyz>); print Dumper \@a' $VAR1 = [ '', 'a', '', 'a3', '', 'a2', '', 'a1' ];
    Question, for me at least is: why doesn't split swallow the sub-strings on which the string is split ? I'm obviously missing something, but can't see it - any enlightenment appreciated.

    TIA

    A user level that continues to overstate my experience :-))

      Hm, its non-zero-width, so it's still nice and easy: You've multiple 'split-points' in sequence in the source. Try either grep /./ on the split results or use split /(?:...)+/ to 'combine' them into one 'split-point'.

      Aren't they cute, those little regexes? Remembering apocalypse5 fondly :).

        ...use split /(?:...)+/ to 'combine' them into one 'split-point'. and then grep for empty lines i.e. grep /./, ..., since the first element is still empty, so might as well use grep /./, ... on the lot to start with.

        I tried the zero capture approach, but a) transposed the '?' and the ';' and b) didn't use '+' ... doh !!!

        Update:

        To reduce any confusion, the transposition to which I referred in the above was entirely down to the paucity of my typing i.e. I typed ':?' instead of '?:' and didn't notice .oO(Maybe I ought to use a larger font...) ;-)

        A user level that continues to overstate my experience :-))
      It puzzled me for a minute, but I found explanation. You have two sub-strings every time. Separated by nothing. </xyz>_nothing_<xyz>. And this nothing is what you find in your output.
      It does swallow the substrings on which the string is split. I don't see any <\/?xyz>s in the Dumper output.

      The blank entries being returned are the zero-length substrings in the middle of </xyz><xyz> - that combination is two matches of the split pattern with nothing in between.

Re: RegEx Doubt
by moritz (Cardinal) on Sep 30, 2009 at 11:45 UTC
    Use an XML or HTML parser (like XML::Twig or HTML::TreeBuilder) - parsing HTML with regexes can be very painful and time consuming.
    Perl 6 - links to (nearly) everything that is Perl 6.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://798296]
Approved by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (13)
As of 2014-08-22 17:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (163 votes), past polls