Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: An "ethical" use of dot-star ..?

by broquaint (Abbot)
on Jun 02, 2003 at 14:12 UTC ( #262378=note: print w/ replies, xml ) Need Help??


in reply to An "ethical" use of dot-star ..?

I guess I'm interested to know what the general consensus for the use of .* is. Is it something to be avoided at all costs, or is it a powerful, oft-misused tool that can be useful and beneficial in carefully controlled circumstances?
My rule of thumb for .* versus .*? is that the former is for grabbing everything after a certain point (I can't be bothered with $') and the latter for grabbing data between 2 points. So I guess it's a 'powerful oft-misused tool', but that's more due to the fact that people aren't aware of the concept of quantifier greediness.
While I'm at it *grin*, does anyone have a "better idea" for pulling the data out of the tags?
Due to the nature of XML it might be a good idea to have more layered regexes e.g
## *very* simplistic stuff (e.g doesn't deal with nested tags) my $token = qr{ (?: \b [A-Z]\w+ \b ) }xi; my $attrib = qr{ (?: $token \s* = \s* "[^"]+" \s* ) }x; my $begin_tag = qr{ < ( $token ) \s* ( $attrib* ) > }x; my $end_tag = qr{ </$token> }x; my $example = q[<ClientID type="String">A1234BX</ClientID>]; my($tag, $attribs, $data) = $example =~ m{ $begin_tag (.*?) $end_tag }x; print "tag - $tag\n"; print "attribs - $attribs\n"; print "data - $data\n"; __output__ tag - ClientID attribs - type="String" data - A1234BX
That could be simplified into a single regex, but like most things complex, they're much easier to digest if they're broken down into smaller components.
HTH

_________
broquaint


Comment on Re: An "ethical" use of dot-star ..?
Download Code
Re: Re: An "ethical" use of dot-star ..?
by sauoq (Abbot) on Jun 03, 2003 at 00:23 UTC
    My rule of thumb for .* versus .*? is that the former is for grabbing everything after a certain point (I can't be bothered with $') and the latter for grabbing data between 2 points.

    I'd say that .*? is most often useful when grabbing things between two points and the second point is defined by a string of more than one character. If the right hand side can be recognized by a single character I'd suggest a negated character class instead. For example, I'd almost alway prefer using /[^x]*/ to using /.*?x/ because the former is explicit in its exclusion of x's. :-)

    -sauoq
    "My two cents aren't worth a dime.";
    

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://262378]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (13)
As of 2014-09-18 19:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (122 votes), past polls