Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: An "ethical" use of dot-star ..?

by broquaint (Abbot)
on Jun 02, 2003 at 14:12 UTC ( #262378=note: print w/replies, xml ) Need Help??


in reply to An "ethical" use of dot-star ..?

I guess I'm interested to know what the general consensus for the use of .* is. Is it something to be avoided at all costs, or is it a powerful, oft-misused tool that can be useful and beneficial in carefully controlled circumstances?
My rule of thumb for .* versus .*? is that the former is for grabbing everything after a certain point (I can't be bothered with $') and the latter for grabbing data between 2 points. So I guess it's a 'powerful oft-misused tool', but that's more due to the fact that people aren't aware of the concept of quantifier greediness.
While I'm at it *grin*, does anyone have a "better idea" for pulling the data out of the tags?
Due to the nature of XML it might be a good idea to have more layered regexes e.g
## *very* simplistic stuff (e.g doesn't deal with nested tags) my $token = qr{ (?: \b [A-Z]\w+ \b ) }xi; my $attrib = qr{ (?: $token \s* = \s* "[^"]+" \s* ) }x; my $begin_tag = qr{ < ( $token ) \s* ( $attrib* ) > }x; my $end_tag = qr{ </$token> }x; my $example = q[<ClientID type="String">A1234BX</ClientID>]; my($tag, $attribs, $data) = $example =~ m{ $begin_tag (.*?) $end_tag }x; print "tag - $tag\n"; print "attribs - $attribs\n"; print "data - $data\n"; __output__ tag - ClientID attribs - type="String" data - A1234BX
That could be simplified into a single regex, but like most things complex, they're much easier to digest if they're broken down into smaller components.
HTH

_________
broquaint

Replies are listed 'Best First'.
Re: Re: An "ethical" use of dot-star ..?
by sauoq (Abbot) on Jun 03, 2003 at 00:23 UTC
    My rule of thumb for .* versus .*? is that the former is for grabbing everything after a certain point (I can't be bothered with $') and the latter for grabbing data between 2 points.

    I'd say that .*? is most often useful when grabbing things between two points and the second point is defined by a string of more than one character. If the right hand side can be recognized by a single character I'd suggest a negated character class instead. For example, I'd almost alway prefer using /[^x]*/ to using /.*?x/ because the former is explicit in its exclusion of x's. :-)

    -sauoq
    "My two cents aren't worth a dime.";
    

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://262378]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2016-10-01 22:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?






    Results (9 votes). Check out past polls.