Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

match whitespace or beginning/end of string

by azadian (Sexton)
on Oct 30, 2009 at 12:28 UTC ( #804146=perlquestion: print w/ replies, xml ) Need Help??
azadian has asked for the wisdom of the Perl Monks concerning the following question:

I want to match either whitespace or beginning or end of string. The obvious way to do this is with assertions, or something like (\S|^). I'm wondering if there isn't some simpler way to do it. The motivation is that I have a string which looks like:

alpha="first" beta="second" gamma="third"

So to see if beta is there, and has the correct value, I could match it with something like m/\sbeta="second"\s/, but this loses at the beginning and end of the string. What I'd like is something like \b which also works as expected at the ends. I'd like something simpler because colleagues that don't know Perl need to write simple regex's for testing with my interpreter.

OK, folks, thanks for your contributions. I now know that I was not missing anything obvious, which was my chief aim.

Comment on match whitespace or beginning/end of string
Select or Download Code
Re: match whitespace or beginning/end of string
by gmargo (Hermit) on Oct 30, 2009 at 12:57 UTC

    What's wrong with \b? It's only needed at the beginning, since you have the closing quote to mark the end. (UPDATED below with fancier pattern)

    Test code:

    #!/usr/bin/perl use strict; use warnings; use diagnostics; my $alpha = qq(alpha="first"); my $beta = qq(beta="second"); my $gamma = qq(gamma="third"); my $s1 = qq($alpha $beta $gamma); my $s2 = qq($alpha $gamma $beta); my $s3 = qq($beta $alpha $gamma); #my $pat = qr/\b(beta="second")/; my $pat = qr/(?:\A|\s)(beta="second")(?:\s|\Z)/; print "s1: \'$s1\', match=".($s1 =~ $pat ? $1 : "nada")."\n"; print "s2: \'$s2\', match=".($s2 =~ $pat ? $1 : "nada")."\n"; print "s3: \'$s3\', match=".($s3 =~ $pat ? $1 : "nada")."\n";
    Produces:
    s1: 'alpha="first" beta="second" gamma="third"', match=beta="second" s2: 'alpha="first" gamma="third" beta="second"', match=beta="second" s3: 'beta="second" alpha="first" gamma="third"', match=beta="second"
    Updated: Removed half the test cases, since there's really only 3 possibilities. Also added fancy pattern that ensures either start/end of string or space is present.

      Works fine for beta, but the same code would not work for alpha. The point is, I don't know the location in the string of the word I'm matching. You're right that, in this case, I don't need to worry about the end of string. But still, I'd like to be able to give my users a general-purpose rule which works even without the quotes.
        See updated code above using fancier pattern. (Why do you think think the original would not work for alpha?)
Re: match whitespace or beginning/end of string
by ikegami (Pope) on Oct 30, 2009 at 16:35 UTC

    Some solutions:

    However, I don't see why the following doesn't suffice:

    /\b(beta="second")/

    You're obviously not writing a validator, and the double quote mark is unambiguously the end of what you want to match. The \b at the start is necessary to avoid accidentally matching alphabeta="fourth", but there's no need for anything similar at the end since the double quote can't be part of anything else.

    Update: I just noticed

    a general-purpose rule which works even without the quotes.

    Again, no problem

    /\b$id=(?:"[^"]*"|\w+)/
      \b and \s don't work if the substring to be matched comes at the beginning of the string. There are, as you pointed out, ways to deal with this, but they are too complicated for my users.

        \b and \s don't work if the substring to be matched comes at the beginning of the string.

        If you're going to contradict, please test first. You would have found yourself wrong. The beginning and the end of the string count as whitespace for \b. This is documented and observable:

        $_ = 'alpha="first" beta="second" gamma="third"'; for my $id (qw( alpha beta gamma )) { my ($val) = /\b$id=("[^"]*"|\w+)/ or next; print("$id: $val\n"); }
        alpha: "first" beta: "second" gamma: "third"

        Or

        $_ = 'alpha="first" beta="second" gamma="third"'; while (/(\w+)=("[^"]*"|\w+)/g) { print("$1: $2\n"); }
        alpha: "first" beta: "second" gamma: "third"
Re: match whitespace or beginning/end of string
by AnomalousMonk (Abbot) on Oct 30, 2009 at 18:05 UTC
    If you don't want to discombobulate your colleagues with any mention of regexen, maybe try secretly wrapping their simple patterns in whatever regex sub-patterns do the trick:
    >perl -wMstrict -le "my $pre = qr{ \A | \s }xms; my $post = qr{ \s | \z }xms; my $str = q{foo='123' barfoo='987' bar='555'}; for my $user_supplied (@ARGV) { my $rx = qr{ $pre \Q$user_supplied\E $post }xms; $str =~ m{ ($rx) }xms; print qq{[$user_supplied] matches }, $1 ? qq{[$1]} : q{nothing}; } " foo='123' bar='555' barfoo='987' foo='987' [foo='123'] matches [foo='123' ] [bar='555'] matches [ bar='555'] [barfoo='987'] matches [ barfoo='987' ] [foo='987'] matches nothing
    Note that the  $pre and  $post patterns can be adjusted to include as much or as little of the surrounding context (i.e., the spaces) as desired.
      That's a good idea which I hadn't thought of. Unfortunately it is not necessarily so easy to do in my application because the user is allowed to write any valid Perl regex. Maybe I can define a new class of matching which does have this wrapping.
        The approach still 'works' (in some sense) because the user can only supply the regex in the form of a string which is later compiled into and/or included into the compilation of a real regex object. Just don't use the metacharacter escaping mechanism of  \Q...\E or the quotemeta function.

        However, there seems to be another problem here: you say users may write any valid Perl regex, but you have also said (or implied) that they are not required to know anything about regexes nor to suffer any danger of exposure to such knowledge. This seems like a prescription for a major headache, if not disaster. User-supplied patterns would usually be validated or meta-quoted up the wazoo.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://804146]
Approved by wfsp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (11)
As of 2014-12-26 18:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (174 votes), past polls