http://www.perlmonks.org?node_id=951539

davies has asked for the wisdom of the Perl Monks concerning the following question:

Again, I have some code that works but I'm sure it could be better written using a regex that I don't know how to write. My code is:

$sTest = uc($sTest); if (uc($sLine) eq "REM $sTest" || Left(uc($sLine), 5 + length($sTest)) + eq "REM $sTest ") {

Left is a sub I have written to help me until I get out of the habit of writing VBA instead of Perl. It's a very basic call to substr.

I have been trying to write something along the lines of:

if ($sLine =~ m/rem $sTest€/i) {

The problem I have is replacing the with something that means "a space or the end of the string". I'm not sure whether Corion is hinting at how to do this in Re: Regex - Matching prefixes of a word or whether it means "one or the other, but not both in a single construct". Either way, that is the closest my searching has come to an answer. Is there a better one, please?

TIA & Regards,

John Davies

Replies are listed 'Best First'.
Re: Space or end of string in regex
by BrowserUk (Patriarch) on Feb 02, 2012 at 22:44 UTC

    You are almost there:

    if( $line =~ m[^REM $sTest ?$]i ) { # ... }

    The $ means end-of-line. The ? following the space preceding the $, means the space is optional.

    More formally, zero or one spaces may exist here.

    That said, allowing for just one trailing space is unusually mean. More normal would be to allow only whitespace to follow the required text before the end-of-line. Which could be written:

    if( $line =~ m[^REM $sTest\s*$]i ) { # ... }

    Formally zero or more whitespace characters may precede the end-of-line.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      This works perfectly, thank you. I'm not worried about multiple spaces as a single space is enough to indicate that this is a keyword that needs special treatment and multiple spaces are handled in that routine. My problem was not realising that I needed to anchor at the start. I was therefore looking for a character class that contained space or end of string and therefore, once again, flailing at demons.

      Am I right in thinking that replacing the slashes with square brackets is optional?

      Regards,

      John Davies

        Am I right in thinking that replacing the slashes with square brackets is optional?

        Indeed. Purely personal preference.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

        if( $line =~ m[^REM $sTest\s*$]i ) {
        If you use the //, then you don't need the "m" in front, but otherwise its just style preference.

        There is one other fine point about this, if $sTest could contain some characters that would normally mean something to the regex engine, like a "(" or whatever, you can specify to interpret the string $sTest literally - meaning ignore the meaning of those characters. This is done by surrounding $sTest with a \Q \E pair.

        if( $line =~ /^REM \Q$sTest\E\s*$/i ) {