Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Variable assignment confusion

by sweetblood (Prior)
on Dec 15, 2003 at 17:21 UTC ( [id://314858]=perlquestion: print w/replies, xml ) Need Help??

sweetblood has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a function that expects either a 4 digit number or a 4digit number followed by an underscore and 3 more digits. (ie: 9999 or 9999_002) If the value only contains 4 digits I want to append "_001" to the end. I could have sworn that all I needed to do was: (pseudo code follows)

my ($new_retid) = $retid =~ /_\d{3}$/ || "${retid}_001";

This of course does not work. $new_retid is always a 1.
It always bugs me that I can't remeber stuff like this. And trying to find answers like this quickly is a challenge(but I do try).
As always,

Thank you!

Replies are listed 'Best First'.
Re: Variable assignment confusion
by fruiture (Curate) on Dec 15, 2003 at 17:48 UTC

    Well, you're assigning either the result of a match or, if the match is without success, the retid + "_001". It's not true that this always assigns 1, it only assigns 1 when the match is successfull, because a simple m// returns a boolean (undef or 1) and || will return it's left operand's result if it's true.

    What you want is, as you said:

    # parens not neccessary here
    my $new_retid = 
    # if the retid has the suffix
        $retid =~ /_\d{3}$/ ?
    # then leave it as is
        $retid :
    # otherwise add _001 suffix
       $retid.'_001'
    

    See perlop for "?:" and the behaviour of m//. HTH

    --
    http://fruiture.de
Re: Variable assignment confusion
by delirium (Chaplain) on Dec 15, 2003 at 17:54 UTC
    You need to enclose the regex match you're looking for in parentheses to assign it to a variable that way. For example:

    my ($new_retid) = $retid =~ /(^\d{4}_\d{3})$/; $new_retid ||= $retid.'_001';
      This is very clever. It's not what I had been thinking of but would work nicely.

      Thanks!

Re: Variable assignment confusion
by Roy Johnson (Monsignor) on Dec 15, 2003 at 18:39 UTC
    Is there some compelling reason to make this a one-line assignment? Here's what I'd consider nicely readable:
    my $new_retid = $retid; $new_retid .= '_001' unless $retid =~ /_\d{3}$/;
    Apart from the ternary operator suggested by several, here are a few bletcherous but possibly instructive ways to turn it into one assignment:
    my $new_retid = $retid . '_001' x $retid !~ /_\d{3}$/;
    (my $new_retid = $retid) .= do {'_001' unless $retid =~ /_\d{3}$/};
    my $new_retid = $retid . ($retid !~ /_\d{3}$/ and '_001');
    Update:
    One more:
    (my $new_retid = $retid) =~ s/(?<!_\d{3})$/_001/;
    And one inspired by delerium's post above:
    my ($new_retid) = grep( $_, $retid =~ /(^\d{4}_\d{3})$/, $retid.'_001' +);

    The PerlMonk tr/// Advocate
Re: Variable assignment confusion
by Aristotle (Chancellor) on Dec 15, 2003 at 18:42 UTC
    I'd throw in some stricter error checking and write it like this:
    $retid =~ /\A(\d{4})(_\d{3})?\z/ or die "Malformed ret. id\n"; my $new_retid = $1 . ( $2 ? $2 : '_001' );
    Though unless you need it, I'd just change with the original value:
    $retid .= '_001' if not $2;
    Update: s/\Q\d{4}/(\\d{4})/ of course. Thanks catching this to Not_a_Number.

    Makeshifts last the longest.

Re: Variable assignment confusion
by ysth (Canon) on Dec 15, 2003 at 17:56 UTC
    The 1 is the return value from the successful match. Try:
    my $new_retid = ($retid =~ /_\d{3}\z/ ? $retid : $retid . "_001");
      Which is (factoring out the common prefix):
      my $new_retid = $retid . ($retid =~ /_\d{3}\z/ ? '' : '_001');

      The PerlMonk tr/// Advocate
      This is no doubt what I was thinking about. Thanks! One question. Why a \z instead of $ at the end of the expression? Don't get me wrong, I'm sure your correct I just don't understand.

      Thanks to All!

        Please don't assume that someone else is right just because they are more experienced than you; always try (as you have done :) to get an explanation. Cargo cultism often stems from unquestioningly assuming that unusually-written code is that way for a reason.

        As to your inquiry, '\z' matches the very end of the string; whereas '$' matches the end, or just before a newline at the end of a string.
        This doesn't make any difference in your case; so I'd use '$' to avoid confusion.

        If you're curious; '\z' becomes much more useful when you switch on the '/m' (multi-line) flag on a regex to allow '$' to match before any newlines in the string. Have a look at perldoc perlre or Mastering Regular Expressions for more details.

        With $, it would match either "9999_999\n" or "9999_999", which is not what your original post requested.
Re: Variable assignment confusion
by pg (Canon) on Dec 15, 2003 at 18:02 UTC

    Use a function to wrap it, so that it can be reused:

    print foo("1234"), "\n"; print foo("1234_002"); sub foo { return ($_[0] =~ /^\d{4}$/) ? $_[0] . "_001" : $_[0]; }
Re: Variable assignment confusion
by TomDLux (Vicar) on Dec 15, 2003 at 19:48 UTC

    If you're certain you won't get invalid data, but only either four digits or else four digits, an underscore, and three digits, then using length or index or substr may be simpler than using a regex.

    $short = "1234"; $result_1 = $short; $result_1 .= "_001" unless ( 4 < length $result_1 ); $result 2 = $short; $result_2 .= "_001 unless ( substr( $result_2, 4, ) ); $result 3 = $short; $result_3 .= "_001 unless ( index( $result_3, '_', 4) );

    index is probably the fastest and most direct; if simply has to locate and return the fifth character,character number four, if there is one. substr returns the remainder of the string, starting after the fourth character, assuming that isn't past the end of the string. length has to count every character in the string. All of these are quite direct. Personally, I would use length, or maybe index.

    If it's important to you to use one line, all of these could be used as the condition in a ternary expression, but two lines is clearer, in my opinion. I like to stretch ternaries over three lines, unless they are very simple:

    $result_4 = $short . ( index( $short, 4, 1 ) ) ? "" : "_001";

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

      length has to count every character in the string.
      This isn't C. (I've been writing this a lot lately for some reason.) The size of a Perl scalar is stored in its metadata; all length has to do is look it up. This is much faster than either of the other functions. You can put NULs in the middle of Perl strings without any problems, remember?

      Makeshifts last the longest.

Re: Variable assignment confusion
by antirice (Priest) on Dec 16, 2003 at 04:17 UTC

    Just for fun:

    my $new_retid = $retid.("_001","")[0+$retid=~/_\d{3}/];

    Yeah. Fun stuff.

    antirice    
    The first rule of Perl club is - use Perl
    The
    ith rule of Perl club is - follow rule i - 1 for i > 1

Re: Variable assignment confusion
by qq (Hermit) on Dec 15, 2003 at 23:24 UTC

    I'd just use s/^(\d{4})$/$1_001/

    ~>perl -e '@id = (1234,"4323_003"); foreach ( @id ) { s/^(\d{4})$/$1_0 +01/; print $_, "\n"; }' 1234_001 4323_003

    If the input needs to be checked, I'd do it separately (and before).

    qq

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://314858]
Approved by sunadmn
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (8)
As of 2024-04-18 17:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found