Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Re^2: what is the best way to seperate the digits and strings from variable ?

by dbwiz (Curate)
on May 09, 2005 at 10:58 UTC ( #455142=note: print w/replies, xml ) Need Help??

in reply to Re: what is the best way to seperate the digits and strings from variable ?
in thread what is the best way to seperate the digits and strings from variable ?

Please, test your code.

\d* will match 0 (ZERO) or more digits.

Thus, it will happily match an empty string at the beginning of a string like "abc123" and return a 0 length string. Try it.

The right expression to use in this case is \d+.

Moreover, a capturing regular expression should always be used with a test:

my $num; my $variable = "abc123"; if ( $variable =~ /(\d+)/) { $num = $1; }

Replies are listed 'Best First'.
Re^3: what is the best way to seperate the digits and strings from variable ?
by polettix (Vicar) on May 09, 2005 at 13:04 UTC
    Please read the OP and the variable names in the test code before slapping hands. You can be right from your implied point of view, but I had my rationale when posting (which implies that I tested my code, of course)

    Here, I'm considering digits as a character class, not as components of a number whose semantic is different from that of a string; this is why I name my variable $start_digits, not $num as you do. As you've surely noted, the OP never talks about numbers, always about digits (anyway, as I noted in my post, the OP was not clear about the usage of this extracted data).

    Thus, when I thought about putting "*" or "+"*, I considered that if there were no digits it was good, and the returned string would be empty.

    I think you can agree with me that, had the OP asked for initial letters, the regex:

    my $variable = "abc123"; my ($letters) = $variable =~ /^([a-zA-Z]*)/;
    would do the job.

    *I swear I thought about that!

    Flavio (perl -e 'print(scalar(reverse("\nti.xittelop\@oivalf")))')

    Don't fool yourself.

      Testing your code is a great concept. Of course, we all have to agree on the specs so we can all agree on what tests are needed. Your code works just fine for a certain subset of possibilities, dbwiz's code works just fine for a different subset of possibilities, both work just fine for the subset of posssibilities as presented by the OP. In the absence of a better spec, we all make assumptions that show the world that we, individually, live in more than they show the world that the OP lives in. (Which is why, if you look back at questions I pose, they're usually quite long-winded - to reduce the "absence of a better spec".)

      As for the initial letters, not that we're straying from the initial thread here ;-), I'd recommend matching with /^([[:alpha:]]*)/ instead. Again, we have to agree on a spec of what "initial letters" means (does it mean English letters, or can it include accented characters, or letters in other non-Roman lettering systems?). If it includes other languages, I like letting perl worry about that stuff for me ;-). Note that it is perfectly reasonable to only accept straight-ascii for some things. We just can't tell from what has been stated so far. (And I've just revealed a bit more about the world I live in.)

        In doing some theatre I think it's (should be) allowed to stretch specs at will - at least, until they don't STRRRRAAAAAAAPPP miserably! I'm a lazy guy (as you implicitly, and correctly, noted with your /^([[:alpha:]]*)/ regex :), but I couldn't accept that reply the very time I've actually actively thougth about "+" and "*"!

        One of the things that PM lacks is a pub section, were dbwiz and I could enjoy a beer laughing about all that! (Others would be welcome as well).

        Flavio (perl -e 'print(scalar(reverse("\nti.xittelop\@oivalf")))')

        Don't fool yourself.

      frodo72, Don't fool yourself, as you signature says.

      dbwiz has given you good advice. Assigning $1 without testing is almost a capital sin in Regex parlance. Your initial code would pass a test against the only example provided by the OP, but it would fail in many other cases.


        I perfectly know that dbwiz gave a good advice, as you can note in Re^5: what is the best way to seperate the digits and strings from variable ?. I only argued upon his jumping to conclusions pointing out that he was working under assumptions that were different from mine, and he basically made my very same error: thinking in a restricted context.

        And, jokes apart, I'm not fooling myself (even if you may think I'm trying to fool you); I'd like to see a single post of mine in which I was even slightly erroneous without taking my responsabilities. If you find, please point me out.

        If you read my answer carefully, you'd see that:

        • I thought about what I was writing, maybe for the very single time;
        • dbwiz suggestion was based upon the assumption that the OP wanted to extract a number. I was answering to a question that dealt with a string. We can argue about that at will, but the fact remains that we were talking about different things and that my answer applied to what I was talking about. Incidentally, the OP language seems to be biased towards my interpretation, at least in my understanding - but here I'm repeating myself.
        On with the rest:
        Assigning $1 without testing is almost a capital sin in Regex parlance
        This is like saying that I shouldn't use symbolic references. Why not, if they're in a controlled environment? As dbwiz correctly noted, my regex ALWAYS matches, because it matches the empty string. So, why should I check if $1 is set if I already know that it's set? Just because otherwise I'm committing a capital sin?
        Your initial code would pass a test against the only example provided by the OP
        I'm happy that you noticed, because I clearly stated that the question was not clear. On the other side, I completely disagree on the following consideration:
        but it would fail in many other cases.
        Of which you provide no single example. And I can tell you why: you continue to assume that you have to extract numbers, and I continue to say that we could need digits. What if he wanted to swap the digit and alpha parts? Would you accept to move files:
        0001-ciccio.txt 001-ciccio.txt 01-ciccio.txt
        all to ciccio-1.txt?

        To definitively conclude this mess:

        • I hereby state that I thank dbwiz for his pointing out that I'm a moron who'd better think about all possible context in which his answers could be used
        • I'm stopping this thread here. Sorry, I don't like it any more and I like much less to repeat myself. I accepts my limits in expressing myself, unluckily I'm Italian and English is not my primary language.
        • I ask pardon to all who've been bored by my posts in this thread.
        • I invite all, me first, to consider that there can be other points of view, and that the discussion should be open enough to benefit from them all without excluding any a-priori.
        Cin-cin with our beer pints! (or whatever you like to drink)

        Flavio (perl -e 'print(scalar(reverse("\nti.xittelop\@oivalf")))')

        Don't fool yourself.
Re^3: what is the best way to seperate the digits and strings from variable ?
by Errto (Vicar) on May 10, 2005 at 02:49 UTC
    Please, test your code.

    Testing code in replies is encouraged (if that), but not required. One must simply have the common sense to accept that occasionally untested code will prove itself wrong, and one must be willing to correct it should that occur.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://455142]
[LanX]: bah the BCC listed the 100 greatest comedies of all time and didn't even consider "Murder by Death" ...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2017-08-23 11:02 GMT
Find Nodes?
    Voting Booth?
    Who is your favorite scientist and why?

    Results (350 votes). Check out past polls.