Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

regex help

by kelscat18 (Initiate)
on Oct 05, 2013 at 19:00 UTC ( #1057061=perlquestion: print w/ replies, xml ) Need Help??
kelscat18 has asked for the wisdom of the Perl Monks concerning the following question:

Hi I'm using the following code to extract some words from a file: my  @words = grep(s/[^a-zA-Z0-9]/ /g, @lines); the problem is that the words i want must contain a mix of both letters and numbers.
jHj8nniO - good I87jjj8y - good jUjngnkk - bad ikbHH - bad
the good words are the words that are a mix of letters and numbers. thanks for any help.

Comment on regex help
Select or Download Code
Re: regex help
by Corion (Pope) on Oct 05, 2013 at 19:03 UTC

    Why not simply test the two conditions? First test that the word contains a letter, and in a second test check that the word contains a number?

      well.. that works fine too^^ thanks.
Re: regex help
by kcott (Abbot) on Oct 05, 2013 at 19:38 UTC

    G'day kelscat18,

    You're using a substitution (i.e. s/pattern/replacement/) when you really want a pattern match (i.e. /pattern/). You're also using the 'g' modifier, which is unnecessary here. Take a look at "perlretut - Perl regular expressions tutorial" to get an understanding of the basics. Here's how I might have coded that (which, I suspect, is close to what Corion had in mind):

    #!/usr/bin/env perl use strict; use warnings; my @tests = qw{jHj8nniO I87jjj8y jUjngnkk ikbHH 12345 !@$%^&*}; my @words = grep { /[A-Za-z]/ && /\d/ } @tests; print "@words\n";

    Output:

    jHj8nniO I87jjj8y

    -- Ken

Re: regex help
by jethro (Monsignor) on Oct 05, 2013 at 20:06 UTC

    Another solution, with only one regex:

    m/\d[a-zA-Z]|[a-zA-Z]\d/;

    This works because in a string with both letters and numbers there has to be at least one location where a letter and a number touch

    Update: To Laurent_R: Absolutely. Clarity and simplicity always wins. Except when this line is in the 3% of code that needs 99,7% of the runtime of a program and you have to optimise for speed

Re: regex help
by Laurent_R (Prior) on Oct 05, 2013 at 21:27 UTC

    This last solution from jethro is clever and effective, but, with such a problem, I would rather take the solution offered by Corion. I think that, faced with a problem like that, it is often better to think is terms of several simple regexes checking individual conditions, rather than building a single more complicated regex to match all cases. Assuming I have to read and understand some undocumented code, I certainly prefer to have something like:

    do_something() if /\d/ and /[A-Za-z]/;

    which tells me immediately that I need at least one letter and one digit, rather than:

    do_something() if /\d[a-zA-Z]|[a-zA-Z]\d/;

    which is quite clear in term of what it does, but less obvious in terms of what the intended underlying rule should really be. Having said that, I also sometimes use these types of supposedly clever shortcuts when they save some typing. But that often implies that I need to add a comment to explain the whole shebang, meaning that I don't save so much typing after all.

Re: regex help
by AnomalousMonk (Abbot) on Oct 06, 2013 at 03:27 UTC

    (Further to kcott's reply:)

    kelscat18: Not only does the substitution you show in the OP select the wrong strings when used with grep, it changes them and also changes strings in the input array.

    >perl -wMstrict -le "my @lines = qw(aaa 111 a2a2 a2==a2 aa==aa); printf '@lines before: '; printf qq{'$_' } for @lines; print ''; ;; my @words = grep(s/[^a-zA-Z0-9]/ /g, @lines); printf '@lines after: '; printf qq{'$_' } for @lines; print ''; printf '@words: '; printf qq{'$_' } for @words; print ''; " @lines before: 'aaa' '111' 'a2a2' 'a2==a2' 'aa==aa' @lines after: 'aaa' '111' 'a2a2' 'a2 a2' 'aa aa' @words: 'a2 a2' 'aa aa'
Re: regex help
by AnomalousMonk (Abbot) on Oct 06, 2013 at 04:39 UTC
    ... must contain a mix of both letters and numbers.
    ... good words are the words that are a mix of letters and numbers.

    The specification and example in the OP is a bit unclear to me, but, taken with some of the other replies, leads me to think that a "word" is a string that either:

    1. must contain only alphanumeric characters, with at least one alphabetic character and at least one numeric character; or
    2. may contain any characters, but with at least one alphabetic character and at least one numeric character; or
    3. may contain any characters, but with at least one contiguous alphabetic and numeric character pair in any order.

    The other replies seem to lean toward alternatives 2 and 3 above. My own first guess was for alternative 1, as in the last code examples below:

    >perl -wMstrict -le "my @lines = qw(abc 345 a1 1a a1a 1a1 abc1 1abc a1==a1 a==1); printf '@lines: '; printf qq{'$_' } for @lines; print qq{\n}; ;; printf 'and 1: '; printf qq{'$_' } for grep { /[[:alpha:]]/ && /\d/ } @lines; print ''; ;; printf 'regex 1: '; printf qq{'$_' } for grep m{ [[:alpha:]] \d | \d [[:alpha:]] }xms, @l +ines; print qq{\n}; ;; ;; printf 'and 2: '; printf qq{'$_' } for grep { !/[^[:alnum:]]/ && /[[:alpha:]]/ && /\d/ +} @lines; print ''; ;; my $al_num = qr{ [[:alpha:]] \d | \d [[:alpha:]] }xms; printf 'regex 2: '; printf qq{'$_' } for grep m{ \A [[:alnum:]]* $al_num [[:alnum:]]* \z +}xms, @lines; print qq{\n}; ;; ;; printf '@lines as was: '; printf qq{'$_' } for @lines; " @lines: 'abc' '345' 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' 'a1==a1' 'a==1 +' and 1: 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' 'a1==a1' 'a==1' regex 1: 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' 'a1==a1' and 2: 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' regex 2: 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' @lines as was: 'abc' '345' 'a1' '1a' 'a1a' '1a1' 'abc1' '1abc' 'a1==a1 +' 'a==1'

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1057061]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (10)
As of 2015-07-06 15:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (77 votes), past polls