Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
P is for Practical
 
PerlMonks

using regex to capture a string and an array

by blackadder (Hermit)
 | Log in | Create a new user | The Monastery Gates | Super Search | 
 | Seekers of Perl Wisdom | Meditations | PerlMonks Discussion | 
 | Obfuscation | Reviews | Cool Uses For Perl | Perl News | Q&A | Tutorials | 
 | Poetry | Recent Threads | Newest Nodes | Donate | What's New | 

on Nov 01, 2009 at 12:33 UTC ( #804335=perlquestion: print w/ replies, xml ) Need Help??
blackadder has asked for the wisdom of the Perl Monks concerning the following question:

Bonjour Monks,..

I have strings like these;

uk1sxve01205.gfjgjf5.fdhd5 usasxve513.gfdhf4.hgfd4
I am trying to capture the first 3 chars and all the digits up to the first dot from the left. So I put this bit of code together
my ($Site_Code, @RS) = ($String=~ /^(...)....(\d+)/g);
But @RS only contains one element! Instead of what I expected that \d+)/g); will place each digit as an array element in @RS!

Enlightments s'il vou plait

Blackadder

Comment on using regex to capture a string and an array
Select or Download Code
Re: using regex to capture a string and an array
by jettero (Prior) on Nov 01, 2009 at 13:33 UTC
    That /g is making the whole regex try to match again. I don't think it's helping you here. Try the non-greedy anything match instead and get specific about your delimiter. You *said* what you wanted. It looks like this:
    my ($site, $digits) = $String =~ m{ ^(.{3}) # the site code .+? # noise, as little as possible though (\d+) # the digits (keepers) \. # the delimiter }x;

    -Paul

[reply]
[d/l]
      Once you've got the digits as a string, you can turn that into an array thus:
      my @RS = split "", $digits;
      (This is included in the suggestion below from BioLion, but it's a bit buried in the code, so I thought I'd post it by itself. No votes required.)

      --
      use JAPH;
      print JAPH::asString();

[reply]
[d/l]
Re: using regex to capture a string and an array
by BioLion (Friar) on Nov 01, 2009 at 13:53 UTC

    Check out perlre and the bit about greedy matching (in fact the whole thing is wirth a read). The (\d+) will capture one *or more* digits, and so will capture all the digits in one slurp. Also using the dot character is nasty as it will match anything and can lead to unexpected nasties...

    Capturing an unknown number of things with regexes is difficult (c.f. known elements of unknown length), and so i would suggest keeping it simple and adding an intermediate step:

    use strict; use warnings; while (<DATA>){ my $input = $_; if ( ## are you sure the format is correct? $input =~ m/^(\w{3}) ## match 3 alphanumerics at the start [^\d]* ## non digits in the middle (\d+) ## capture all the digits before \. ## an actual dot /x ){ my $site_code = $1; my @rs = split '', $2; ## split the digits up into an array print "Input : $input\n\$site_code : \'$site_code\'\n\@rs :\n\t", +(join "\n\t", @rs), "\n"; } else{ ## ... process alternately? print "input \'$input\' cannot be processed.\n"; } } __DATA__ uk1sxve01205.gfjgjf5.fdhd5 usasxve513.gfdhf4.hgfd4 how_did_this_get_here?
    Just a something something...
[reply]
[d/l]
[select]
Re: using regex to capture a string and an array
by ikegami (Saint) on Nov 01, 2009 at 18:36 UTC

    /g indicates the match should be performed repeatedly, so

    /^(...)....(\d+)/g
    is basically
    / ^(...)....(\d+) (?: (?s:.*?) ^(...)....(\d+) (?: (?s:.*?) ^(...)....(\d+) (?: (?s:.*?) ^(...)....(\d+) etc )?)?)? /xg

    (But with less ability to backtrack)

[reply]
[d/l]
[select]
Re: using regex to capture a string and an array
by AnomalousMonk (Deacon) on Nov 01, 2009 at 18:38 UTC
    The following does the job with a single regex (well, almost), but is perhaps a bit too cute to live in production code:

    >perl -wMstrict -le "for my $String (@ARGV) { my ($Site_Code, @RS) = grep defined, $String =~ m{ \A (...) .{4} | \G (\d) }xmsg ; local $\" = q{' '}; print qq{site code: '$Site_Code' digits: '@RS'}; } " uk1sxve01205.gfjgjf5.fdhd5 usasxve513.gfdhf4.hgfd4 site code: 'uk1' digits: '0' '1' '2' '0' '5' site code: 'usa' digits: '5' '1' '3'
    As you will see if you eliminate the grep statement, the regex produces a rain of undefined elements.

    I think I would prefer an approach more in line with those given in other replies:

    1. Extract the decimal digits as a single string. This allows you to be very specific about what you want.
    2. If you are interested in the individual digit characters, split the string of digits to an array to get them.
[reply]
[d/l]

Back to Seekers of Perl Wisdom


Login:
Password
remember me
What's my password?
Create A New User

Node Status
node history
Node Type: perlquestion [id://804335]
Approved by biohisham
Front-paged by wfsp
help
Community Ads
Chatterbox
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users
Others drinking their drinks and smoking their pipes about the Monastery: (10)
Moriarty
atcroft
Gavin
herveus
dsheroh
Eyck
Gangabass
baxy77bax
gnosti
im2
As of 2009-11-21 11:18 GMT
Sections
The Monastery Gates
Seekers of Perl Wisdom
Meditations
PerlMonks Discussion
Categorized Q&A
Tutorials
Obfuscated Code
Perl Poetry
Cool Uses for Perl
Perl News
Information
PerlMonks FAQ
Guide to the Monastery
What's New at PerlMonks
Voting/Experience System
Tutorials
Reviews
Library
Perl FAQs
Other Info Sources
Find Nodes
Nodes You Wrote
Super Search
List Nodes By Users
Newest Nodes
Recently Active Threads
Selected Best Nodes
Best Nodes
Worst Nodes
Saints in our Book
Leftovers
The St. Larry Wall Shrine
Offering Plate
Awards
Craft
Snippets Section
Code Catacombs
Quests
Editor Requests
Buy PerlMonks Gear
PerlMonks Merchandise
Planet Perl
Perlsphere
Use Perl
Perl.com
Perl 5 Wiki
Perl Jobs
Perl Mongers
Perl Directory
Perl documentation
CPAN
Random Node
Voting Booth

Future historians will find that the material characteristic of the current era is...

Aluminium
Plastic
Oil
Water
Carbon dioxide
Copper
Iron
Silicon
Salt
Uranium
Hydrogen
Other

Results (730 votes), past polls