Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

regex problem

by Anonymous Monk
on Mar 04, 2018 at 19:31 UTC ( #1210321=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

The desired outcome should be :
abcd 723
abcd 724
abcde 552
abcde 554
abcde 553
abcded 756
but instead I get :
abcd 723
abcd -724
abcde 552
abcde -554-553
abcdef 756
abcdef
The code:
while ($line=<DATA>) { my @c=($line=~/^(\w+)\t(\d+)((?:-\d+)*)/); my @d=@c[1..$#c]; foreach $e (@d) { print $c[0]," ", $e,"\n"; } } __DATA__ abcd 723-724 abcde 552-554-553 abcdef 756

Replies are listed 'Best First'.
Re: regex problem
by tybalt89 (Priest) on Mar 04, 2018 at 19:55 UTC

    regex doesn't work that way.

    #!/usr/bin/perl # http://perlmonks.org/?node_id=1210321 use strict; use warnings; while(<DATA>) { my ($first, $rest) = /^(\w+)\s+([-\d]+)/; print "$first $_\n" for $rest =~ /\d+/g; } __DATA__ abcd 723-724 abcde 552-554-553 abcdef 756
Re: regex problem
by johngg (Abbot) on Mar 05, 2018 at 00:43 UTC

    A solution using splits and maps to build a hash. The order of the output might be problematic if you desire something other than sorted.

    johngg@shiraz ~/perl/Monks $ perl -Mstrict -Mwarnings -E ' open my $inFH, q{<}, \ <<EOD or die $!; abcd 723-724 abcde 552-554-553 abcdef 756 EOD my %hash = map { $_->[ 0 ] => [ split m{-}, $_->[ 1 ] ] } map { [ split ] } <$inFH>; foreach my $key ( sort keys %hash ) { say qq{$key $_} for @{ $hash{ $key } }; }' abcd 723 abcd 724 abcde 552 abcde 554 abcde 553 abcdef 756

    I hope this is useful.

    Update: This version dispenses with the hash so items will be output in the same order as input.

    johngg@shiraz ~/perl/Monks $ perl -Mstrict -Mwarnings -E ' open my $inFH, q{<}, \ <<EOD or die $!; abcd 723-724 abcde 552-554-553 abcdef 756 EOD say qq{@$_} for map { my $key = $_->[ 0 ]; my @nos = split m{-}, $_->[ 1 ]; map { [ $key => shift @nos ] } 1 .. scalar @nos; } map { [ split ] } <$inFH>;' abcd 723 abcd 724 abcde 552 abcde 554 abcde 553 abcdef 756

    Update 2: Even simpler.

    johngg@shiraz ~/perl/Monks $ perl -Mstrict -Mwarnings -E ' open my $inFH, q{<}, \ <<EOD or die $!; abcd 723-724 abcde 552-554-553 abcdef 756 EOD say qq{@$_} for map { my( $key, $rest ) = split; my @nos = split m{-}, $rest; map { [ $key => shift @nos ] } 1 .. scalar @nos; } <$inFH>;' abcd 723 abcd 724 abcde 552 abcde 554 abcde 553 abcdef 756

    Cheers,

    JohnGG

Re: regex problem
by AnomalousMonk (Chancellor) on Mar 04, 2018 at 19:58 UTC

    Try something like:

    c:\@Work\Perl\monks>perl -wMstrict -le "for my $s ( qq{abcd\t723-724}, qq{abcde\t552-554-553}, qq{abcdef\t756}, qq{abcdef\tfoo}, ) { my $parsed = my ($base, $groups) = $s =~ m{ \A ([[:alpha:]]+) \t (\d+ (?: - \d+)*) \z }xms; ;; die qq{bad string '$s'} unless $parsed; ;; print qq{'$s' -> }; for my $g ($groups =~ /\d+/g) { print qq{ '$base' '$g'}; } } " 'abcd 723-724' -> 'abcd' '723' 'abcd' '724' 'abcde 552-554-553' -> 'abcde' '552' 'abcde' '554' 'abcde' '553' 'abcdef 756' -> 'abcdef' '756' bad string 'abcdef foo' at -e line 1.

    Update: Well, basically the same idea as tybalt89's approach, but more effort at data validation.


    Give a man a fish:  <%-{-{-{-<

Re: regex problem
by Marshall (Abbot) on Mar 05, 2018 at 19:47 UTC
    I figure that your thinking is too complicated, especially when it comes to regex and processing the input lines.
    Consider this code:
    #!/usr/bin/perl use strict; use warnings; while (my $line = <DATA>) { next if $line =~ /^\s*$/; # skip blank lines $line =~ s/^\s*//; # remove leading spaces $line =~ s/\s*$//; # remove trailing space and line ending + my ($name, @nums) = split /[\s-]+/, $line; foreach my $num (@nums) { print "$name\t$num\n"; } } # PRINTS #abcd 723 #abcd 724 #abcde 552 #abcde 554 #abcde 553 #abcdef 756 __DATA__ abcd 723-724 abcde 552-554-553 abcdef 756
    I don't think the trim leading and trailing spaces statements are needed here given your DATA. However, you should become familiar with how to do that.

    Update: As a general rule:

    • Use Regex when you know what to keep.
    • Use Split when you know what to throw away.
    This code works the same:
    #!/usr/bin/perl use strict; use warnings; while (my $line = <DATA>) { next if $line =~ /^\s*$/; # skip blank lines my ($name, @nums) = $line =~ /(\w+)/g; foreach my $num (@nums) { print "$name\t$num\n"; } } # PRINTS #abcd 723 #abcd 724 #abcde 552 #abcde 554 #abcde 553 #abcdef 756 __DATA__ abcd 723-724 abcde 552-554-553 abcdef 756
    I almost always have a statement to throw away blank lines. They can often appear at the end of a file and hard to see when you just type or cat the file.
Re: regex problem
by writch (Acolyte) on Mar 06, 2018 at 16:06 UTC
    I just split it into the two thoughts you had, grabbing the text, and then the potential numbers in the line.
    while ($line=<DATA>) { my @c=($line=~/^(\w+)\t/); my @d=$line =~ /(\d+)/gsm; foreach $e (@d) { print $c[0]," ", $e,"\n"; } } __DATA__ abcd 723-724 abcde 552-554-553 abcdef 756
Re: regex problem
by hippo (Abbot) on Mar 08, 2018 at 12:29 UTC

    Taking our anonymous brother's spec literally, here is a solution which produces the actual output he says is desired:

    use strict; use warnings; while (my $line = <DATA>) { my ($key, @nums) = split /[\s-]+/, $line; $key =~ s/f$/d/; print "$key $_\n" for @nums; } __DATA__ abcd 723-724 abcde 552-554-553 abcdef 756
Re: regex problem SHORT SOLUTION
by python_guy (Initiate) on Mar 08, 2018 at 05:34 UTC
    HERE! If it's only printing like that, you worry about I would recommend this short piece of code!
    while ($line=<DATA>) { my @items = split /\s+/,$line; $items[1] =~ s/\-/\n$items[0] /g; print join " ",@items,"\n"; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1210321]
Front-paged by Corion
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (10)
As of 2018-04-24 15:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?