Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Indexing multiplicate firstnames with regex

by stringZ (Acolyte)
on May 27, 2009 at 15:22 UTC ( [id://766449]=perlquestion: print w/replies, xml ) Need Help??

stringZ has asked for the wisdom of the Perl Monks concerning the following question:

Dear PerlMonks! I have a text file containing names, like

George Fortran
Jessi Heavens
Bill Clinton
Barack Obama
Bill Gates
Steve Jobs
Bill Green
George Bush

These names are just for the sake of example. I would like to mark the firstnames occur more than one with a concatenated incremented number, so this list of names would be:

George1 Fortran
Jessi Heavens
Bill1 Clinton
Barack Obama
Bill2 Gates
Steve Jobs
Bill3 Green
George2 Bush

I can do this with hashes and arrays, but I want to do it with regex substitution.

My code:

$dat = join(" ", @a); $i = 1;
$dat =~ s/([a-zA-Z\+) (.*) \1/${1}$i ${2} ${1}$i/g;
print $dat;


This is what I could do, it does not find all occurences and doesn't increment $i. These two things are my problems. @a is an array containing all the firstnames, and only them. $dat is a string where firstnames are separated by space.

I don't know if there's a solution manipulating an array with substitution. It would be so good if I could do this for an array containing whole records ( 'Jessi Heavens' would be an element instead of only the firstname 'Jessi') and there still was a way to do the thing with regex.

Thanks in advance
stringZ
  • Comment on Indexing multiplicate firstnames with regex

Replies are listed 'Best First'.
Re: Indexing multiplicate firstnames with regex
by ikegami (Patriarch) on May 27, 2009 at 15:58 UTC
    my %tot_count; ++$tot_count{ (/(\S+)/)[0] } for @full_names; my %cur_count; s/(\S+)/ $1 . ( $tot_count{$1} > 1 ? ++$cur_count{$1} : '' ) /e for @full_names;

    Update: Fixed bug.

Re: Indexing multiplicate firstnames with regex
by whakka (Hermit) on May 27, 2009 at 15:54 UTC
    This works (no regexes necessary):
    #!/usr/bin/perl use strict; use warnings; my %first; while ( <DATA> ) { chomp; my ($fn,$ln) = split; push @{$first{$fn}}, $ln; } for my $fn ( keys %first ) { my @ln = @{$first{$fn}}; if ( @ln == 1 ) { print $fn,' ',$ln[0],"\n"; } else { my $c = 1; print $fn,$c++,' ',$_,"\n" for @ln; } } __DATA__ George Fortran Jessi Heavens Bill Clinton Barack Obama Bill Gates Steve Jobs Bill Green George Bush
    This doesn't preserve the ordering, though.
Re: Indexing multiplicate firstnames with regex
by moritz (Cardinal) on May 27, 2009 at 16:01 UTC
    You can't count in regexes, unless you actually call code blocks. This can be done either in the substitution part with s/.../.../e or in the regex with (?{ code }) assertions.

    But all in all it seems foolish to do that kind of stuff in a single regex, because it's not what they are made for; hashes and a loop are a much better fit for this situation.

    Update: nonetheless I tried to write a regex which does the substitution, but it only works for the first name unless you use it in a loop. Here it goes:

    use strict; use warnings; my $str = join '', <DATA>; my $i = 0; 1 while $str =~ s{ ^((?>\w+))(?!\d)(.*?) ^\1(?!\d) }{"$1$2$1" . ++$i}msxeg; print $str; __DATA__ George Fortran Jessi Heavens Bill Clinton Barack Obama Bill Gates Steve Jobs Bill Green George Bush

    Second update: Uhm, I didn't read it too carefully, the code doesn't do what it should; ignore it...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://766449]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2024-04-23 20:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found