Indexing multiplicate firstnames with regex

stringZ has asked for the wisdom of the Perl Monks concerning the following question:

Dear PerlMonks! I have a text file containing names, like

George Fortran
Jessi Heavens
Bill Clinton
Barack Obama
Bill Gates
Steve Jobs
Bill Green
George Bush

These names are just for the sake of example. I would like to mark the firstnames occur more than one with a concatenated incremented number, so this list of names would be:

George1 Fortran
Jessi Heavens
Bill1 Clinton
Barack Obama
Bill2 Gates
Steve Jobs
Bill3 Green
George2 Bush

I can do this with hashes and arrays, but I want to do it with regex substitution.

My code:

$dat = join(" ", @a); $i = 1;
$dat =~ s/([a-zA-Z\+) (.*) \1/${1}$i ${2} ${1}$i/g;
print $dat;

This is what I could do, it does not find all occurences and doesn't increment $i. These two things are my problems. @a is an array containing all the firstnames, and only them. $dat is a string where firstnames are separated by space.

I don't know if there's a solution manipulating an array with substitution. It would be so good if I could do this for an array containing whole records ( 'Jessi Heavens' would be an element instead of only the firstname 'Jessi') and there still was a way to do the thing with regex.

Thanks in advance
stringZ

Comment on Indexing multiplicate firstnames with regex

Replies are listed 'Best First'.
Re: Indexing multiplicate firstnames with regex by ikegami (Patriarch) on May 27, 2009 at 15:58 UTC
`my %tot_count; ++$tot_count{ (/(\S+)/)[0] } for @full_names; my %cur_count; s/(\S+)/ $1 . ( $tot_count{$1} > 1 ? ++$cur_count{$1} : '' ) /e for @full_names;` [download] Update: Fixed bug.	[reply] [d/l]
Re: Indexing multiplicate firstnames with regex by whakka (Hermit) on May 27, 2009 at 15:54 UTC
This works (no regexes necessary): `#!/usr/bin/perl use strict; use warnings; my %first; while ( <DATA> ) { chomp; my ($fn,$ln) = split; push @{$first{$fn}}, $ln; } for my $fn ( keys %first ) { my @ln = @{$first{$fn}}; if ( @ln == 1 ) { print $fn,' ',$ln[0],"\n"; } else { my $c = 1; print $fn,$c++,' ',$_,"\n" for @ln; } } __DATA__ George Fortran Jessi Heavens Bill Clinton Barack Obama Bill Gates Steve Jobs Bill Green George Bush` [download] This doesn't preserve the ordering, though.	[reply] [d/l]
Re: Indexing multiplicate firstnames with regex by moritz (Cardinal) on May 27, 2009 at 16:01 UTC
You can't count in regexes, unless you actually call code blocks. This can be done either in the substitution part with `s/.../.../e` or in the regex with `(?{ code })` assertions. But all in all it seems foolish to do that kind of stuff in a single regex, because it's not what they are made for; hashes and a loop are a much better fit for this situation. Update: nonetheless I tried to write a regex which does the substitution, but it only works for the first name unless you use it in a loop. Here it goes: `use strict; use warnings; my $str = join '', <DATA>; my $i = 0; 1 while $str =~ s{ ^((?>\w+))(?!\d)(.*?) ^\1(?!\d) }{"$1$2$1" . ++$i}msxeg; print $str; __DATA__ George Fortran Jessi Heavens Bill Clinton Barack Obama Bill Gates Steve Jobs Bill Green George Bush` [download] Second update: Uhm, I didn't read it too carefully, the code doesn't do what it should; ignore it...	[reply] [d/l] [select]


Pathologically Eclectic Rubbish Lister
	PerlMonks