Re: How can I make a string unique (quicker than my approach at least)
by syphilis (Archbishop) on Apr 02, 2024 at 00:20 UTC
|
... and then make this array unique and print the unique elements
For this part of the operation I would think that List::Util::uniqstr() is what you want.
use warnings;
use strict;
use List::Util qw(uniqstr);
my $str = 'olivia-niels-peter-lars-niels-lars-olivia-olivia';
my @s = split /\-/, $str;
print "@s\n";
my @s_new = uniqstr(@s);
print "@s_new\n";
__END__
Outputs:
olivia niels peter lars niels lars olivia olivia
olivia niels peter lars
Cheers, Rob | [reply] [d/l] |
Re: How can I make a string unique (quicker than my approach at least)
by johngg (Canon) on Apr 02, 2024 at 10:29 UTC
|
This solution preserves order, if that is an issue.
johngg@aleatico:~/perl/Monks$ perl -Mstrict -Mwarnings -E 'say q{};
open my $inFH, q{<}, \ <<__EOD__ or die $!;
ID1<TAB>nick-john-helena
ID2<TAB>george-andreas-lisa-anna-matthew-andreas-lisa
ID3<TAB>olivia-niels-peter-lars-niels-lars-olivia-olivia
__EOD__
while ( <$inFH> )
{
chomp;
my( $pre, $post ) = split m{(?<=>)};
say $pre, join q{-},
do { my %seen; grep { ! $seen{ $_ } ++ } split m{\s?-\s?}, $pos
+t }
}'
ID1<TAB>nick-john-helena
ID2<TAB>george-andreas-lisa-anna-matthew
ID3<TAB>olivia-niels-peter-lars
I hope this is of interest.
| [reply] [d/l] |
|
|
| [reply] [d/l] [select] |
Re: How can I make a string unique (quicker than my approach at least)
by LanX (Saint) on Apr 02, 2024 at 01:14 UTC
|
DB<3> p $a
olivia-niels-peter-lars-niels-lars-olivia-olivia
DB<4> @hash{ split '-',$a } = ()
DB<5> x keys %hash
0 'lars'
1 'niels'
2 'peter'
3 'olivia'
DB<6> p join '-', keys %hash
lars-niels-peter-olivia
DB<7>
| [reply] [d/l] |
Re: How can I make a string unique (quicker than my approach at least)
by Marshall (Canon) on Apr 02, 2024 at 15:38 UTC
|
use strict;
use warnings;
use List::Util qw(uniq);
while (<DATA>)
{
my ($id, @names) = split (/\n|\t|-/,$_);
print "$id\t",join("-",uniq(@names)),"\n"; #update:parens needed fo
+r join args
}
=Prints
ID1 nick-john-helena
ID2 george-andreas-lisa-anna-matthew
ID3 olivia-niels-peter-lars
=cut
__DATA__
ID1 nick-john-helena
ID2 george-andreas-lisa-anna-matthew-andreas-lisa
ID3 olivia-niels-peter-lars-niels-lars-olivia-olivia
| [reply] [d/l] |
Re: How can I make a string unique (quicker than my approach at least)
by stevieb (Canon) on Apr 01, 2024 at 22:59 UTC
|
Quick and dirty. If the order of the names matter, this won't work. If there are duplicate ID tags, this won't work.
use warnings;
use strict;
my %seen;
while (my $line = <DATA>) {
chomp $line;
my ($id, $data) = split /\s+/, $line;
next if ! $id || ! $data;
$seen{$id}{$_}++ for split /-/, $data;
}
for my $id (sort keys %seen) {
printf(
"%s\t%s\n",
$id,
join '-', keys %{ $seen{$id} }
);
}
__DATA__
ID1 nick-john-helena
ID2 george-andreas-lisa-anna-matthew-andreas-lisa
ID3 olivia-niels-peter-lars-niels-lars-olivia-olivia
Output:
ID1 helena-nick-john
ID2 george-lisa-anna-matthew-andreas
ID3 niels-peter-lars-olivia
| [reply] [d/l] [select] |
Re: How can I make a string unique (quicker than my approach at least)
by jdporter (Paladin) on Apr 02, 2024 at 15:43 UTC
|
perl -MList::Util=uniq -ple "s((\S+)$){ join '-', sort uniq split /-/,
+ $1 }e" < 11158635.dat
| [reply] [d/l] |
Re: How can I make a string unique (quicker than my approach at least) (sort)
by LanX (Saint) on Apr 03, 2024 at 14:29 UTC
|
While most answers have concentrated on the ambiguity of preserving order or not - which you never clarified - there is another aspect to consider about uniqueness.
Are nick-john-helena and john-helena-nick really considered two different solutions?
If not, you'll need to sort the result set.
Goes without saying that solutions preserving order are overkill in that case.
FWIW: afaik, my hash solution should always return the keys in the same randomized order , hence sorting wouldn't be strictly necessary. At least within the limits of the same process.
But personally I'd rather play safe and sort.
| [reply] [d/l] [select] |
Re: How can I make a string unique (quicker than my approach at least)
by harangzsolt33 (Deacon) on Apr 03, 2024 at 03:30 UTC
|
# Usage: LIST = RemoveDuplicates(LIST)
sub RemoveDuplicates
{
my %seen;
grep !$seen{$_}++, @_;
}
Maybe somebody who is more knowledgeable can explain how this works, because I don't understand it myself. I just know it works. I tested it. | [reply] [d/l] |
|
|
Hello harangzsolt33,
Maybe somebody ... can explain how this works
Since there is no explicit return, the sub returns the value of its final statement, namely grep !$seen{$_}++, @_;. @_ contains the arguments passed into the sub, and grep filters out those elements that do not make the expression !$seen{$_}++ true. So let’s look at that expression in detail.
%seen is a hash, initially empty. When reference is made to an element that does not yet exist, that element is autovivified. So if $_ is 'x' and the hash has no 'x' key, a hash element is created with key 'x' and value undef.
Now the clever part: postfix ++ increments an item’s value, but the increment is delayed until after the current expression has been evaluated. Further, incrementing undef produces the value 1, because undef is taken to be zero. So if the current value of $_ is not already in the hash %seen, the expression !$seen{$_}++ autovivifies a hash value with key $_ and value undef and applies the logical negation operator ! to the value. Since undef is false by definition, its negation is true and the value of $_ passes through the grep filter into the eventual output of the subroutine.
But the next time $_ has that value, the hash item $seen{$_} exists and has a value of 1 (from the previous application of postfix ++). And since !1 is false, grep filters this item out. In this way, only the first occurrence of any item passes through the filter. So all repeated items are removed from the original list.
Hope that helps,
| [reply] [d/l] [select] |
|
|
sub uniq {
my %h; # Keep track of things seen.
grep { # 4: Return items seen only once.
not $h{$_}++ # 2: Item is not (yet) be seen.
# 3: ++ would then say item was seen.
} @_; # 1: For each input...
}
| [reply] [d/l] |
|
|
cat in
ID1 nick-john-helena
ID2 george-andreas-lisa-anna-matthew-andreas-lisa
ID3 olivia-niels-peter-lars-niels-lars-olivia-olivia
Code:
perl -MList::Util=uniq -ple '
s/
^ # Beginning of line (BOL).
\w+ # Any "words".
\s+ # Any whitespace (like tabs).
\K # "Keep" whats to the left.
(\S+) # Capture and replace next non whitespace (words).
/
join "-", # 4: n1-n2-n3
uniq # 3: [ "n1", "n2", "n3" ]
split "-", # 2: [ "n1", "n2", "n1", "n3" ]
$1 # 1: n1-n2-n1-n3
/xe # Freespace regex and eval replacement.
' in
Output:
ID1 nick-john-helena
ID2 george-andreas-lisa-anna-matthew
ID3 olivia-niels-peter-lars
| [reply] [d/l] [select] |
|
|
| [reply] |
|
|
Agreed.
Curiously, in this case the top answer at StackOverflow points to the identical faq link you pointed to ...
while the second top answer cites perldoc -q duplicate (which emits identical content)
and then further goes to the bother of embedding verbatim brian_d_foy's excellent FAQ entry in the SO response! ...
so you'd think harangzsolt33 must have seen it (or requires a new pair of glasses) ...
maybe he can comment further to clear up this mystery. :)
| [reply] [d/l] |