Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Removing certain lines from array

by Dr Manhattan (Beadle)
on Jul 30, 2013 at 08:54 UTC ( #1046977=perlquestion: print w/ replies, xml ) Need Help??
Dr Manhattan has asked for the wisdom of the Perl Monks concerning the following question:

Hi all

I have a input file with some data that I have to sort trough which looks like this(this is just some short silly example)

hello 1234 5698 7458 hi 1457 7459 6214 good_day 1458 hi 1258 3658 good_morning 4758 hi 1453

I need to remove every element where the first word(first part of array element) appears twice or more in the array, so in this example the three elements starting with 'hi...' needs to be removed, even though the entire element is not identical.

I tried this but it is not working. Any ideas or easier ways to do this?

my $length = $#array; my $woord1; my $woord2; for (my $q = 0; $q <= $length; $q++) { if ($array[$q] =~ /^(\w+|\_)(\s)(\d{4})/) { $woord1 = "$1"; for (my $x = $length-1; $x == 0; $x--) { if ($array[$x] =~ /^(\w+|\_)(\s)(\d{4})/) { $woord2 = "$1"; if ("$woord1" eq "$woord2") { print "$woord1\t$woord2\n are the same"; delete $array[$x]; } } } } }

Thanks in advance for any help

Comment on Removing certain lines from array
Select or Download Code
Re: Removing certain lines from array
by rjt (Deacon) on Jul 30, 2013 at 09:16 UTC

    Ouch. This should be a bit faster. The trick is to do a quick first pass through and count the number of occurrences of each (1st) word, and then go through a 2nd time and output only those lines where $words{$1} == 1.

    #!/usr/bin/env perl use 5.012; use warnings FATAL => 'all'; my %words; # Number of times word appears first in a string my @lines = <DATA>; chomp @lines; $words{$_}++ for map { /^(\S+)/ ? $1 : () } @lines; say for grep { /^(\S+)/ ? $words{$1} == 1 : 1 } @lines; __DATA__ hello 1234 5698 7458 hi 1457 7459 6214 good_day 1458 hi 1258 3658 good_morning 4758 hi 1453 oneword hi

    Output:

    hello 1234 5698 7458 good_day 1458 good_morning 4758 oneword
Re: Removing certain lines from array
by arkturuz (Curate) on Jul 30, 2013 at 09:17 UTC
    How about this approach:
    my @array = ( "hello 1234 5698 7458", "hi 1457 7459 6214", "good_day 1458", "hi 1258 3658", "good_morning 4758", "hi 1453", ); # count all occurences my %freqs; for my $elem (@array) { my ($word) = $elem =~ /^(\w+)/; $freqs{$word}++; } # remove all multifrequent words my @new_array; for my $elem (@array) { my ($word) = $elem =~ /^(\w+)/; unless ($freqs{$word} > 1) { push @new_array, $elem; } }
    edit: corrected - element needs to be removed
Re: Removing certain lines from array
by VGavara (Novice) on Jul 30, 2013 at 11:53 UTC

    If original order doesn't matter try this approach. Broadly, it indexes the array items by using as key the first word. If an key becomes duplicated the item is removed. In detail it uses a hash where keys are the first word of every line in the array. If it tries to add a hash item that does exist (ie, a duplicated key) it sets the hash value to empty string. Finally it filters the hash values discarding those being an empty string.

    my @array = ( "hello 1234 5698 7458", "hi 1457 7459 6214", "good_day 1458", "hi 1258 3658", "good_morning 4758", "hi 1453", ); my %indexedArray; my $key; foreach (@array) { $key = (split())[0]; if ( defined $indexedArray{$key} ) { $indexedArray{$key} = ''; } else { $indexedArray{$key} = $_; } } @array = grep(/.+/,values(%indexedArray));
      You should turn warnings on and rerun this script.

      Iterating over the array in reverse will avoid the problem of the loop indices needing adjustment after an element is deleted.

      Update: You changed the script while I was replying. Your description doesn't match now, although I suspect it will in a moment. ;o)

        Sorry Loops, I made some changes before reading your comment. In fact the original script didn't work and I think your comment is not suitable for the new one. Sorry again, newbie behind the wheel :/

Re: Removing certain lines from array
by VGavara (Novice) on Jul 30, 2013 at 13:52 UTC

    Anyway, I think your code doesn't work because of the initial value of $x in the second loop: Instead of $length-1 it might be $q-1.

    That second loop has another bug: the exit condition ($x == 0) is wrong, it might be $x >= 0 in order to process all the lines starting from the previous being processed and ending when reaching the beginning of the file (array).

    Finally the length of the array changes every time you delete an item, but your variable $length doesn't refresh its value.

    The best way to do what you try to do in the way you try to do is processing the array backwards as the attached code shows. I've not tested it I think it reflects "the concept"

    my $woord1; my $woord2; my $q = $@array-2; while ( $q >= 0 ) { if ($array[$q] =~ /^(\w+|\_)(\s)(\d{4})/) { $woord1 = $1; $x = $q+1; while ($x < $@array) { if ($array[$x] =~ /^(\w+|\_)(\s)(\d{4})/) { $woord2 = $1; if ($woord1 eq $woord2) { print "$woord1($q)\t$woord2($x)\n are the same"; delete $array[$x]; last; } } else { $x++; } } } $q--; }
Re: Removing certain lines from array
by johngg (Abbot) on Jul 30, 2013 at 16:34 UTC

    Similar to other solutions but storing each line along with a reference to the relevant word count hash element in an AoA then using grep and map to print only the lines we want. Note that I have to dereference ( ${ $_->[ 1 ] } ) to get at the value in the hash.

    $ perl -Mstrict -Mwarnings -E ' open my $inFH, q{<}, \ <<EOD or die $!; hello 1234 5698 7458 hi 1457 7459 6214 good_day 1458 hi 1258 3658 good_morning 4758 hi 1453 EOD my @lines; my %counts; while ( <$inFH> ) { my $first = ( split )[ 0 ]; $counts{ $first } ++; push @lines, [ $_, \ $counts{ $first } ]; } print for map { $_->[ 0 ] } grep { ${ $_->[ 1 ] } == 1 } @lines;' hello 1234 5698 7458 good_day 1458 good_morning 4758 $

    I hope this is of interest.

    Cheers,

    JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1046977]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2014-09-15 10:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (146 votes), past polls