Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

a one-liner for this trivial problem?

by Anonymous Monk
on Apr 16, 2013 at 14:42 UTC ( #1028930=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
so, because I am no expert in Perl, I was wondering what is (since I am sure there is) an efficient one-liner script to do the following:
Ig you have a file with let's say IDS (1000 of them) and another one, smaller, with 20 or 30 of them, that belong to the big one. So, what you want to to do is to remove these 20 from the big file, so that in the end, the big file has 980 IDS instead of 1000.
My poor knowledge of Perl would create a proper script, where I would open the big file, store the IDS in an array or hash, then open the smaller one, and, for each of the IDS in the smaller file, if they existed in the array or hash that had been created based on the big file, I would erase them from there.
Any easy-to-use one-liner for this? I looked around for AWK and I found this:
Example:
Remove the line containing the string "awk": sed '/awk/d' filename.txt
but I don't know if this can be useful...
Thanks

Comment on a one-liner for this trivial problem?
Re: a one-liner for this trivial problem?
by hdb (Parson) on Apr 16, 2013 at 14:46 UTC

    You do it the other way round: first you read the smaller file into a hash. Then you open the bigger file, check for each ID if it exists in the hash and only print if it does not.



      Yeah, true, I was just hoping there could be an one-liner solution to this, and not need to write a "normal" script...

        You can do that in one line. The easiest way is to use a really long line and to fill that line by removing all line breaks from the script.

        Maybe you can show us the code you've already written to solve the problem. Then we can help you with the technical hurdles that might prevent that code from working as a one-liner.

Re: a one-liner for this trivial problem?
by CountOrlok (Friar) on Apr 16, 2013 at 14:52 UTC
    grep can do this for you: grep -v -f small_file big_file
      Oh yes! Great tip!
Re: a one-liner for this trivial problem?
by space_monk (Chaplain) on Apr 16, 2013 at 17:09 UTC
    This solution is just if you really insist on using Perl; the Re: a one-liner for this trivial problem? suggestion using grep is better....
    perl -pie ' BEGIN { open $fh, "<" , "littlefile" or die "Cannot read file"; our %hash = map { $_ => 1} <$fh>; } print if not $hash{$_}; ' < bigfile

    All the above will go on one line. any minor bugs are your problem (chomps may be needed for example) :-)

    A Monk aims to give answers to those who have none, and to learn from those who know more.
Re: a one-liner for this trivial problem?
by kcott (Abbot) on Apr 16, 2013 at 17:37 UTC

    Something like this perhaps:

    $ cat 1028930_short 2 5 6 8
    $ cat 1028930_long 1 2 3 4 5 6 7 8 9
    $ perl -i -nE 'chomp; if ($s) { $d{$_} or say } else { ++$d{$_}; ++$s +if eof; say }' 1028930_short 1028930_long
    $ cat 1028930_short 2 5 6 8
    $ cat 1028930_long 1 3 4 7 9

    -- Ken

Re: a one-liner for this trivial problem?
by NetWallah (Abbot) on Apr 16, 2013 at 23:35 UTC
    How about this:
    perl -ne "$small||=$ARGV; $small eq $ARGV? $h{$_}++: $h{$_}?0:print" s +mall.txt large.txt
    This takes advantage of the fact that $ARGV contains the file name.

    Use single-quotes if you run in *nix.

                 "I'm fairly sure if they took porn off the Internet, there'd only be one website left, and it'd be called 'Bring Back the Porn!'"
            -- Dr. Cox, Scrubs

      Good idea, but removing the hard coding of file names perhaps gives....

      perl -ne " $ARGV eq $ARGV[0] ? $h{$_}++ : $h{$_} ? 0 : print; " small.txt large.txt
      A Monk aims to give answers to those who have none, and to learn from those who know more.
Re: a one-liner for this trivial problem?
by jakeease (Friar) on Apr 17, 2013 at 08:32 UTC

    Here's another way; I'm going to grab some sleep instead of turning it into a one-liner. :-)

    #!/usr/bin/perl -w use strict; sub slurp { my $file = shift; open my $fh, '<', $file or return undef; local $/ unless wantarray; return <$fh>; } my @long = slurp 'long'; my @short = slurp 'short'; my %diff; @diff {@long} = (); delete @diff {@short}; my @diff = sort keys %diff; print "@diff\n";
Re: a one-liner for this trivial problem?
by PrakashK (Pilgrim) on Apr 19, 2013 at 19:11 UTC
    perl -ne 'BEGIN {open my $sh, "<", shift; %h = map {$_ => 1} <$sh>} print unless $h{$_}' small large

    Read the small file in the BEGIN block and shift it out of @ARGV.

    A minor plus of this approach is that any number of "large" files can be passed, after the "small" file.

    If you have File::Slurp available:

    perl -MFile::Slurp=slurp -ne 'BEGIN {%h = map {$_ => 1} slurp shift} print unless $h{$_}' small large
      exclude smallfile from largefile
      grep -i -v -f smallfile largefile
      include only smallfile from largefile
      grep -i -f smallfile largefile
      include only smallfile from largefile in reverse order numberic by kolumn 2
      grep -i -f smallfile largefile | sort -r -b -g -k 2

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1028930]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (12)
As of 2014-08-21 17:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (140 votes), past polls