Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Marsel:

Another way you might be able to do the job is with a file merge. To do so, sort both files on the key(s) of interest, then read records in order and merge them as appropriate.

Example:

#!/usr/bin/perl -w use strict; use warnings; open F1, 'sort -k3 mergefile.1|' or die "opening file 1"; open F2, 'sort -k2 mergefile.2|' or die "opening file 2"; open OUF, '>', 'mergefile.out' or die "opening output file"; my @in1; my @in2; sub getrec1 { @in1 = (); if (!eof(F1)) { (@in1) = split /\t/, <F1>; chomp $in1[2]; } } sub getrec2 { @in2 = (); if (!eof(F2)) { (@in2) = split /\t/, <F2>; chomp $in2[2]; } } sub write1 { print OUF "$in1[2]\t$in1[0]\t$in1[1]\tnull\tnull\n"; getrec1; } sub write2 { print OUF "$in2[1]\tnull\tnull\t$in2[0]\t$in2[2]\n"; getrec2; } sub writeboth { print OUF "$in1[2]\t$in1[0]\t$in1[1]\t$in2[0]\t$in2[2]\n"; getrec1; getrec2; } # Prime the pump getrec1; getrec2; while (1) { last if $#in1<0 and $#in2<0; if ($#in1<0 or $#in2<0) { # Only one file is left... write2 if $#in1<0; write1 if $#in2<0; } elsif ($in1[2] eq $in2[1]) { # Matching records, merge & write 'em writeboth; } elsif ($in1[2] lt $in2[1]) { # unmatched item in file 1, write it & get next rec write1; } else { # unmatched item in file 2, write it & get next rec write2; } }
Example output:

root@swill ~/PerlMonks $ cat mergefile.1 15 20 foo 22 30 bar 30 33 baz 14 22 fubar root@swill ~/PerlMonks $ cat mergefile.2 alpha baz 17.30 gamma foobar 22.35 gamma bar 19.01 delta fromish 33.03 sigma bear 14.56 root@swill ~/PerlMonks $ ./file_merge.pl root@swill ~/PerlMonks $ cat mergefile.out bar 22 30 gamma 19.01 baz 30 33 alpha 17.30 bear null null sigma 14.56 foo 15 20 null null foobar null null gamma 22.35 fromish null null delta 33.03 fubar 14 22 null null root@swill ~/PerlMonks $
--Roboticus

In reply to Re: How to deal with Huge data by roboticus
in thread How to deal with Huge data by Marsel

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others perusing the Monastery: (4)
    As of 2014-09-24 04:43 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (245 votes), past polls