Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

selecting corresponding lines in multiple files

by Anonymous Monk
on May 31, 2024 at 15:44 UTC ( [id://11159735]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have two files, a and b. Each has the same number of records (over 10,00,000 actually), and each line in either file complements the data on the same line in the other file. I want to do some pattern matching on one of the files (say a), and output the matching lines to a third file, while at the same time writing the corresponding lines in file b to a fourth file. So both output files should have the same number of records with the data relationship for each line in both files maintained.

I imagine some trivial way exists to do this, but I haven't thought of it yet.

  • Comment on selecting corresponding lines in multiple files

Replies are listed 'Best First'.
Re: selecting corresponding lines in multiple files
by choroba (Cardinal) on May 31, 2024 at 16:06 UTC
    To demonstrate the process, I prepared two files, a and b (it would have been kind of you to prepare them for us).

    File a contains abbreviations plus their type (day or month), tab separated:

    Sun d Mon d Tue d Wed d Thu d Fri d Sat d Jan m Feb m Mar m Apr m May m Jun m Jul m Aug m Sep m Oct m Nov m Dec m

    File b contains the full names:

    Sunday Monday Tuesday Wednesday Thursday Friday Saturday January February March April May June July August September October November December

    Perl makes it possible to keep several files opened at the same time, some for reading and some for writing. Let's only output the month to two output files. Note the checks that both the input files end at the same line, i.e. none of them is shorter.

    #!/usr/bin/perl use warnings; use strict; sub is_month { my ($line) = @_; return $line =~ /\tm$/ ? 1 : 0 } open my $in_a, '<', 'a' or die "a: $!"; open my $in_b, '<', 'b' or die "b; $!"; open my $out_a, '>', 'a.out' or die "a.out: $!"; open my $out_b, '>', 'b.out' or die "b.out: $!"; while (my $line_a = <$in_a>) { my $line_b = <$in_b>; die "File b shorter!\n" unless defined $line_b; if (is_month($line_a)) { print {$out_a} $line_a; print {$out_b} $line_b; } } close $out_a; close $out_b; die "File a shorter!\n" unless eof $in_b;

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: selecting corresponding lines in multiple files
by hippo (Archbishop) on May 31, 2024 at 16:07 UTC

    The trivial way is to open A and B for reading, read one line from each, perform your match on the A line and if it matches, do work on both lines.

    A slightly markedly less trivial way would be tieing the input files to arrays, looping over the A array and only when the match succeeds do some work on the nth element of the B array.

    You could also paste the two input files as a preprocessing step and then use that one file as input to your script.

    However, the best way IMHO is to bite the bullet and use a database instead as this is the sort of thing at which they perform really well.


    🦛

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11159735]
Approved by choroba
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-09-10 00:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.