Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Processing files column-wise

by BrowserUk (Patriarch)
on Feb 22, 2012 at 22:49 UTC ( [id://955637]=note: print w/replies, xml ) Need Help??


in reply to Processing files column-wise

I've also looked at Tie::Handle::CSV and Text::CSV, but these modules seem to only process a file line-wise, not column-wise, which considering the size of my files would be quite inefficient and complex (once the column header is read, this is all the information necessary to determine where to copy the entire column to).

There is no mechanism for reading a column from a file without reading the file line by line. That's just the way files work.

But, line-by-line processing of files is perfectly efficient. Provided that you do not have to re-process each line for each column. That means placing all the fields from the first line into their respective files, before reading and processing the second line.

This makes a lot of assumptions about the formatting of your ids and data files, but may serve to illustrate the technique even if you need to use one of the bastardised csv format processors.

Update: reversing the %ids array -- ie. using the filenos as the keys and pushing the field nos to an array as the value would save having to grep the hash 4 times for every record.

This is untested beyond basic syntax checking:

#! perl -slw use strict; use Data::Dump qw[ pp ]; ## Assumes that the IDs file consists of space separated lines ## zero-based-column-no-in-data-file zero-based-fileno-destination open IDS, '<', 'ids.map' or die $!; my %ids = map{ my( $columnNo, $fileNo ) = split; $columnNo -= 10; ## adjust column numbers ( $columnNo, $fileNo ); }<IDS>; close IDS; chomp %ids; ## for each data filenmae supplied on the command line for my $filename ( @ARGV ) { ## open that file for input open IN, '<',.$filename or die $!; ## open 4 output files named as $filename.out.n my @outs; open $outs[ $_ ], '>', "$filename.out.$_" for 0 .. 3; ## read the data file line by line while( <IN> ) { ## split the line into fields -- assumes sane csv definition my @fields = split '\s*,\s*', $_; ## print the first 10 fields to each of the 4 files ## and remove them from the @fields array printf { $outs[ $_ ] } "%s, ", join ', ', splice @fields[ 0 .. 9 ] for 0 .. 3; splice @fields, 0, 9; ## for each of the output files for my $fileNo ( 0 .. 3 ) { ## print those fields ... print { $outs[ $fileNo ] } join ', ', @fields[ ## that are mapped to this file grep{ $ids{ $_ } == $fileNo } 0 .. $#fields ]; } } ## cleanup close $outs[ $_ ] for 0 .. 3; close IN; }

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://955637]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (2)
As of 2026-03-09 03:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.