Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Easy matrix builder

by hj4jc (Beadle)
on Jan 18, 2006 at 17:07 UTC ( [id://524011]=CUFP: print w/replies, xml ) Need Help??

I created 200 text files. Each file contained 30 thousand rows and 1 column. I wanted the quickest, easiest way to build a matrix that had 30 thousand rows and 200 columns. I am sure many of you here can do this in less than 10 minutes, and if you are one of them, I would be grateful if you could please share your ideas on other ways to do this or modules I can use, and I also hope somebody out there finds this code easy and useful.
#! usr/bin/perl #This script takes multiple files that contain the same number of rows +, #and builds a matrix where the number of columns would equal the numbe +r of files. use warnings; use strict; #opens all txt files in the current directory #and stores the names of the files in @files array opendir(FILES,".")||die "Cannot open files in the directory\n"; my @files=(); my @matrix=(); for(readdir(FILES)){ if($_=~/\.txt/){ push(@files, $_); } } my $n=scalar(@files); print "This directory contains $n \.txt files\n"; #opens a file which will contain all the ratios open(TREE,">matrix.txt"); #goes through each file in @files array for my $i(0..$#files){ print "Working on \.\.\. $files[$i]\.\n"; open(FH,"<$files[$i]"); my $j=0; while(my $line=<FH>){ chomp $line; my @line=split("\t",$line); #This is more applicable if the fi +le contained more than one column, tab delimited #@matrix is an array of array #jth array in @matrix contains records from different files fo +r jth line push @{$matrix[$j]}, "$line[0]"; $j++; } } for my $tmp(@matrix){ #$records joins the elements by tab my $records=join "\t", @$tmp; print TREE "$records\n"; } exit;

Replies are listed 'Best First'.
Re: Easy matrix builder
by jdporter (Paladin) on Jan 18, 2006 at 17:32 UTC

    I think you've done a fine job. However, there are numerous things in your code that could be simpler. For starters, I would leave the determination of the input and output files to the shell. I.e. use @ARGV as the file list, and redirect output. Those are things the shell is better at.

    I'd also be wary of any solution that reads all that data into memory, even if in small chunks.

    Here's one way to do it that happens to make use of a standard module, Tie::File, the doco of which says:

    The file is not loaded into memory, so this will work even for gigantic files.

    use Tie::File; use strict; use warnings; my @files_as_arrays = map { my @a; tie @a, 'Tie::File', $_; \@a } @ARGV; { local( $,, $\ ) = ( "\t", "\n" ); my $i = 0; while (1) { my @a = map { $_->[$i] } @files_as_arrays; $i++; grep { defined $_ } @a or last; no warnings; print @a } }

    Of course, an array of filehandles would work. It's a little bit simpler, too:

    use IO::File; use strict; use warnings; my @filehandles = map { new IO::File $_, 'r' } @ARGV; while (1) { my @a = map { $_->getline } @filehandles; grep { defined $_ } @a or last; no warnings; print join( "\t", @a ), "\n"; }
    We're building the house of the future together.
Re: Easy matrix builder
by jdporter (Paladin) on Jan 18, 2006 at 17:14 UTC
    paste *.txt > matrix.txt

    (See paste in the Perl Power Tools project for an OS-neutral implementation.)

    We're building the house of the future together.

      This is also available under *nix (which is where the idea for many of the ppt programs came from IIRC). However, watch for the number of open files. 200 may be approaching the per user, or per process limit on some machines / OS.

      If paste does what I think it does, it will open all of the files and then pull a line from each in a loop.

      --MidLifeXis

        Indeed he most probably meant the *NIX tool and pointed to ppt for a perl implementation to look at for inspiration.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://524011]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2024-04-19 22:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found