<?xml version="1.0" encoding="windows-1252"?>
<node id="101196" title="cols2lines.pl" created="2001-07-31 15:41:27" updated="2005-08-13 06:47:22">
<type id="1748">
sourcecode</type>
<author id="40469">
tfrayner</author>
<data>
<field name="doctext">
&lt;CODE&gt;
#! /usr/bin/perl -w

# To Do: more error checking, get --delimiter to work for \t. 

# cols2lines.pl takes an input file with fields separated 
# by the chosen delimiter (default is the tab stop) and 
# switches the columns into rows (and vice versa, 
# obviously). The script can be called with the input file, 
# or both input and output files as its arguments. If one or 
# both are omitted from the command line the user is prompted 
# for their location. The delimiter can be defined using the 
# -d or --delimiter switch. The delimiter may be double-
# quoted if necessary. In non-interactive mode the output 
# file will not overwrite an existing file unless the -f or 
# --force switch is used. Use the -h or --help
# switch to get a usage summary.

use strict;
use Getopt::Long;

# DECLARE GLOBALS
my $delimiter = "\t"; # default delimiter is tab stop 
my $infile = "";
my $outfile = "";

# PARSE ARGUMENTS
sub parseargs{
    my $force;
    my $helptext;
    &amp;GetOptions("d|delimiter=s" =&gt; \$delimiter, 
                "f|force"       =&gt; \$force, 
                "h|help"        =&gt; \$helptext);
    if ($helptext){
        die ("Usage: cols2lines [-h] [-f] [-d &lt;delimiter&gt;]".
                        " [input file] [output file]\n");
    }
    if (@ARGV){
        $infile=shift (@ARGV);
        unless (-e $infile){die ("File \'$infile\' not found.\n");}
        $outfile=shift (@ARGV);
        if ((-e $outfile) and !$force){
            die ("File \'$outfile\' aleady exists. Use -f to".
                        " force overwrite.\a\n");
        }
    } else {
        $infile="";
        $outfile="";
    }
    return ($infile, $outfile);
}

# ASSIGN INPUT FILE
sub queryinfile {
    $infile="";
    until (-e $infile){
        print ("Please enter the name of the input file:\n");
        $infile = &lt;STDIN&gt;;
        chomp ($infile);
        unless (-e $infile){
            print STDERR ("File \'$infile\' not found. ".
                                        "Ctrl-C to exit.\a\n");
        }
    }
    $infile;
}

# ASSIGN OUTPUT FILE
sub queryoutfile {
    print ("Please enter the name of the output file:\n");
    $outfile = &lt;STDIN&gt;;
    chomp ($outfile);
    if (-e $outfile){
        print STDERR ("File \'$outfile\' aleady exists. ".
                                        "Overwrite? [Y\/N]\a\n");
        my $answer = &lt;STDIN&gt;;
        chomp ($answer);
        if (lc($answer) ne 'y'){die ("Script aborted by user.\n");}
    }
    $outfile;
}

# PERFORM OPERATION
sub bigfile_colstolines {
    my $infile = shift;
    my $outfile = shift;
    my $infilehandle = "&lt;$infile";          # read-only
    open (INFILE, $infilehandle) or die ("File error.\a\n");
    my $outfilehandle = "&gt;$outfile";       # write only
    open (OUTFILE, $outfilehandle) or die ("Output failure.\a\n");
    my $line = &lt;INFILE&gt;;
    my @testarray = split (/$delimiter/, $line);
    seek(INFILE,0,0);
    for (my $counter=0; $counter &lt;= $#testarray; $counter++){
        my @columnarray = undef();
        while (defined ($line = &lt;INFILE&gt;)){
            chomp ($line);
            my @linearray = split (/$delimiter/, $line);
            push (@columnarray, $linearray [$counter]);
        }
        shift (@columnarray);               # removes unwanted characters
        my $newline = join $delimiter, (@columnarray);
        print OUTFILE ($newline, "\n");
        seek(INFILE,0,0);
    }
    close(INFILE);
    close(OUTFILE);
}

# MAIN LOOP

($infile, $outfile)=&amp;parseargs;
unless ($infile){$infile=&amp;queryinfile;}
unless ($outfile){$outfile=&amp;queryoutfile;}
&amp;bigfile_colstolines ($infile, $outfile);
print ("Done.\n");
exit
&lt;/CODE&gt;</field>
<field name="codedescription">
Time once again to reinvent the wheel, I suspect. I wrote this a couple of years back as an exercise in perl. Specifically, a friend of mine was wanting to manipulate large (&amp;gt;10GB) tables of data. Part of his analysis involved transposing a table such that the columns became lines and vice versa. As a result I wrote this rather convoluted script. It only loads one line of the table into memory at a time, rather than the whole table. 
&lt;p&gt;
The other points of this exercise were to make the code as well-structured (whatever that means :-P) and as user-friendly as possible. I imagine it's possible to load the essential functionality into a single line, but that wasn't my aim here. 
&lt;p&gt;
Of course, I imagine there's a perfectly good CPAN module out there which would do this much better than this :-)
&lt;p&gt;
&lt;b&gt;Update:&lt;/b&gt; The original script opened and closed the input file with each column that it read. I've changed to the more efficient [seek] function suggested by [RhetTbull].</field>
<field name="codecategory">
Utility scripts</field>
<field name="codeauthor">
Tim Rayner
&lt;br&gt;tfrayner@yahoo.co.uk</field>
</data>
</node>
