Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

build hash from csv file

by tcf03 (Deacon)
on Sep 19, 2005 at 13:06 UTC ( #493146=snippet: print w/replies, xml ) Need Help??
Description: This function builds and returns a hash when passed a filename and the number of fields. Rudimentary best guess on the number of fields is implemented, but in reality just takes the first line and uses the number of fields in it to compare against the rest. It would be called as such  my %hash = build_hash("somefile.csv", "10");

This is from a work in progress which takes a csv file pulls out the fields specified by the user and drops them in another file. This other file is processed by mailing software. Once the mailing software is finished it spits out a csv file and the script puts it all back together again with the new data.
#############
sub build_hash
#############
{
    my ( $file_, $numfields_ ) = @_;
    my $line                   = 0;
    my ( %hash, $cvsfile, $errorfile );

    open $cvsfile, $file_ or
        confess "Unable to open $file_\n";
    open $errorfile, ">", "${file_}\.err" or
        confess "Unable to open ${file_}\.err\n";

    for (<$cvsfile>)
    {
        chomp($_);
        s/"//g;

        $line++;

        my (@linedata) = split /,/, $_;
  
        # Make a best guess (using line 1) 
        if ( $line == 1 and ! defined ( $numfields_ ) )
        {
            $numfields_ = scalar(@linedata)
        }
        
        unless ( scalar(@linedata) == $numfields_ )
        {
            print $errorfile "$_\n";
            next;
        }

    my $fieldnum = 0;

    for my $info (@linedata)
    {
        $fieldnum++;
        $hash{$line}{$fieldnum} = $info;
    }
    }
    close ($cvsfile);
    close ($errorfile);
    return %hash or confess "unable to return\n";
} 
Replies are listed 'Best First'.
Re: build hash from csv file
by davorg (Chancellor) on Sep 19, 2005 at 13:21 UTC

    Hope you don't mind a few comments on your code.

    Firstly, just spliting on commas and removing double quotes is only going to work on the most basic CSV files. Much better to look at using Text::ParseWords (which comes with Perl) or Text::CSV_XS (which doesn't). Both of these will handle more complex CSV files than your code does.

    Secondly, it seems a bit wasteful to loop round your @linedata array copying each element separately into a second-level hash. If fact I'd question the use of a hash there at all. If you're using a hash whose keys are low numbered integers, then you're much better off using an array. And as you've already got an array, you can just store a reference to that in your data structure.

    $hash{$lineno} = \@linedata;

    Hope this is useful.

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      Don't mind at all. I was looking at the Text::CSV::Simple - Ill look at the others now though - I was just a bit perplexed as to how to merge all the data back together from different files with unknown fields. Thanks for your input.

      Ted
      --
      "That which we persist in doing becomes easier, not that the task itself has become easier, but that our ability to perform it has improved."
        --Ralph Waldo Emerson
Re: build hash from csv file
by dragonchild (Archbishop) on Sep 19, 2005 at 13:29 UTC
    First off, the hash you're building is ... wonky. Why not just use an array of arrays instead of a hash of hashes? All your keys are numbers ... that indicates you really want an array, not a hash.

    Second, unless your CSV format is vastly different than the ones I'm used to, your s/"//g; will LOSE information, badly. Like, it'd be trivial to make a correct CSV file that would make your code break.

    Much better is to let a CPAN module do this for you. I like tilly's Text::xSV best, but Text::CSV_XS is perfectly acceptable. (Text::CSV isn't feature-complete.)

    use Text::xSV; sub build_hash { my ($file_) = @_; my $reader = Text::xSV->new(); $reader->open_file( $file_ ); my @result; while ( my $row = $reader->get_row() ) { push @result, $row; } return @result; }
    If you absolutely have to have a hash of hashes, then you could do something like:
    use Text::xSV; sub build_hash { my ($file_) = @_; my $reader = Text::xSV->new(); $reader->open_file( $file_ ); my %result; my $line = 0; while ( my $row = $reader->get_row() ) { # @{ $result{ ++$line } }{ 1 .. scalar(@$row) } = @$row; $line++; for my $i ( 1 .. @$row ) { $result{$line}{$i} = $row->[ $i-1 ] } } return %result; }
    That commented-out wonky line is a hash-slice. It's exactly equivalent (but faster) than the 4 lines below.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://493146]
help
Chatterbox?
[LanX]: DBI: is there an easy method to get the content of a column as an array
[LanX]: values %{ $DBI->fetchrow_has href('column_name' }
[LanX]: ehm ... fetchall_hashref
[Jenda]: my int @a = ^5_000_000 Coerces the argument to Numeric, and generates a range from 0 up to (but excluding) the argument.
[Jenda]: The apparent design principle of Perl6 operators was "Let's confuse 'em dudes." Whose braindead idea was this particular operator and the three hundred around it?

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2018-07-16 16:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (344 votes). Check out past polls.

    Notices?