Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Parsing CSV into a hash

by McDarren (Abbot)
on Sep 19, 2005 at 21:42 UTC ( #493311=perlquestion: print w/replies, xml ) Need Help??
McDarren has asked for the wisdom of the Perl Monks concerning the following question:

My first go at dealing with delimited data has not been as straightforward as I expected :(

After stuffing around with various Text/CSV modules and super searching all over the place for an hour or two, I finally gave up and decided to write my own parsing routine.

The data is tab delimited, with the first line being a header.
Each column represents user attributes, and each row represents the data for a particular user.
The only inconsistency in the data (if it is that), is that the header row has a leading hash (#), whereas the rest of the data doesn't.

All I wanted to do is build a hash from the data, in the form $data->{$user}{$attribute}

Although the code I have works, and is only about a dozen lines, I'm disappointed - because:

a) I'm sure this could be done easier with the appropriate module - but I'm damned if I could work out how, and
b) I've used both a flag and a counter, which I'm sure are redundant - but I don't know how to get rid of them

Any advice?
Thanks in advance
#!/usr/bin/perl -w use strict; my $datafile = "stuff.csv"; my @fields; my $data; my $flag; open DATA, "<$datafile" or die "Could not open $datafile:$!\n"; while (<DATA>) { chomp(); # If the flag hasn't been set, this must be the header (first) lin +e if (!$flag) { # Throw away the first two fields (#, username) in the header (undef, undef, @fields) = (split /\t/); $flag++; } else { my ($user, @userdata) = (split /\t/); my $i = 0; while ($i <= $#fields) { $data->{$user}{$fields[$i]} = $userdata[$i]; $i++; } } } close DATA;

Replies are listed 'Best First'.
Re: Parsing CSV into a hash
by GrandFather (Sage) on Sep 19, 2005 at 21:53 UTC

    You might like to replace you inner while loop with:

    $data->{$user}{$_} = shift @userdata for (@fields);

    Perl is Huffman encoded by design.
      Okay, that's one part of my problem solved, thanks :)

      Let me just work through that to make sure I have it....
      It works from right to left, yes?
      We iterate through each item in @fields, grabbing the associated item from @userdata, then assigning it to the hash, yes?

        That's right. Note also that if there are too few fields in a line this simply assigns undef rather than generating an invalid access as your original code did.

        If you are sure that the first line contains the header line then you can simply:

        $_ = <DATA> or die "Empty file"; chomp; # Throw away the first two fields (#, username) in the header (undef, undef, @fields) = (split /\t/);

        before the loop.

        Perl is Huffman encoded by design.
Re: Parsing CSV into a hash
by jZed (Prior) on Sep 19, 2005 at 23:53 UTC
    Here it is using Text::CSV_XS. Set the key_colnum to the number of the column you want to be key. Here it's set to 0 (the first column). This reads a "tab-delimited" file, set sep_char to something else if there's another delimiter.
    #!/usr/bin/perl -w use strict; use Text::CSV_XS; use IO::File; my $csv = Text::CSV_XS->new({binary=>1,sep_char=>"\t"}); my $fh = IO::File->new( 'hdi.csv') or die $!; my $hash; my $key_colnum = 0; my $cols = $csv->getline($fh); while(my $vals = $csv->getline($fh)){ last unless @$vals; my %row; @row{@$cols} = @$vals; $hash->{ $row{ $cols->[$key_colnum] } } = \%row; } use Data::Dumper; print Dumper $hash;
Re: Parsing CSV into a hash
by davidrw (Prior) on Sep 19, 2005 at 22:58 UTC
    here's a (untested) solution using Text::CSV .. what modules did you try? if you show your attempts w/specific modules (maybe as separate posts), we can probably help with them..
    my $data = {}; use Text::CSV; my $csv = Text::CSV->new(); open FILE, "foo.txt"; # open a file my $line = <FILE>; # get the header line $line =~ s/^#//; # strip off leading # $csv->parse($line); # parse line ... my @cols = $csv->fields(); # ... storing column name +s while(my $line = <FILE>){ # loop through rest of file $csv->parse($line); # parse line ... my @vals = $csv->fields(); # ... getting the values # my %row = map { $cols[$_] => $vals[$_] } 0 .. $#cols; # and hashi +ng up based on col names # $data->{ $row{ $cols[0] } } = \%row; # stick this hash into big + data structure. # NOTE: it's hashing on th +e first column.. # might want to change if +necessary # UPDATE: just use a hash slice: @{ $data->{ $row{ $cols[0] } } }{ @cols } = @vals; } close FILE; # close file -- all done!
Re: Parsing CSV into a hash
by dragonchild (Archbishop) on Sep 20, 2005 at 02:09 UTC
    Text::xSV was built for exactly this kind of situation. (Why don't people use it more often?!?)
    use Text::xSV; my $parser = Text::xSV->new; $parser->open_file( $datafile ); my @headers = $parser->read_header; shift @headers; # Remove the # field. my $user_header = shift @headers; my $data; while ( $parser->get_row ) { my $user = $parser->extract( $user_header ); $data->{ $user }{ @headers } = [ $parser->extract( @headers ) ]; }

    Update: Obviously, use tilly's corrections to my suggestion as he wrote the module in question. Plus, he's right. :-)

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      use Text::xSV; my $parser = Text::xSV->new( filename=>$datafile, sep=>"\t" ); $parser->read_header; my %data; while (my $row_ref = $parser->get_row) { $data{ $row_ref->[1] } = $parser->extract_hash; }
      The significant changes from your version are:
      1. xSV doesn't default to a tab. You have to tell it that you want that.
      2. You don't need to call open_file - it will do that for you if you pass the filename into the constructor.
      3. The return of read_header cannot be relied on, so don't rely on it.
      4. I'm using the return of get_row directly. Yes, you can mix positional and name-based logic in Text::xSV.
      5. Your slicing logic was buggy. But luckily there is no need to do something that complex.

      UPDATE: A closer reading of the original question shows that the format differs from the obvious. The first row has an extra # in it that is not lined up with the other fields and needs to be discarded. Here is a solution to that variant:

      use Text::xSV; my $parser = Text::xSV->new( filename=>$datafile, sep=>"\t" ); my @headers = $parser->get_row; shift @headers; # Get rid of the unwanted # $parser->bind_fields(@headers); my %data; while (my $row_ref = $parser->get_row) { $data{ $row_ref->[1] } = $parser->extract_hash; }
Re: Parsing CSV into a hash
by Skeeve (Vicar) on Sep 20, 2005 at 05:08 UTC
    Only Question I ask myself: You are removing the first 2 columns of the first row but using the second column in all othe rows... Is that correct?

    so here is my attempt notusing any module. As long as the data is not using quotes around the values it's, to my understanding, not necessary to use the modul.
    #!/usr/bin/perl -w use strict; my $datafile = "stuff.csv"; my @fields; my $data; open DATA, "<$datafile" or die "Could not open $datafile:$!\n"; #I handle the first line outside the loop # Throw away the first two fields (#, username) in the header # Are you sure about "first two?" (undef, undef, @fields) = split /\t/, <>; my ($user, @userdata); while (<DATA>) { chomp(); # If you're sure about the 2 fields above, # I think we need to drop the second field here too ($user, undef, @userdata) = split /\t/; @{$data->{$user}}{@fields} = @userdata; } close DATA;

Re: Parsing CSV into a hash
by McDarren (Abbot) on Sep 21, 2005 at 04:30 UTC
    Thanks everyone for the responses, and especially thanks for the examples using the various parsing modules. I now have those to refer to in the future.

    I've settled on a simplified version of my original code, incorporating the suggestions from GrandFather, as follows:
    #!/usr/bin/perl -w use strict; my $datafile = "stuff.csv"; my @fields; my $data; open DATA, "<$datafile" or die "Could not open $datafile:$!\n"; # Grab the first (header) line, discarding the first two fields $_ = <DATA> or die "Empty file"; chomp((undef, undef, @fields) = (split /\t/)); # Grab the rest, and populate the hashref while (<DATA>) { chomp(); my ($user, @userdata) = (split /\t/); $data->{$user}{$_} = shift @userdata for (@fields); } close DATA;
    This is much "nicer" than my original, and I'm now a happy vegemite :)


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://493311]
Approved by socketdave
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (4)
As of 2019-01-16 01:43 GMT
Find Nodes?
    Voting Booth?
    After Perl5, I'm mostly interested in:

    Results (274 votes). Check out past polls.