<?xml version="1.0" encoding="windows-1252"?>
<node id="912884" title="Re^4: Help parsing a complicated csv" created="2011-07-05 17:32:02" updated="2011-07-05 17:32:02">
<type id="11">
note</type>
<author id="590906">
linuxer</author>
<data>
<field name="doctext">
&lt;p&gt;Hi,
&lt;p&gt;I just read this thread again and saw your reply.
&lt;p&gt;Assuming, that an empty string is not a valid value, I came up with this:

&lt;code&gt;
#! /usr/bin/perl
use strict;
use warnings;

use Text::CSV_XS;

my $csv = Text::CSV_XS-&gt;new({ binary =&gt; 1, allow_whitespace =&gt; 1, }) 
  or die "Cannot use CSV: " . Text::CSV_XS-&gt;error_diag();

# for testing; in real world, open file and use that handle
my $fh = \*DATA;

my (%hash, @hdr);

while ( my $row = $csv-&gt;getline( $fh ) ) { 

    # header not yet defined? or 1st cell starts with '&lt;' ==&gt; use row as header
    if ( !@hdr || $row-&gt;[0] =~ m/^&lt;/ ) {
        @hdr = @{$row};
        next;
    }
    # otherwise try to process data
    else {
        for my $i ( 0 .. $#hdr ) {
            # only add those values which contain at least one character
            # so: no "undef"s or empty strings in result
            # if empty strings are OK or wanted, try to replace length() with defined()
            push @{ $hash{$hdr[$i]} }, ( length $row-&gt;[$i] ? $row-&gt;[$i] : () );
        }
    }
}

# check created data structure
require Data::Dumper;
$Data::Dumper::Sortkeys = 1;
print Data::Dumper::Dumper( \%hash );

__DATA__
&lt;A1&gt;,   &lt;A2&gt;,   &lt;A3&gt;
a1,     aa1,    aaa1
a2,	    ,    aaa2
a3,     aa3
a4

&lt;B1&gt;,   &lt;B2&gt;
b1,     bb1
b2,     bb2
b3
&lt;/code&gt;

That produced a result like this:

&lt;code&gt;
$VAR1 = {
          '&lt;A1&gt;' =&gt; [
                      'a1',
                      'a2',
                      'a3',
                      'a4'
                    ],
          '&lt;A2&gt;' =&gt; [
                      'aa1',
                      'aa3'
                    ],
          '&lt;A3&gt;' =&gt; [
                      'aaa1',
                      'aaa2'
                    ],
          '&lt;B1&gt;' =&gt; [
                      'b1',
                      'b2',
                      'b3'
                    ],
          '&lt;B2&gt;' =&gt; [
                      'bb1',
                      'bb2'
                    ]
        };
&lt;/code&gt;
</field>
<field name="root_node">
901226</field>
<field name="parent_node">
905085</field>
</data>
</node>
