Why is my data structure wrong?

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing some tests for the Mainstreet Credit Verification Engine and I need to create some reports based upon CSV data that the engine parses for me. I don't have access to the raw CSV data, so I have to write a routine that uses their interface. I am trying to create a hashref with one key being the report headers (in the order they exist) and the second key being an arrayref of hashrefs with the data that I need. For example, with csv data like the following:

foo,bar
333,aaa
444,bbb
555,ccc

I'd like to create the following data structure:

$VAR1 = {
          'data' => [
                      {
                        'foo' => '333',
                        'bar' => 'aaa'
                      },
                      {
                        'foo' => '444',
                        'bar' => 'bbb'
                      },
                      {
                        'foo' => '555',
                        'bar' => 'ccc'
                      }
                    ],
          'headers' => [
                         'foo',
                         'bar'
                       ]
        };

The code that I am using is as follows (stripped of non-essential features):

#!/usr/bin/perl
use strict;
use Data::Dumper;

# testing _get_csv
local *MCVE::MCVE_Ub = sub { [qw/ 333 444 /] };
local *MCVE::MCVE_NumRows    = sub { 3 };
local *MCVE::MCVE_NumColumns = sub { 2 };
my @columns = ('foo', 'bar');
local *MCVE::MCVE_GetHeader  = sub { shift @columns };
my @rows = qw/ 333 aaa 444 bbb 555 ccc /;
local *MCVE::MCVE_GetCellByNum = sub { shift @rows };

my $report = _get_csv();
print Dumper $report;

sub _get_csv
{
    my $rows    = MCVE::MCVE_NumRows();
    my $columns = MCVE::MCVE_NumColumns();

    my %report;
    my @headers;
    my @data = ({}) x ($rows-1); # need an extra for the header

    foreach my $column ( 0 .. $columns - 1 )
    {
        push @headers, MCVE::MCVE_GetHeader( $column );
    }

    $report{headers} = \@headers;
    foreach my $row ( 0 .. $rows - 1 )
    {
        my @temp;
        foreach my $column ( 0 .. $columns - 1 )
        {
            push @temp, MCVE::MCVE_GetCellByNum($column,$row );
        }
        warn "Temp: '@temp' added to row $row ";
        @{$data[$row]}{ @headers } = @temp;
    }
    $report{data} = \@data;
    return \%report;
}
[download]

I think you can guess what the functions are trying to do :)

Note that this snippet is for unit testing (making sure my algorithm is correct) rather than integration testing, which is why I localize all of the functions.

The problem is in the output:

Temp: '333 aaa' added to row 0  at C:\test.pl line 39.
Temp: '444 bbb' added to row 1  at C:\test.pl line 39.
Temp: '555 ccc' added to row 2  at C:\test.pl line 39.
$VAR1 = {
          'data' => [
                      {
                        'foo' => '444',
                        'bar' => 'bbb'
                      },
                      $VAR1->{'data'}[0],
                      {
                        'foo' => '555',
                        'bar' => 'ccc'
                      }
                    ],
          'headers' => [ 
                         'foo',
                         'bar'
                       ]
        };

As you can see from the warning lines, I appear to be building my temp data correctly and even assigning it to the correct row in the array ref, but element 0 (444 bbb) should be (333 aaa). What the heck am I doing wrong?

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Comment on Why is my data structure wrong? Download Code

Replies are listed 'Best First'.
Re: Why is my data structure wrong? by kvale (Monsignor) on Jun 24, 2002 at 20:10 UTC
One problem may be the initialization `my @data = ({}) x ($rows-1); # need an extra for the header` [download] This sets up only 2 anon hashes, but tyou have 3 rows of data. -Mark	[reply] [d/l]
Re: Re: Why is my data structure wrong? by Ovid (Cardinal) on Jun 24, 2002 at 20:21 UTC
No, that wasn't it, but you put me on the right path! You're correct that I had the wrong number of hashrefs, but changing the above line to `my @data;`, the output is correct. Thank you :) `Temp: '333 aaa' added to row 0 at test.pl line 39. Temp: '444 bbb' added to row 1 at test.pl line 39. Temp: '555 ccc' added to row 2 at test.pl line 39. $VAR1 = { 'data' => [ { 'foo' => '333', 'bar' => 'aaa' }, { 'foo' => '444', 'bar' => 'bbb' }, { 'foo' => '555', 'bar' => 'ccc' } ], 'headers' => [ 'foo', 'bar' ] };` [download] Now, does anyone care to tell my po' self why this worked? I feel unusually dense today. Update: Apparently, using the 'x' operator to populate an array with references will populate it with the same reference for every element, which apparently is the problem. My intent in populating the array with hashrefs was to ensure strong type checking: `my @foo = ([]); @{$foo[0]}{'one'}='uno'` [download] The above code fails because you can't coerce an array into a hash. Cheers, Ovid Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.	[reply] [d/l] [select]
Re: Re: Re: Why is my data structure wrong? by japhy (Canon) on Jun 24, 2002 at 22:41 UTC
Yes, using the `x` operator with references employs the same reference each time. Why? Because that's the most efficient way of doing it, I'd gather, regardless of whether you're using references or not! As proof, run these in the Perl debugger. (The first `x` is the 'display this data' instruction to the debugger.) You could also check the source code (pp.c) where the `repeatcpy()` C function or macro is employed. `x map \$_, ("japhy") x 5; x ([]) x 5;` [download] _____________________________________________________ Jeff`[japhy]`Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area) `s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;`	[reply] [d/l]
Re^4: Why is my data structure wrong? by grantm (Parson) on Jun 25, 2002 at 08:26 UTC
Re4: Why is my data structure wrong? by Hofmator (Curate) on Jun 25, 2002 at 08:58 UTC
(jeffa) Re: Why is my data structure wrong? by jeffa (Bishop) on Jun 24, 2002 at 20:17 UTC
This may or may not help your prob - how about an even simpler unit test: `use strict; use Text::CSV; use Data::Dumper; my (%report,@headers); my $csv = Text::CSV->new(); while(<DATA>) { die unless $csv->parse($_); if ($. == 1) { @headers = $csv->fields(); $report{headers} = \@headers; } else { my @row = $csv->fields(); push @{$report{data}}, {map{$headers[$_]=>$row[$_]} (0..$#row)}; } } print Dumper \%report; __DATA__ foo,bar 333,aaa 444,bbb 555,ccc` [download] Update: "This fails because, as I mentioned, I don't have direct access to the CSV data and thus cannot use Text::CSV" I probably chose the words 'unit test' poorly. When i find myself in this situation i try to back away and try another route. I only used Text::CSV to abstract that portion of the problem, as the data structure was what was giving you the trouble.	[reply] [d/l]
Re: (jeffa) Re: Why is my data structure wrong? by Ovid (Cardinal) on Jun 24, 2002 at 21:09 UTC
This fails because I don't have direct access to the CSV data and thus cannot use Text::CSV (though I'd like to). Thanks for the suggestion, though. Cheers, Ovid Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.	[reply]
Re: Why is my data structure wrong? by George_Sherston (Vicar) on Jun 24, 2002 at 23:52 UTC
I think this is to do with an interesting feature of the `x` operator. The awfwy squoowy thing is happening right at the initialisation point: `my @data = ({}) x (3); print Dumper \@data;` [download] produces `$VAR1 = [ {}, $VAR1->[0], $VAR1->[0] ];` [download] ... and not `$VAR1 = [ {}, {}, {}, ];` [download] - if it had done that (for which you need `@data = ({},{},{});`) then the rest of the script would have done what you expected. This tells something about how the `x` operator treats hashrefs (and arrayrefs for that matter) - it sets up the whole data structure so that all the data structures are whatever you set any one of them to, last time you set them. It's instructive to put a `last if $row == $n;` at the end of your `foreach my $row...`. Then look at the contents of `$report->{data}` with `for (0..2) { print $report->{data}->[$_]->{foo}," => ", $report->{data}->[$_]-> +{bar},"\n"; }` [download] - and goldarn if there isn't always the same thing in each of the three hashes, albeit a different same thing depending on what `$n` is. Strikes me this could be a rather useful feature of the `x` operator... if only I can remember it until the right moment arises. Thanks very much for this stimulating thread, by the way. (I think part of the reason it was hard to work out what was going on is that (IMHO) `local MCVE::MCVE_NumRows = sub { 3 };` should be `local MCVE::MCVE_NumRows = sub { 4 };` - I think it's meant to be all the rows including the headers. Making it one short means that your sub is actually creating one hash in `@data` which is independent of the initialised hashes, and hence with a life of its own. That's why both the last and the second last key/value pairs in `@rows` are represented.) � George Sherston	[reply] [d/l] [select]
Re: Why is my data structure wrong? by jsprat (Curate) on Jun 24, 2002 at 21:52 UTC
`C:\s\pldir>perl -MData::Dumper -e "@a=({}) x 2;print Dumper @a;" $VAR1 = {}; $VAR2 = $VAR1;` [download] When the array is initialized, you end up with the first element being a hash reference, the second element is a reference to the first! I haven't figured out why, yet. I ran the test without the array initialization (`my @data = ({}) x ($rows-1); # need an extra for the header`) and Dumper printed the data structure as expected. Is there a reason you need to setup @data in advance? Maybe `push @data, {} for (1 .. $rows-1);` would work better if you do?	[reply] [d/l] [select]
Re: Why is my data structure wrong? by tachyon (Chancellor) on Jun 24, 2002 at 20:58 UTC
This appears to be a perl bug. You can see in this very stripped down example that the correct hashes are made but they are not pushed into the array(ref or plain) as expected. use Data::Dumper; my (@data, @col_names, %tmp, $record); @data = <DATA>; chomp @data; @col_names = split ',', shift @data; while(my $line = shift @data) { last unless $line; @tmp{@col_names} = split ',', $line; print 'Inside ', Dumper \%tmp; push @{$record}, \%tmp; } push @{$record}, { 'what ', 'the �$%^& ?' }; print 'Outside ', Dumper $record; print 'Record 0 ', %{$record->[0]}, "\n"; print 'Record 1 ', %{$record->[1]}, "\n"; print 'Record 2 ', %{$record->[2]}, "\n"; print 'Record 3 ', %{$record->[3]}, "\n"; __DATA__ foo,bar 333,aaa 444,bbb 555,ccc __END__ Inside $VAR1 = { 'foo' => '333', 'bar' => 'aaa' }; Inside $VAR1 = { 'foo' => '444', 'bar' => 'bbb' }; Inside $VAR1 = { 'foo' => '555', 'bar' => 'ccc' }; Outside $VAR1 = [ { 'foo' => '555', 'bar' => 'ccc' }, $VAR1->[0], $VAR1->[0], { 'what ' => 'the �$%^& ?' } ]; Record 0 foo555barccc Record 1 foo555barccc Record 2 foo555barccc Record 3 what the �$%^& ? [download] cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re: Re: Why is my data structure wrong? by Anonymous Monk on Jun 24, 2002 at 21:08 UTC
That isn't a bug. Your %tmp hash is declared prior to the loop. You are pushing a reference to the same hash each time through the loop (your $record array ref holds a bunch of pointers to the same thing).	[reply]
(tye)Re: Why is my data structure wrong? by tye (Sage) on Jun 25, 2002 at 16:14 UTC
I understand your desire to avoid autovivification. I wish there was a Perl pragma to disable it so I could detect certain types of programming mistakes. Since we lack such, my next suggestion is of little practical value. However, you can initialize a list of reference like so: `my @data= map( {}, 1..$rows );` (Just a tiny tidbit.) - tye (but my friends call me "Tye")	[reply] [d/l]