Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Why is my data structure wrong?

by Ovid (Cardinal)
on Jun 24, 2002 at 19:54 UTC ( [id://176922]=perlquestion: print w/replies, xml ) Need Help??

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing some tests for the Mainstreet Credit Verification Engine and I need to create some reports based upon CSV data that the engine parses for me. I don't have access to the raw CSV data, so I have to write a routine that uses their interface. I am trying to create a hashref with one key being the report headers (in the order they exist) and the second key being an arrayref of hashrefs with the data that I need. For example, with csv data like the following:

foo,bar
333,aaa
444,bbb
555,ccc

I'd like to create the following data structure:

$VAR1 = {
          'data' => [
                      {
                        'foo' => '333',
                        'bar' => 'aaa'
                      },
                      {
                        'foo' => '444',
                        'bar' => 'bbb'
                      },
                      {
                        'foo' => '555',
                        'bar' => 'ccc'
                      }
                    ],
          'headers' => [
                         'foo',
                         'bar'
                       ]
        };

The code that I am using is as follows (stripped of non-essential features):

#!/usr/bin/perl use strict; use Data::Dumper; # testing _get_csv local *MCVE::MCVE_Ub = sub { [qw/ 333 444 /] }; local *MCVE::MCVE_NumRows = sub { 3 }; local *MCVE::MCVE_NumColumns = sub { 2 }; my @columns = ('foo', 'bar'); local *MCVE::MCVE_GetHeader = sub { shift @columns }; my @rows = qw/ 333 aaa 444 bbb 555 ccc /; local *MCVE::MCVE_GetCellByNum = sub { shift @rows }; my $report = _get_csv(); print Dumper $report; sub _get_csv { my $rows = MCVE::MCVE_NumRows(); my $columns = MCVE::MCVE_NumColumns(); my %report; my @headers; my @data = ({}) x ($rows-1); # need an extra for the header foreach my $column ( 0 .. $columns - 1 ) { push @headers, MCVE::MCVE_GetHeader( $column ); } $report{headers} = \@headers; foreach my $row ( 0 .. $rows - 1 ) { my @temp; foreach my $column ( 0 .. $columns - 1 ) { push @temp, MCVE::MCVE_GetCellByNum($column,$row ); } warn "Temp: '@temp' added to row $row "; @{$data[$row]}{ @headers } = @temp; } $report{data} = \@data; return \%report; }

I think you can guess what the functions are trying to do :)

Note that this snippet is for unit testing (making sure my algorithm is correct) rather than integration testing, which is why I localize all of the functions.

The problem is in the output:

Temp: '333 aaa' added to row 0  at C:\test.pl line 39.
Temp: '444 bbb' added to row 1  at C:\test.pl line 39.
Temp: '555 ccc' added to row 2  at C:\test.pl line 39.
$VAR1 = {
          'data' => [
                      {
                        'foo' => '444',
                        'bar' => 'bbb'
                      },
                      $VAR1->{'data'}[0],
                      {
                        'foo' => '555',
                        'bar' => 'ccc'
                      }
                    ],
          'headers' => [ 
                         'foo',
                         'bar'
                       ]
        };

As you can see from the warning lines, I appear to be building my temp data correctly and even assigning it to the correct row in the array ref, but element 0 (444 bbb) should be (333 aaa). What the heck am I doing wrong?

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Replies are listed 'Best First'.
Re: Why is my data structure wrong?
by kvale (Monsignor) on Jun 24, 2002 at 20:10 UTC
    One problem may be the initialization
    my @data = ({}) x ($rows-1); # need an extra for the header
    This sets up only 2 anon hashes, but tyou have 3 rows of data.

    -Mark

      No, that wasn't it, but you put me on the right path! You're correct that I had the wrong number of hashrefs, but changing the above line to my @data;, the output is correct. Thank you :)

      Temp: '333 aaa' added to row 0 at test.pl line 39. Temp: '444 bbb' added to row 1 at test.pl line 39. Temp: '555 ccc' added to row 2 at test.pl line 39. $VAR1 = { 'data' => [ { 'foo' => '333', 'bar' => 'aaa' }, { 'foo' => '444', 'bar' => 'bbb' }, { 'foo' => '555', 'bar' => 'ccc' } ], 'headers' => [ 'foo', 'bar' ] };

      Now, does anyone care to tell my po' self why this worked? I feel unusually dense today.

      Update: Apparently, using the 'x' operator to populate an array with references will populate it with the same reference for every element, which apparently is the problem. My intent in populating the array with hashrefs was to ensure strong type checking:

      my @foo = ([]); @{$foo[0]}{'one'}='uno'

      The above code fails because you can't coerce an array into a hash.

      Cheers,
      Ovid

      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

        Yes, using the x operator with references employs the same reference each time. Why? Because that's the most efficient way of doing it, I'd gather, regardless of whether you're using references or not!

        As proof, run these in the Perl debugger. (The first x is the 'display this data' instruction to the debugger.) You could also check the source code (pp.c) where the repeatcpy() C function or macro is employed.

        x map \$_, ("japhy") x 5; x ([]) x 5;

        _____________________________________________________
        Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
        s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

(jeffa) Re: Why is my data structure wrong?
by jeffa (Bishop) on Jun 24, 2002 at 20:17 UTC
    This may or may not help your prob - how about an even simpler unit test:
    use strict; use Text::CSV; use Data::Dumper; my (%report,@headers); my $csv = Text::CSV->new(); while(<DATA>) { die unless $csv->parse($_); if ($. == 1) { @headers = $csv->fields(); $report{headers} = \@headers; } else { my @row = $csv->fields(); push @{$report{data}}, {map{$headers[$_]=>$row[$_]} (0..$#row)}; } } print Dumper \%report; __DATA__ foo,bar 333,aaa 444,bbb 555,ccc

    Update:
    "This fails because, as I mentioned, I don't have direct access to the CSV data and thus cannot use Text::CSV"

    I probably chose the words 'unit test' poorly. When i find myself in this situation i try to back away and try another route. I only used Text::CSV to abstract that portion of the problem, as the data structure was what was giving you the trouble.

      This fails because I don't have direct access to the CSV data and thus cannot use Text::CSV (though I'd like to). Thanks for the suggestion, though.

      Cheers,
      Ovid

      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: Why is my data structure wrong?
by George_Sherston (Vicar) on Jun 24, 2002 at 23:52 UTC
    I think this is to do with an interesting feature of the x operator. The awfwy squoowy thing is happening right at the initialisation point:
    my @data = ({}) x (3); print Dumper \@data;
    produces
    $VAR1 = [ {}, $VAR1->[0], $VAR1->[0] ];
    ... and not
    $VAR1 = [ {}, {}, {}, ];
    - if it had done that (for which you need @data = ({},{},{});) then the rest of the script would have done what you expected.

    This tells something about how the x operator treats hashrefs (and arrayrefs for that matter) - it sets up the whole data structure so that *all* the data structures are whatever you set any one of them to, last time you set them.

    It's instructive to put a last if $row == $n; at the end of your foreach my $row.... Then look at the contents of $report->{data} with
    for (0..2) { print $report->{data}->[$_]->{foo}," => ", $report->{data}->[$_]-> +{bar},"\n"; }
    - and goldarn if there isn't always the same thing in each of the three hashes, albeit a *different* same thing depending on what $n is.

    Strikes me this could be a rather useful feature of the x operator... if only I can remember it until the right moment arises.

    Thanks very much for this stimulating thread, by the way.


    (I think part of the reason it was hard to work out what was going on is that (IMHO) local *MCVE::MCVE_NumRows    = sub { 3 }; should be local *MCVE::MCVE_NumRows    = sub { 4 }; - I think it's meant to be all the rows *including* the headers. Making it one short means that your sub is actually creating *one* hash in @data which is independent of the initialised hashes, and hence with a life of its own. That's why both the last *and* the second last key/value pairs in @rows are represented.)



    § George Sherston
Re: Why is my data structure wrong?
by jsprat (Curate) on Jun 24, 2002 at 21:52 UTC
    C:\s\pldir>perl -MData::Dumper -e "@a=({}) x 2;print Dumper @a;" $VAR1 = {}; $VAR2 = $VAR1;

    When the array is initialized, you end up with the first element being a hash reference, the second element is a reference to the first! I haven't figured out why, yet.

    I ran the test without the array initialization (my @data = ({}) x ($rows-1); # need an extra for the header) and Dumper printed the data structure as expected. Is there a reason you need to setup @data in advance? Maybe
    push @data, {} for (1 .. $rows-1); would work better if you do?

Re: Why is my data structure wrong?
by tachyon (Chancellor) on Jun 24, 2002 at 20:58 UTC

    This appears to be a perl bug. You can see in this very stripped down example that the correct hashes are made but they are not pushed into the array(ref or plain) as expected.

    use Data::Dumper; my (@data, @col_names, %tmp, $record); @data = <DATA>; chomp @data; @col_names = split ',', shift @data; while(my $line = shift @data) { last unless $line; @tmp{@col_names} = split ',', $line; print 'Inside ', Dumper \%tmp; push @{$record}, \%tmp; } push @{$record}, { 'what ', 'the £$%^& ?' }; print 'Outside ', Dumper $record; print 'Record 0 ', %{$record->[0]}, "\n"; print 'Record 1 ', %{$record->[1]}, "\n"; print 'Record 2 ', %{$record->[2]}, "\n"; print 'Record 3 ', %{$record->[3]}, "\n"; __DATA__ foo,bar 333,aaa 444,bbb 555,ccc __END__ Inside $VAR1 = { 'foo' => '333', 'bar' => 'aaa' }; Inside $VAR1 = { 'foo' => '444', 'bar' => 'bbb' }; Inside $VAR1 = { 'foo' => '555', 'bar' => 'ccc' }; Outside $VAR1 = [ { 'foo' => '555', 'bar' => 'ccc' }, $VAR1->[0], $VAR1->[0], { 'what ' => 'the £$%^& ?' } ]; Record 0 foo555barccc Record 1 foo555barccc Record 2 foo555barccc Record 3 what the £$%^& ?

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      That isn't a bug. Your %tmp hash is declared prior to the loop. You are pushing a reference to the same hash each time through the loop (your $record array ref holds a bunch of pointers to the same thing).
(tye)Re: Why is my data structure wrong?
by tye (Sage) on Jun 25, 2002 at 16:14 UTC

    I understand your desire to avoid autovivification. I wish there was a Perl pragma to disable it so I could detect certain types of programming mistakes. Since we lack such, my next suggestion is of little practical value.

    However, you can initialize a list of reference like so:     my @data= map( {}, 1..$rows ); (Just a tiny tidbit.)

            - tye (but my friends call me "Tye")

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://176922]
Approved by arturo
Front-paged by George_Sherston
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-04-25 11:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found