Ovid has asked for the wisdom of the Perl Monks concerning the following question:
I'm writing some tests for the Mainstreet Credit Verification Engine and I need to create some reports based upon CSV data that the engine parses for me. I don't have access to the raw CSV data, so I have to write a routine that uses their interface. I am trying to create a hashref with one key being the report headers (in the order they exist) and the second key being an arrayref of hashrefs with the data that I need. For example, with csv data like the following:
foo,bar
333,aaa
444,bbb
555,ccc
I'd like to create the following data structure:
$VAR1 = {
'data' => [
{
'foo' => '333',
'bar' => 'aaa'
},
{
'foo' => '444',
'bar' => 'bbb'
},
{
'foo' => '555',
'bar' => 'ccc'
}
],
'headers' => [
'foo',
'bar'
]
};
The code that I am using is as follows (stripped of non-essential features):
#!/usr/bin/perl
use strict;
use Data::Dumper;
# testing _get_csv
local *MCVE::MCVE_Ub = sub { [qw/ 333 444 /] };
local *MCVE::MCVE_NumRows = sub { 3 };
local *MCVE::MCVE_NumColumns = sub { 2 };
my @columns = ('foo', 'bar');
local *MCVE::MCVE_GetHeader = sub { shift @columns };
my @rows = qw/ 333 aaa 444 bbb 555 ccc /;
local *MCVE::MCVE_GetCellByNum = sub { shift @rows };
my $report = _get_csv();
print Dumper $report;
sub _get_csv
{
my $rows = MCVE::MCVE_NumRows();
my $columns = MCVE::MCVE_NumColumns();
my %report;
my @headers;
my @data = ({}) x ($rows-1); # need an extra for the header
foreach my $column ( 0 .. $columns - 1 )
{
push @headers, MCVE::MCVE_GetHeader( $column );
}
$report{headers} = \@headers;
foreach my $row ( 0 .. $rows - 1 )
{
my @temp;
foreach my $column ( 0 .. $columns - 1 )
{
push @temp, MCVE::MCVE_GetCellByNum($column,$row );
}
warn "Temp: '@temp' added to row $row ";
@{$data[$row]}{ @headers } = @temp;
}
$report{data} = \@data;
return \%report;
}
I think you can guess what the functions are trying to do :)
Note that this snippet is for unit testing (making sure my algorithm is correct) rather than integration testing, which is why I localize all of the functions.
The problem is in the output:
Temp: '333 aaa' added to row 0 at C:\test.pl line 39.
Temp: '444 bbb' added to row 1 at C:\test.pl line 39.
Temp: '555 ccc' added to row 2 at C:\test.pl line 39.
$VAR1 = {
'data' => [
{
'foo' => '444',
'bar' => 'bbb'
},
$VAR1->{'data'}[0],
{
'foo' => '555',
'bar' => 'ccc'
}
],
'headers' => [
'foo',
'bar'
]
};
As you can see from the warning lines, I appear to be building my temp data correctly and even assigning it to the correct row in the array ref, but element 0 (444 bbb) should be (333 aaa). What the heck am I doing wrong?
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.
Re: Why is my data structure wrong?
by kvale (Monsignor) on Jun 24, 2002 at 20:10 UTC
|
One problem may be the initialization
my @data = ({}) x ($rows-1); # need an extra for the header
This sets up only 2 anon hashes, but tyou have 3 rows of data.
-Mark | [reply] [d/l] |
|
No, that wasn't it, but you put me on the right path! You're correct that I had the wrong number of hashrefs, but changing the above line to my @data;, the output is correct. Thank you :)
Temp: '333 aaa' added to row 0 at test.pl line 39.
Temp: '444 bbb' added to row 1 at test.pl line 39.
Temp: '555 ccc' added to row 2 at test.pl line 39.
$VAR1 = {
'data' => [
{
'foo' => '333',
'bar' => 'aaa'
},
{
'foo' => '444',
'bar' => 'bbb'
},
{
'foo' => '555',
'bar' => 'ccc'
}
],
'headers' => [
'foo',
'bar'
]
};
Now, does anyone care to tell my po' self why this worked? I feel unusually dense today.
Update: Apparently, using the 'x' operator to populate an array with references will populate it with the same reference for every element, which apparently is the problem. My intent in populating the array with hashrefs was to ensure strong type checking:
my @foo = ([]);
@{$foo[0]}{'one'}='uno'
The above code fails because you can't coerce an array into a hash.
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats. | [reply] [d/l] [select] |
|
x map \$_, ("japhy") x 5;
x ([]) x 5;
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??; | [reply] [d/l] |
|
|
(jeffa) Re: Why is my data structure wrong?
by jeffa (Bishop) on Jun 24, 2002 at 20:17 UTC
|
This may or may not help your prob - how about an even
simpler unit test:
use strict;
use Text::CSV;
use Data::Dumper;
my (%report,@headers);
my $csv = Text::CSV->new();
while(<DATA>) {
die unless $csv->parse($_);
if ($. == 1) {
@headers = $csv->fields();
$report{headers} = \@headers;
}
else {
my @row = $csv->fields();
push @{$report{data}}, {map{$headers[$_]=>$row[$_]} (0..$#row)};
}
}
print Dumper \%report;
__DATA__
foo,bar
333,aaa
444,bbb
555,ccc
Update:
"This fails because, as I mentioned, I don't have direct access to the CSV data and thus cannot use Text::CSV"
I probably chose the words 'unit test' poorly. When i find
myself in this situation i try to back away and try another
route. I only used Text::CSV to abstract that portion of
the problem, as the data structure was what was giving you
the trouble. | [reply] [d/l] |
|
This fails because I don't have direct access to the CSV data and thus cannot use Text::CSV (though I'd like to). Thanks for the suggestion, though.
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.
| [reply] |
Re: Why is my data structure wrong?
by George_Sherston (Vicar) on Jun 24, 2002 at 23:52 UTC
|
I think this is to do with an interesting feature of the x operator. The awfwy squoowy thing is happening right at the initialisation point:
my @data = ({}) x (3);
print Dumper \@data;
produces$VAR1 = [
{},
$VAR1->[0],
$VAR1->[0]
];
... and not $VAR1 = [
{},
{},
{},
];
- if it had done that (for which you need @data = ({},{},{});) then the rest of the script would have done what you expected.
This tells something about how the x operator treats hashrefs (and arrayrefs for that matter) - it sets up the whole data structure so that *all* the data structures are whatever you set any one of them to, last time you set them.
It's instructive to put a last if $row == $n; at the end of your foreach my $row.... Then look at the contents of $report->{data} with
for (0..2) {
print $report->{data}->[$_]->{foo}," => ", $report->{data}->[$_]->
+{bar},"\n";
}
- and goldarn if there isn't always the same thing in each of the three hashes, albeit a *different* same thing depending on what $n is.
Strikes me this could be a rather useful feature of the x operator... if only I can remember it until the right moment arises.
Thanks very much for this stimulating thread, by the way.
(I think part of the reason it was hard to work out what was going on is that (IMHO) local *MCVE::MCVE_NumRows = sub { 3 }; should be local *MCVE::MCVE_NumRows = sub { 4 }; - I think it's meant to be all the rows *including* the headers. Making it one short means that your sub is actually creating *one* hash in @data which is independent of the initialised hashes, and hence with a life of its own. That's why both the last *and* the second last key/value pairs in @rows are represented.)
§ George Sherston
| [reply] [d/l] [select] |
Re: Why is my data structure wrong?
by jsprat (Curate) on Jun 24, 2002 at 21:52 UTC
|
C:\s\pldir>perl -MData::Dumper -e "@a=({}) x 2;print Dumper @a;"
$VAR1 = {};
$VAR2 = $VAR1;
When the array is initialized, you end up with the first element being a hash reference, the second element is a reference to the first! I haven't figured out why, yet. I ran the test without the array initialization (my @data = ({}) x ($rows-1); # need an extra for the header) and Dumper printed the data structure as expected. Is there a reason you need to setup @data in advance? Maybe push @data, {} for (1 .. $rows-1); would work better if you do? | [reply] [d/l] [select] |
Re: Why is my data structure wrong?
by tachyon (Chancellor) on Jun 24, 2002 at 20:58 UTC
|
This appears to be a perl bug. You can see in this very stripped down example that the correct hashes are made but they are not pushed into the array(ref or plain) as expected.
use Data::Dumper;
my (@data, @col_names, %tmp, $record);
@data = <DATA>;
chomp @data;
@col_names = split ',', shift @data;
while(my $line = shift @data) {
last unless $line;
@tmp{@col_names} = split ',', $line;
print 'Inside ', Dumper \%tmp;
push @{$record}, \%tmp;
}
push @{$record}, { 'what ', 'the £$%^& ?' };
print 'Outside ', Dumper $record;
print 'Record 0 ', %{$record->[0]}, "\n";
print 'Record 1 ', %{$record->[1]}, "\n";
print 'Record 2 ', %{$record->[2]}, "\n";
print 'Record 3 ', %{$record->[3]}, "\n";
__DATA__
foo,bar
333,aaa
444,bbb
555,ccc
__END__
Inside $VAR1 = {
'foo' => '333',
'bar' => 'aaa'
};
Inside $VAR1 = {
'foo' => '444',
'bar' => 'bbb'
};
Inside $VAR1 = {
'foo' => '555',
'bar' => 'ccc'
};
Outside $VAR1 = [
{
'foo' => '555',
'bar' => 'ccc'
},
$VAR1->[0],
$VAR1->[0],
{
'what ' => 'the £$%^& ?'
}
];
Record 0 foo555barccc
Record 1 foo555barccc
Record 2 foo555barccc
Record 3 what the £$%^& ?
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
| [reply] [d/l] |
|
That isn't a bug. Your %tmp hash is declared prior to the loop. You
are pushing a reference to the same hash each time through the loop
(your $record array ref holds a bunch of pointers to the same thing).
| [reply] |
(tye)Re: Why is my data structure wrong?
by tye (Sage) on Jun 25, 2002 at 16:14 UTC
|
I understand your desire to avoid autovivification. I wish there was a Perl pragma to disable it so I could detect certain types of programming mistakes. Since we lack such, my next suggestion is of little practical value.
However, you can initialize a list of reference like so:
my @data= map( {}, 1..$rows );
(Just a tiny tidbit.)
- tye (but my friends call me "Tye")
| [reply] [d/l] |
|
|