Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

How to combine rows of a spreadsheet

by wrinkles (Pilgrim)
on Sep 09, 2010 at 01:22 UTC ( [id://859450]=perlquestion: print w/replies, xml ) Need Help??

wrinkles has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks I need your help, PayPal produces an IPN history of shopping cart transactions in a csv format, with the first row of the report as column heading. Then the subsequent transactions span at least two rows: the first row is general info, and the (immediately) following rows contain product info (and redundant information). I want to flatten the data for one transaction into one row, and add relevant product headings as I go. For example:
Txn ID, Name , Type , Product ID, Qty, Gross 010 , Moe , Cart , , 1 , 15 010 , Moe , Item , widget-1 , 1 , 15 020 , Larry, Cart , , 3 , 35 020 , Larry, Item , widget-1 , 1 , 15 020 , Larry, Item , widget-2 , 2 , 20
And I'd like to convert to:
Txn ID, Name , Qty, Gross, widget-1 Qty, widget-1 Gross, widget-2 Qt +y, widget-2 Gross 010 , Moe , 1 , 20 , 1 , 15 , + , 020 , Larry, 3 , 35 , 1 , 15 , 2 + , 20
Of course the actual files have more fields and I left out the quote marks and added spaces for legibility. And I actually don't care if the columns are not in a particular order, as well as they contain the correct data. My strategy is to iterate down through the rows, adding unique column names (hash keys) and assigning an order (auto-incremented). Then as I iterate through the data, I can know which row to put the data in. Here's the code so far. It mostly works, but the final transaction is overwriting all the others, so I'm getting a spreadsheet with the same transaction multiple time. I also need to add the crucial Item data. I am planning to create new Item-specific headings to append to the data. Any help appreciated.
#!/usr/bin/perl use strict; use Text::CSV; use Data::Dumper; my $report_file = '/files/report.csv'; my $csv_file = '/files/custom.csv'; my $csv = Text::CSV->new ( { allow_whitespace => 1, binary => 1, sep_c +har => ',' } ) or die "Cannot use CSV: ".Text::CSV->error_diag (); my $csv_fh; open ($csv_fh, "<:encoding(utf8)", $csv_file) or die "$csv_file : $!"; # declare column names for getline_hr $csv->column_names($csv->getline($csv_fh)) or die "$csv_file : $!"; my @records = (); my @temp = (); my %column_order = (); my $ci= 0; while (my $row = $csv->getline_hr($csv_fh)) { my $key; my $value; if (${$row}{'Type'} eq 'Shopping Cart Payment Received') { + # push (@records, \@temp) if @temp; EDITED: THIS LINE RESULTED I +N OVERWRITTEN DATA push (@records, [@temp]) if @temp; # THIS SEEMS TO WORK, S +EE MY COMMENT BELOW @temp = (); while (($key, $value) = each %{$row}) { ${%column_order}{$key} = $ci++ unless (defined ${%column_o +rder}{$key}); $temp["${%column_order}{$key}"] = $value; + } } elsif (${$row}{'Type'} eq 'Shopping Cart Item') { # more code to add append to first row ) } my $report_fh; open ($report_fh, ">:encoding(utf8)", $report_file) or die "$report_fi +le : $!"; for my $key (sort {$column_order{$a}<=>$column_order{$b}} keys %column +_order) { print $report_fh "$key" , ", "; } print $report_fh "\n"; for my $record (@records) { print $report_fh join(", ",@$record), "\n"; } print $report_fh "\n";

Replies are listed 'Best First'.
Re: How to combine rows of a spreadsheet
by JavaFan (Canon) on Sep 09, 2010 at 09:04 UTC
    Here's a quick program that does what you request. It doesn't do full CSV parsing (just splitting on commas), but I presume you don't need help with that (use a module). The output isn't lined up vertically either - but that shouldn't matter either I presume.
    #!/usr/bin/perl use 5.010; use strict; use warnings; my %data; my %seen; while (<DATA>) { chomp; my ($id, $name, $type, $pid, $qty, $gross) = split /\s*,\s*/; given ($type) { when ('Item') { $data{$id,$name}{$type}{$pid}{qty} += $qty; $data{$id,$name}{$type}{$pid}{gross} += $gross; $seen{$pid} = 1; next; } when ('Cart') { $data{$id,$name}{$type}{qty} = $qty; $data{$id,$name}{$type}{gross} = $gross; } } } my @pids = sort keys %seen; print "Txn ID, Name, Qty, Gross"; print ", $_ Qty, $_ Gross" for @pids; print "\n"; foreach my $ckey (sort keys %data) { my ($id, $name) = split $;, $ckey; print "$id, $name, ", $data{$ckey}{Cart}{qty} // "", ", ", $data{$ckey}{Cart}{gross} // ""; foreach my $pid (@pids) { print ", ", $data{$ckey}{Item}{$pid}{qty} // "", ", ", $data{$ckey}{Item}{$pid}{gross} // ""; } print "\n"; } __DATA__ Txn ID, Name , Type , Product ID, Qty, Gross 010 , Moe , Cart , , 1 , 15 010 , Moe , Item , widget-1 , 1 , 15 020 , Larry, Cart , , 3 , 35 020 , Larry, Item , widget-1 , 1 , 15 020 , Larry, Item , widget-2 , 2 , 20
    Output:
    Txn ID, Name, Qty, Gross, widget-1 Qty, widget-1 Gross, widget-2 Qty, +widget-2 Gross 010, Moe, 1, 15, 1, 15, , 020, Larry, 3, 35, 1, 15, 2, 20
Re: How to combine rows of a spreadsheet
by wrinkles (Pilgrim) on Sep 09, 2010 at 04:27 UTC
    I found a clue to the disappearing records here Re: How do i create array of arrays. Changing:
    push (@records, \@temp) if @temp;
    to:
    push (@records, [@temp]) if @temp;
    Gets around the problem of my records being over-written. I welcome any pointers on the rest of the code or the overall strategy. TIA
Re: How to combine rows of a spreadsheet
by wwe (Friar) on Sep 09, 2010 at 09:24 UTC
    Hi wrinkles, here a is an attempt of my. You need to adopt the output format to meet you requirements.

    I'm going through an input file and add all transactions and items of the transaction to a hash %input, which gives you a structure like this:

    $VAR1 = { '010' => { 'Gross' => '15', 'name' => 'Moe', 'widget-2' => { 'Gross' => '15', 'Qty' => '1' }, 'qty' => '1' }, '020' => { 'Gross' => '35', 'name' => 'Larry', 'widget-1' => { 'Gross' => '15', 'Qty' => '1' }, 'widget-2' => { 'Gross' => '20', 'Qty' => '2' }, 'qty' => '3' } };
    Be aware that input file should contain only unique transaction ids and no repeating items per transaction as there is no error handling or accumulation if multiple item lines with the same item id are included per transaction. If you need this it's easy to implement. After you have all data in a hash it's easy to iterate over it and create a desired output. Here is the complete script
    # perl # http://www.perlmonks.org/?node_id=859450 use strict; use warnings; use diagnostics; use 5.010; use Data::Dumper; my %input; my $line = <DATA>; # skip 1st line while ( my $line = <DATA> ) { chomp $line; my ($tx, $name, $type, $product, $qty, $gross) = split /\s*,\s*/, + $line; #say "$tx, $name, $type, $product, $qty, $gross"; if ( $type eq 'Cart') { #say 'process Cart line'; #say "data is $tx, $name, $qty, $gross"; $input{$tx} = { 'name' => $name, 'qty' => $qty, 'Gross'=> $gross, } } elsif ( $type eq 'Item' ) { #say 'process Item line'; #say "data is $tx, $name, $qty, $gross"; $input{$tx} = { %{$input{$tx}}, $product => { 'Qty' => $qty, 'Gross' => $gross, }, } } else { warn 'line not recognized'; } } #say Dumper(\%input); my @items = qw( widget-1 widget-2); # all available items, need to be +defined in advance to have a stable order of output items say 'Txn ID, Name , Qty, Gross, widget-1 Qty, widget-1 Gross, widget +-2 Qty, widget-2 Gross'; foreach my $tx (keys %input) { #say "prcess transaction $tx"; my $lineout = "$tx, $input{$tx}{'name'}, $input{$tx}{'qty'},$input +{$tx}{'Gross'}, "; foreach my $item ( @items ) { #say "found key $key"; #say "found data $input{$tx}{$key}"; #say "process item $input{$tx}{$item}"; $lineout .= ", "; $lineout .= $input{$tx}{$item}{'Qty'} // ''; $lineout .= ", "; $lineout .= $input{$tx}{$item}{'Gross'} // ''; } say $lineout; } __DATA__ Txn ID, Name , Type , Product ID, Qty, Gross 010 , Moe , Cart , , 1 , 15 010 , Moe , Item , widget-2 , 1 , 15 020 , Larry, Cart , , 3 , 35 020 , Larry, Item , widget-1 , 1 , 15 020 , Larry, Item , widget-2 , 2 , 20
    which creates this output:
    Txn ID, Name , Qty, Gross, widget-1 Qty, widget-1 Gross, widget-2 Qt +y, widget-2 Gross 010, Moe, 1,15, , , , 1, 15 020, Larry, 3,35, , 1, 15, 2, 20
    I made it easier to me using the build-in DATA handle so you'll need to adopt the code for reading the files and writing output to a file.

    P.S. I assume you have an error in the first line of the example output as it shows a total of 20 items included in the first transaction but only 15 are shown in the item line.

Re: How to combine rows of a spreadsheet
by wrinkles (Pilgrim) on Sep 09, 2010 at 19:48 UTC
    Many thanks to JavaFan and wwe for much-appreciated help. I will learn from your approaches to this problem. After replacing the code where I pushed an arrayref to pushing an anonymous array, my code fell into place. So I've pasted it below for completeness. (correction added 9-22-2010)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://859450]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2024-04-23 20:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found