http://www.perlmonks.org?node_id=1192516

Lady_Aleena has asked for the wisdom of the Perl Monks concerning the following question:

Hello. I'm currently rewriting one of my more loathed subs with Text::CSV_XS. I was wondering if there were a way to write it so that it returns key-value pairs instead of an array or hash of hashes when there are only two fields in the file.

My code so far...

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use Text::CSV_XS qw(csv); sub make_list { my (%opt) = @_; my $headers = $opt{'headings'}; my $filters; for my $header (@$headers) { if ($header =~ s/\+$//) { $filters->{$header} = sub { $_ = [split(/;\s*/, $_)]; $_ }; } } my $list = csv ( in => $opt{file}, headers => $headers, key => $opt{key} ? $headers->[0] : undef, filter => $filters, sep_char => '|', quote_char => undef, empty_is_undef => 1, allow_whitespace => 1, ) || die Text::CSV_XS->error_diag; return $list; } my $file = 'Artifacts.txt'; my $headers = ['item', 'seen in+']; print Dumper make_list( file => $file, headings => $headers);

The file...

Crystal Skull|Stargate SG-1, Crystal Skull; The Librarian: Return to K +ing Soloman's Mines; Indiana Jones and the Kingdom of the Crystal Sku +ll Spear of Destiny|The Librarian: Quest for the Spear; Hellboy; Constant +ine Book or Key of Soloman|The Librarian: Return to King Soloman's Mines; +Season of the Witch Necronomicon|H.P. Lovecraft Ark of the Covenant|Raides of the Lost Ark; The Librarian: Quest for t +he Spear TARDIS|Doctor Who; The Sarah Jane Adventures; The Librarians, And the +Final Curtain Doc Brown's Delorean|Back to the Future; Back to the Future Part II; B +ack to the Future Part III; The Librarians, And the Final Curtain Pandora's Box|Warehouse 13; The Librarian: Quest for the Spear Excalibur|Excalibur; The Last Legion; The Librarian: Quest for the Spe +ar H.G. Wells' Time Machine|The Librarians; Warehouse 13 Holy Grail|Indiana Jones and the Last Crusade; The Librarian: Quest fo +r the Spear

So what I want from this file is the first field to be the keys and the second field to be an array ref. Here is an example of what I want.

$VAR1 = { 'Ark of the Covenant' => [ 'Raides of the Lost Ark', 'The Librarian: Quest for the Spe +ar' ], 'Necronomicon' => [ 'H.P. Lovecraft' ], 'Holy Grail' => [ 'Indiana Jones and the Last Crusade', 'The Librarian: Quest for the Spear' ], 'H.G. Wells\' Time Machine' => [ 'The Librarians', 'Warehouse 13' ], 'Spear of Destiny' => [ 'The Librarian: Quest for the Spear' +, 'Hellboy', 'Constantine' ], 'Book or Key of Soloman' => [ 'The Librarian: Return to King + Soloman\'s Mines', 'Season of the Witch' ], 'Crystal Skull' => [ 'Stargate SG-1, Crystal Skull', 'The Librarian: Return to King Soloman\ +'s Mines', 'Indiana Jones and the Kingdom of the C +rystal Skull' ], 'Doc Brown\'s Delorean' => [ 'Back to the Future', 'Back to the Future Part II', 'Back to the Future Part III', 'The Librarians, And the Final +Curtain' ], 'TARDIS' => [ 'Doctor Who', 'The Sarah Jane Adventures', 'The Librarians, And the Final Curtain' ], 'Pandora\'s Box' => [ 'Warehouse 13', 'The Librarian: Quest for the Spear' ], 'Excalibur' => [ 'Excalibur', 'The Last Legion', 'The Librarian: Quest for the Spear' ] };

Is it possible with Text::CSV_XS?

No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
Lady Aleena

Replies are listed 'Best First'.
Re: Can Text::CSV_XS return key-value pairs?
by Tux (Canon) on Jun 11, 2017 at 11:48 UTC

    Something like this?

    use 5.18.2; use warnings; use Data::Peek; use Text::CSV_XS qw(csv); my $file = "pm-1192516.csv"; sub make_list_2 { my %opt = @_; my $headers = $opt{headings}; my %foo; my $list = csv ( in => $opt{file}, headers => $headers, sep_char => "|", quote_char => undef, empty_is_undef => 1, allow_whitespace => 1, auto_diag => 1, on_in => sub { push @{$foo{$_{item}}}, split m/;\s*/ => $_{"seen in+"} }, ); return \%foo; } DDumper make_list_2 (file => $file, headings => [ "item", "seen in+" ] +);

    ->

    { 'Ark of the Covenant' => [ 'Raides of the Lost Ark', 'The Librarian: Quest for the Spear' ], 'Book or Key of Soloman' => [ 'The Librarian: Return to King Soloman\'s Mines', 'Season of the Witch' ], 'Crystal Skull' => [ 'Stargate SG-1, Crystal Skull', 'The Librarian: Return to King Soloman\'s Mines', 'Indiana Jones and the Kingdom of the Crystal Skull' ], 'Doc Brown\'s Delorean' => [ 'Back to the Future', 'Back to the Future Part II', 'Back to the Future Part III', 'The Librarians, And the Final Curtain' ], Excalibur => [ 'Excalibur', 'The Last Legion', 'The Librarian: Quest for the Spear' ], 'H.G. Wells\' Time Machine' => [ 'The Librarians', 'Warehouse 13' ], 'Holy Grail' => [ 'Indiana Jones and the Last Crusade', 'The Librarian: Quest for the Spear' ], Necronomicon => [ 'H.P. Lovecraft' ], 'Pandora\'s Box' => [ 'Warehouse 13', 'The Librarian: Quest for the Spear' ], 'Spear of Destiny' => [ 'The Librarian: Quest for the Spear', 'Hellboy', 'Constantine' ], TARDIS => [ 'Doctor Who', 'The Sarah Jane Adventures', 'The Librarians, And the Final Curtain' ] }

    Enjoy, Have FUN! H.Merijn

      Tux, can I add that to what I already have? Also, the value will not always be split. So can the value be first put through the filters then go to the on_in? And if this needs to use an additional hash, then how do I get everything to use the outside hash, the one you called %foo?

      sub make_list { my (%opt) = @_; my $file = $opt{file} && ref($opt{file}) eq 'ARRAY' ? data_file(@{$o +pt{file}}) : $opt{file}; my $headers = $opt{'headings'} ? $opt{'headings'} : ['heading']; my $filters; for my $header (@$headers) { if ($header =~ s/\+$//) { $filters->{$header} = sub { $_ = [split(/;\s*/, $_)]; $_ }; } } my $list = csv ( in => $file, headers => $headers, key => $opt{key} ? $headers->[0] : undef, filter => $filters, sep_char => '|', quote_char => undef, empty_is_undef => 1, allow_whitespace => 1, auto_diag => 1, ); return $list; }

      I did not want to include the code I am trying to replace with an all in one, but it looks like I have to.

      No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
      Lady Aleena

        Like this then? (only requirement is to have "item" a required field

        use 5.18.2; use warnings; use Data::Peek; use Text::CSV_XS qw(csv); my $file = "pm-1192516.csv"; sub make_list { my %opt = @_; my %plus; my @head = map { s/\+$// && $plus{$_}++; $_ } @{$opt{headings}}; my %foo; my $list = csv ( in => $opt{file}, headers => \@head, sep_char => "|", quote_char => undef, empty_is_undef => 1, allow_whitespace => 1, auto_diag => 1, on_in => sub { foreach my $f (@head) { $f eq "item" and next; push @{$foo{$_{item}}{$f}}, $plus{$f} ? split m/;\s*/ +=> $_{$f} : $_{$f}; } }, ); return \%foo; } DDumper make_list (file => $file, headings => [ "item", "seen in+" ]);

        ->

        { 'Ark of the Covenant' => { 'seen in' => [ 'Raides of the Lost Ark', 'The Librarian: Quest for the Spear' ] }, 'Book or Key of Soloman' => { 'seen in' => [ 'The Librarian: Return to King Soloman\'s Mines', 'Season of the Witch' ] }, 'Crystal Skull' => { 'seen in' => [ 'Stargate SG-1, Crystal Skull', 'The Librarian: Return to King Soloman\'s Mines', 'Indiana Jones and the Kingdom of the Crystal Skull' ] }, 'Doc Brown\'s Delorean' => { 'seen in' => [ 'Back to the Future', 'Back to the Future Part II', 'Back to the Future Part III', 'The Librarians, And the Final Curtain' ] }, Excalibur => { 'seen in' => [ 'Excalibur', 'The Last Legion', 'The Librarian: Quest for the Spear' ] }, 'H.G. Wells\' Time Machine' => { 'seen in' => [ 'The Librarians', 'Warehouse 13' ] }, 'Holy Grail' => { 'seen in' => [ 'Indiana Jones and the Last Crusade', 'The Librarian: Quest for the Spear' ] }, Necronomicon => { 'seen in' => [ 'H.P. Lovecraft' ] }, 'Pandora\'s Box' => { 'seen in' => [ 'Warehouse 13', 'The Librarian: Quest for the Spear' ] }, 'Spear of Destiny' => { 'seen in' => [ 'The Librarian: Quest for the Spear', 'Hellboy', 'Constantine' ] }, TARDIS => { 'seen in' => [ 'Doctor Who', 'The Sarah Jane Adventures', 'The Librarians, And the Final Curtain' ] } }

        Enjoy, Have FUN! H.Merijn
Re: Can Text::CSV_XS return key-value pairs?
by choroba (Cardinal) on Jun 11, 2017 at 12:40 UTC
    You can use Text::CSV_XS for handling both the steps, splitting on "|" and splitting on ";". I wasn't able to achieve this using the csv method, so I had to go more procedural:
    #!/usr/bin/perl use warnings; use strict; use Text::CSV_XS qw(csv); use Data::Dumper; my $file = shift; my $csv = 'Text::CSV_XS'->new({ sep_char => '|', quote_char => undef, empty_is_undef => 1, allow_whitespace => 1, }) or die 'Text::CSV_XS'->error_diag; open my $CSV, '<', $file or die $!; my %structure; my $inner = 'Text::CSV_XS'->new({ sep_char => ';', quote_char => undef, allow_whitespace => 1, }) or die 'Text::CSV_XS'->error_diag; while (my $row = $csv->getline($CSV)) { $inner->parse($row->[1]); $structure{ $row->[0] } = [ $inner->fields ]; } print Dumper(\%structure);

    Update: Fixed along the suggestions by Tux below.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      That is close, but not what the OP requested. The second Text::CSV_XS->new also needs an allow_whitespace => 1. I'd also put the constructor *outside* of the second loop. No need to create it on every iteration. And yes, it is safer than using split, but I did not want to be pedantic


      Enjoy, Have FUN! H.Merijn

      choroba, could you please include where you got the headings/headers? You used the method new() instead of the function csv(). Since new() doesn't have a header option, I don't know where the headers are invoked if they are not the first line of the file being parsed.

      As for the secondary new() splitting (or whatever does it) at ;, that would be triggered whenever a header has a + at the end of it (and the + being removing from the header after parsing). So, for example, my movies.txt has the following headers...

      'headers' => ['title','start year','end year',qw(media format+ Wikiped +ia allmovie IMDb TV.com Flixster genre+ source company)],

      format and genre would be parsed at the ; while all other fields are strings.

      So for a file where I only want a key-value pair for each line, it looks like I will have to put in 2 headers, and if the second one has a + it is to be parsed with the value being split (or whatever) on the ;.

      I gravitated directly to csv() because it looked easier to use than new() since I could not figure out what did what. Like what makes an array of hashes and what makes a hash of hashes (what I use mostly). The whatchamacallits (like getline, parse, etc) are not grouped together in such a way as to make it obvious to me.

      So, would you please expand the code so I can see everything you are doing? I am a bit lost.

      No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
      Lady Aleena
        I used new instead of csv because it gives you more control over what you can do. Here, no headers are defined at all, which means the row is split into an array reference $row. The second part of the row is processed by the very same module, as you recognised, to split the string on semicolons—if I understand your comment correctly, you'd need to do that for each column with the + in the general case. The output structure is being built on line 31, where the first part of the line is used as the key, and the result of the secondary processing is used as the value.

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,