Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Creating a Multi Level Hash from CSV

by workInProgress12 (Novice)
on May 12, 2021 at 19:11 UTC ( [id://11132513]=perlquestion: print w/replies, xml ) Need Help??

workInProgress12 has asked for the wisdom of the Perl Monks concerning the following question:

New to perl here and have been trying to figure this one out for a couple days now. I think I need nested forall loops or something along those lines. The task that needs to be done is to assign the columns to a multi level hash. So if my csv file was something like something below. Important to note that there are repeated values, which is the case for the provided sample output, (ie info 01 = info11, info02 = info12, etc.), but this is not always the case.
header1,header2,header3.... info01,info02,info03... info11,info12,info13.. : : :
The hash would look like (with the appropriate brackets)
$VAR1 = { info01 => { info02 => { info03 => { info11 => { info12 => { info13 => {
What I have right now:
# MODULES use strict; use warnings; use Pod::Usage; use Data::Dumper; use Getopt::Long; use File::Basename; use Cwd 'abs_path'; use Data::Dumper qw(Dumper); use Text::CSV; my $csv = Text::CSV->new({ binary => 1, auto_diag => 1, sep_char => ',' }); my @columns; open(my $input, '<:utf8',"input.csv") or die; while (<$input>){ $csv->parse($_) or die "parse() failed: "; my @data = $csv->getline($input); for my $i (0..$#data) { # push @{$columns[$i]}, $data[$i]; push @{$columns[$i]}, $data[$i]; } } close $input; my %hash = map {shift @$_ => $_} @columns; use Data::Dumper; print Dumper(\%hash);
This is giving me my values all in one line and not how is required

Replies are listed 'Best First'.
Re: Creating a Multi Level Hash from CSV
by tybalt89 (Monsignor) on May 12, 2021 at 20:04 UTC

    Something like this? I faked an input file because you did not include a valid input file.

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11132513 use warnings; my %hash; while( <DATA> ) { my ( $top, @rest ) = split /,|\n/; # FIXME faking CSV for testing $hash{$top} = chain(@rest); } use Data::Dump 'dd'; dd \%hash; sub chain { return @_ > 1 ? { shift() => chain(@_) } : shift(); } __DATA__ a,b,c,d,e f,g,h,i,j xx,yy,zz this,is,a,strange,type,of,data,organization

    Outputs:

    { a => { b => { c => { d => "e" } } }, f => { g => { h => { i => "j" } } }, this => { is => { a => { strange => { type => { of => { data => "organizatio +n" } } } }, }, }, xx => { yy => "zz" }, }

      Or you could do it functionally rather than explicit recursion using List::Util's reduce (not that that's likely to be of any help to trolls who're unfamiliar with a bog simple recursive sub . . .).

      ## (reduce #(hash-map %2 %1) 1 (reverse [:foo :bar :baz :quux])) use List::Util qw( reduce ); use Data::Dumper qw( Dumper ); my $hash = reduce { +{ $b => $a } } 1, reverse( qw( foo bar baz quux ) + ); say Dumper( $hash ); __END__ $VAR1 = { 'foo' => { 'bar' => { 'baz' => { 'quux' => '1' } } } };

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

      This was so helpful, thank you so much I've really been struggling. I was wondering if you could explain the sub chain{} function? I'm not at all sure what you're doing there, but it works! Also do you have any advice as to what I should do if I want to remove the first row so the header doesn't do what the rest of the hash is doing. Again, really appreciate all your help.
        ... explain the sub chain{} function ...

        The"Perl Magick" here is recursion; it is not peculiar to Perl in any way.

        sub chain { return @_ > 1 ? { shift() => chain(@_) } : shift(); }
        If there are two or more items in the @_ function argument array (@_ > 1), { shift() => chain(@_) } executes. shift takes the top argument off of @_ and makes it a hash key with the value of whatever the call to chain(@_) returns. Note that chain(@_) is called with whatever remains in the @_ array after one item has been removed by the shift call. This key/value pair is then returned within an anonymous hash reference.

        If there are fewer than two items in the @_ array, the call to chain(@_) simply returns the top item in the array. Note that this does not "properly" handle the case in which the @_ array is empty. What is "proper" handling in this case (if it can even arise)? Only you can figure this out.

        Win8 Strawberry 5.8.9.5 (32) Thu 05/13/2021 15:26:20 C:\@Work\Perl\monks >perl -Mstrict -Mwarnings my %hash; # my ( $top, @rest ) = split /,|\n/; # FIXME faking CSV for testing my ($top, @rest) = qw(foo); $hash{$top} = chain(@rest); use Data::Dump 'dd'; dd \%hash; sub chain { return @_ > 1 ? { shift() => chain(@_) } : shift(); } ^Z { foo => undef }

        ... remove the first row so the header doesn't do what the rest of the hash is doing.

        One might do the following:
            open(my $input, '<:utf8',"input.csv") or die;

            <$input>;  # read and discard one line/record
        (I think Text::CSV can do this for you (it does just about everything else :), but you'll have to check for yourself.)


        Give a man a fish:  <%-{-{-{-<

      A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Creating a Multi Level Hash from CSV
by Marshall (Canon) on May 12, 2021 at 19:48 UTC
    Please forgive me if I ask this, but what are you trying to accomplish?
    Why do you think that you need the data structure that you are asking for?
    How does the rest of the code use this?
    I suspect that what you are asking for is not what you really need.
    We call this an X-Y problem.
    Please enlighten me.

    Perl is amazing. If you "back up" just a bit to explain a bit more about the application, my suspicion is that you will get some ideas that you haven't even considered.

    You give:

    info01,info02,info03... info11,info12,info13..
    And then:
    $VAR1 = { info01 => { info02 => { info03 => { info02 => { info12 => { info13 => {
    What happened to info11? Why does info02 become a higher level hash key?

    I re-read your problem statement. Sounds like you have some of what I would call aliases. Perhaps "Cathy Smith" is the same person as "Cathy Smith Jones" or whatever. I am not sure what is going on.

      OP definitely needs to elaborate for a better answer, but I've seen this kind of CSV-ish structure used once or twice to define something like a tree structure (not that it's a great idea, more there was a CSV hammer and hence . . .).

      Edit: Just to be slightly more concrete, they had something like this YAML

      --- Top A: - A1 - A2: - A2.1 - A2.2 Top B: - B1: - B1.1 - B1.2

      And they represented as something like:

      Top A, Top A,A1, Top A,A2, Top A,A2,A2.1 Top A,A2,A2.2 Top B, Top B,B1 Top B,B1,B1.1 Top B,B1,B1.2

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

      I'm really sorry about the confusion, fixed the sample output now. I'm supposed to print the CSV file separated with | as the end goal, which I realize could just be done with a simple split and join statement (which I was able to do) but was asked specifically to do with multi level hash, gotta do what the boss asks I guess.
        ... fixed the sample output ...

        Further to Marshall's post:

        ... it is good form to preserve the original version ...
        Please see How do I change/delete my post? The bottom line: Please do not destroy context.


        Give a man a fish:  <%-{-{-{-<

        I see your updates. BTW, when you update your post, it is good form to preserve the original version so later readers can figure out what is going on. You can "hide" say a previous section with <readme> <readmore>tags. Note: I made a typo and works out to be a demo of method of updating a post...the <strike> tags!

        I am still confused by this: Important to note that there are repeated values, which is the case for the provided sample output, (ie info 01 = info11, info02 = info12, etc.), but this is not always the case.. As you can see there have been multiple interpretations of your problem statement. My interpretation is shown below which allows for duplicated values in the CSV line(s).

        In general, the more you tell us about your problem, the more helpful the Monks can be. I didn't understand any more about what you are actually doing with this data structure. You may find that is a very awkward thing to work with.

        I would be curious to know how close my "crystal ball" got.

        use strict; use warnings; use Data::Dumper; my %hash; foreach my $line (<DATA>) #simulated simple CSV file { my @cols = (split /,|\n/,$line); my $href = \%hash; while (my $col = shift @cols) { $href->{$col} = {} unless (exists ($href->{$col}) and keys %{$href->{$col}} ); $href = $href->{$col}; } } print Dumper \%hash; =OUTPUT: $VAR1 = { 'x' => { '5' => { '6' => {} } }, 'a' => { 'x' => { 'e' => {}, 'm' => {}, 'c' => {} }, 'b' => { 'c' => {} }, 'y' => { 'h' => {} } } }; =cut __DATA__ a,x,c a,x,e a,y,h a,x,m a,b,c x,5,6
Re: Creating a Multi Level Hash from CSV
by 1nickt (Canon) on May 13, 2021 at 11:16 UTC

    Hi,

    Important to note that there are repeated values, which is the case for the provided sample output, (ie info 01 = info11, info02 = info12, etc.), but this is not always the case.

    Although your sample data does not correspond with this comment, you should consider that hashes have unique keys.

    use strict; use warnings; use feature 'say'; use Data::Dumper; my %hash; while (<DATA>) { chomp; my @items = split ','; $hash{ $items[0] } = $items[1]; } say Dumper \%hash; __DATA__ a,b a,c x,y
    Output:
    $ perl 11132513.pl $VAR1 = { 'a' => 'c', 'x' => 'y' };

    Hope this helps!


    The way forward always starts with a minimal test.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11132513]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (6)
As of 2024-03-29 09:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found