http://www.perlmonks.org?node_id=846603

walkingthecow has asked for the wisdom of the Perl Monks concerning the following question:

Hey Monks! I have a question on mapping data, and the best way to do it. I always find a way to do it, but I am almost positive that my way is not the best way. It may work, but I want it to look good too ;).

So, let's assume we have a file, that we for some odd reason cannot sort, and that file contains something like this:

sales:bob@foo.com sales:joe@foo.com retail:steve@bar.com sales:debbie@foo.com sales:john@foo.com support:david@blah.com retail:judy@bar.com support:jose@blah.com
Now, suppose we need to take that and get output like this:
sales:bob@foo.com,joe@foo.com,debbie@foo.com,john@foo.com retail:steve@bar.com,judy@bar.com support:david@blah.com,jose@blah.com
I have always found a way to do it, using hash of hashes, like in the code below:
%hash; while (<>) { my ($dept,$email)=split(/:/); $hash{$dept}{'email'} = $email; }
That is a horrible way to do it, and I know this. This has been one thing that has been very difficult for me in Perl, is wrapping my mind around the best way to map data like this.

Any help/input is really appreciated. Thank you Monks!

By the way, sorry I only provide a short snippet of code, but at the moment I cannot come up with a good way to handle the problem given in the example :(

Replies are listed 'Best First'.
Re: Best Way to Map Data
by toolic (Bishop) on Jun 25, 2010 at 21:01 UTC
    I think it would be better as a Hash-of-Arrays:
    use strict; use warnings; my %hash; while (<DATA>) { chomp; my ($dept,$email)=split(/:/); push @{ $hash{$dept} }, $email; } for (sort keys %hash) { print "$_:", join(',', @{ $hash{$_} }), "\n"; } __DATA__ sales:bob@foo.com sales:joe@foo.com retail:steve@bar.com sales:debbie@foo.com sales:john@foo.com support:david@blah.com retail:judy@bar.com support:jose@blah.com

    Prints:

    retail:steve@bar.com,judy@bar.com sales:bob@foo.com,joe@foo.com,debbie@foo.com,john@foo.com support:david@blah.com,jose@blah.com
      Very simple, very sexy, and quite elegant. Thank you toolic. I know hashes, arrays, hash of hashes, array of arrays, and now thanks to you and the link you provided, I know of the almighty hash of arrays.

      Thank You!

        At some point the penny will drop for you and you will realise that you can generate data structures in Perl to match whatever you are trying to achieve. But first you need to figure out what you want to achieve.

        Indeed, in Perl it is generally trivial to match a structure to the form of your data, although it is not always so easy to manage the structure you've created. Consider for example you have a file containing groups of three lines where the first line is a header, the second line is a list of objects and the third line a mapping between objects and names. You could end up with a structure that looks like:

        my @records = ( ["Record 1", [qw(obj1 obj2 obj3)], {obj1 => ['Sam'], obj3 => ['Bob +']}], [ "Record 2", [qw(obj1 obj5 obj6)], {obj5 => ['Lew', 'Fred'], obj6 => ['Bob']} ], );

        So, what are you going to call that puppy? When your structures start getting that interesting it is time to wrap it up with some code, most likely by turning the elements of @records into Record objects and provide appropriate accessor methods for the various parts of the record.

        True laziness is hard work
        Actually, walkingthecow, I find your code easier to read and understand.
Re: Best Way to Map Data
by jwkrahn (Abbot) on Jun 26, 2010 at 00:49 UTC

    If you don't mind the extra step of removing the trailing comma then a hash should work:

    my %hash; while ( <> ) { my ( $dept, $email ) = split /:/; $hash{ $dept } .= "$email,"; } s/,\z// for values %hash;
Re: Best Way to Map Data
by biohisham (Priest) on Jun 26, 2010 at 05:23 UTC
    Data structures are so interesting of a brain exercise I bet, If you are comfortable around the basic ones (arrays, hashes, hashes of hashes, hashes of arrays...etc) then going ahead to making your own advanced structures is just straightforwardly possible by making combinations of these data structure types. This can be particularly useful to manipulate data in different ways.

    If the order of the data matters to you and that you seek to access data sequentially then an array would be involved in building that data structure, if the order doesn't matter then a hash can be used instead.

    Check Q&A->Data Structures for in more in depth information.


    Excellence is an Endeavor of Persistence. A Year-Old Monk :D .