which data structure do I need for this grouping problem?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: which data structure do I need for this grouping problem? by 1nickt (Canon) on Sep 04, 2018 at 14:32 UTC
Hi, I'd suggest you build a hash keyed by name and then a sub-hash per name keyed by the date, with the values stored in an array. (To lovers of acronyms this would be a HOHOA (hash of hashes of arrays)). `use strict; use warnings; use feature 'say'; my %result; for my $line (<DATA>) { chomp $line; my ( $name, $date, $val ) = split ' ', $line; push @{ $result{ $name }{ $date } }, $val; } for my $name ( keys %result ) { say $name; for my $date ( keys %{ $result{ $name } } ) { say "\t$date: @{ $result{ $name }{ $date } }"; } } __DATA__ nick 20/5/1950 one john 18/2/1980 two nick 19/6/1978 three nick 20/5/1950 four nick 12/9/2000 five john 15/6/1997 six nick 20/5/1950 seven` [download] Output: `$ perl 1221691.pl john 15/6/1997: six 18/2/1980: two nick 19/6/1978: three 20/5/1950: one four seven 12/9/2000: five` [download] Hope this helps! The way forward always starts with a minimal test.	[reply] [d/l] [select]
Re^2: which data structure do I need for this grouping problem? by bliako (Monsignor) on Sep 04, 2018 at 20:15 UTC
and that would be a HoTHel ! edit: he said tab-separated so `'\t'` probably?	[reply] [d/l]
Re: which data structure do I need for this grouping problem? by haukex (Archbishop) on Sep 04, 2018 at 13:53 UTC
Here's one way: If there is a character that you know doesn't occur in either the name or the date (like a tab), you can use that to separate the two and make a single hash key out of it. Note that in the sample data that you've posted here, you don't have any tabs, so I've had to guess that all of your columns are separated by tabs. However, in that case, `my ( $name, $rest ) = split /\t/;` is only grabbing the first two columns. Also, you probably want to chomp your lines. `use strict; use warnings; my %res; while (<DATA>) { chomp; my ( $name, $date, @rest ) = split /\t/; push @{ $res{"$name\t$date"} }, @rest; } for my $key ( sort keys %res ) { my ( $name, $date ) = split /\t/, $key, 2; print "$name,$date:", join( "\|", @{ $res{$key} } ), "\n"; } __DATA__ nick 20/5/1950 one john 18/2/1980 two two and a half nick 19/6/1978 three nick 20/5/1950 four nick 12/9/2000 five john 15/6/1997 six nick 20/5/1950 seven eight` [download] Output: `john,15/6/1997:six john,18/2/1980:two\|two and a half nick,12/9/2000:five nick,19/6/1978:three nick,20/5/1950:one\|four\|seven\|eight` [download] In the above code, using `\t` to separate the hash key is safe, because of the `split /\t/` I know that none of the strings will contain tabs. If you choose a separator character of which you're not sure if it's contained in the strings, like say `\|`, you may want to add a check like `die $name if $name=~/\\|/; die $date if $date=~/\\|/;` to play it safe. Also, you can use a separator that is very unlikely to appear, like `$/` or `\0` (but again, if you want to code defensively, check for its presence anyway). (Update: $/ is the input record separator, which chomp removes for you. Also made minor fix to the latter two regexes.) Update 2: I should also mention that using a module like Text::CSV is generally better for reading this kind of data, because it handles things like quoted fields and escaped characters for you (also install Text::CSV_XS for speed).	[reply] [d/l] [select]
Re^2: which data structure do I need for this grouping problem? by Anonymous Monk on Sep 04, 2018 at 20:49 UTC
There is a built-in mechanism for this: see `perldoc -v '$;'`.	[reply] [d/l]
Re^2: which data structure do I need for this grouping problem? by Anonymous Monk on Sep 04, 2018 at 14:03 UTC
Interesting approach, thank you!	[reply]
Re: which data structure do I need for this grouping problem? by kevbot (Vicar) on Sep 05, 2018 at 04:41 UTC
Hello, I see that you have already received good replies. Here is another way to perform this task. The Data::Table module has many useful methods for manipulating tabular data. In this case, the `group` method is applicable. The data.tsv file contains the following tab-delimited data `nick 20/5/1950 one john 18/2/1980 two nick 19/6/1978 three nick 20/5/1950 four nick 12/9/2000 five john 15/6/1997 six nick 20/5/1950 seven` [download] This code will group the data, and prepare the concatenated values. #!/usr/bin/env perl use strict; use warnings; use Data::Table; # Load input data from tsv file # The first argument is the file name # The second argument specifies that there is no header row (in this + case # the Data::Table object that is created will have auto-generated co +lumn # names of col1, col2, etc. my $dt = Data::Table::fromTSV('data.tsv', 0); print "The input table is:\n"; print $dt->tsv, "\n\n"; # Group by 'col1' and 'col2' my $output_t = $dt->group( ['col1', 'col2'], # columns to group by ['col3'], # Columns to perform calculation on [ \&join_vals ], # Apply join_vals function to values found in 'co +l3' ['values'] # Put the joined values into these columns ); print "The output table is:\n"; print $output_t->tsv, "\n\n"; sub join_vals { my @data = @_; return join("\|", @data); } exit; [download] The output should be, `The input table is: col1 col2 col3 nick 20/5/1950 one john 18/2/1980 two nick 19/6/1978 three nick 20/5/1950 four nick 12/9/2000 five john 15/6/1997 six nick 20/5/1950 seven The output table is: col1 col2 values nick 20/5/1950 one\|four\|seven john 18/2/1980 two nick 19/6/1978 three nick 12/9/2000 five john 15/6/1997 six` [download]	[reply] [d/l] [select]
Re: which data structure do I need for this grouping problem? by afoken (Chancellor) on Sep 05, 2018 at 17:20 UTC
The standard solution for handling files containing character-separated values (CSV), including tabulator separated values, is to use Text::CSV and - if possible - its accelerating companion Text::CSV_XS. It is "the standard" because it not only splits (and joins) on the separating character(s), but also handles quoting, escaping, and all of those nasty edge cases you can find in CSV files. If you are used to work with relational databases and DBI, try DBD::CSV. It sits on top of Text::CSV and allows you to treat CSV files like database tables in a relational database. In other words: You can use SQL to work directly with CSV files. All of those modules are currently maintained by our helpful Tux. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]


Keep It Simple, Stupid
	PerlMonks