http://www.perlmonks.org?node_id=404813

rupesh has asked for the wisdom of the Perl Monks concerning the following question:

Using Hashes would be the best option for this sort of pattern matching I suppose, but I would like to have your point of view.
I have been making reports for senior management for the past 2 days and I've been thinking in the same way for this activity as well. Anyone have any better ideas?

The data is in the form of:
Date(mm/dd/yyyy) Product Region .... 1/2/2003 Sample1 Site1 1/2/2003 Sample2 Site2 2/3/2003 Sample2 Site2 2/3/2003 Sample2 Site1 2/3/2003 Sample3 Site1 2/13/2003 SAmple2 Site4 1/5/2003 Sample1 SITE2 ..... ... ...
As you can see, the date is not sorted.
The case may change for the product and region.
And there may be (usually) multiple entries for the same date - the reason behind this activity.

Requirements:
1. For the year (2003), sort by month.
2. For one month (say Jan), get the product released for every day.
3. Automate that data to get into charts. What is required is a chart for every day for one month. For that day, I would need to show what products are released.

To begin with, I would need the data to understand which day of which month is a product(s) released.

Any thoughts?

Thanks,
Rupesh.

Replies are listed 'Best First'.
Re: Data interpretation
by FoxtrotUniform (Prior) on Nov 03, 2004 at 04:36 UTC
    1. For the year (2003), sort by month.
    2. For one month (say Jan), get the product released for every day.
    3. Automate that data to get into charts. What is required is a chart for every day for one month. For that day, I would need to show what products are released.

    Sounds like a problem for a (relational) database.

    --
    Yours in pedantry,
    F o x t r o t U n i f o r m

    "Anything you put in comments is not tested and easily goes out of date." -- tye


      An RDBMS would be simple and easier to manage I agree. The issue is that the data is in the form of .csv.
      Rupesh.

        You might find DBD::CSV handy, then.

        --
        Yours in pedantry,
        F o x t r o t U n i f o r m

        "Anything you put in comments is not tested and easily goes out of date." -- tye

        So use, DBD::CSV which will allow you to use DBI and SQL on your files and treat them like an RDBMS.
        So what? Load it into a database. I'm going to assume that this is data that will not change, as you mention it's sales data. So, you're probably going to want this in a database anyways.

        Being right, does not endow the right to be rude; politeness costs nothing.
        Being unknowing, is not the same as being stupid.
        Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
        Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

        You mean, there's still a database out there that isn't able to import from the CSV file? Quick! Call the Smithsonian!
Re: Data interpretation
by aquarium (Curate) on Nov 03, 2004 at 10:17 UTC
    Do you need to script this? You can just import that data straight into your favorite spreadsheet and sort by a column or combination of columns, and generate charts while you're there. If you do need to script it, say to separate and pre-sort each month data: just write a custom sort routine and pass the data to the script on STDIN. Hashes could possibly be used, but i doubt they would serve you well...as you are not performing any computation on the data, just sorting it. Besides that, hashes are more suited to data that has unique key indexes....your data does not have any column that can be uniquely indexed. You could combine columns perhaps to create a unique key index, but that would only complicate matters for your sorting.
    the hardest line to type correctly is: stty erase ^H
Re: Data interpretation
by TedPride (Priest) on Nov 04, 2004 at 09:44 UTC
    Something like the following might work. Note that printing a line for each day of the specified month can be difficult when the month is February and you have to figure out whether the year is a leap year or not. Also note that the data is tab delimited, and may or may not come through a cut and paste correctly.
    use strict; use warnings; my ($keya, $keyb, %hash); while (<DATA>) { chomp; my @d = split(/[\/\t]/); $keya = $d[2].$d[0]; $keyb = $d[1]; if (exists($hash{$keya}{$keyb})) { push(@{$hash{$keya}{$keyb}}, \@d); } else { $hash{$keya}{$keyb} = [\@d]; } } my $year = 2003; my $month = 2; my @days = (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31); if (($year % 4) && ($year % 100)) { $days[1] = 29; } for (1..($days[$month-1])) { print sprintf('%02d/%02d', $month, $_)."/$year\n"; for (@{$hash{"$year$month"}{$_}}) { print ' '.@$_[3].' '.@$_[4]."\n"; } print "\n"; } __DATA__ 1/2/2003 Sample1 Site1 1/2/2003 Sample2 Site2 2/3/2003 Sample2 Site2 2/3/2003 Sample2 Site1 2/3/2003 Sample3 Site1 2/13/2003 SAmple2 Site4 1/5/2003 Sample1 SITE2