Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Organizing Links - Better to use hash, hash of arrays, or...?

by CalebH (Acolyte)
on Feb 02, 2014 at 00:54 UTC ( [id://1073008]=perlquestion: print w/replies, xml ) Need Help??

CalebH has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, Monks!

Currently, I realize that I have way to many links cluttering my computer. When I find an interesting link, I mail it to myself, which now has my email folder filled with 2000+ links. In order to combat that, I wrote a script (discussed in Perl MIME parser partially works) to pull those links from my email and add them to a logfile.

The problem is that I also have shortcuts from my Firefox bookmarks, Internet Explorer shortcuts from right clicking and bookmarking pages, etc.

What I am trying to do, and need help with, is organizing these links into a logfile. From the email parser I am currently using, I have modified the logging to log 3 things : The link, the title of the page (taken from $Subject), and the date that it was sent (taken from $date).

I basically want to do the following:

- log the urls to a text file (for this example, URLs.txt), in a format similar to Url|Title|Date(s)|Count (If seen multiple times, it would be Url|Title|10/31/05 5:54PM, 01/05/10 10:10AM| Count: 2)

- be able to parse more urls (Bookmarks exported from Firefox, or shortcuts, or future emails), and increment any values seen in the previous logfile (URLS.txt), so that if the link was previously seen it would increment it and add any relative information.

My first thought is to use a hash, like

$seen{$item}{count}++ # to increment the Count push @urls, $item unless $seen{$item} $seen{$item}++ # To store the url
But how would I do this with all of the values I want to save? The above looks like it would store the url, and nothing else in its current form. I figure a push would take care of the dates, but I'm not sure how to implement it. How would I do something so I ended up with something like...
@urls = ( { url => 'http://www.url.com', title => 'The secret to Perl', date => '@dates that were pushed', count => '5' }, { url => 'http://www.google.com/page?id=blah', title => 'Google Search results for blah', date => '01/05/10 7:44PM, 05/10/12 8:12AM', count => '2' );
Apologies for the long rambling post, and I hope I explained everthing well enough. I couldn't exactly figure out how to describe what I needed, so hopefully this got it down!

Replies are listed 'Best First'.
Re: Organizing Links - Better to use hash, hash of arrays, or...?
by Kenosis (Priest) on Feb 02, 2014 at 01:19 UTC

    Consider a hash of arrays of arrays (HoAoA), where the keys are the urls and the associated value is a reference to an array of arrays:

    use strict; use warnings; use Data::Dumper; my %hash; while (<DATA>) { chomp; my ( $url, $subject, $date ) = split /\t/; $hash{$url}->[0] //= $subject; push @{ $hash{$url}[1] }, $date; } local $" = ', '; print "Url|Title|Date(s)|Count\n"; for my $key ( keys %hash ) { print "$key|$hash{$key}->[0]|@{ $hash{$key}[1] }|" . scalar @{ $hash{$key}[1] } . "\n"; } print "\n", Dumper \%hash; __DATA__ http://www.perl.com Perl 01/05/10 7:44PM http://www.google.com/page?id=blah Google Search results for blah + 01/05/10 7:44PM http://www.google.com/page?id=blah Google Search results for blah + 05/10/12 8:12AM http://www.perl.com Perl 01/05/10 7:42AM http://www.perl.com Perl 01/10/10 9:44PMM

    Output:

    Url|Title|Date(s)|Count http://www.google.com/page?id=blah|Google Search results for blah|01/0 +5/10 7:44PM, 05/10/12 8:12AM|2 http://www.perl.com|Perl|01/05/10 7:44PM, 01/05/10 7:42AM, 01/10/10 9: +44PM|3 $VAR1 = { 'http://www.google.com/page?id=blah' => [ 'Google Search res +ults for blah', [ '01/05/10 7:44PM +', '05/10/12 8:12AM +' ] ], 'http://www.perl.com' => [ 'Perl', [ '01/05/10 7:44PM', '01/05/10 7:42AM', '01/10/10 9:44PM' ] ] };
      Thank you, this is EXACTLY what I needed, and it works great.

      I realize that SQLite would have been a better choice, but to be honest I don't know that much about SQL commands, or how to combine Perl with it to pull data out of a DB.

      I agree that it would be a better choice for organizing links, however. But this was just an easier solution temporarily. :)

Re: Organizing Links - Better to use hash, hash of arrays, or...?
by Athanasius (Archbishop) on Feb 02, 2014 at 03:03 UTC

    (Not an answer to the question asked, but more of a meta-observation:)

    As the logfile grows, it too will eventually become unwieldy. Why not adopt the scalable solution from the outset and use a database? DBD::SQLite is self-contained, and provides “A complete DB in a single disk file” — like your logfile, but much easier to grow, edit, and search.

    Just a thought,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Organizing Links - Better to use hash, hash of arrays, or...?
by AnomalousMonk (Archbishop) on Feb 02, 2014 at 06:26 UTC

    I have to agree with Athanasius that a database seems like a better approach, but if you want to continue with your original thought, take a look at the discussion of complex data structures in perldsc. While perhaps not the best approach to this particular problem, such structures and the techniques for using them are, in general, well worth knowing.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1073008]
Approved by boftx
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (2)
As of 2024-04-20 06:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found