Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Extracting icons from a bookmark file

by mirod (Canon)
on Jun 08, 2012 at 13:08 UTC ( #975159=CUFP: print w/ replies, xml ) Need Help??

Like many, I use a password manager for all my web password. KeePass allows you to add custom icons to each entry, so I thought I would use the favicon that comes with each site. The problem is that, at least in FireFox, the icons are saved in the bookmarks file, base-64 encoded, so there is no easy way to get them. Also KeePass lists the names of the icon files, but doesn't seem to have a preview, so each icon needs to be properly named.

I use the code below to extract the icons, eliminate duplicates, and save each one to a properly named file in the current directory.

I apologize for using XML::Twig (again!), but that's the module I know best.

To get all the files, export your bookmarks in an empty directory, then run the code there.

#!/usr/bin/perl use strict; use warnings; use XML::Twig; use MIME::Base64; my $HEADER= "data:image/([^:]*);base64,"; # image header # if not listed here the type is used as extension (eg png, or jpeg) my %type2ext= ( 'x-icon' => 'ico'); my %seen; # used to avoid duplicates, both for sites and for icons XML::Twig->new( twig_handlers => { 'dt/a[@icon]' => \&save_icon }) ->parsefile_html( 'bookmarks.html'); sub save_icon { my( $t, $a)= @_; my $url= $a->att('href'); my $site= url2name( $url); return if( $seen{$site}); $seen{$site}=1; my $icon= $a->att( 'icon'); return unless $icon=~ s{^$HEADER}{}; # skip if icon is not base-64 + encoded my $icon_type= $1; return if( $seen{$icon}); $seen{$icon}=1; my $ext= $type2ext{$icon_type} || $icon_type; my $file= join '.', $site, $ext; $icon= decode_base64( $icon); warn "adding icon for $site ($icon_type) to $file\n"; open( my $out, '>:raw', $file) or die "cannot create file $file: $ +!"; print {$out} $icon; } # takes a full url and returns a short version of the site name # eg http://http://perlmonks.org/?node=Newest%20Nodes => perlmonks sub url2name { my( $url)= @_; $url=~ s{^https?://}{}; # remove protocol $url=~ s{^www.}{}; # no need for the www. bit $url=~ s{/.*$}{}; # keep only the site name $url=~ s{\.[^.]*$}{}; # ermove the tld return $url; }

Comment on Extracting icons from a bookmark file
Download Code
Reaped: Re: Extracting icons from a bookmark file
by NodeReaper (Curate) on Jun 10, 2012 at 13:03 UTC
Reaped: Re: Extracting icons from a bookmark file
by NodeReaper (Curate) on Jun 12, 2012 at 12:54 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://975159]
Approved by toolic
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2015-07-04 19:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls