Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

XML Manipulation

by larsen (Parson)
on May 21, 2001 at 23:56 UTC ( [id://82075]=perlquestion: print w/replies, xml ) Need Help??

larsen has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to use XML to organize my Web bookmarks. The first structure that comes to my mind is something similar to:
<bookmark> <link category="Perl" name="Perlmonks" url="http://www.perlmonks.org"> <description>Perl Monastery</description> </link> <link category="Perl" name="Perl.com" url="http://www.perl.com"> <description>Perl Official Site</description> </link> <link category="Macintosh" name="Macity" url="http://www.macity.it"> <description>Macintosh news site (italian resource)</description> </link> </bookmark>
It's link-oriented, so when I want to add another link to the file I can simply use another <link> block filling it with the appropriata informations. So far so good.

Now I'd like to use these XML file to produce a tree of HTML pages containing my links organized by category. What I have to do is to convert a link-oriented structure to a category-oriented one.

I wrote this piece of code:

#!/usr/bin/perl use strict; use XML::Simple; use Data::Dumper; my $bookmark = XMLin( './links.xml' ); # print Dumper( $bookmark ); # Data structure conversion my $bookmark_by_category = {}; foreach my $n (keys %{$bookmark->{link}}) { push @{ $bookmark_by_category->{ $bookmark->{link}->{$n}->{'catego +ry'} }}, { 'Name' => $n, 'Url' => $bookmark->{link}->{$n}->{'url'}, 'Description' => $bookmark->{link}->{$n}->{'description'}, }; } print Dumper( $bookmark_by_category );

It is simple but since:

  1. I'm new to XML manipulation
  2. It seems to me it's a standard problem so I guess there's a standard solution
  3. I don't want to pollute the environment with re-invented wheels...
I ask you if there's that standard solution I was talking about above.

Replies are listed 'Best First'.
Re: XML Manipulation
by OeufMayo (Curate) on May 22, 2001 at 00:33 UTC

    larsen,
    You might be interested to look at a previous node of mine, which uses the XBEL DTD to create a bookmark XML file from Opera's bookmark file.

    XP whoring? What's that?

    <kbd>--
    my $OeufMayo = new PerlMonger::Paris({http => 'paris.mongueurs.net'});</kbd>
Re: XML Manipulation
by aardvark (Pilgrim) on May 22, 2001 at 01:29 UTC
    You may want to give mirod's XML::Twig a look. You can slice and dice your XML in lots of fun and interesting ways.
    use strict; use XML::Twig; my $file = "bookmark.xml"; my $twig = new XML::Twig( TwigHandlers => {'link[@category]' => \&print_link}); $twig->parsefile($file); sub print_link { my ($t, $elt) = @_; print $elt->text . "\n"; my $attributes_hr = $elt->atts; foreach my $attr (keys %$attributes_hr) { print "$attr: $attributes_hr->{$attr} \n"; } }
    You could also generate multiple HTML pages from a single XML file using XSL and XSLT If you are interested in this technique you should give chapter 10 of Michael Kays's XSLT Programmer's Reference a read. He talks about this on page 637. Good Luck.

    Get Strong Together!!

Re: XML Manipulation
by mirod (Canon) on May 22, 2001 at 12:17 UTC

    So here is the XML::Twig version (warning: not tested, I can't this week, let's see how many bugs you find there!)

    #!/bin/perl -w use strict; use XML::Twig; my $MAIN__INDEX = "links_main.html"; # main index, linked to categori +es my $INDEX_SUFFIX = "_links.html"; # used to generate the various f +iles per category my $MAIN_TITLE = "My links"; # Title for the main index my $INDEX_TITLE = "Links for %s"; # printf format for low level in +dex titles my $twig= new XML::Twig); $twig->parsefile( './links.xml'); # load the xml doc in memory my @link= $twig->children( 'link'); # first lets get the categories my %categories; $category{$_->att( 'category')++} foreach (@link); # put the categories in an array, sorted by number of links in descend +ing order my @category= sort { $category{$b} <=> $category{$a} } keys %category # generate the main link page open( MAIN, ">$MAIN_INDEX") or die "$0 cannot open $MAIN_INDEX: $!"; # I know I coulda used CGI.pm... print MAIN qq{<html><head><title>$MAIN_TITLE</title></head> <body><h1>$MAIN_TITLE</h1> <ul>}; foreach my $category (@category) { print MAIN qq{<a href="%s"><li>%s<small> (%s links})</small></a></ +li>}, category_file( $category), $category, $category{$category; } print MAIN qq{</ul></body></html>}; close MAIN; # now let's create the categories # it will be easier if we sort he links by category, # in the same order as the @category list # Hi [merlyn]! @links= map {$_->[1] } sort { {$b->[0] <=> $a->[0] } map { [ $category{$_->att( 'category')}, $_ ] } @link; foreach my $category (@category) { my $category_file= category_file( $category); open( INDEX, ">$category_file") or die "$0 cannot open $category_file: $!"; my $title= sprintf $INDEX_TITLE, $category; print INDEX qq{<html><head><title>$title</title></head> <body><h1>$title</h1> <ul>}; # as the links are ordered we know the links for the # current category are at the beginning of @link my $link= shift @link; while( $link->att( 'category') eq $category) { printf INDEX qq{<li><a href="%s">%s</a> %desc</li>\n", $link->( 'url'), link->( 'name'), $link->att( 'description'); $link= shift @link; } print INDEX qq{ <hr><p align="center"><a href="$MAIN_INDEX">$MAIN_ +TITLE</a></p></body></html>}; close INDEX; } sub category_file { my $category= shift; return lc( $category) . $INDEX_SUFFIX; }

    This design does not really allow for a different way of sorting the categories, you would also need to modify it slightly if you want to have next/previous index links.

    Now the ObNoE (Obligatory Note on Encodings, yes I know it starts like obnoxious ;--). As you seem to have sites from various countries in your link list, I am pretty sure your system will break as soon as you include an accented description: if you have accented characters in a non-UTF-8 encoding (most likely latin1, aka ISO-8859-1 if my memory serves me well, that's what most Western sites use) in your original XML file you will have to add an XML declaration at the top of your document (something like <?xml version="1.0" encoding="ISO-8859-1"?>). This also means that you will not be able to mix encodings (like getting a link to a Japanese site with a shift-JIS encoded description). The output will be UTF-8 encoded, I hope your browser can display it, otherwise you will have to convert everything back to whatever your favourite encoding is, or use the KeepEncoding option when you ceate the XML:Twig object (if you are using a 1-byte encoding like latin1). Welcome to the beautiful world of XML encoding!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://82075]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-03-28 22:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found