|
Item Description: Good introduction to XML processing with Perl
Review Synopsis:
- XML and Perl
- Authors: Mark Riehl and Ilya Sterin
- ISBN 0-7357-1289-1
- Publisher: New Riders Publishing
- Released: 2002-10-14
One of Perl's great strengths is in processing text files. That is,
after all, why it became so popular for generating dynamic web pages -
web pages are just text (albeit text that is supposed to follow particular
rules). As XML is just another text format, it follows that Perl will be
just as good at processing XML documents. It's therefore surprising that
using Perl for XML processing hasn't recieved much attention until
recently. That's not saying that there hasn't been work going on in that
area - many of the Perl XML processing modules have long and honourable
histories - it'd just that the world outside of the Perl community doesn't
seem to have taken much notice of this work. This is all set to change
with the publication of this book and O'Reilly's Perl and XML.
XML and Perl is written by two well-known members of the
Perl XML community. Both are frequent contributors to the "perl-xml"
mailing list, so there's certainly no doubt that they know what they
are talking about. Which is always a good thing in a technical book.
The book is made up of five sections. The first section has a
couple of chapters which introduce you to the concepts voered in the
book. Chapter one introduces you separately to XML and Perl and then
chapter two takes a first look at how you can use Perl to process XML. This
chapter finishes with two example programs for parsing simple XML
documents.
Section two goes into a lot more detail about parsing XML
documents with Perl. Chapter three looks at event-driven parsing using
XML::Parser and XML::Parser::PerlSAX to demonstrate to build example
programs before going to talk in some detail about XML::SAX which is
currently the state of the art in event-driven XML parsing in Perl. It
also looks at XML::Xerces which is a Perl inteface to the Apache
Software Foundation's Xerces parser. Chapter four covers tree based
XML parsing and presents examples using XML::Simple, XML::Twig, XML::DOM
and XML::LibXML. In both of these chapters the pros and cons of each of
the modules are discussed in detail so that you can easily decide which
solution to use in any given situation.
Section three covers generating XML documents. In chapter five
we look at generating XML from text sources using simple
print statements and also the modules XML::Writer and
XML::Handler::YAWriter. Chapter six looks at taking data from a
database and turning that into XML using modules like XML::Generator::DBI
and XML::DBMS. Chapter seven looks at miscellaneous other input formats
and contains examples using XML::SAXDriver::CSV and
XML::SAXDriver::Excel.
Section four covers more advanced topics. Chapter eight is about
XML transformations and filtering. This chapter covers using XSLT to
transform XML documents. It covers the modules XML::LibXSLT,
XML::Sabletron and XML::XPath.
Chapter nine goes into detail about Matt Sergeant's AxKit, the
Apache XML Kit which allows you to create a website in XML and
automatically deliver it to your visitors in the correct format.
Chapter rounds off the book with a look at using Perl to create
web services. It looks at the two most common modules for creating web
services in Perl - XML::RPC and SOAP::Lite.
Finally, section five contains the appendices which provide more
background on the introductions to XML and Perl from chapter one.
There was one small point that I found a little annoying when reading
the book. Each example was accompanied with a sample of the XML documents to
be processed together with both a DTD and an XML Schema definition for the
document. This seemed to me to be overkill. Did we really need both DTDs and
XML Schemas for every example. I would have found it less distracting if one
(or even both) of these had been moved to an appendix.
That small complaint aside, I found it a useful and interesting book.
It will be very useful to Perl programmers (like myself) who will increasingly
be expected to process (and provide) data in XML formats.
Update: Added book details.
Re: XML and Perl by mirod (Canon) on Jan 27, 2003 at 20:49 UTC |
I must say I did not quite like this book. It is not that it is awful, it's just that it rubbed me the wrong way I guess.
I have objective complaints: some of the examples I have checked are not as robust as they should be, some (minor) facts are wrong,
and most importantly some important points are not even discussed. I also have some more subjective problems with this book which overall made it a not-so-enjoyable read.
Overall the book can be useful as a source of commented code examples though, and it explains properly the various processing models available (tree vs event). I just think that it could really have used some more reviewing and editing.
Missed points
The book does not discuss encodings at all. This is a major problem, as in my experience most of the problems beginners have with XML come from misunderstanding encodings. If your data is either US-ASCII or UTF-8 and will be in the future, then encodings won't be too much of a problem. In real life this seems to be rarely the case, so I would expect a book about Perl and XML to give you an idea why your parser dies mysteriously when fed a French name and what to do in this case.
A whole chater is dedicated to modules that interface with XSLT processors but there is no discussion on why you woud want to use XSLT as opposed to Perl, and how to choose when to use one or the other (or use both in cooperation). This is the kind of high level introduction that I would have hoped to find in this book.
Examples
Some of the examples lack robustness.
Why do people insist on advocating the DOM as a valid tool for generic XML transformation is beyond my grasp!
The example proposed in the book gets it half right by testing in its main loop whether a child is really an element, but a well-placed comment would still break it when it then blindly assumes that the first child of an element node is the text of the element (a comment or processing instruction would break that assumption). Ironically davorg's own (excellent) Data Munging with Perl had the same problem ;--). Tony Darugar's excellent article Effective XML processing with DOM
and XPath in Perl gives a detailed analysis of the kind of problem you run into when using the DOM on real projects.
I gave up testing the examples after a while but I believe that the XML::LibXML example can also be broken with differently formatted XML or extra comments.
The whole chapter (5) advocating the use of XML::Writer above plain print statements completely misses the real reasons why you should use the module (it escapes XML special characters). Instead it focusses on a really contrived discussion against print (and even states that you cannot have a multi-line print, which is false).
Miscellaneous problems
The book gets a host of details wrong, which is not crippling but gets irritating after a while:
The book gives the impression that Perl
is a good choice for processing XML because it is very good at processing
text. In fact Perl's strength with XML depends mostly on modules... written
in C (or based on C parsers) XML::DOM was NOT written by
TJ Mather, but by Enno Derksen. This is even
mentionned in the Perl and XML FAQ,
use is not a pragma, strict, in use strict; is a pragma
XML::XPath is listed in the XSLT chapter, with no mention that it is NOT an XSLT processor.
The annex titled "Perl Essentials" promises to be a Perl 101 but only explains how to install
Perl modules (it does a good job at explaining it BTW)
Style problems
I am not a fan of giving an entire listing of an example then repeating the
example in-extenso, broken-up in commented sections. I'd rather have the
complete code only available for download on a web-site and not take up page space (the book web site is not up BTW). As davorg mentioned the use of both DTDs and W3C Schemas is unnecessary, especially as W3C Schemas support in Perl is in its infancy (and did not exist when the book was published, see XML::Schema). Finally I found the tone of the book a little too didactic for me: "I have shown" and "I will demonstrate" are repeated over and over again. The book also goes from using 'I' to using 'we' a couple of times.
| [reply] |
|
Michel,
Respectfully we all know you have a bee in your bonnet about the DOM ;-). But regardless of the problems processing XML with the DOM brings up, I'm not entirely sure that covering those problems in the presented code would be the best way to do it -- I can't stand books that present massively long examples -- I'd much rather be given something simple I can build on. Perhaps a box-out would be better though. I haven't read this book yet, but I will be sure to suggest that to Ilya as a change for the second edition.
As far as encodings go, I seem to recall reading that the book covers this by simply stating that all XML parsers return their data in UTF-8, regardless of the input encoding, but then I've only flicked through it on the bookshelves so I can't be sure. I get the feeling that because XML::Twig deals with the encodings issue by messing with the original_string (which is possibly scarier than the alternatives of just leaving things in UTF-8) that you think it's terribly important that this be covered in detail, when I think that in the majority of situations people need to come out of their encoding-specific shells and get used to the world of unicode. I'd treat someone coding in perl4 style the same way.
Regarding discussing why you would want to use XSLT, I'd rather keep this out of a technical book. This is a wishy washy issue, and I'd rather just get into the code, thanks. I guess mileage varies on this - personally I prefer nutshell-style books that just get down and dirty without any discussion of the whys.
Overall I think your response is rather damning of what is a much better book than "Perl and XML" from O'Reilly, and given the choice of the two I'd pick this book any day.
All, respectfully, IMHO ;-)
| [reply] |
|
Overall I think your response is rather damning of what is a much better book than "Perl and XML" from O'Reilly, and given the choice of the two I'd pick this book any day.
I agree. I wasn't impressed with O'Reilly's Perl and XML either. I thought it's examples were far too simple and it spent too much time on trivial issues while only providing far too brief discussion of the important points.
XML and Perl is definately an improvement, despite what opinions people many have on the DOM :)
| [reply] |
Re: XML and Perl by Anonymous Monk on Jan 28, 2003 at 11:45 UTC |
Nice review :)
Second time reading through this I realized some extra clarification as to what book you're reviewing would be helpful. The title is rather generic, so just in case another comes along:
- XML and Perl
- Authors: Mark Riehl and Ilya Sterin
- ISBN 0-7357-1289-1
- Publisher: New Riders Publishing
- Released: 2002-10-14
Hope I got the right one :)
| [reply] |
|
Excellent point. I made the assumption that the site automatically build that info from the ISBN number. It probably should :)
I've taken the liberty of stealing the details from your post and putting them into mine. Hope you don't mind.
--
<http://www.dave.org.uk>
"The first rule of Perl club is you do not talk about
Perl club." -- Chip Salzenberg
| [reply] |
|
| [reply] |
Re: XML and Perl by Ryszard (Priest) on Jan 28, 2003 at 19:16 UTC |
We got this book at work, and having never worked with XML before I found the book not really that helpful. I found after reading a couple of chapters, skimming, looking at the examples my understanding of XML and how to do XML processing in perl was only a little better than what it was before the book.
In order to complete a basic level of understanding, i got out #!/usr/bin/perl and started writing away.
I personally would want something that started very basic, and went up the scale to complex and complete real world examples of XML based applications.
It would be my opinion this book would not make a good addition to a library... | [reply] [d/l] |
Back to Reviews
|
|