Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Hello XML fans, it's time to do some Prolog-like search and query on a small XML database. What is shown below is an adjacency map. It is an XML document which shows which cities are next to which other cities. The utility of such a document/data structure can be imagined to be if a person had an inter-city travel ticket and wanted to look up which cities were next to his. Which cities were two away, etc, etc.

While you could use normal nested Perl data structures to deal with this, XML is becoming en vogue and as a result we have to be just as fashionable. Actually, this isn't true, we can always use Gisle Aas' Data::XMLDumper to convert XML to-and-for Perl nested data structures. But for the purpose of this tutorial, we will act like that module doesn't exist.

So without further adieu, I present the XML document detailing the (far too windy) part of the world I currently live in (and will be escaping from as soon as Christmas is here):

<border_list> <pair><city>mountain view</city><city>sunnyvale</city></pair> <pair><city>mountain view</city><city>palo alto</city></pair> <pair><city>menlo park</city><city>palo alto</city></pair> <pair><city>atherton</city><city>menlo park</city></pair> <pair><city>atherton</city><city>redwood city</city></pair> <pair><city>san carlos</city><city>redwood city</city></pair> <pair><city>san carlos</city><city>belmont</city></pair> <pair><city>hillesdale</city><city>belmont</city></pair> <pair><city>hillesdale</city><city>san mateo</city></pair> </border_list>

Ok, so now what

So, now that I have shown the data, it is time to grok it, munge it, eat it for breakfast as a meal replacement and basically put it at it's knees to do our bidding.

Program One: find all cities next to menlo park

Ok, here is a program to grok this XML-base for all cities next menlo park:
use XML::Twig; my $t = XML::Twig->new(PrettyPrint => 'record'); $t->parsefile('adj.xml'); my $root = $t->root; # @pair has all the pairs of adjacent cities in it my @pair = $root->children; # target city we are looking for my $city = 'menlo park'; # this routine takes a search text and a list of XML elements and # searches them for the text sub candidate_generator { my ($search_text, @data) = @_; grep { grep { $_->text eq $search_text } $_->children } @data; } # take the entire XML-base and search for records which have our # target city in them my @adj = candidate_generator($city,@pair); # print them out in a human-readable form map { $_->print } @adj;
and here is the pretty output:
<pair> <city>menlo park</city> <city>palo alto</city> </pair> <pair> <city>atherton</city> <city>menlo park</city> </pair>

all done

The program was documented, so it should make sense, but let's take a closer look at candidate_generator().
# this routine takes a search text and a list of XML elements and # searches them for the text sub candidate_generator { my ($search_text, @data) = @_; grep { grep { $_->text eq $search_text } $_->children } @data; }
It consists of two nested greps and hence can be a little confusing. Depending on the way you think you might want to think about the outer grep and then the inner grep or vice versa. It is only fitting that I discuss both methods of program comprehension.

Let's do top-down first. The outer grep is basically saying: take all the XML records and only return the ones which satisfy the inner search criteria. The inner search criteria takes each individual XML record and looks at each of it's children, where each child is a city and examines its text for equality with the text to be searched for, or concretely speaking menlo park.

Ok, now bottom up. The innermost expression is  $_->text eq $search_text and what this does is take an XML element and get its text and compare it to a normal Perl string. So if $elt was an XML::Twig::Elt representing

<city>boise</city>
then $elt->text would be boise. Now we work out a bit more. And a bit more out is  grep { YADAYADA } $_->children So here we take advantage of the fact that the XML is structured so that neighboring cities are both children of the pair element, e..g:
<pair><city>mountain view</city><city>sunnyvale</city></pair>
and we are just checking to see if either child is the text we are looking for. And now we finally make it to the outer grep and the first sentence in the top-down description says what that is doing.

th-th-th-that's the first post, folks

Anyway, that was the first in a series of 3 posts. The next two will do slightly more advanced searching and in the process introduce a call or two more from the XML::Twig API.

In reply to Adjacency List Processing in XML::Twig by princepawn

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others exploiting the Monastery: (5)
    As of 2014-11-28 02:02 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My preferred Perl binaries come from:














      Results (192 votes), past polls