Parsing two XML at the same time and align them

by corfuitl (Sexton)
on May 17, 2018 at 10:20 UTC

corfuitl has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonks

I have the following problem to solve and need your help!

I have two XML files (that should be identical but not always) and I would to extract some values and align them.

My XML files look like this:

<file original="File_1.xml"> <body> <unit id="id1"> <title>Part 1_file1</title> </unit> <unit id="id2"> <title>Part 2</title> </unit> </body> </file> <file original="File_2.xml"> <body> <unit id="id1"> <title>Part 1</title> </unit> </body> </file>

I would like to align them in this way:

File_1.xml id1 title_value_from_first_xml title_value_from_se +cond_xml File_1.xml id2 title_value_from_first_xml title_value_from_se +cond_xml File_2.xml id1 title_value_from_first_xml title_value_from_se +cond_xml

Any suggestions?

Replies are listed 'Best First'.
Re: Parsing two XML at the same time and align them
on May 17, 2018 at 12:00 UTC
    Note that the XML chunk you posted is not a well-formed XML, as it lacks a root node. I wrapped it into
    <root> ... </root>

    and used XML::LibXML to get the desired output:

    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use XML::LibXML; my @files = @ARGV[0, 1]; my %extracted; for my $xml_file (@files) { my $dom = 'XML::LibXML'->load_xml(location => $xml_file); for my $file ($dom->findnodes('/root/file')) { my $original = $file->{original}; for my $unit ($file->findnodes('body/unit')) { my $id = $unit->{id}; my $title = $unit->findvalue('title'); $extracted{$original}{$id}{$xml_file} = $title; } } } for my $file (keys %extracted) { for my $id (keys %{ $extracted{$file} }) { say join "\t", $file, $id, @{ $extracted{$file}{$id} }{@files} +; } }

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      <>Thank you so much for your help and I apologize for my late response.

      I tested and it works! However, since my XML uses some namespaces it is not possible to parse the file as is but I need to replace them. In header, it has

      <... xmlns:oka="ok-fram:xml-extensions" ...>

      and some elements start with oka i.e. (oka:inputEncoding="US-ASCII")

Re: Parsing two XML at the same time and align them
on May 17, 2018 at 11:49 UTC
    Hello corfuitl,

    > Any suggestions?

    Yes! avoid XML::Simple

    I'm used with XML::Twig and you can profit twig_handler to trap your id

    You can use a hash to store results, as in:  $res{ $filename."\t".$id } = [] so during the parsing of the two files you can push there first result from file1 and then results from file2.


    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Parsing two XML at the same time and align them
on May 17, 2018 at 19:11 UTC
    I am trying to put together something that might work for you, but I am unsure about the input data (there is only one "title value from second xml... does it need to be printed everytime?... am i missing a key piece of evidence here? xD). If you could kindly post some better examples we would all sure appreciate it. Otherwise, I am not to sure about input data. Also, are these the only elements that will be in the input data? Please elaborate the question and include better examples. ;)
