Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

how to compare the 2 large xml file wiith minimun runtime??

by pinkesh_like (Initiate)
on Oct 11, 2013 at 06:43 UTC ( [id://1057835]=perlquestion: print w/replies, xml ) Need Help??

pinkesh_like has asked for the wisdom of the Perl Monks concerning the following question:

hi.. i have an 2 large xml files(around 380MB). can anyone suggest some options by which i can compare the this 2 XML file with minimum runtime??. i need to compare the xml file by comparing some of the keys value in both the hashes likes 'message' , ID or variable_name .. if this values are equal then only the XML file is same keys.. thanks in advance
  • Comment on how to compare the 2 large xml file wiith minimun runtime??

Replies are listed 'Best First'.
Re: how to compare the 2 large xml file wiith minimun runtime??
by Discipulus (Canon) on Oct 11, 2013 at 07:00 UTC
    short question, shorter answer:
    XML::Twig with handlers and flush, populate an %hash, same thing for second file, then play with hashes.

    hth L*
    there are no rules, there are no thumbs..
Re: how to compare the 2 large xml file wiith minimun runtime??
by Anonymous Monk on Oct 11, 2013 at 07:36 UTC
Re: how to compare the 2 large xml file wiith minimun runtime??
by RMGir (Prior) on Oct 11, 2013 at 12:48 UTC
    There may be opportunities to "cheat", if you know something about the processes generating the XML. For instance, are the sections always in the same order? The message id's on the same lines in each section? etc...

    If you have to handle arbitrary XML, or you don't have control of the sources so you can't guarantee that any cheats will remain valid, I'd say find the fastest XML parser you can and go with that...


    Mike
Re: how to compare the 2 large xml file wiith minimun runtime??
by Anonymous Monk on Oct 11, 2013 at 13:33 UTC
    380MB, really, is not "large" for most computers these days, which could easily handle both data structures side-by-side in memory without serious swapping. Therefore you could simply suck the two files into memory and use something like Data::Compare. If you are looking for specific comparisons, XSLT might be useful to "drill down" to exactly what you are looking for.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1057835]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2024-04-25 07:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found