Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Removing duplicate subtrees from XML

by elusion (Curate)
on Dec 03, 2002 at 01:55 UTC ( #217105=note: print w/replies, xml ) Need Help??


in reply to Removing duplicate subtrees from XML

I can see right now a BIG problem with this line:

if ($XML_process_line =~ /^(\d{1,10})([\%|\<].{1,1000}\>)/){

I don't think it does what you want. First of all, this: [\%|\<]. You use [] and |, I think you want one or the other. If you want to alternate between % and <, use [\%<].

Second, and more important, this: .{1,1000}>. Perl's regexes are greedy, that means that if you do this: "<one></one>" =~ /<(.{1,1000})>/;print $1;, you're going to get one></one printed, because it matches as many characters as possible before stopping.

Instead, you'd want to use [\%|\<][^>]{1,1000}>, which uses a negative character class.

That being said, this is hard to do and even harder to do right, so you should use a module. I would suggest XML::Twig, but there plenty of others as well.

elusion : http://matt.diephouse.com

Update: I also noticed that you use two variables for your line. You assign to $XML_line, but use your regex on $XML_process_line. Remember to use -w and strict.

Replies are listed 'Best First'.
Re: Re: Removing duplicate subtrees from XML
by matth (Monk) on Dec 03, 2002 at 11:27 UTC
    reply to Update. Well spotted. I hand edited that variable in an attempt make the variable names more meaningfull, prior to pasting. Thanks for all the advice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://217105]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (9)
As of 2019-07-18 11:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?