Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

delete duplicated xml lines

by cibiena (Initiate)
on Aug 08, 2012 at 17:59 UTC ( [id://986334]=perlquestion: print w/replies, xml ) Need Help??

cibiena has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, With xml:libxml and Spreadsheet::ParseExcel and the command " $materialmapping_table_xml->createElement("item"); " I generate lines from excel document. for example:
<item pr_1="a" pr_2="2" internal="co_1" pr_family="2" /> <item pr_1="b" pr_2="2" internal="co_1" pr_family="2" /> <item pr_1="c" pr_2="2" internal="co_1" pr_family="2" /> <item pr_1="a" pr_2="1" internal="co_1" pr_family="2" /> <item pr_1="b" pr_2="1" internal="co_1" pr_family="2" /> <item pr_1="c" pr_2="1" internal="co_1" pr_family="2" /> <item pr_1="a" pr_2="2" internal="co_1" pr_family="2" /> <item pr_1="b" pr_2="2" internal="co_1" pr_family="2" /> <item pr_1="c" pr_2="2" internal="co_1" pr_family="2" />
How I can delte the duplicated xml lines (the first 3 and the last 3 in example)?? output must be:
<item pr_1="a" pr_2="2" internal="co_1" pr_family="2" /> <item pr_1="b" pr_2="2" internal="co_1" pr_family="2" /> <item pr_1="c" pr_2="2" internal="co_1" pr_family="2" /> <item pr_1="a" pr_2="1" internal="co_1" pr_family="2" /> <item pr_1="b" pr_2="1" internal="co_1" pr_family="2" /> <item pr_1="c" pr_2="1" internal="co_1" pr_family="2" />
thanks

Replies are listed 'Best First'.
Re: delete duplicated xml lines
by hbm (Hermit) on Aug 08, 2012 at 18:49 UTC

    I imagine you don't want to delete duplicate lines, you want to avoid creating them. As you walk through your spreadsheet, create an XML line only if the current line hasn't already been seen; and then hash the current line so it won't be duplicated. Something like:

    my %seen; for my $thing ($worksheet->...) { next if exists $seen{$thing}; # create XML element $seen{$thing}++; }
      Thankyou very much for your answer, but sorry I'am novice in perl and is very difficult for me. this is a little part of my code:
      my $materialmapping_table_xml = XML::LibXML->createDocument( "1.0", "U +TF-8"); my $materialmapping_table_xml_root = $materialmapping_table_xml->creat +eElement("masterdata"); $materialmapping_table_xml->setDocumentElement($materialmapping_table_ +xml_root); my $materialmapping_item3 = $materialmapping_table_xml->createElement( +"item3"); $materialmapping_item3->setAttribute(decode('cp1252',$pr_cell_name3->{ +Val}),$pr3); $materialmapping_item3->setAttribute("pr_family",$family); $materialmapping_table_xml_root->addChild($materialmapping_item3);
      and the same for item2... for example, generate this output:
      <item2 duble="1" pr="c" pr_width="1250" pr_family="2" /> <item2 duble="2" pr="c" pr_width="1250" pr_family="2" /> <item2 pr="a" duble="2" pr_width="1250" pr_family="2" /> <item2 pr="b" duble="2" pr_width="1250" pr_family="2" /> <item2 pr="c" duble="2" pr_width="1250" pr_family="2" /> <item3 pr="a" duble="1" pr_width="1250" pr_family="2" /> <item3 pr="b" duble="1" pr_width="1250" pr_family="2" /> <item3 pr="c" duble="1" pr_width="1250" pr_family="2" /> <item3 pr_width="1250" duble="1" pr="c" pr_family="2" /> <item3 pr="a" duble="2" pr_width="1250" pr_family="2" /> <item3 pr="b" duble="2" pr_width="1250" pr_family="2" /> <item3 pr="c" duble="2" pr_width="1250" pr_family="2" /> <item3 pr_width="1250" duble="2" pr="c" pr_family="2" /> <item3 pr_width="1250" pr="c" duble="2" pr_family="2" /> <item3 pr_width="1250" pr="c" duble="2" pr_family="2" /> <item3 pr_width="1250" pr="c" duble="2" pr_family="2" /> <item2 pr="a" duble="3" pr_width="1250" pr_family="2" />
      but the OUTPUT must be:
      <item3 pr="a" duble="1" pr_width="1250" pr_family="2" /> <item3 pr="b" duble="1" pr_width="1250" pr_family="2" /> <item3 pr="c" duble="1" pr_width="1250" pr_family="2" /> <item3 pr="a" duble="2" pr_width="1250" pr_family="2" /> <item3 pr="b" duble="2" pr_width="1250" pr_family="2" /> <item3 pr="c" duble="2" pr_width="1250" pr_family="2" /> <item2 pr="a" duble="3" pr_width="1250" pr_family="2" />
      where I have to put the code? thankyou very much for your precious help

        Please provide all of your code.

Re: delete duplicated xml lines
by Kenosis (Priest) on Aug 09, 2012 at 02:59 UTC

    Although hbm makes a good point, in case you do want to delete duplicate xml lines, here's one way:

    use Modern::Perl; my %seen; say for grep { chomp; !$seen{$_}++ } <DATA>; __DATA__ <item pr_1="a" pr_2="2" internal="co_1" pr_family="2" /> <item pr_1="b" pr_2="2" internal="co_1" pr_family="2" /> <item pr_1="c" pr_2="2" internal="co_1" pr_family="2" /> <item pr_1="a" pr_2="1" internal="co_1" pr_family="2" /> <item pr_1="b" pr_2="1" internal="co_1" pr_family="2" /> <item pr_1="c" pr_2="1" internal="co_1" pr_family="2" /> <item pr_1="a" pr_2="2" internal="co_1" pr_family="2" /> <item pr_1="b" pr_2="2" internal="co_1" pr_family="2" /> <item pr_1="c" pr_2="2" internal="co_1" pr_family="2" />

    Output:

    <item pr_1="a" pr_2="2" internal="co_1" pr_family="2" /> <item pr_1="b" pr_2="2" internal="co_1" pr_family="2" /> <item pr_1="c" pr_2="2" internal="co_1" pr_family="2" /> <item pr_1="a" pr_2="1" internal="co_1" pr_family="2" /> <item pr_1="b" pr_2="1" internal="co_1" pr_family="2" /> <item pr_1="c" pr_2="1" internal="co_1" pr_family="2" />

    Hope this helps!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://986334]
Approved by chacham
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-03-19 11:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found