Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Pulling from a list and inserting into XML documents

by gng4life (Initiate)
on Jun 22, 2017 at 13:25 UTC ( [id://1193291]=perlquestion: print w/replies, xml ) Need Help??

gng4life has asked for the wisdom of the Perl Monks concerning the following question:

Hello All, First post so go easy on me. I'm getting back into Perl after not touching it for years so I am like a noob. This is what I need to do. I have text file A that three fields, for example IP address, MAC, hostname, and there about 200 of them, comma separated (I can change this to a csv or whatever if easier). I have about 200 XML documents that I need to search for tags and enter those three fields above into that section, save it, open the next document and enter the same three fields, save, repeat for all docs. Here is an example:
File A hosta,1.1.1.1,00000C123456, hostb,2.2.2.2,00000C123457, hostc,3.3.3.3,00000C123458, hostd,4.4.4.4,00000C123459, etc...(about 200 items, will have more later) File 1 . <HOST_NAME></HOST_NAME> <HOST_IP></HOST_IP> <HOST_MAC></HOST_MAC> . . File 2 . <HOST_NAME></HOST_NAME> <HOST_IP></HOST_IP> <HOST_MAC></HOST_MAC> . . File 3 . <HOST_NAME></HOST_NAME> <HOST_IP></HOST_IP> <HOST_MAC></HOST_MAC> . . After all the docs are done, they will look like this... File 1 . <HOST_NAME>hosta</HOST_NAME> <HOST_IP>,1.1.1.1</HOST_IP> <HOST_MAC>00000C123456</HOST_MAC> . . File 2 . <HOST_NAME>hostb</HOST_NAME> <HOST_IP>,2.2.2.2</HOST_IP> <HOST_MAC>00000C123457</HOST_MAC> . . etc.
I have portable strawberry Perl right now on my work computer. If it would be easier to do it in ActivePerl or in Strawberry Perl installed version, let me know and I can spin it up on another workstation. So what is the best way to do this? Thanks for any help!

Replies are listed 'Best First'.
Re: Pulling from a list and inserting into XML documents
by haukex (Archbishop) on Jun 22, 2017 at 14:39 UTC

    Two modules you should look at are Text::CSV and XML::Twig. There are several other good XML modules, but this one is well-suited for processing XML documents as a stream. What is unclear to me is whether your XML input files are all different, because if they're all the same, that means you're basically doing some template processing, and maybe a system like Template-Toolkit might be better for this. Also, it's not clear to me how you want to match up input files to rows of CSV data. Anyway, here's a quick sample to get you started, updating this to read/write from actual files is left as an exercise for the reader ;-)

    use warnings; use strict; use Text::CSV; use XML::Twig; my %data; my $twig = XML::Twig->new( twig_print_outside_roots => 1, twig_roots => { 'HOST_NAME' => sub { $_[1]->set_text($data{host})->print }, 'HOST_IP' => sub { $_[1]->set_text($data{ip} )->print }, 'HOST_MAC' => sub { $_[1]->set_text($data{mac} )->print }, } ); my $XML = <<'END_XML'; # just for demo <root> <HOST_NAME></HOST_NAME> <HOST_IP></HOST_IP> <HOST_MAC></HOST_MAC> </root> END_XML my $csv = Text::CSV->new({binary=>1,auto_diag=>2}); while ( my $row = $csv->getline(*DATA) ) { @data{"host","ip","mac"} = @$row; # here's where you'd need to match up XML file with data row $twig->parse($XML); } $csv->eof or $csv->error_diag; __DATA__ hosta,1.1.1.1,00000C123456, hostb,2.2.2.2,00000C123457, hostc,3.3.3.3,00000C123458,

    Output:

    <root> <HOST_NAME>hosta</HOST_NAME> <HOST_IP>1.1.1.1</HOST_IP> <HOST_MAC>00000C123456</HOST_MAC> </root> <root> <HOST_NAME>hostb</HOST_NAME> <HOST_IP>2.2.2.2</HOST_IP> <HOST_MAC>00000C123457</HOST_MAC> </root> <root> <HOST_NAME>hostc</HOST_NAME> <HOST_IP>3.3.3.3</HOST_IP> <HOST_MAC>00000C123458</HOST_MAC> </root>
Re: Pulling from a list and inserting into XML documents
by thanos1983 (Parson) on Jun 22, 2017 at 14:24 UTC

    Hello gng4life,

    Welcome back to Perl, and welcome to the monastery.

    You do not show us any code that you have tried to create and resolve your problem and from your description I am not 100% sure what you are trying to do. So I will try to propose something in pseudo code and you can correct me if I am wrong.

    From your description I understood that you have one file (File A) with all the data 200 lines.

    First step that I would do would be open the file parse it into an array of hashes (why Array Of Hashes) to keep the order of the lines, (why hash) so you can separate the lines based on 3 keys e.g. (host, IP, MAC). After parsing the file you should have 200 array hashes. In case you do not care about the sequence of the lines then simply use Hash Of Hashes (HoH) and use the line number as a key in case of sorting.

    Step two, get the files to be processed in an array or in something that you can iterate. Create a loop for each file (assuming 200 files exist) and write a very simple script that will search for string (<HOST_NAME>) after that insert value from hash, search for (<HOST_IP>) insert value and last search for (<HOST_MAC>) insert value. At this point if you manage to write a script to do that you are done... :D

    It is not that difficult, start with the basics, post us some code and we will help where ever you get stuck. :D

    Minor note, other Monks maybe they will come up with a smaller or bigger solution proposed to your problem, this is just one way of resolving it out of my mind.

    Update: I created a bit more advanced answer to your question that might confuse you a bit but I think if you examine it will provide you all the information regarding the first step of my answer.

    So what I have created I thought about having multiple files with data in the same format (e.g. file1.txt, file2.txt, etc) and assuming that those multiple files contain your data. I assume that file1.txt is the starting point of your data so I place file1.txt as the last file on script input because inputs are processed in reversed order first is processed the last input and last is the first. What do I mean for example perl test.pl file2.txt file1.txt first file processed is file1.txt and second file processed is file2.txt.

    What the script does, first parses each line of the files (one by one) splits the line based on commas (based on the input that you provided us) and creates a hash with the keys that you said (hostname, ip, mac) with hash slice (Slices). Then simply we create a hash of hashes (HASHES OF HASHES) based on (Variables related to filehandles) and also the second level key is the line number of each file. The script is generic and will work with as many files you provide it.

    The last part of the script is using the module File::Find::Rule to find all (xml) files in the defined directories that you will provide. Then simply combine the script with the answer of fellow monk haukex and you should be just fine. ;)

    #!usr/bin/perl use strict; use warnings; use Data::Dumper; use File::Find::Rule; sub process_file { my %HoH; while (<>) { chomp; my %hash; my @data = split /,/, $_; # split each line on comma my @keys = ('HOSTNAME', 'IP', 'MAC'); @hash{@keys} = @data; # hash slice $HoH{$ARGV}{$.} = \%hash; } continue { close ARGV if eof; } return \%HoH; } sub get_xml_files { my @dirs = ('.'); my $level = shift // 2; my @files = File::Find::Rule->file() ->name('*.xml') ->maxdepth($level) ->in(@dirs); return \@files; } print Dumper process_file(); print Dumper get_xml_files(); __END__ $ perl test.pl file2.txt file1.txt $VAR1 = { 'file2.txt' => { '3' => { 'HOSTNAME' => 'hostf', 'IP' => '6.6.6.6', 'MAC' => '00000C123458' }, '1' => { 'HOSTNAME' => 'hostd', 'MAC' => '00000C123456', 'IP' => '4.4.4.4' }, '2' => { 'HOSTNAME' => 'hoste', 'MAC' => '00000C123457', 'IP' => '5.5.5.5' } }, 'file1.txt' => { '1' => { 'IP' => '1.1.1.1', 'MAC' => '00000C123456', 'HOSTNAME' => 'hosta' }, '2' => { 'HOSTNAME' => 'hostb', 'MAC' => '00000C123457', 'IP' => '2.2.2.2' }, '3' => { 'HOSTNAME' => 'hostc', 'MAC' => '00000C123458', 'IP' => '3.3.3.3' } } }; $VAR1 = [ 'response.xml' ];

    Hope this helps, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: Pulling from a list and inserting into XML documents
by anonymized user 468275 (Curate) on Jun 23, 2017 at 17:06 UTC
    I am biased about XML modules - XML::Twig is one of several I have tried and rejected several times on meeting different XML parsing challenges. The one I like best based on programmer experience is XML::Parser. The fact that an early version was written by Larry Wall might have something to do with it.

    One world, one people

Re: Pulling from a list and inserting into XML documents
by sundialsvc4 (Abbot) on Jun 22, 2017 at 19:44 UTC

    The library that immediately comes to mind, for me, is XML::LibXML, which is a binding for the libxml2 binary library in all of its glory.   I make this recommendation because, not only do you want to parse an XML file, but you also want to modify it and to create a new file containing the modified document.   Your approach would be to parse the XML document into a DOM data-structure, then modify the data-structure, then write it out as a new XML file.   You may also wish to use XPath expressions and/or XSLT as a means of locating items and possibly also of making the necessary modifications to it.   All of which this module can do.   Furthermore, since libxml2 is an industry-standard heavy-hitter, you can be confident that the output files will be universally accepted.

    Of course, XML::Twig is also a well-known power-hitter in the Perl world of XML, and it has a feature called “XML filters” which might also be useful here.   This approach would be more along the line of in-line changes to the XML text, and I think that its applicability to your situation would depend very much on just how precisely you can identify the nodes that need to be modified.   (The perldoc contains an excellent example.)

    Both of these modules are “go to” modules for XML processing in Perl.   (I frankly don’t use XML::Simple very often, but I will mention it here as a third-baseman.)

    As for parsing the input file, I agree that Text::CSV is a solid power-tool for this sort of thing, and I have nothing further to add to the prior comments.

    Oh yes, one more thought:   You list three XML nodes which I presume occur in sequence.   I would also presume that all three of them occur as children of some single node-type which you did not mention in your OP description.   An XPath or XSLT-oriented approach would zero in on this parent node-type and then locate and modify all three of the nodes that are underneath it.   (I am not entirely sure how a Twig filter would do the same thing.)

    And, do be sure to carefully examine how XSLT technology might be applicable to this situation:   it may well prove to be the case that you can bypass a lot of otherwise-messy Perl programming.   This Interactive Periodic Table of the Elements web page was constructed entirely(!!) using XSLT!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1193291]
Approved by stevieb
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-03-28 10:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found