Hi,
I'm trying to write a script to reconstruct directory structures and file names that are described by an XML file, however I'm meeting with mixed success. As far as Perl scripting goes I'm still using my training wheels.
At work we have an application that basically archives directories, and files by renaming all the files and directories into MD5 hash names, then tossing the lot into a single directory. It writes the description of "which file goes where" into an XML document.
Unfortunately, it also tosses in about 10 attributes for every item, only two of which I really need, those being the original name, and the MD5 equivalent name. I found an example of a script that does something similar, and was able to modify it for my needs. The script doesn't seem to like anything complex, like my XML document though. It spits the output out as one long unbroken string of MD5 names, followed by another unbroken string of file names.
It is getting the directory structures right, but just mashing everything in the directory together. I'm just not understanding how to make the script isolate individual attributes correctly for each XML element. Here is an example of the XML data structure:
<ncp_directory op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" upol="0"
+ uprs="0" vpol="1" vnipol="1" rpol="1" user_specific="0" ntperm="0" n
+ame="$dir1" flags="" lm="129232888600305382" cr="129232888600305382"
+>
<ncp_directory op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" upol
+="0" uprs="0" vpol="1" vnipol="1" rpol="1" user_specific="0" ntperm="
+0" name="CutePDFWriter" flags="" lm="129232886309260678" cr="12923271
+1066448490" >
<ncp_file op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" upol="0
+" uprs="0" vpol="1" vnipol="1" rpol="1" name="cpwmon2k.dll" length="8
+7552" md5="27A8QATED9I2Ox8F65OGEPPDCIV" flags="a" lm="129018983800000
+000" cr="129232711245774126" gac_register_op="SAME" register="false"
+/>
<ncp_directory op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" up
+ol="0" uprs="0" vpol="1" vnipol="1" rpol="1" user_specific="0" ntperm
+="0" name="converter" flags="" lm="129232881029776793" cr="1292328706
+12045954" >
<ncp_directory op="ADD" ipol="0" iprs="0" uppol="0" upprs="0"
+upol="0" uprs="0" vpol="1" vnipol="1" rpol="1" user_specific="0" ntpe
+rm="0" name="GPLGS" flags="" lm="129232870625951047" cr="129232870612
+202191" >
<ncp_file op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" upo
+l="0" uprs="0" vpol="1" vnipol="1" rpol="1" name="gsdll32.dll" length
+="2768896" md5="5F7UGLCH9K3GKxBNML1LM0G3RNL" flags="a" lm="1274070452
+20000000" cr="129232870614545746" gac_register_op="SAME" register="fa
+lse" />
<ncp_file op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" upo
+l="0" uprs="0" vpol="1" vnipol="1" rpol="1" name="a010013l.pfb" lengt
+h="69958" md5="7EDJ7V7QHMBQ1x6HLC54FG0OP6T" flags="a" lm="12685496594
+0000000" cr="129232870612202191" gac_register_op="SAME" />
<!--- truncated for brevity sake-->
<ncp_file op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" upo
+l="0" uprs="0" vpol="1" vnipol="1" rpol="1" name="z003034l.pfb" lengt
+h="113405" md5="D6I2GGENUCQLEx6FMO1IPG1E8F7" flags="a" lm="1268541248
+40000000" cr="129232870625951047" gac_register_op="SAME" />
<ncp_file op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" upo
+l="0" uprs="0" vpol="1" vnipol="1" rpol="1" name="zeroline.ps" length
+="2567" md5="FETPJPBOOF039xCCTQFGII9DNN0" flags="a" lm="1265889176800
+00000" cr="129232870625951047" gac_register_op="SAME" />
</ncp_directory>
<ncp_file op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" upol=
+"0" uprs="0" vpol="1" vnipol="1" rpol="1" name="GSSetup.exe" length="
+122880" md5="E61K8P45E8D81x3T3E47C8QIP0U" flags="a" lm="1277517870000
+00000" cr="129232870612045954" gac_register_op="SAME" />
</ncp_directory>
As you can see, it goes
Directory name
contained files with attributes
/end directory tag etc.
Here is my script:
use XML::XPath;
my $file = 'ncpobjs.xml';
my $xp = XML::XPath->new(filename => $file);
foreach my $ncptype ($xp->find('//ncp_directory')->get_nodelist){
print $ncptype->find('ncp_file')->string_value;
print ' (' . $ncptype->find('@name') . ') ';
print $ncptype->find('ncp_file/@md5'), " ", $ncptype->find('ncp_fi
+le/@name'), "\n";
print "\n";
}
And here is an example of the quasi-gibberish that I'm getting as output for each directory level:
(x64) ALSOO431VHGO2x825OF80GN8RNM605U9UMHOR3M1xEQIPMMRKFK3F0 PSCRIPT.HLPPSCRIPT.NTF
So it boils down to how do I change this odd output into something like "MD5 Name = File Name" for each file element? I have the feeling I might need another for-loop inside to deal with the files, I just can't figure out where to place it. Any insight would be very much appreciated!