Oh wise and benevolent Monks, I beseech thee to help me with this Perl conundrum:
I need to retire a 3rd party online document warehouse application, and I'm getting stuck on a specific problem. The data is stored in a hierarchy resembling a Windows file tree, like:
folder
|--folder2
|--folder3
| |--folderX
| |--folderY
|-folder4
I need to recreate that hierarchy in a UNIX path structure. There are tools provided by the application to extract the hierarchy, but their information is very segmented. The 2 tools I have are:
Folder info: will give me the folder id, name, and number of subfolders under it, like so:
Folder 4464 - foldername_1.
count subfolders 0.
Folder 4465 - foldername_2.
count subfolders 0.
Folder 4466 - foldername_3.
count subfolders 4.
...
Folder/Folder info: will give me folder ID's for each sub-folder under each folder, like so:
Folder 1298 - foldername_ten.
subfolder 1299.
subfolder 1300.
Folder 1299 - foldername_eleven.
No sub folders.
Folder 1300 - foldername_twelve.
No sub folders.
Folder 1311 - foldername_thirteen.
subfolder 1317.
subfolder 1318.
subfolder 1958.
Based on this data, I wrote a script that would first collect the folder ID's and names. For each folder ID, it would then build a UNIX path by searching for the folder's parent folder, and then the parent for that parent folder recursively back to the root folder as shown here:
# %folders has each folder ID as a key and the name as the value
# @subfolders is the Folder/Folder data as shown above, line-for-l
+ine
foreach my $k (sort (keys (%folders))) {
$folderpaths{$k} = &build_path($k,@subfolders);
print "$k => $folderpaths{$k}\n";
}
sub build_path($@) {
my $folderid = shift @_;
my @dumpff = @_;
my $path = "$folderid";
my $parentid = "";
foreach my $line (@dumpff) {
if ($line =~ /^Folder (\d{3,5})\s+\-\s+.*\./) {
$parentid = $1;
}
elsif ($line =~ /\s+subfolder\s+$folderid\s+\-\s+.*\./) {
$path = join('/', &build_path($parentid,@subfolders),$fold
+erid);
}
}
return $path;
}
The above code looked like it was working perfectly, but I saw lots of data was missing. I later discovered that there are a few folders that appear as children under multiple parent folders. My script can identify multiple parents for each node it's looking at, but it can only return 1 match.
My question is, how can I account for the multiple paths, and how can I identify those multiple paths so I know to make each duplicate path a symlink to the original when I actually build the UNIX filesystem?