http://www.perlmonks.org?node_id=175760


in reply to Complex Data Structures

I sooo agree with everyone that stated the use of Data::Dumper. Don't leave vim without it!

For me? Well, complex data structures have been something I've been getting to know intimately for this last project I've been working on. Mine haven't been nearly as daunting as others' here probably, but they have been challenging.

I've been developing a script the purpose of which is to stage literally hundreds of thousands of files to a temporary location to be archived off to CD. The tough part was that these files are stored in locations that describe where we got the files (I work for a telco so by this I mean as far as which city they came from, what kind of carrier they are, what kind of billing we did on this file, etc). Additionally, there are files in those directories that were used to process the files we really need to archive off. Now, the most important part was that these files usually have datestamps embedded in the file names. Oh, but we can't have a universal naming scheme...nooo thus the date formats differ from file to file and indeed, there are even some files that have no datestamps in them at all. We stage the files based on where the files come from (or file type) and what the date of the file is.

Alright, so my complex data structures come from two things. The first structure comes from the config file I invented for the script to read in order to know where to find the files, what kind of files they are, and where the files should go. That structure is a hash ref of a hash of hashes of arrays. The next structure I used is the actual file data such as filename, source, dest, date, and type which is only a hash ref of a hash of arrays.

The data structure was not nearly as daunting as code I had to write to be able to keep the date formats for the destinations uniform. Since the dates in the filenames were not uniform, I had to devise a way to be able to read the dates in and match them up to a date format specified in the config file on a per RE basis. I then took that format and that date and reformatted the date to conform to a date mask we use when we create the directories for archiving. Geez, that took me a long time to work out. It works great in the context of the script but lacks (easy to break) if used from command line.

Here is an example of the config file:

## Set up default environment ## allowed options for define{} blocks # source # destination # unkdest # rule_once define { source = "/u20/home/gvc/gvc_dtfr" # Location of source p +orts. destination = "/u90/gvc_archive/new" # Destination for all +ports. unkdest = "/u90/gvc_archive/tmp" # Where should files go + if # a destination cannot + be # determined or create +d for # a port. default_mask= "mmddyyyy" # Default date mask fo +r the # dest directories. # If this rule is not NULL then only the rules # specified in this variable will be run and # the rest of the rules will be ignored. # See the conf.doc for more on this. # This next should be a rule or a # list of rules delimited commas. rule_once = "5" } #####*******##### ## default macros #####*******##### ## allowed options for rules! # source # destination # unkdest # test_only # regex # port # macro # Arbor AMA files. # matches F*-P*.####.ama # datefield is the number between P*. and .ama macro arbor_ama { regex = "F.*?-P.*?\.(\d+)\.ama:::$1:::mmdd" } rule 5 { port = "150,152" macro = "arbor_ama,usl1,usl2,uslnull,rpt,arbor1_1"; }
Then part of the code where I am working through one of these structures is:
# Walk the hash of complex data structures called $macros # and $rules. # starting the traversal of %$rules while ( my ($rule_key,$rule_val) = each(%$rules) ) { # Now, breaking out the hash references from $rule_val while ( my ($nkey,$nval) = each(%$rule_val) ) { # Simple enough, if the key is "macro" then we have found # our macros in the complex data structure. if ( $nkey eq "macro" ) { # Now, we need to start to traverse the %$macros structure # and we will do the merge. while ( my ($macro_key,$macro_val) = each(%$macros) ) { # Now, walk the array reference that was contained in the # reference $nval (which is a reference to an array) for ( @$nval ) { # Now, if the key from the macro hash matches the # rule that is referencing a macro then... if ( $macro_key eq $_ ) { # Replace the macro name with the actual regex from the # macro. map { push(@{$$rules{$rule_key}{regex}},$_); } @{$$macros{$_}{regex}}; # Now that we have the macros mapped to the rules # we can drop the macros from the rules hashes # since they are dead weight now anyway.. delete(%{$rule_val}->{macro}); } } } } } }
Well, this is probably way more than you asked for but this project is way fresh on my mind and so I couldn't stop myself. :)

To sum up I have come to really respect complex data structures. I have found that they can seriously shorten a task if they are used properly. I really can't come up with words to express how much I appreciate complex data structures. They are great!

_ _ _ _ _ _ _ _ _ _
- Jim
Insert clever comment here...