Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Data Structures

by YYCseismic (Beadle)
on May 01, 2008 at 22:51 UTC ( #684046=perlquestion: print w/ replies, xml ) Need Help??
YYCseismic has asked for the wisdom of the Perl Monks concerning the following question:

I'm still rather new to Perl, and have been learning it as a requirement for maintaining a program at work. The program is used to load physical survey information (x,y,z coordinates) for seismic lines, allowing the option to write to a file afterwards. Each survey file (SEG-P1 format, for those who might know what this is) may contain data for multiple seismic lines.

My current plan is to store (x,y,z) information in a hash of hashes, and I'm curious if anyone has a better idea for storing these data.

The data model and implementation as I see it look something like this:

Line --+ | +-- Station | | | +---- Easting (x) | +---- Northing (y) | +---- Elevation (z) | +-- Length | +-- Group (Stn) Interval %Coord = ( $Line_1 => { stn_1 => coord_1, stn_2 => coord_2, stn_3 => coord_3, ... ... ... stn_n => coord_n, }, $Line_2 => { stn_1 => coord_1, stn_2 => coord_2, stn_3 => coord_3, ... ... ... stn_n => coord_n, }, # And so-on to $Line_n );
From what I understand, the coordinates would then be accessed as (for example)
$Easting{$Line}{$Station}

Does this look like a reasonable model/implementation?

Is there an easier, more efficient way to do this?

I got the idea for the hash of hashes by googling Perl Data Structures, and found the data structures cookbook by Tom Christiansen, which discusses just these topics.

Comment on Data Structures
Select or Download Code
Re: Data Structures
by ikegami (Pope) on May 01, 2008 at 23:06 UTC
    A hash with keys stn_1 .. stn_n? That should be an array! Or is that an example?

    Is there an easier, more efficient way to do this?

    What operation on the structure is too slow? Or is it that it's taking up too much memory?

Re: Data Structures
by BrowserUk (Pope) on May 01, 2008 at 23:43 UTC
    $Easting{$Line}{$Station}

    That implies you intend to have 3 hashes, one for each of x, y & z. Parallel data structures are not (generally) a good idea as they can get out of sync and you're stuffed. Going by the line diagram, you'd be better off with something that allowed you to do:

    my %line = ( lineNameA => { StationA => [ xxx.xx, yyy.yy, zzz.zz ], StationB => [ xxx.xx, yyy.yy, zzz.zz ], ... }, lineNameB => { ... ); my( $x, $y, $z ) = $line{ $lineName }{ $stationName }; # Or use constant{ X => 0, Y => 1, Z => 2 }; my $y = $line[ $lineNo ][ $staionNo ][ Y ];

    That's assuming the the identifiers are names not numbers.

    If they are numbers, or names of the form "line_003" and "station_12", (low, mostly sequential number postfixes ),

    then you'd save some memory and be a tad quicker to use arrays instead of hashes:

    my @line = ( [ ## $line[ 0 ] [ xxx.xxx, yyy.yyy, zzz.zzz ], ## $line[0][0] (station 0) [ xxx.xxx, yyy.yyy, zzz.zzz ], ## (station 1) [ xxx.xxx, yyy.yyy, zzz.zzz ], ... ], [ ## Line[ 1 ] [ xxx.xxx, yyy.yyy, zzz.zzz ], ## Line[ 1 ][ 0 ] [ xxx.xxx, yyy.yyy, zzz.zzz ], [ xxx.xxx, yyy.yyy, zzz.zzz ], ... ], ... ); my( $x, $y, $z ) = $line[ $lineNo ][ $stationNo ]; # Or use constant{ X => 0, Y => 1, Z => 2 }; my $y = $line[ $lineNo ][ $staionNo ][ Y ];

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      my( $x, $y, $z ) = $line{ $lineName }{ $stationName };
      my( $x, $y, $z ) = @{ $line{ $lineName }{ $stationName } };
      my( $x, $y, $z ) = $line[ $lineNo ][ $stationNo ];
      my( $x, $y, $z ) = @{ $line[ $lineNo ][ $stationNo ] };

      As I'm rather new to this kind of thing, I didn't realize that parallel data structures might not be a good idea. What you say makes sense, though.

      I think I'm more inclined to go with your second option:

      my @line = ( [ ## $line[ 0 ] [ xxx.xxx, yyy.yyy, zzz.zzz ], ## $line[0][0] (station 0) [ xxx.xxx, yyy.yyy, zzz.zzz ], ## (station 1) [ xxx.xxx, yyy.yyy, zzz.zzz ], ... ], [ ## Line[ 1 ] [ xxx.xxx, yyy.yyy, zzz.zzz ], ## Line[ 1 ][ 0 ] [ xxx.xxx, yyy.yyy, zzz.zzz ], [ xxx.xxx, yyy.yyy, zzz.zzz ], ... ], ... ); my( $x, $y, $z ) = $line[ $lineNo ][ $stationNo ];
      My plan, if you can call it that, was to hold the line names (identifiers) in an array, since they may or may not start with a numeric. An annotated example of a portion of a SEG-P1 file is given below.

      lineName stn. east...north...elev. 000301038 1260 52205121N109153806W 618485158009020 6626 000301038 1261 52205121N109153674W 618510158009027 6623 000301038 1262 52205120N109153542W 618535158009029 6621 000301016 400 52153542N109482654W 581401057903909 6738 000301016 401 52153542N109482522W 581426057903913 6738 000301016 402 52153542N109482390W 581451057903918 6738

      The stn, east, north, and elev indicate the columns in which those values (station number, easting, northing, and elevation) are found throughout the file. (Note the change in line number/identifier part way through.) Station identifiers are always numeric, and, while they are arbitrarily assigned, I would like for any access to be according to these numbers. For example,

      my( $x, $y, $z ) = $line[000301038][1261]; ... print "$x, $y, $z"; # Result: 6185101, 58009027, 6623
      One reason I thought about using hashes here is because they are essentially associative arrays, so I can have a hash with the line name as the key, instead of some arbitrary number as the key. So instead of accessing according to $line[0][0] for the first station of the first line, I would prefer to say something like $line{32A-5}[101] for the first station of line 32A-5. This way there is no "first" or "last" line, only first and last stations (which makes sense, seeing as the survey is generally linear).

        The only problem with using array rather than hashes, is that if, for example, all your line identifiers start with '0030nnnn', then using an array, you would have space allocated to 300,000 elements 00000000 .. 000299999 which would never be used, but would take up space. (This is what I meant above by "if your numbers are low and mostly sequential".).

        In this case, you would be much better off using hashes as a "sparse array". The same is true for your station numbers. With just three stations number 1250..1252 on the line 000301038, using hashes will definitely save you much memory.

        Note also that I made an error (pointed out by alexm in the post following mine) when I typed:

        my( $x, $y, $z ) = $line[000301038][1261];

        It should be

        my( $x, $y, $z ) = @{ $line[000301038][1261] };

        Or, if you go with hashes as I think you probably should having seen the real data:

        my( $x, $y, $z ) = @{ $line{ 000301038 }{ 1261 } }; ## and ## Assumes use constant { X=>0, Y=1, Z=>2 } my $thisX = $line{ $line }{ $stn }[ X ];

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        Given your example, what you want is a hash of arrays of hashes:
        my %data; push @{$data{$linename}}, { station => $station, coords => $coords, ea +sting => $easting, northing => $northing, elevation => $elevation };
        for my $linename ( keys %data ){ for my $entry ( @{$data{$linename}} ){ print "$linename: @{$entry}{qw(station easting northing elevation) +}; } }
        At least, that's what it looks like from your example, but you really haven't given enough information about the structure of those fields.
Re: Data Structures
by ashis (Initiate) on May 02, 2008 at 09:27 UTC
    Can you please try like this:
    $line ={'Station1'=>{'x'=>'','y'=>'','z'=>}, 'Station2'=>{'x'=>'','y'=>'','z'=>} }
    similarly if you start from line:
    $line ={'line1'=>={'Station1'=>{'x'=>'','y'=>'','z'=>}, 'Station2'=>{'x'=>'','y'=>'','z'=>} }, 'line2'=>{'Station1'=>{'x'=>'','y'=>'','z'=>}, 'Station2'=>{'x'=>'','y'=>'','z'=>} } }
Re: Data Structures
by roboticus (Canon) on May 02, 2008 at 11:08 UTC
    YYCseismic:
    Is there an easier, more efficient way to do this?

    Looking at the data structure alone isn't going to give you the answer--You need to look at how you use it. An efficient data structure for one algorithm might be terribly inefficient for another. The convenience of notation may similarly change.

    ...roboticus

      Agreed. But, consider also how this may be extended with new data elements in the future. Compare
      $Easting{$Line}{$Station}
      vs
      $Data{$Line}{$Station}{Easting}
      For example, if you were to add lat/long or one of the misc items to Station, the second structure would more easily accommodate that new data.

        Interesting. I had thought of using a triply-nested hash, but thought it might be too complex. I guess if the code is written properly, though, it shouldn't be a problem, right?

Re: Data Structures
by leocharre (Priest) on May 02, 2008 at 15:05 UTC

    I am going to approach question from a different angle.
    I am going to assume by "would then be accessed as" and by "Is there an easier, more efficient way to do this?" you may be talking about interface more then performance.

    "accessed as" wreaks of api, as does 'easier'. I'm going to imagine that efficient means you want to be able to remain efficient in your job as in you don't want to go insane in 6 months- and create something important that only you know how to use...

    I smell poop here.
    Perl Object Oriented Programming.

    I am trying to suggest that.. let your code be all nuts and creepy.. but.. provide an object oriented interface. Then later on you can worry about the innards and not change the interface.

    Wouldn't it be nicer to interact with ...

    use Seismic; my $s = new Seismic; my $lines = $s->lines; my $line0 = $s->line(0); my $line1 = $lines->[1]; my $stations = $line0->stations; my $station0 = $stations->[0]; my $station1 = $s->line(0)->station(1); my $x = $station0->easting; my $y = $station0->northing; my $z = $station0->elevation; my ($x,$y,$z) = $station1->xyz;

    You can then rearrange your code, change your hash hierarchy, and it won't change your scripts using it, etc!

    I've done some crazy hash stuff like this before- and looking back, this is stuff that's a nightmare to maintain interaction with. The interface really should be separated from the innards.

    This is what OO is for, this is what computers are for, not people, imho.

    Putting the extra week or two to learn some OO and implement it here- will save you countless hours later.

      Yes, I suppose I am talking more about interface here, rather than performance. I do want the program to be fast and efficient at what it does, but I also want it to be an easy-to-use and efficient tool for loading surveys. The previous version of the program looks to me more like a kludge rather than something that was well thought out with some semblance of design taken in to account.

      The possibility of using OO code had crossed my mind, but I was not sure I'd know how to do it properly. (I certainly do like the sample you have shown!) Fortunately, this project is not a priority, so I'm basically working on it during "down-time", which allows me to learn a lot in the process. Can you (or anyone) recommend any specific books for helping to learn poop?

        Very good attitude.

        Once you go over the learning curve, this is all going to make so much sense to you that you'll be jumping up and down throwing confetti at random strangers and nobody will understand... but that's ok.. we understand.

        It will not take long to get it, and seriously- it will be a blast. You can still have very intricate ways of associating and storing data but you can have an incredibly simple way to get to it.

        With OO, you think of the modules (classes) as blue prints, as instructions on how to build something. And objects are the thing made, the houses built from the blue print, alive- and this only happens when you 'instance' (new) from a module(class).

        You have your architect's plan, and you build from it 5 houses, they all have different addresses and people inside them, but you only needed one plan.

        Anywho, forget everything I said and try it out. You can get an example working and understood in a few hours. After that you will think you 'got it'- and then a few days later you'll start to more honestly get it. This is what I went through.

        Do you have the OReilly Porgramming Perl book? You should have a copy, you can get it off amazon for 10 bucks. You'll need it- it's very handy.

        This is (un?)fortunately going to be an adventure, learning OO. There is no one place to 'learn' it. You will have to scavenge various sources, from each place you will attain a new facet of understanding. And you will have to experiment endlessly- which you will probably do out of sheer curiosity if not out of need.

        Search the turorials section, there's really some very good stuff in there.

        Most important, do not get discouraged if it's weird. Don't give up on the concepts until you can tell yourself you really understand what the idea of oo is.

        I love good ol bash and straight empirical(is it?) functional programming - but for complicated stuff with thousands of lines of code...screw that! I need pOOp. (sorry... it's Friday...)

        (Please excuse the very loose non technical language in this post).

Re: Data Structures
by CountZero (Bishop) on May 02, 2008 at 15:50 UTC
    This is indeed a prime example of where one can put Object Oriented Programming to good use (leocharre++).

    Of course OOP is not for the faint hearted and has a lot of do's and don'ts.

    CPAN to the rescue!

    There are tons of modules dealing with Object Oriented Programming, but why not reach for the most advanced? Try Moose! At first it will seem like total overkill for your application, but you don't have to use all of its features (yet). It is wel laid out, has good documentation and simply works and allows you to extend it with its more advanced features as and when you need it.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        Where are you going to store your 1000 objects?
        In an object?

        And how are you going to access the individual instances you need?
        Isn't that what accessors are made for?

        Of course in Perl, objects are just eye-candy around a data-structure (hashes, arrays, or any combination) but if I follow your reasoning, I should program in Assembler as in the end it all is machine-code.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://684046]
Approved by almut
Front-paged by McDarren
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (8)
As of 2014-08-20 05:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (105 votes), past polls