http://www.perlmonks.org?node_id=109438

brassmon_k has asked for the wisdom of the Perl Monks concerning the following question:

I'm stuck fellow monks, Okay this is the first time I've run into non-fixed data. What I mean by that is that the data is in a block structure but certain lines of the data are on different lines most of the time.

Normally I'd use $/ to break down the blocks into paragraphs and print out $line2 or whatever line. However now I can't do that because let's say $line2 was dog but in my data dog could be on line 5 so if I said:
print "$line2";
I'd have it printing cat.

So I was wondering how with a data structure as below could you print out the following lines of data without knowing what line they'll be on. I was reading through the camel book and it sounds like I need to make a hash of arrays but the example wasn't what I wanted it prints out the ARRAY value like (0xcf02) some crazy stuff like that. I want it to print out the value I typed into it. Anyway do I have the right idea to make a hash of arrays for the below data in order to print out a line that isn't always in the same location. For example the lines: "interruptionTime", "chargeParty", "disconnectingParty". Lets say those are the 3 lines I wanted to print and in one record block "interruptionTime" could be on line 2 and in the next block line 18. Here is an example of the data:
PCSPLMNCallDataRecord mSTerminating callIdentificationNumber 2487067'D relatedCallNumber 2487060'D recordSequenceNumber 7410342'D exchangeIdentity "MAPP01E 0117802"'S mSCIdentification 1119207079800F'TBCD cellIDForFirstCellCalled 13F046044D55B5'H incomingRoute "BAPL01I"'S outgoingRoute "BAPL01O"'S callingPartyNumber 146084466642'TBCD calledPartyNumber 1116084466643F'TBCD iMSICalled 13600410005231F1'H mobileStationRoamingNumber 1116084469970F'TBCD redirectionCounter 0'D dateForStartOfCharge 1 8 28'BCD timeForTCSeizureCalled 170F25'H timeForStartOfCharge 23 15 44'BCD chargeableDuration 0 0 11'BCD trafficActivityCode 11 2 14'BCD teleServiceCode 11'H internalCauseAndLoc 0 3'BCD timeFromRegisterSeizureToStartOfCharging 0 0 9'BCD timeForStopOfCharge 23 15 55'BCD interruptionTime 0 0 0'BCD typeOfCallingSubscriber 1'D disconnectingParty 0'D chargedParty 1'D eosInfo 00'H callPosition 3'D originForCharging 1'D cellIDForLastCellCalled 13F046044D55B5'H locationNumberTerminating 114141079808F0'H tariffClass 10'D tariffSwitchInd 0'D firstAssignedSpeechCoderVersion 01'H speechCoderPreferenceList 0100'H firstRadioChannelUsed 00'H radioChannelProperty 01'H presentationAndScreeningIndicator 30'H
So any ideas on how I can achieve this, and is my idea headed in the right direction with a hash of arrays?
The Brassmon_k

Replies are listed 'Best First'.
Re: Non-fixed data in record
by dragonchild (Archbishop) on Aug 31, 2001 at 20:57 UTC
    Uhh... why not just make a hash of hashes of hashes. The outer hash's key would be 'PCSPLMNCallDataRecord'. The middle hash's key would be 'mSTerminating'. The inner hash's keys would be, for example, 'radioChannelProperty' and its value would be "00'H".

    To load it, you'd have:

    my %data; $data{PCSPLMNCallDataRecord}{mSTerminating}{radioChannelProperty} = "00'H";
    To access it, you'd have:
    my $value = $data{PCSPLMNCallDataRecord}{mSTerminating}{radioChannelPr +operty};

    ------
    We are the carpenters and bricklayers of the Information Age.

    Vote paco for President!

      Yes,

      That's good but here's why I can't give any of the keys a value after the "mSTerminating" line. Okay but let's say the line I want to print can have a possibility of 3 different values.

      So let's say the line you used "radioChannelProperty" can have 3 different values. Would I then constuct like this:
      my %data; $data{PCSPLMNCallDataRecord}{mSTerminating} {radioChannelProperty} = "00'H"; {radioChannelProperty} = "01'H"; {radioChannelProperty} = "02'H";
      The Brassmon_k
        Nope. You would do something like the following:
        my %data; @{$data{PCSPLMNCallDataRecord}{mSTerminating}{radioChannelProperty}} = ("00'H", "01'H", "02'H");
        And access it just like you would any array. Just the name is a little longer. :)

        ------
        We are the carpenters and bricklayers of the Information Age.

        Vote paco for President!

Re: Non-fixed data in record
by perrin (Chancellor) on Aug 31, 2001 at 20:59 UTC
    It looks like a plain old hash to me. The left column is all the keys and the right is their corresponding values. If there are lots of records like this one in one file, you might use an array of hashes, or you could just read them in and deal with them one at a time. It sounds like you need to read up on references a bit more to understand how to do nested data structures like an array of hashes.
      That is very true,

      Everything else about PERL I pretty much get (atleast what I've seen so far) but hashes and arrays I realize are vital but I'm kind of sketchy maybe really lacking the knowledge on how to construct a more complicated version of a hash or an array. I can do the basics for them but it gets hard for me later on. Any good sources where I can read up. The camel books examples just don't hit home.

      The Brassmon_k

        Advanced Perl Programming (the panther book) has an excellent chapter on implementing complex data structures. learning when to use what data structure will mostly come from experience though.

        anders pearson

        The Perl man pages are a good source, and there's lots of info on this site as well. Try a search for "references."
Re: Non-fixed data in record
by demerphq (Chancellor) on Aug 31, 2001 at 22:58 UTC
    Hi Brassmon_k. So parsing CellPhone CDR's are we? Hmmm.. I bet I have a module you would like. Funny though. In germany you could go to jail for posting that. (In fact i hope you jinked the data....)

    To answer you question what should you read, I would suggest just reading Perldata, perldsc (perl data structures cookbook) and perllol. That will explain data structures and the like. Also perlreftut might be a good idea, as understanding refrences is an essential part of data structures in perl.
    The solutions provided by other posters are pretty good (hash of hash of array)

    HTH

    Yves
    --
    You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)
      Goutentac heir demerphq,

      or it's goutentag I always forget the spelling in Deutsch. Anyway yes I jinked the data by changing numbers and I would very much very MUCH like that module, please....See I was always looking for a CDR module but didn't know where to get it.

      See by occupation I'm a UNIX sysadmin but I was forced into writing a bunch of DB scripts/forced into PERL (I like it though..I like it alot) Our programmer got fired for harrassment....long story...Anywho my usual job is maintaining systems and setting up disk mirrors installing OS and building mainframe systems and keeping up the networks/user/logins/the whole bunch of odd jobs sysadmins have and scripting is a part of it but not like this so that is why I'm kinda of fresh to all of this. Anyway I would love that module. PS. You know the "deasn9" program for translating a full CDR. Well I wouldn't have to suffer all of this if there were an HP-UNIX version. Ericsson only gave me one for Solaris UNIX and said they had no version for HP-UNIX. You wouldn't by chance have that also? Please give me the module... That is candy....A PERL module for CDR'S!!! Please! Even though I got the idea now it was through everybodies help. Thanks PERLMONKS! and especially the people that helped me. What I don't know is why when you can write something so simple that most of you give me this highly complex code atleast from my point of view you guys are used to it. Then again I'm not to sharp at explaining code needs through words. The guy that stated exactly what I wanted wrote a whole page code. Granted it probably worked and was very efficient (I will play with it to try to learn what he did) it was to complex for me so I made my own simple simon version that works great! I tested it already by delimiting it's results by msisdn and subheading and it works great.

      The Brassmonk

      I really want that module!
        Guten Tag

        Actually I am not Deutsche, just living here.

        Sorry, I shouldnt have raised your hopes so high. There isn't really a CDR module (that I know of), basically because there are a lot of different kinds of CDR. For different switches and the like. For instance I know of at least three or four kinds just where I work.
        No the module I was refering to is called Tie::Hash::Trie and will be uploaded to CPAN by the end of the weekend and it has some properties that you might find useful if you are dealing with CDRs. I will let you know when it has be uploaded. (sorry, for the lack of details, but im in the middle of writing the pod and dont want to have to do it twice. :-)

        What I don't know is why when you can write something so simple that most of you give me this highly complex code atleast from my point of view you guys are used to it

        I can sympathize but I wont agree. You see usually a 'simple' solution provides simple results, this means it may work for a limited set of cases but meltsdown when the going gets tuff. (Good example are Godzilla!'s posts on CLPM, usually they work for the problem at hand but really are not viable for long term scenarios)
        The more complex solutions, while a little harder to understand are usually much more robust, scalable and frankly usually more usable.

        I can understand that you dont want to use code that you dont understand, but I strongly suggest that you take the time to figure out the solutions that were posted. Also that you read those docs. Perhaps if you have access to a NT or w9x box have a look at the activestate distro? They have nice html pages of all the docs laid out in a very convienient form (similer to cpan.org etc) I have spent hours and hours and hours on those pages and still have more to learn.

        Anyway, good luck, and sorry for the confusion, yves

Re: Non-fixed data in record
by dmmiller2k (Chaplain) on Aug 31, 2001 at 22:13 UTC
    Calling this entire fragment of data a 'block,' just to put a name to it, and presuming for the moment that the string 'PCSPLMNCallDataRecord' identifies the beginning of every such block of data, and further that 'mSTerminating' is a label (say, like its type) that could be different in each such block, I might write something like this to read it into a perl data structure:
    my %PCSPLMNCallDataRecords; my $type_name; # type of the block we're reading my $block = undef; # place for block values while ( <> ) { # get each line somehow chomp; # no leading spaces means new block if ( /^PCSPLMNCallDataRecord/ ) { $type_name = undef; # next we expect a type name $block = undef; } # 3 leading spaces signals 'type' of block follows elsif ( /^\s{3}(\S.*)\s+$/ ) { next if ( $typename || $block ); # before we're ready $type_name = $1; # (e.g., 'mSTerminating') $block = {}; # next we expect data } # six leading spaces indicates data line elsif ( /^\s{6}(\S+)\s+(.+)\s*$/ ) { next if (! $typename || ! $block ); # before we're ready # lefthand column is the 'property' name. after first whitespace, # rest is data with trailing whitespace trimmed my ($key, $val) = ($1, $2); # first line in this block if ( scalar keys %$block == 0 ) { # if we don't already have an array create for this type, create + one now $PCSPLMNCallDataRecords{$type_name} = [] if not exists $PCSPLMNC +allDataRecords{$type_name}; # no values added yet, add hashref to array for this type only f +or first line push @{$PCSPLMNCallDataRecords{$type_name}}, $block; } $block->{$key} = $val; # add data to block (e.g., ' +callIdentificationNumber' => "2487067'D") } } # now you have the following structure in %PCSPLMNCallDataRecords: %PCSPLMNCallDataRecords = ( 'mSTerminating' => [ # array of 'mSTermina +ting' records { callIdentificationNumber => '2487067\'D', relatedCallNumber => '2487060\'D', recordSequenceNumber => '7410342\'D', exchangeIdentity => '"MAPP01E 01178 +02"\'S', mSCIdentification => '1119207079800F +\'TBCD', cellIDForFirstCellCalled => # etc ... }, { #... next mSTerminating record ... } ], 'mSOtherType' => [ array of 'mSOtherType' records, . +.. ] ); # an array of hashes with all of the 'mSTerminating' records: my @mSTerminating = @{$PCSPLMNCallDataRecords{mSTerminating}}; # the first 'mSTerminating' record in the array my %mSTerminating = %{$mSTerminating[0]}; # OR my %mSTerminating = %{$PCSPLMNCallDataRecords{mSTerminating}[0]};
    Hope this helps ...

    dmm

    Just call me the Anti-Gates ...
    
      Wow,

      You're good! I mean, I understand it sort of. OK, you interpretted what I said exactly the righ way. That "PCSPLMNCallDataRecords" is at the top of every record block and then it is immediately followed by a subheading such as: mSTerminating, mSOriginating, transit, and like 3 others and everything below that are data lines that can be in different positions by 3 to 4 lines or more.

      I know you understand the question and posted a good answer but I have to understand the answer. What ever happened to 2+2=4. Wow, see that's why I have a hard time with hashes and arrays.
      The Brassmon_k
        brassmon_k ... Instead of starting with something as complex as your problem seems to be, why not start with something really simple and actually understand how Perl does multi-level data structures? You seem to want to eat your cake and have it, too.

        Perl understanding does not come very quickly, nor very easily. You have to be willing to work at it, just like anything else. Perl has a very deceiptfully easy learning curve, but it still has all the complexities of a C or a LISP.

        Another thing, and this is much more nitpicky - I would never classify someone who didn't understand multi-level data structures as intermediate in any language, especially not Perl. Look at it as your graduation-from-beginner test. You don't even need to understand how references work (though it helps, in the long run). Look back at my answer and try to understand it.

        ------
        We are the carpenters and bricklayers of the Information Age.

        Vote paco for President!

        Thanks for the compliment. I see what you mean, though. What seemed to me to be a fairly simple deconstruction of the data turns into a fairly complex multilevel perl data structure.

        The trick to understanding these (at least what worked for me) is to realize that arrays (and hashes) can only store scalars, not other arrays (or hashes).

        Conveniently enough, references to arrays (or hashes) ARE scalars. So rather than an array of hashes (or hash of arrays, or whatever), you can have an array of hashrefs (or a hash of arrayrefs, etc.).

        Looking at your data, the first thing that struck me (as it apparently did to several other respondents) was that the lines below 'mSTerminating' appeared to suggest a hash (key/value pairs).

        Surmising (apparently correctly) that 'mSTerminating' was a type or descriptor name (i.e., that there might be more than one instance of it), I figured that there should be an array of references to the above hashes.

        Finally, we need a hash to associate a reference to this array with the string 'mSTerminating'.

        So the data itself suggests the structure: a hash which associates types (e.g., 'mSTerminating') with (a reference to) an array of instances, themselves represented by hashrefs.

        Hope this helps.

        dmm

        Just call me the Anti-Gates ...