Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Parse tags into an array

by dragotony (Initiate)
on Jan 15, 2013 at 15:59 UTC ( #1013418=perlquestion: print w/replies, xml ) Need Help??
dragotony has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am writing a PERL code. The code is given a file which has data in the format of :

<Tag Name> <Tag value>

e.g. :


Now, the file to be processed by the PERL code has data something like this :

<some other tags>
<some other tags>
ACCOUNT 1000901
<some other tags>
<some other tags>
<some other tags>
<some other tags>

The entire set of tags from TAGSTART through TAGEND are repeated multiple times in a file. The number of times of repetition is not fixed in a file.

The PERL code is required to process the file such that we have 1 array of the following pairs :

"ACCOUNT" tag value - "BILLAMOUNT" tag value

Could please point out any approach that would be the best to achieve this ?


Replies are listed 'Best First'.
Re: Parse tags into an array
by BrowserUk (Pope) on Jan 15, 2013 at 16:33 UTC

    Here's a simple method that doesn't require you to know what tags are in each section, or even what order the tags you are interested in appear. It is also easily extended to handle more tags of interest. The output is an array of hashes:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; my @accts; { local $/ = 'TAGEND'; while( <DATA> ) { m[ (?=.* FIRSTNAME \s+ ( \S+ ) ) (?=.* LASTNAME \s+ ( \S+ ) ) (?=.* ACCOUNT \s+ ( \S+ ) ) (?=.* BILLAMOUNT \s+ ( \S+ ) ) ]xsm and push @accts, { firstname => $1, lastname => $2, account => $3, billamount => $4, }; } } pp \@accts; __DATA__

    And the output:

    C:\test> [ { account => 1000901, billamount => 4200, firstname => "DAVID", lastname => "RHODES", }, { account => 1000902, billamount => 10000, firstname => "MARY", lastname => "RHODES", }, { account => 1000903, billamount => 1200, firstname => "BILL", lastname => "HICKOK", }, { account => 1000909, billamount => -1, firstname => "FRED", lastname => "BLOGGS", }, ]

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Parse tags into an array
by blue_cowdawg (Monsignor) on Jan 15, 2013 at 16:08 UTC
        Could please point out any approach that would be the best to achieve this ?

    What have you tried? What was the result?

    The approach I'd take would be to create a pseudo state machine that changes state based on input. If I was able to attach graphics to a PM post I'd give you a nice state diagram for that, but here goes with a text only version

            |                                   yes
            v                                    |
         start----->"TAGSTART" ---> intag ---> "TAGEND?" <------+
                                                 |              |
                                               process input ---+

    The bit "process input" you essentially read in a line, depending on the tag you just add it to a hash as appropriate.

    Peter L. Berghold -- Unix Professional
    Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg
Re: Parse tags into an array
by LanX (Chancellor) on Jan 15, 2013 at 19:55 UTC

    I suppose you mean "associative" array, that is a "hash" in Perl.

    Instead of relying on nifty regexes to process whole records in one string I'd rather prefer using a flip-flop like demonstrated here: Re: Parsing file and joining content into string to read al tags from TAGSTART through TAGEND

    in the if part do something like

    @tag = split ' ', $line, 2; $field{ $tag[0] } = $tag[1];

    and in the else part

    $bill{ $field{ACCOUNT} } = $field{BILLAMOUNT} if %field; %field = ();

    IMHO this is a more robust and better maintainable approach.


    For instance if you ever need to retrieve all data per account just do

    $bill{ $field{ACCOUNT} } = { %field } if %field; %field = ();

    Cheers Rolf

Re: Parse tags into an array
by Anonymous Monk on Jan 16, 2013 at 09:08 UTC

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1013418]
Approved by bitingduck
[ambrus]: "could evacuate to open schoolyard and then call or go back to school" => wait what? like power cycling the school building by removing all the kids and then reentering them? how would that fix the problems?

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (8)
As of 2017-01-18 12:36 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (161 votes). Check out past polls.