Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Parse tags into an array

by dragotony (Initiate)
on Jan 15, 2013 at 15:59 UTC ( [id://1013418]=perlquestion: print w/replies, xml ) Need Help??

dragotony has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am writing a PERL code. The code is given a file which has data in the format of :

<Tag Name> <Tag value>

e.g. :

FIRSTNAME DAVID

Now, the file to be processed by the PERL code has data something like this :

TAGSTART
FIRSTNAME DAVID
LASTNAME RHODES
<some other tags>
<some other tags>
ACCOUNT 1000901
<some other tags>
<some other tags>
BILLAMOUNT 4200
<some other tags>
<some other tags>
TAGEND


The entire set of tags from TAGSTART through TAGEND are repeated multiple times in a file. The number of times of repetition is not fixed in a file.

The PERL code is required to process the file such that we have 1 array of the following pairs :

"ACCOUNT" tag value - "BILLAMOUNT" tag value

Could please point out any approach that would be the best to achieve this ?

Thanks.

Replies are listed 'Best First'.
Re: Parse tags into an array
by BrowserUk (Patriarch) on Jan 15, 2013 at 16:33 UTC

    Here's a simple method that doesn't require you to know what tags are in each section, or even what order the tags you are interested in appear. It is also easily extended to handle more tags of interest. The output is an array of hashes:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; my @accts; { local $/ = 'TAGEND'; while( <DATA> ) { m[ (?=.* FIRSTNAME \s+ ( \S+ ) ) (?=.* LASTNAME \s+ ( \S+ ) ) (?=.* ACCOUNT \s+ ( \S+ ) ) (?=.* BILLAMOUNT \s+ ( \S+ ) ) ]xsm and push @accts, { firstname => $1, lastname => $2, account => $3, billamount => $4, }; } } pp \@accts; __DATA__
    TAGSTART FIRSTNAME DAVID LASTNAME RHODES UNKNOWNTAG SOMEVALUE UNKNOWNTAG SOMEVALUE UNKNOWNTAG SOMEVALUE ACCOUNT 1000901 UNKNOWNTAG SOMEVALUE UNKNOWNTAG SOMEVALUE BILLAMOUNT 4200 UNKNOWNTAG SOMEVALUE TAGEND TAGSTART LASTNAME RHODES UNKNOWNTAG SOMEVALUE ACCOUNT 1000902 BILLAMOUNT 10000 UNKNOWNTAG SOMEVALUE UNKNOWNTAG SOMEVALUE FIRSTNAME MARY UNKNOWNTAG SOMEVALUE TAGEND TAGSTART FIRSTNAME BILL LASTNAME HICKOK UNKNOWNTAG SOMEVALUE UNKNOWNTAG SOMEVALUE ACCOUNT 1000903 UNKNOWNTAG SOMEVALUE UNKNOWNTAG SOMEVALUE UNKNOWNTAG SOMEVALUE BILLAMOUNT 1200 TAGEND TAGSTART FIRSTNAME FRED UNKNOWNTAG SOMEVALUE LASTNAME BLOGGS UNKNOWNTAG SOMEVALUE ACCOUNT 1000909 UNKNOWNTAG SOMEVALUE BILLAMOUNT -1 UNKNOWNTAG SOMEVALUE TAGEND

    And the output:

    C:\test>junk47.pl [ { account => 1000901, billamount => 4200, firstname => "DAVID", lastname => "RHODES", }, { account => 1000902, billamount => 10000, firstname => "MARY", lastname => "RHODES", }, { account => 1000903, billamount => 1200, firstname => "BILL", lastname => "HICKOK", }, { account => 1000909, billamount => -1, firstname => "FRED", lastname => "BLOGGS", }, ]

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Parse tags into an array
by blue_cowdawg (Monsignor) on Jan 15, 2013 at 16:08 UTC
        Could please point out any approach that would be the best to achieve this ?

    What have you tried? What was the result?

    The approach I'd take would be to create a pseudo state machine that changes state based on input. If I was able to attach graphics to a PM post I'd give you a nice state diagram for that, but here goes with a text only version

            +------------------------------------+
            |                                   yes
            v                                    |
         start----->"TAGSTART" ---> intag ---> "TAGEND?" <------+
                                                 |              |
                                               process input ---+
    

    The bit "process input" you essentially read in a line, depending on the tag you just add it to a hash as appropriate.


    Peter L. Berghold -- Unix Professional
    Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg
Re: Parse tags into an array
by LanX (Saint) on Jan 15, 2013 at 19:55 UTC
    Hi

    I suppose you mean "associative" array, that is a "hash" in Perl.

    Instead of relying on nifty regexes to process whole records in one string I'd rather prefer using a flip-flop like demonstrated here: Re: Parsing file and joining content into string to read al tags from TAGSTART through TAGEND

    in the if part do something like

    @tag = split ' ', $line, 2; $field{ $tag[0] } = $tag[1];

    and in the else part

    $bill{ $field{ACCOUNT} } = $field{BILLAMOUNT} if %field; %field = ();

    IMHO this is a more robust and better maintainable approach.

    UPDATE:

    For instance if you ever need to retrieve all data per account just do

    $bill{ $field{ACCOUNT} } = { %field } if %field; %field = ();

    Cheers Rolf

Re: Parse tags into an array
by Anonymous Monk on Jan 16, 2013 at 09:08 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1013418]
Approved by bitingduck
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-04-24 22:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found