Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Parsing a Tagged File Format

by arunhorne (Pilgrim)
on Apr 30, 2003 at 10:28 UTC ( #254232=perlquestion: print w/ replies, xml ) Need Help??
arunhorne has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I have a text file in the format below:

T1 Line1 T2 Line2 Line3 Line4 T3 Line5 Line6 ER

Basically it is a tagged file and I need to load it into a hash. As you can see it has potentially has more than one line per tag. I want to create a hash that maps each tag (tags are unique) to an array of the lines associated with that tag, i.e.

T2 => (Line2,Line3,Line4)

Tag names are always two letter but may be arbitrary and therefore cannot be hardcoded, however, the file is always terminated by a tag named ER.

Can anyone provide me with the code to translate this text into a hash to my requirements, I just can't seem to handle the multiline case.

Thanks in advance

____________
Arun

Comment on Parsing a Tagged File Format
Select or Download Code
Replies are listed 'Best First'.
•Re: Parsing a Tagged File Format
by merlyn (Sage) on Apr 30, 2003 at 10:38 UTC
    Untested, but I usually get this stuff right... {grin}
    my $tag = "Invalid"; my %results; while (<>) { if (/^ER/) { last; } elsif (/^([A-Z]{2})\s+(.*)/) { # might need adjustment $tag = $1; push @{$results{$tag}}, $2; } elsif (/^\s+(.*)/) { # might need adjustment push @{$results{$tag}}, $1; } else { die "I don't understand $_"; } }
    and now @{$results{T2}} is qw(Line2 Line3 Line4).

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

Re: Parsing a Tagged File Format
by broquaint (Abbot) on Apr 30, 2003 at 10:40 UTC
    Maybe not exactly what you want, but this should get out started
    use Data::Dumper; my $data = <<TXT; T1 Line1 T2 Line2 Line3 Line4 T3 Line5 Line6 ER TXT my(%h, $k); for(split /\n/, $data) { last if /^ER\z/; /^(T\d+)/ and $k = $1; push @{ $h{$k} }, m< (?: T\d+ )? \s+ (.*) >x; } print Dumper(\%h); __output__ $VAR1 = { 'T1' => [ 'Line1' ], 'T2' => [ 'Line2', 'Line3', 'Line4' ], 'T3' => [ 'Line5', 'Line6' ] };
    You could probably compact that process into a regex and a map(), but that for and split() should be easier to maintain :)
    HTH

    _________
    broquaint

Re: Parsing a Tagged File Format
by Anonymous Monk on Apr 30, 2003 at 10:41 UTC
    #! perl -slw use strict; use Data::Dumper; my %data; my $current_key; while( <DATA> ) { my ($key, $value) = m[(^[\w]{2})?\s+(.*$)]; if( defined $key ) { last if $key eq 'ER'; push @{ $data{$key} }, $value; $current_key = $key; } else { push @{ $data{$current_key} }, $value; } } print Dumper \%data __DATA__ T1 Line1 T2 Line2 Line3 Line4 T3 Line5 Line6 ER
      Basically the same answer, but a little more golfed:
      use strict; use Data::Dumper; my %data; my $current_key; while( <DATA> ) { my ($key, $value) = m[^((?:\w{2})?)\s+(.*)] or die "Bad line: $_"; last if $key eq 'ER'; $current_key = $key || $current_key; die "No key defined" unless $current_key; push @{ $data{$current_key} }, $value; } print Dumper \%data __DATA__ T1 Line1 T2 Line2 Line3 Line4 T3 Line5 Line6 ER
      Updated. I should test these things first :-) (and its silly that (\w{2}?) doesn't DWIM) (and see Aristotle's answer further down for a very similar answer). Oh well).
Re: Parsing a Tagged File Format
by Aristotle (Chancellor) on Apr 30, 2003 at 14:23 UTC
    my $cur_tag; my %lines_for; while(<>) { chomp; my ($tag, $line) = split /\s+/, $_, 2; $cur_tag = $tag || $cur_tag or die "First input line has no tag"; last if $cur_tag eq 'ER'; push @{$lines_for{$cur_tag}}, $line; }

    Makeshifts last the longest.

Re: Parsing a Tagged File Format
by hossman (Prior) on May 01, 2003 at 01:00 UTC
    If you want a compact (and illegible) solution...
    #!/usr/local/bin/perl -w use strict; use Data::Dumper; undef $/; my %t = <DATA> =~ /(\w{2})\s+((?:.+\n)(?:\s{2}.+\n)*)/g; grep { $t{$_} = [split /\n\s*/, $t{$_}]; } keys %t; print Dumper(\%t); __DATA__ T1 Line1 T2 Line2 Line3 Line4 T3 Line5 Line6 ER
Re: Parsing a Tagged File Format
by Anonymous Monk on Apr 30, 2003 at 11:00 UTC
    What is
    T2 => (Line2,Line3,Line5)
    It's certainly not perl :)
      Pseudo-code that quite clearly represents a mapping between a key and a list of values. Hope this is clear. ____________
      Arun

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://254232]
Approved by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (10)
As of 2015-07-08 06:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (94 votes), past polls