Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Parsing a Tagged File Format

by arunhorne (Pilgrim)
on Apr 30, 2003 at 10:28 UTC ( [id://254232]=perlquestion: print w/replies, xml ) Need Help??

arunhorne has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I have a text file in the format below:

T1 Line1 T2 Line2 Line3 Line4 T3 Line5 Line6 ER

Basically it is a tagged file and I need to load it into a hash. As you can see it has potentially has more than one line per tag. I want to create a hash that maps each tag (tags are unique) to an array of the lines associated with that tag, i.e.

T2 => (Line2,Line3,Line4)

Tag names are always two letter but may be arbitrary and therefore cannot be hardcoded, however, the file is always terminated by a tag named ER.

Can anyone provide me with the code to translate this text into a hash to my requirements, I just can't seem to handle the multiline case.

Thanks in advance

____________
Arun

Replies are listed 'Best First'.
•Re: Parsing a Tagged File Format
by merlyn (Sage) on Apr 30, 2003 at 10:38 UTC
    Untested, but I usually get this stuff right... {grin}
    my $tag = "Invalid"; my %results; while (<>) { if (/^ER/) { last; } elsif (/^([A-Z]{2})\s+(.*)/) { # might need adjustment $tag = $1; push @{$results{$tag}}, $2; } elsif (/^\s+(.*)/) { # might need adjustment push @{$results{$tag}}, $1; } else { die "I don't understand $_"; } }
    and now @{$results{T2}} is qw(Line2 Line3 Line4).

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

Re: Parsing a Tagged File Format
by broquaint (Abbot) on Apr 30, 2003 at 10:40 UTC
    Maybe not exactly what you want, but this should get out started
    use Data::Dumper; my $data = <<TXT; T1 Line1 T2 Line2 Line3 Line4 T3 Line5 Line6 ER TXT my(%h, $k); for(split /\n/, $data) { last if /^ER\z/; /^(T\d+)/ and $k = $1; push @{ $h{$k} }, m< (?: T\d+ )? \s+ (.*) >x; } print Dumper(\%h); __output__ $VAR1 = { 'T1' => [ 'Line1' ], 'T2' => [ 'Line2', 'Line3', 'Line4' ], 'T3' => [ 'Line5', 'Line6' ] };
    You could probably compact that process into a regex and a map(), but that for and split() should be easier to maintain :)
    HTH

    _________
    broquaint

Re: Parsing a Tagged File Format
by Anonymous Monk on Apr 30, 2003 at 10:41 UTC
    #! perl -slw use strict; use Data::Dumper; my %data; my $current_key; while( <DATA> ) { my ($key, $value) = m[(^[\w]{2})?\s+(.*$)]; if( defined $key ) { last if $key eq 'ER'; push @{ $data{$key} }, $value; $current_key = $key; } else { push @{ $data{$current_key} }, $value; } } print Dumper \%data __DATA__ T1 Line1 T2 Line2 Line3 Line4 T3 Line5 Line6 ER
      Basically the same answer, but a little more golfed:
      use strict; use Data::Dumper; my %data; my $current_key; while( <DATA> ) { my ($key, $value) = m[^((?:\w{2})?)\s+(.*)] or die "Bad line: $_"; last if $key eq 'ER'; $current_key = $key || $current_key; die "No key defined" unless $current_key; push @{ $data{$current_key} }, $value; } print Dumper \%data __DATA__ T1 Line1 T2 Line2 Line3 Line4 T3 Line5 Line6 ER
      Updated. I should test these things first :-) (and its silly that (\w{2}?) doesn't DWIM) (and see Aristotle's answer further down for a very similar answer). Oh well).
Re: Parsing a Tagged File Format
by Aristotle (Chancellor) on Apr 30, 2003 at 14:23 UTC
    my $cur_tag; my %lines_for; while(<>) { chomp; my ($tag, $line) = split /\s+/, $_, 2; $cur_tag = $tag || $cur_tag or die "First input line has no tag"; last if $cur_tag eq 'ER'; push @{$lines_for{$cur_tag}}, $line; }

    Makeshifts last the longest.

Re: Parsing a Tagged File Format
by hossman (Prior) on May 01, 2003 at 01:00 UTC
    If you want a compact (and illegible) solution...
    #!/usr/local/bin/perl -w use strict; use Data::Dumper; undef $/; my %t = <DATA> =~ /(\w{2})\s+((?:.+\n)(?:\s{2}.+\n)*)/g; grep { $t{$_} = [split /\n\s*/, $t{$_}]; } keys %t; print Dumper(\%t); __DATA__ T1 Line1 T2 Line2 Line3 Line4 T3 Line5 Line6 ER
Re: Parsing a Tagged File Format
by Anonymous Monk on Apr 30, 2003 at 11:00 UTC
    What is
    T2 => (Line2,Line3,Line5)
    It's certainly not perl :)
      Pseudo-code that quite clearly represents a mapping between a key and a list of values. Hope this is clear. ____________
      Arun

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://254232]
Approved by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2024-03-19 02:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found