http://www.perlmonks.org?node_id=626045


in reply to Re: Looking for Perl Elegance!
in thread Looking for Perl Elegance!

Well the input will stream in line by line and will look something like this:


asdf
asdfg=4eafvasdfadsf
ashfasdf
asdf qer qwer
asd as dsasdi weeiwer dfhjTITLE#How are you#asdfads
asdfa

asdg
rt
wqrqw
re

DATA-A#item1#asdfdasfdasdasDATA-B#item2#
asdfda
dasfa


asdfdas
DATA-C#item3#
aasdfDATA-A#item1a#DATA-B#item2b#

asd
asdf
asdDATA-C#item3c#
asdf asdf3132 adsf TITLE#I am fine#ads fadsfdasfdas

The do come in order but they may not have a title. I need to store each item as it comes and I won't know when I am done until I get to the end of the stream. There will always be a DATA-A DATA-B and a DATA-C which will be associated with the last TITLE. so I can do this:
if( $line =~ /TITLE/) { $line =~ s/TITLE#(.*)#/$1/; chomp($1); } else if ( $line =~ /DATA-A/) { $line =~ s/DATA-A#(.*)#/$1/; chomp($1); }
ect but that doesn't strike me as an elegant way to do this. The output would be like using a print statement. I was looking at using perlform for the output. To do that, I will need to get all the data first, figure out the max length of each item like:
max length of 'how are you', 'i am fine'
max length of 'item1', 'item1a', 'item4'
max length of 'item2', 'item2b', 'item5'
max length of 'item3', 'item3c', 'item6'
then use that to format the output. I can't get a good example on the forum here without using html which might be part of the confusion. No CSV, no sorting...just display it on the screen using a print statement but make it pretty like a table.
I hope that answers the questions.
THANKS for the help on this.

Replies are listed 'Best First'.
Re^3: Looking for Perl Elegance!
by TGI (Parson) on Jul 11, 2007 at 18:32 UTC

    For your table formatting, take a look at Text::Table.

    Here's a quick take at some code to extract the data from your files. Take a close look at your regexes, they don't need to be substitutions, and the use of the greedy '.*' will cause you to get wrong values on lines with multiple tags.

    # Assume a FIELD#...# string can't be split across lines. use strict; use warnings; use Data::Dumper; my @DATA_FIELDS = ( 'TITLE', 'DATA-A', 'DATA-B', 'DATA-C', ); # Build a regex that matches all fields and extracts a value. my $all_fields = join '|', @DATA_FIELDS; my $DATA_REGEX = qr/($all_fields)#([^#]*)#/; my @data; # store all tag data my $title_data = {}; # reference to current title's data store. while (<DATA>) { # the /g while ( /$DATA_REGEX/g ) { my $field = $1; my $value = $2; print "$_ -> $field, $value\n"; if( $field eq 'TITLE' ) { # store previous title data set if not empty. push @data, $title_data if %$title_data; # start new title data set $title_data = { TITLE => $value }; } else { # store field data in current title data set. push @{ $title_data->{$field} }, $value; } } } # store final title data set push @data, $title_data if %$title_data; print Dumper \@data; __DATA__ asdf asdfg=4eafvasdfadsf ashfasdf asdf qer qwer asd as dsasdi weeiwer dfhjTITLE#How are you#asdfads asdfa asdg rt wqrqw re DATA-A#item1#asdfdasfdasdasDATA-B#item2# asdfda dasfa asdfdas DATA-C#item3# aasdfDATA-A#item1a#DATA-B#item2b# asd asdf asdDATA-C#item3c# asdf asdf3132 adsf TITLE#I am fine#ads fadsfdasfdas


    TGI says moo

Re^3: Looking for Perl Elegance!
by dsheroh (Monsignor) on Jul 11, 2007 at 17:59 UTC
    I'm not sure what to say to answer your actual questions, but I suspect that
    $line =~ s/DATA-A#(.*)#/$1/;
    may not do what you actually want. It removes "DATA-A#" and the final "#" from the line. So, for instance, given the line
    aasdfDATA-A#item1a#DATA-B#item2b#
    
    from your sample data, it would produce
    aasdfitem1a#DATA-B#item2b
    
    Assuming you're trying to extract the text "item1a" from the line, the regex you want is
    $line =~ /DATA-A#([^#]*)#/;
    which extracts "item1a" into $1 (without destroying the rest of the line, so the DATA-B will still be there to collect later). Using [^#]* instead of .* will cause it to stop capturing at the first # it sees instead of continuing to the last one.

    I suppose something like

    my %data = (); while ($line =~ /(TITLE|DATA-[ABC])#([^#]*)#/g) { $data{$1} = $2; handle_data($data{TITLE}, $data{DATA-A}, $data{DATA-B}, $data{DATA-C}) if $1 eq 'DATA-C'; }
    might be what you're looking for here, but I'm not entirely sure. Note the assumption that you identify the end of a data set by the appearance of a DATA-C element. handle_data would then either print the data, store it for later formatting, or whatever else may need to be done with it. Other initialization and/or sanity checking is probably needed unless your input stream is known to be perfect and will never, say, send an ITEM-C before all of the other elements have appeared.

    update: Added a forgotten ) in the last code fragment. This code is (obviously) untested. If it breaks, you get to keep both pieces.

Re^3: Looking for Perl Elegance!
by Ploux (Acolyte) on Jul 11, 2007 at 19:45 UTC
    You should use 'elsif' and not 'else if'.