http://www.perlmonks.org?node_id=1058795

joeymac has asked for the wisdom of the Perl Monks concerning the following question:

Hello Magnanimous Monks,

I am attempting to rewrite some Bash scripts that use a lot of awk pipes into Perl. I am looking for the best way to read in information from a repetitive text file. The text file looks something like this:

Some Field : some value Another Field: 1234 Different One: 5678 Yet Another : foo . . .

etc...

And then the "pattern" repeats (many times) in the text file. The end result is to print it all out in neat columns with headers at the top, something like:

something number1 number2 word some value 1234 5678 foo some else 4321 8765 bar

I started out putting the whole text into an array and grepping out what I want, put that was quickly becoming very messy. My second attempt is trying to extract multiple lines using the method:

while(<>){ if ( /Some Field/ ... /Yet Another/ ) { push (@array, $_); #do something with the info... } }

I found my self wanting to then put it into an array as above but that will most likely end up being the same messy situation.

I would greatly appreciate any ideas or suggestions about an easy way to extract the bits of info that I want and print them out into the desire column format (using sprintf I reckon). Also as a side note, the Perl will need to be somewhat portable (i.e. it'll run under Windows, Linux, and OSX OS's).

Thanks in advance!

Replies are listed 'Best First'.
Re: Extracting and manipulating a range of lines
by GrandFather (Saint) on Oct 18, 2013 at 22:49 UTC

    If you just want to print the table you don't need to store anything. If you need to manipulate the data (including using string lengths of the data to format the table nicely) then use an array of hashes.

    In any case, a neat way to manage the file is to use the first field name as the line end string so you can read an entire record at a hit:

    #!/usr/bin/perl use warnings; use strict; use 5.010; my $startField = "Some Field :"; my $format = "%-15s %8s %8s %6s\n"; printf $format, 'something', 'number1', 'number2', 'word'; local $/ = $startField; while (defined (my $rec = <DATA>)) { next if $rec eq $startField; # Skip empty first record my @fields = split "\n", $rec; my @values; $fields[0] = $startField . $fields[0]; # Restore field label for my $field (@fields[0 .. 3]) { my (undef, $value) = split ': ', $field, 2; push @values, $value; } printf $format, @values; } __DATA__ Some Field : some value Another Field: 1234 Different One: 5678 Yet Another : foo line of uninteresting stuff Some Field : some else Another Field: 4321 Different One: 8765 Yet Another : bar more junk Some Field : another value Another Field: 1122 Different One: 5566 Yet Another : baz

    Prints:

    something number1 number2 word some value 1234 5678 foo some else 4321 8765 bar another value 1122 5566 baz

    Note the local $/ = $startField; line.

    True laziness is hard work
      Elegant. Simply elegant.
Re: Extracting and manipulating a range of lines
by LanX (Saint) on Oct 18, 2013 at 19:11 UTC
    I think the data structure you want is an array of hashes.

    Parse every record into a hash {Some Field => some value, Another Field => 1234 , ... } and push that into an array.

    Your approach is already good, you just need to parse every line into a hash within the flip-flop and push them into an array outside the flip-flop (else-branch)

    After parsing you just need to iterate over the array and to print the data you wanted.¹

    HTH! =)

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    ¹) plz note: you can also use a AoA if you field infos are very regular.

Re: Extracting and manipulating a range of lines
by marinersk (Priest) on Oct 18, 2013 at 21:11 UTC
    AT first glance, I'm with LanX on this one (more or less).

    If you wish to preserve the order found in the source file, put the hashes inside an array. If you don't care about order, or want them sorted, I'd put the arrays inside the hashes.

    my %inpinf = (); while (my $inpbuf = <INPFIL>) { chomp $inpbuf; my ($inpkey, $inpval) = split /\s*\:\s*/, $inpbuf, 2; push @{$inpinf{$inpkey}}, $inpval; } C:\Steve\Dev\PerlMonks\P-2013-10-18@1437-DataMerge>type test2.dat Some Field : some value Another Field: 1234 Different One: 5678 Yet Another : foo Another Field: 9012 Different One: 3456 Yet Another : bar C:\Steve\Dev\PerlMonks\P-2013-10-18@1437-DataMerge>datamerge.pl test2. +dat Processing "test2.dat" %inpinf (C:\Steve\Perl/debug.pm:887(990)): [Another Field] => [ARRAY(0x4a8498)] [1234] [9012] [Different One] => [ARRAY(0x4bd6a8)] [5678] [3456] [Some Field] => [ARRAY(0x4a83f0)] [some value] [Yet Another] => [ARRAY(0x4bd768)] [foo] [bar]

    Full code (apologies for using my home-spun debugmodule rather than Data::Dumperbut I haven't switched over to  Data::Dumperyet and wanted a fast way to show you the resulting data structure.)

Re: Extracting and manipulating a range of lines
by Lennotoecom (Pilgrim) on Oct 20, 2013 at 21:17 UTC
    another solution:
    while ($l = <DATA>){ $_++ if $l=~m/Some Field/; $a[$_]{$1} = $2 if $l=~m/^(\w+\s\w+)\s*\:\s(\w+\s\w+|\w+)$/; $a[0]{$1} = $1; } foreach $a (@a){ printf "%-15s %-15s %-15s %-15s\n", map{${$a}{$_}} sort {$b =~ +m/S/}keys %$a; } __DATA__ Some Field : some value Another Field: 1234 Different One: 5678 Yet Another : foo line of uninteresting stuff Some Field : some else Another Field: 4321 Different One: 8765 Yet Another : bar more junk Some Field : another value Another Field: 1122 Different One: 5566 Yet Another : baz
    it would've been easier not to print the caption line)