Extracting and manipulating a range of lines

joeymac has asked for the wisdom of the Perl Monks concerning the following question:

Hello Magnanimous Monks,

I am attempting to rewrite some Bash scripts that use a lot of awk pipes into Perl. I am looking for the best way to read in information from a repetitive text file. The text file looks something like this:

Some Field   : some value
Another Field: 1234
Different One: 5678
Yet Another  : foo
.
.
.
[download]

etc...

And then the "pattern" repeats (many times) in the text file. The end result is to print it all out in neat columns with headers at the top, something like:

something   number1   number2   word
some value    1234    5678       foo
some else     4321    8765       bar
[download]

I started out putting the whole text into an array and grepping out what I want, put that was quickly becoming very messy. My second attempt is trying to extract multiple lines using the method:

while(<>){
    if ( /Some Field/ ... /Yet Another/ ) {
        push (@array, $_);
        #do something with the info...
    }
}
[download]

I found my self wanting to then put it into an array as above but that will most likely end up being the same messy situation.

I would greatly appreciate any ideas or suggestions about an easy way to extract the bits of info that I want and print them out into the desire column format (using sprintf I reckon). Also as a side note, the Perl will need to be somewhat portable (i.e. it'll run under Windows, Linux, and OSX OS's).

Thanks in advance!

Comment on Extracting and manipulating a range of lines Select or Download Code

Replies are listed 'Best First'.
Re: Extracting and manipulating a range of lines by GrandFather (Saint) on Oct 18, 2013 at 22:49 UTC
If you just want to print the table you don't need to store anything. If you need to manipulate the data (including using string lengths of the data to format the table nicely) then use an array of hashes. In any case, a neat way to manage the file is to use the first field name as the line end string so you can read an entire record at a hit: #!/usr/bin/perl use warnings; use strict; use 5.010; my $startField = "Some Field :"; my $format = "%-15s %8s %8s %6s\n"; printf $format, 'something', 'number1', 'number2', 'word'; local $/ = $startField; while (defined (my $rec = <DATA>)) { next if $rec eq $startField; # Skip empty first record my @fields = split "\n", $rec; my @values; $fields[0] = $startField . $fields[0]; # Restore field label for my $field (@fields[0 .. 3]) { my (undef, $value) = split ': ', $field, 2; push @values, $value; } printf $format, @values; } __DATA__ Some Field : some value Another Field: 1234 Different One: 5678 Yet Another : foo line of uninteresting stuff Some Field : some else Another Field: 4321 Different One: 8765 Yet Another : bar more junk Some Field : another value Another Field: 1122 Different One: 5566 Yet Another : baz [download] Prints: `something number1 number2 word some value 1234 5678 foo some else 4321 8765 bar another value 1122 5566 baz` [download] Note the `local $/ = $startField;` line. True laziness is hard work	[reply] [d/l] [select]
Re^2: Extracting and manipulating a range of lines by marinersk (Priest) on Oct 19, 2013 at 07:50 UTC
Elegant. Simply elegant.	[reply]
Re: Extracting and manipulating a range of lines by LanX (Saint) on Oct 18, 2013 at 19:11 UTC
I think the data structure you want is an array of hashes. Parse every record into a hash `{Some Field => some value, Another Field => 1234 , ... }` and push that into an array. Your approach is already good, you just need to parse every line into a hash within the flip-flop and push them into an array outside the flip-flop (else-branch) After parsing you just need to iterate over the array and to print the data you wanted.š HTH! =) Cheers Rolf ( addicted to the Perl Programming Language) š) plz note: you can also use a AoA if you field infos are very regular.	[reply] [d/l]
Re: Extracting and manipulating a range of lines by marinersk (Priest) on Oct 18, 2013 at 21:11 UTC
AT first glance, I'm with LanX on this one (more or less). If you wish to preserve the order found in the source file, put the hashes inside an array. If you don't care about order, or want them sorted, I'd put the arrays inside the hashes. my %inpinf = (); while (my $inpbuf = <INPFIL>) { chomp $inpbuf; my ($inpkey, $inpval) = split /\s\:\s/, $inpbuf, 2; push @{$inpinf{$inpkey}}, $inpval; } C:\Steve\Dev\PerlMonks\P-2013-10-18@1437-DataMerge>type test2.dat Some Field : some value Another Field: 1234 Different One: 5678 Yet Another : foo Another Field: 9012 Different One: 3456 Yet Another : bar C:\Steve\Dev\PerlMonks\P-2013-10-18@1437-DataMerge>datamerge.pl test2. +dat Processing "test2.dat" %inpinf (C:\Steve\Perl/debug.pm:887(990)): [Another Field] => [ARRAY(0x4a8498)] [1234] [9012] [Different One] => [ARRAY(0x4bd6a8)] [5678] [3456] [Some Field] => [ARRAY(0x4a83f0)] [some value] [Yet Another] => [ARRAY(0x4bd768)] [foo] [bar] [download] Full code (apologies for using my home-spun `debug`module rather than `Data::Dumper`but I haven't switched over to `Data::Dumper`yet and wanted a fast way to show you the resulting data structure.) Read more... (1151 Bytes)	[reply] [d/l] [select]
Re: Extracting and manipulating a range of lines by Lennotoecom (Pilgrim) on Oct 20, 2013 at 21:17 UTC
another solution: while ($l = <DATA>){ $_++ if $l=~m/Some Field/; $a[$_]{$1} = $2 if $l=~m/^(\w+\s\w+)\s*\:\s(\w+\s\w+\|\w+)$/; $a[0]{$1} = $1; } foreach $a (@a){ printf "%-15s %-15s %-15s %-15s\n", map{${$a}{$_}} sort {$b =~ +m/S/}keys %$a; } __DATA__ Some Field : some value Another Field: 1234 Different One: 5678 Yet Another : foo line of uninteresting stuff Some Field : some else Another Field: 4321 Different One: 8765 Yet Another : bar more junk Some Field : another value Another Field: 1122 Different One: 5566 Yet Another : baz [download] it would've been easier not to print the caption line)	[reply] [d/l]

Back to Seekers of Perl Wisdom