http://www.perlmonks.org?node_id=684868


in reply to Re: Fixed Position Column Records
in thread Fixed Position Column Records

You don't need the x in the template. As you note applying a heuristic requires another pass through the data. Here is a simple append to LHS column one.... As you note there are failure cases whatever you do. Just leaving it simple and doing the column merge in Excel probably makes a lot of sense.

#! perl -w use strict; my (@templ, $templ); my $TEMPL = 'a'; my @lines = grep{! m/^\s*$/ }<DATA>; my $mask = ' ' x length $lines[0]; $mask |= $_ for @lines; push @templ, length($1) while $mask =~ m/(\S+(\s+|$))/g; $templ = $TEMPL. join $TEMPL, @templ; print "Naive $templ\n"; print join '|', unpack $templ, $_ for @lines; # heuristic to detect and remove column breaks giving null fields # this effectively assumes left justification and appends left # but you could make it trickier for my $line (@lines) { my @data = unpack $templ, $line; for my $i (1..$#data) { next unless $data[$i] =~ m/^\s*$/; $templ[$i-1] += $templ[$i]; # add to LHS column $templ[$i] = 0; # unset this column in template } } $templ = $TEMPL. join $TEMPL, grep{$_}@templ; # need grep to skip 0's print "\nMunged $templ\n"; print join '|', unpack $templ, $_ for @lines; __DATA__ The First One Here Is Longer. Collie SN 2 62287630 77312 9387 +1 MVP A A Second (PART) here First In 20 MT 69287655 506666 6106 +6 RTD 3rd Person "Something" X&Y No SH 64287705 45423 5244 +3 RTE The Fourth Person 20 MLP 4000 60505504 3530 7220 +1 VRE The Fifth Name OR Something Twin 200 SH 69505179 3530 7220 +1 VRE B The Sixth Person OR Item MLP 60505174 3,530 72,20 +1 VRE 70 The Seventh Record MLP 64205122 3530 7220 +1 VRE The Eighth Person MLP MLP 60545154 3530 722 +0 VRE

Output

Naive a30a12a3a2a10a8a8a4a2 The First One Here Is Longer. |Collie SN | |2 |62287630 |77312 +| 93871 |MVP |A A Second (PART) here |First In 20 |MT | |69287655 |506666 +| 61066 |RTD | 3rd Person "Something" |X&Y No SH | | |64287705 |45423 +| 52443 |RTE | The Fourth Person 20 |MLP 4000 | | |60505504 |3530 +| 72201 |VRE | The Fifth Name OR Something |Twin 200 SH | | |69505179 |3530 +| 72201 |VRE |B The Sixth Person OR Item |MLP | | |60505174 |3,530 +|72,201 |VRE | 70 The Seventh Record |MLP | | |64205122 |3530 +| 72201 |VRE | The Eighth Person MLP |MLP | | |60545154 |3530 +| 7220 |VRE | Munged a30a17a10a8a8a6 The First One Here Is Longer. |Collie SN 2 |62287630 |77312 | +93871 |MVP A A Second (PART) here |First In 20 MT |69287655 |506666 | +61066 |RTD 3rd Person "Something" |X&Y No SH |64287705 |45423 | +52443 |RTE The Fourth Person 20 |MLP 4000 |60505504 |3530 | +72201 |VRE The Fifth Name OR Something |Twin 200 SH |69505179 |3530 | +72201 |VRE B The Sixth Person OR Item |MLP |60505174 |3,530 |7 +2,201 |VRE 70 The Seventh Record |MLP |64205122 |3530 | +72201 |VRE The Eighth Person MLP |MLP |60545154 |3530 | + 7220 |VRE