So if the data format really _is_ fixed-width columns then something like dragonchild's code would work, using
my @column_widths = (57, 17, '*');
for the widths (check against the real column widths). Although to remove the leading _and_ trailing spaces from each piece I'd do something like:
my ($desc, $code, $other_thingy) = unpack $unpack_spec, $_;
foreach my $piece ($desc, $code, $other_thingy) {
$piece =~ s/^\s+//;
$piece =~ s/\s+$//;
}
(I think that's right, hmmm, testing with dragonchild's modified code ....)
# Change these to the actual column widths. Use a star at the end to g
+et the rest.
my @column_widths = ( 57, 17, '*');
my $unpack_spec = join ' ', map { "A$_" } @column_widths;
my %codes;
while (<DATA>)
{
chomp;
my ($desc, $code, $other_thingy) = unpack $unpack_spec, $_;
foreach my $piece ($desc, $code, $other_thingy) {
$piece =~ s/^\s+//;
$piece =~ s/\s+$//;
}
$codes{$code} = {
Description => $desc,
Other_Thing => $other_thingy,
};
}
my $choice = 'GMF';
print "$choice: $codes{$choice}{Description}\n";
$choice = 'G3311A2';
print "$choice: $codes{$choice}{Description}\n";
__DATA__
Total index B50001
Crude processing (capacity) B5610C
Primary & semifinished processing (capacity) B562A3C
Finished processing (capacity) B5640C
Manufacturing ("SIC") B00004
Manufacturing (NAICS) GMF
Durable manufacturing (NAICS) GMFD
Wood product G321
+ 321
Nonmetallic mineral product G327
+ 327
Primary metal G331
+ 331
Iron and steel products G3311A2
+ 3311,2
Fabricated metal product G332
+ 332
Machinery G333
+ 333
_ _ OUTPUT _ _
GMF: Manufacturing (NAICS)
G3311A2: Iron and steel products
|