Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Help me write a good reg-exp for this text

by shenme (Priest)
on Sep 05, 2003 at 16:44 UTC ( [id://289273]=note: print w/replies, xml ) Need Help??


in reply to Help me write a good reg-exp for this text

The example key you give ('GMF') to show what you want to key on is perhaps too simple? What do you want to use for one of your more complicated lines like: (some spaces removed)
    Iron and steel products       G3311A2     3311,2

What would you be wanting to use as the key? I might imagine at least three possibilities:

  • everything in those fixed columns, thus 'G3311A2     3311,2' including the spaces between the strings,
  • just one part of all that, such as just 'G3311A2',
  • both parts individually as alternate keys 'G3311A2' and '3311,2'

dragonchild uses $other_thingy to capture the '3311,2' separately. But what really should be done with that part?

Replies are listed 'Best First'.
Re: Re: Help me write a good reg-exp for this text
by waxmop (Beadle) on Sep 05, 2003 at 17:06 UTC

    You're right; I was ambiguous. For this line:

    Iron and steel products G3311A2 3311,2
    The key should be 'G3311A2' and the value should be 'Iron and steel products'. The '3311,2' information is not needed by me.
      So if the data format really _is_ fixed-width columns then something like dragonchild's code would work, using
      my @column_widths = (57, 17, '*');
      for the widths (check against the real column widths).   Although to remove the leading _and_ trailing spaces from each piece I'd do something like:
      my ($desc, $code, $other_thingy) = unpack $unpack_spec, $_; foreach my $piece ($desc, $code, $other_thingy) { $piece =~ s/^\s+//; $piece =~ s/\s+$//; }
      (I think that's right, hmmm, testing with dragonchild's modified code ....)
      # Change these to the actual column widths. Use a star at the end to g +et the rest. my @column_widths = ( 57, 17, '*'); my $unpack_spec = join ' ', map { "A$_" } @column_widths; my %codes; while (<DATA>) { chomp; my ($desc, $code, $other_thingy) = unpack $unpack_spec, $_; foreach my $piece ($desc, $code, $other_thingy) { $piece =~ s/^\s+//; $piece =~ s/\s+$//; } $codes{$code} = { Description => $desc, Other_Thing => $other_thingy, }; } my $choice = 'GMF'; print "$choice: $codes{$choice}{Description}\n"; $choice = 'G3311A2'; print "$choice: $codes{$choice}{Description}\n"; __DATA__ Total index B50001 Crude processing (capacity) B5610C Primary & semifinished processing (capacity) B562A3C Finished processing (capacity) B5640C Manufacturing ("SIC") B00004 Manufacturing (NAICS) GMF Durable manufacturing (NAICS) GMFD Wood product G321 + 321 Nonmetallic mineral product G327 + 327 Primary metal G331 + 331 Iron and steel products G3311A2 + 3311,2 Fabricated metal product G332 + 332 Machinery G333 + 333 _ _ OUTPUT _ _ GMF: Manufacturing (NAICS) G3311A2: Iron and steel products

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://289273]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (8)
As of 2024-03-28 15:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found