Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

raw data formatting

by teamassociated (Acolyte)
on Nov 16, 2012 at 00:06 UTC ( #1004082=perlquestion: print w/ replies, xml ) Need Help??
teamassociated has asked for the wisdom of the Perl Monks concerning the following question:

Will you offer some help? Thx MUCH!!!!! I am trying to get to this, for every 5 lines of data in the raw file (which is one record), I need to output 6 lines into the output file. Every record should have 'F' be the last line, and 'A' should be the start of the next record. Line F needs to be created for each record and should contain the text 'acctnum=' and then use the first two pieces of data in line A to create the acctnum.

A 5##### 8 Z Axxxx xxxx, ANN 19793.43 02/02/2012 06/08/ +2012 32989.05 B 03###### S JOxxx xxxx, ADxxx .00 xx/xx/xxxx + 0 C 0 7 14## D xxxx ST #101 .00 02/08/2012 06/ +15/2012 NOV D 0 U MILWAUKEE, WI ##### 32989.05 02/08/2012 + 3 E 3##### MILWAUKEE WI .00 53214 + 12 F acctnum=508####8 A 50#### 8 Z ALLEN ,Nxxxxx 75.00 05/27/2012 06/05 +/2012 1845.00 B 037###### E Nxxxx ALLEN .00 09/xx/xxx 06/05/ +2012 MYT####### C 734####### S 15659 XXXX xxxxx .00 06/06/2012 06/1 +4/2012 MYT9##### NOV D 0 C TAYLOR,MI ##### 1845.00 05/27/2012 + 3 3 E 37##### S TAYLOR MI .00 48180 + 3 12 F acctnum=50##### A 50##### 9 Z Axxxx xxxxxx ,SUSETH 75.00 05/05/2012 05/15/2 +012 2045.00 B 0000000000 E SUSE xxxxxxx RIxxx .00 03/18/xxxx 05/15 +/2012 AJT9###### C 313###### 440 xxx xxxx .00 05/16/2012 06/1 +5/2012 AJT9#2###### NOV D 0 C LINCOLN PARK,XX xx### 2045.00 05/05/2012 + 1 3 E 0 S LINCOLN PARK XX .00 481 +46 1 12 F acctnum=50#####

From this: $ cat tstin

0 S LINCOLN PARK MI .00 4 +8146 1 12

My Perl code....been trying and trying....this is like my 5th mod.

use strict; use warnings; my $vers = q|1.5|; my $letter = 'A'; #my $file = qq(/var/adm/scripts/20121106_C1SA_174204.tmp); my $file = qq(/var/adm/scripts/tstin); my ($acctnum,@acctnum); open (my $f, "+<", $file) or die $!; while (my $ln = (<$f>)) { #next if ($. == 1..88); next if $ln !~ /\s\s\s\s\d+.*/; next if $ln =~ /\A\w+|\cM\s\d+\.\d+.*/; substr($ln, 1,2) = ""; substr($ln, 0,1) = $letter++; if ( $. % 5 == 1 ) { ###-- every 5th line --### if ( substr($ln, 10, 4) =~ /\s+/ ) { substr($ln, 10, 4, ""); $acctnum = substr($ln, 2, 10); push @acctnum, "F acctnum=$acctnum"; } } ###-- 79/80 are the 2 spaces not needed --### if ( length(substr($ln, 79, 2)) > 1 || substr($ln, 79, 2) =~ /\s*/ + ) { substr($ln, 79, 2, ""); print $ln; } else { next; } }

Comment on raw data formatting
Select or Download Code
Re: raw data formatting
by roboticus (Canon) on Nov 16, 2012 at 00:24 UTC

    teamassociated:

    Part of the difficulty appears to be that the file may have some lines you want to discard. Since the $. variable keeps track of the lines read and not the lines wanted, using the modulo operator on it is going to fail. However, the modulo operator idea is a *good* one, you just need to keep track of the lines you want to keep yourself:

    my $cnt_keepers=0; while (my $ln = <$f>) { next if .... # Now we've got a line we want to keep ++$cnt_keepers; if ($cnt_keepers % 5 == 0) { ... every fifth line processing ... } ... }

    Another way you could do it is to simply accumulate the lines you want, and when you have five, process the lot of 'em:

    my @records; while (my $ln = <$f>) { next if ... push @records, $ln; if (@records==5) { print "A $records[0]\n"; print "B $records[1]\n"; print "C $records[2]\n"; print "D $records[3]\n"; print "E $records[4]\n"; print "F $records[5]\n"; @records = (); } }

    It may be coincidence, but it seems like the first record of each group always has a 'Z' (?record type indicator?) around column 20. If it's not a coincidence, then another way to handle it would be to accumulate records until you find a line with a 'Z' marker, then process all the records you have, and then put the new record into the accumulator.

    Update: Fixed a code tag.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: raw data formatting
by Kenosis (Priest) on Nov 16, 2012 at 01:06 UTC

    Perhaps the following, which uses List::MoreUtils's natatime to grab five elements at a time from a list of file lines, will be helpful:

    use strict; use warnings; use List::MoreUtils qw/natatime/; my $it = natatime 5, <DATA>; while ( chomp( my @lines = $it->() ) ) { my $letter = 'A'; my $acctNum = do { $lines[0] =~ /\s+(\d+)\s+(\d+)/; $1 . $2 }; push @lines, " acctnum=$acctNum"; print for map { s/\s+/$letter++ . ' '/e; "$_\n" } @lines; } __DATA__ Place your data here...

    Update: Added a chomp; removed the non-destructive substitution modifier (/r) and a print "\n"; line.

      Looks awesome, but I get errors: it does not like the /r, but does fine with /e except the output is all 11111.
      # perl -c foobar Bareword found where operator expected at foobar line 12, near "s/\s+/ +$letter++ . ' '/er" syntax error at foobar line 12, near "s/\s+/$letter++ . ' '/er " foobar had compilation errors. root@facs04ap [/var/adm/scripts] # perl -v This is perl, v5.8.8 built for aix-thread-multi

      code from u

      use strict; use warnings; use List::MoreUtils qw/natatime/; my $it = natatime 5, <DATA>; while ( my @lines = $it->() ) { my $letter = 'A'; my $acctNum = do { $lines[0] =~ /\s+(\d+)\s+(\d+)/; $1 . $2 }; push @lines, " acctnum=$acctNum"; print for map { s/\s+/$letter++ . ' '/er } @lines; print "\n"; } __DATA__ 50###### 8 Z Axxx ,xxxxxxx 19793.43 02/02/2012 06/08 +/2012 32989.05 037###### S JOYCE xxxxxxS .00 06/06/xxxx + 0 0 7 1471 xxxx xxxx x #101 .00 02/08/2012 + 06/15/2012 NOV 0 U MILWAUKEE, WI xxxxx 32989.05 02/08/2012 + 3 3##### MILWAUKEE WI .00 5321 +4 12 50##### 8 Z Axxx ,Nxxxxx 75.00 05/27/2012 06/ +05/2012 1845.00 03######3 E Nxxxx Axxxx .00 09/xx/xxxx 06/ +05/2012 MYT##### 7####### S 156xxx xxxxx BLVD .00 06/06/2012 06/ +14/2012 MYT######## NOV 0 C TAYLOR,MI 48180 1845.00 05/27/2012 + 3 3 37###### S TAYLOR MI .00 48 +180 3 12 50##### 9 Z ALxxxx Rxxxx ,SUSxxx 75.00 05/05/2012 05/15 +/2012 2045.00 0000000000 E SUSE Axxxxx Rxxxx .00 03/xx/xxxx 05/1 +5/2012 AJT9####### 31####### 44xxx xxx AVE .00 05/16/2012 0 +6/15/2012 AJT92###### NOV 0 C LINCOLN PARK,MI xxxxx 2045.00 05/05/2012 + 1 3 0 S LINCOLN PARK MI .00 4 +8146 1 12

        Your version of Perl doesn't support the non-destructive substitution modifier (/r). Use the following:

        print for map { s/\s+/$letter++ . ' '/e; $_ } @lines;

        Will make that change in the original posting.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1004082]
Approved by tobyink
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2014-12-20 01:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (94 votes), past polls