Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

raw data formatting

by teamassociated (Novice)
on Nov 16, 2012 at 00:06 UTC ( #1004082=perlquestion: print w/ replies, xml ) Need Help??
teamassociated has asked for the wisdom of the Perl Monks concerning the following question:

Will you offer some help? Thx MUCH!!!!! I am trying to get to this, for every 5 lines of data in the raw file (which is one record), I need to output 6 lines into the output file. Every record should have 'F' be the last line, and 'A' should be the start of the next record. Line F needs to be created for each record and should contain the text 'acctnum=' and then use the first two pieces of data in line A to create the acctnum.

A 5##### 8 Z Axxxx xxxx, ANN 19793.43 02/02/2012 06/08/ +2012 32989.05 B 03###### S JOxxx xxxx, ADxxx .00 xx/xx/xxxx + 0 C 0 7 14## D xxxx ST #101 .00 02/08/2012 06/ +15/2012 NOV D 0 U MILWAUKEE, WI ##### 32989.05 02/08/2012 + 3 E 3##### MILWAUKEE WI .00 53214 + 12 F acctnum=508####8 A 50#### 8 Z ALLEN ,Nxxxxx 75.00 05/27/2012 06/05 +/2012 1845.00 B 037###### E Nxxxx ALLEN .00 09/xx/xxx 06/05/ +2012 MYT####### C 734####### S 15659 XXXX xxxxx .00 06/06/2012 06/1 +4/2012 MYT9##### NOV D 0 C TAYLOR,MI ##### 1845.00 05/27/2012 + 3 3 E 37##### S TAYLOR MI .00 48180 + 3 12 F acctnum=50##### A 50##### 9 Z Axxxx xxxxxx ,SUSETH 75.00 05/05/2012 05/15/2 +012 2045.00 B 0000000000 E SUSE xxxxxxx RIxxx .00 03/18/xxxx 05/15 +/2012 AJT9###### C 313###### 440 xxx xxxx .00 05/16/2012 06/1 +5/2012 AJT9#2###### NOV D 0 C LINCOLN PARK,XX xx### 2045.00 05/05/2012 + 1 3 E 0 S LINCOLN PARK XX .00 481 +46 1 12 F acctnum=50#####

From this: $ cat tstin

0 S LINCOLN PARK MI .00 4 +8146 1 12

My Perl code....been trying and trying....this is like my 5th mod.

use strict; use warnings; my $vers = q|1.5|; my $letter = 'A'; #my $file = qq(/var/adm/scripts/20121106_C1SA_174204.tmp); my $file = qq(/var/adm/scripts/tstin); my ($acctnum,@acctnum); open (my $f, "+<", $file) or die $!; while (my $ln = (<$f>)) { #next if ($. == 1..88); next if $ln !~ /\s\s\s\s\d+.*/; next if $ln =~ /\A\w+|\cM\s\d+\.\d+.*/; substr($ln, 1,2) = ""; substr($ln, 0,1) = $letter++; if ( $. % 5 == 1 ) { ###-- every 5th line --### if ( substr($ln, 10, 4) =~ /\s+/ ) { substr($ln, 10, 4, ""); $acctnum = substr($ln, 2, 10); push @acctnum, "F acctnum=$acctnum"; } } ###-- 79/80 are the 2 spaces not needed --### if ( length(substr($ln, 79, 2)) > 1 || substr($ln, 79, 2) =~ /\s*/ + ) { substr($ln, 79, 2, ""); print $ln; } else { next; } }

Comment on raw data formatting
Select or Download Code
Re: raw data formatting
by roboticus (Canon) on Nov 16, 2012 at 00:24 UTC

    teamassociated:

    Part of the difficulty appears to be that the file may have some lines you want to discard. Since the $. variable keeps track of the lines read and not the lines wanted, using the modulo operator on it is going to fail. However, the modulo operator idea is a *good* one, you just need to keep track of the lines you want to keep yourself:

    my $cnt_keepers=0; while (my $ln = <$f>) { next if .... # Now we've got a line we want to keep ++$cnt_keepers; if ($cnt_keepers % 5 == 0) { ... every fifth line processing ... } ... }

    Another way you could do it is to simply accumulate the lines you want, and when you have five, process the lot of 'em:

    my @records; while (my $ln = <$f>) { next if ... push @records, $ln; if (@records==5) { print "A $records[0]\n"; print "B $records[1]\n"; print "C $records[2]\n"; print "D $records[3]\n"; print "E $records[4]\n"; print "F $records[5]\n"; @records = (); } }

    It may be coincidence, but it seems like the first record of each group always has a 'Z' (?record type indicator?) around column 20. If it's not a coincidence, then another way to handle it would be to accumulate records until you find a line with a 'Z' marker, then process all the records you have, and then put the new record into the accumulator.

    Update: Fixed a code tag.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: raw data formatting
by Kenosis (Priest) on Nov 16, 2012 at 01:06 UTC

    Perhaps the following, which uses List::MoreUtils's natatime to grab five elements at a time from a list of file lines, will be helpful:

    use strict; use warnings; use List::MoreUtils qw/natatime/; my $it = natatime 5, <DATA>; while ( chomp( my @lines = $it->() ) ) { my $letter = 'A'; my $acctNum = do { $lines[0] =~ /\s+(\d+)\s+(\d+)/; $1 . $2 }; push @lines, " acctnum=$acctNum"; print for map { s/\s+/$letter++ . ' '/e; "$_\n" } @lines; } __DATA__ Place your data here...

    Update: Added a chomp; removed the non-destructive substitution modifier (/r) and a print "\n"; line.

      Looks awesome, but I get errors: it does not like the /r, but does fine with /e except the output is all 11111.
      # perl -c foobar Bareword found where operator expected at foobar line 12, near "s/\s+/ +$letter++ . ' '/er" syntax error at foobar line 12, near "s/\s+/$letter++ . ' '/er " foobar had compilation errors. root@facs04ap [/var/adm/scripts] # perl -v This is perl, v5.8.8 built for aix-thread-multi

      code from u

      use strict; use warnings; use List::MoreUtils qw/natatime/; my $it = natatime 5, <DATA>; while ( my @lines = $it->() ) { my $letter = 'A'; my $acctNum = do { $lines[0] =~ /\s+(\d+)\s+(\d+)/; $1 . $2 }; push @lines, " acctnum=$acctNum"; print for map { s/\s+/$letter++ . ' '/er } @lines; print "\n"; } __DATA__ 50###### 8 Z Axxx ,xxxxxxx 19793.43 02/02/2012 06/08 +/2012 32989.05 037###### S JOYCE xxxxxxS .00 06/06/xxxx + 0 0 7 1471 xxxx xxxx x #101 .00 02/08/2012 + 06/15/2012 NOV 0 U MILWAUKEE, WI xxxxx 32989.05 02/08/2012 + 3 3##### MILWAUKEE WI .00 5321 +4 12 50##### 8 Z Axxx ,Nxxxxx 75.00 05/27/2012 06/ +05/2012 1845.00 03######3 E Nxxxx Axxxx .00 09/xx/xxxx 06/ +05/2012 MYT##### 7####### S 156xxx xxxxx BLVD .00 06/06/2012 06/ +14/2012 MYT######## NOV 0 C TAYLOR,MI 48180 1845.00 05/27/2012 + 3 3 37###### S TAYLOR MI .00 48 +180 3 12 50##### 9 Z ALxxxx Rxxxx ,SUSxxx 75.00 05/05/2012 05/15 +/2012 2045.00 0000000000 E SUSE Axxxxx Rxxxx .00 03/xx/xxxx 05/1 +5/2012 AJT9####### 31####### 44xxx xxx AVE .00 05/16/2012 0 +6/15/2012 AJT92###### NOV 0 C LINCOLN PARK,MI xxxxx 2045.00 05/05/2012 + 1 3 0 S LINCOLN PARK MI .00 4 +8146 1 12

        Your version of Perl doesn't support the non-destructive substitution modifier (/r). Use the following:

        print for map { s/\s+/$letter++ . ' '/e; $_ } @lines;

        Will make that change in the original posting.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1004082]
Approved by tobyink
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2014-09-20 09:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (157 votes), past polls