Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Converting fixed record length files to pipe delimited

by akm2 (Scribe)
on Feb 19, 2001 at 15:07 UTC ( [id://59390]=perlquestion: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.

akm2 has asked for the wisdom of the Perl Monks concerning the following question:

I been trying to fixup web intergration of an OLD MS-DOS system, that uses fixed record length files ascii text files to store data.

the file im trying to convert is located at http://www.mscorp.org/bksic.txt

my desired output is to take every field defined except those labeled as trash in my perl code, and cut the spaces off the front and end then but a pipe between each field. and save the output fo a file.

Any advise would help. My perl code is below.

#LOAD REQUIRED PACKAGES & SET GLOBAL VARIABLES $file = "bksic.txt"; $outfile = "bksic.out"; #READ & PARSE STOCK INVENTORY CONTROL #DEFINE FIELDS @BKSIC_MFG = (); @BKSIC_MODEL = (); @BKSIC_SPEC1 = (); @BKSIC_SPEC2 = (); @BKSIC_SPEC3 = (); @BKSIC_SPEC4 = (); @BKSIC_OPT1 = (); @BKSIC_OPT2 = (); @BKSIC_OPT3 = (); @BKSIC_OPT4 = (); @BKSIC_DESC = (); @BKSIC_QOH = (); @BKSIC_TRASH1 = (); @BKSIC_MULT = (); @BKSIC_TRASH2 = (); @BKSIC_LIST = (); @BKSIC_TRASH3 = (); #OPEN, READ & PARSE DATAFILE open(DATAFILE,"<$file") || die "I can NOT open $file please fix the pr +oblem!\n"; @products = (); while ($line=<DATAFILE>) { if ((substr($line, 3, 50)) !~ /COMPONENT PARTS/) { push (@products, $line); } } close (DATAFILE); $i = 0; foreach $line (@products){ ($BKSIC_MFG[$i]) = substr ($line, 0, 3) =~ m/^\s*(.*)\ +s*$/; ($BKSIC_MODEL[$i]) = substr ($line, 3, 50) =~ m/^\s*(. +*)\s*$/; ($BKSIC_SPEC1[$i]) = substr ($line, 53, 7) =~ m/^\s*(. +*)\s*$/; ($BKSIC_SPEC2[$i]) = substr ($line, 60, 7) =~ m/^\s*(. +*)\s*$/; ($BKSIC_SPEC3[$i]) = substr ($line, 67, 7) =~ m/^\s*(. +*)\s*$/; ($BKSIC_SPEC4[$i]) = substr ($line, 74, 7) =~ m/^\s*(. +*)\s*$/; ($BKSIC_OPT1[$i]) = substr ($line, 81, 6) =~ m/^\s*(.* +)\s*$/; ($BKSIC_OPT2[$i]) = substr ($line, 87, 6) =~ m/^\s*(.* +)\s*$/; ($BKSIC_OPT3[$i]) = substr ($line, 93, 6) =~ m/^\s*(.* +)\s*$/; ($BKSIC_OPT4[$i]) = substr ($line, 99, 6) =~ m/^\s*(.* +)\s*$/; ($BKSIC_DESC[$i]) = substr ($line, 105, 55) =~ m/^\s*( +.*)\s*$/; ($BKSIC_QOH[$i]) = substr ($line, 160, 4) =~ m/^\s*(.* +)\s*$/; ($BKSIC_TRASH1[$i]) = substr ($line, 164, 41) =~ m/^\s +*(.*)\s*$/; ($BKSIC_MULT[$i]) = substr ($line, 205, 6) =~ m/^\s*(. +*)\s*$/; ($BKSIC_TRASH2[$i]) = substr ($line, 211, 23) =~ m/^\s +*(.*)\s*$/; ($BKSIC_LIST[$i]) = substr ($line, 234, 7) =~ m/^\s*(. +*)\s*$/; ($BKSIC_TRASH3[$i]) = substr ($line, 241, 169) =~ m/^\ +s*(.*)\s*$/; $i++; } open(DATAFILE,">>$outfile") || die "I can NOT open $outfile please fix + the problem!\n"; $i = 0; $x = @products; while ($i <= $x) { print DATAFILE "$BKSIC_MFG[$i]|$BKSIC_MODEL[$i]|$BKSIC_SPEC1[$ +i]|$BKSIC_SPEC2[$i]|$BKSIC_SPEC3[$i]|$BKSIC_SPEC4[$i]|$BKSIC_OPT1[$i] +|$BKSIC_OPT2[$i]|$BKSIC_OPT3[$i]|$BKSIC_OPT4[$i]|$BKSIC_DESC[$i]|$BKS +IC_QOH[$i]|$BKSIC_TRASH1[$i]|$BKSIC_MULT[$i]|$BKSIC_TRASH2[$i]|$BKSIC +_LIST[$i]|$BKSIC_TRASH3[$i]|\n"; $i++; } close (DATAFILE);

Replies are listed 'Best First'.
OffTopic: Formatting Tags
by stefan k (Curate) on Feb 19, 2001 at 15:19 UTC
    PLEASE,
    be careful with those <pre> tags and use <code> tags to format your code.

    It would be useful, if you told us what exactly is not working. A quick glance at your code shows that you got a somewhat un-perlish writing ;-)

    Maybe you should make yourself comfortable with the split() command. You could do something like

    ($field1, $field2, undef, undef ...) = split /\s+/, $line;

    which discardes the undef parts and let's you use the parts you wanted to have....

    Regards Stefan K

Re: Converting fixed record length files to pipe delimited
by agoth (Chaplain) on Feb 19, 2001 at 15:26 UTC
    ditto the above comment, use code tags!!,

    One solution to your problem:

    • use unpack to get your data out into an array
    • slice the array to discard the values you dont want
    • join the array with pipes
    open (OUTFILE, >$file) or die; while (<FILE>) { my @ary = unpack('A35 A30 A15', $_); my @tmp = @ary[0..4]; print OUTFILE (join '|', @tmp); } close OUTFILE, FILE;
Re: Converting fixed record length files to pipe delimited
by davorg (Chancellor) on Feb 19, 2001 at 15:30 UTC

    I'd do it something like this (trying to reconstruct the spec by reverse engineering your script):

    open(INFILE, $file) or die "Can't open $file: $!\n"; open(OUTFILE, $file) or die "Can't open $outfile: $!\n"; # widths of the cols my @cols = qw(3 50 7 7 7 7 6 6 6 6 55 4 41 6 23 7 169); # build unpack format my $fmt = join '', map { "A$_" } @cols; # column names my @col_names = qw(mfg model spec1 spec2 spec3 spec4 opt1 opt2 opt3 opt4 desc qoh trash1 mult trash2 list trash3); while (<INFILE>) { my %rec; $rec{@col_names} = unpack $fmt, $_; next if $rec{model} = 'COMPONENT PARTS'; print OUTFILE join('|', $rec{@col_names}) }

    Which looks a bit simpler than your version :)

    --
    <http://www.dave.org.uk>

    "Perl makes the fun jobs fun
    and the boring jobs bearable" - me

Re: Converting fixed record length files to pipe delimited
by arturo (Vicar) on Feb 19, 2001 at 15:33 UTC

    Basic Technique: read the records in, trim them, then use perlfunc:join to generate the output form. Probably your best bet is to read in each line, put all the 'keeper' fields into an array, then loop through the array and print the joined array out to a file. Here's one, relatively easily grokkable way to do it:

    # for each line, # get fields you're keeping, put them into @fields # in the proper order foreach my $field (@fields) { $field =~ s/^\s*(.*?)\s*$/; # trim whitespace -- but beware! # two-command version (see the FAQ in perlfaq) # $field =~ s/^\s*//; # $field =~ s/\s*$//; } print OUTPUTFILEHANDLE join "|" @fields; # now process the next line

    HTH

    Philosophy can be made out of anything. Or less -- Jerry A. Fodor

(boo) Re: Converting fixed record length files to pipe delimited
by boo_radley (Parson) on Feb 19, 2001 at 15:45 UTC
    This the part unpack was born to play. It'll let you defile define field lengths, which you can stuff into an array. I'm taking a guess as to what the field names might be, and what the field lengths would be; it'd have been more useful to know those than to actually see the text ;)
    if you wanted just the first 5 fields, and rework the first, third and second :
    ($part_code, $part_size, $part_size2, $quantity, $cryptic_field) = unp +ack ("A54 A7 A7 A10 A5", $line_from_file); print OUT join ("|", ($partcode, $part_size2, $part_size));
    make sense?
Re: Converting fixed record length files to pipe delimited
by tadman (Prior) on Feb 19, 2001 at 15:45 UTC
    Yikes! That code didn't get wrapped for some reason, and it's throwing the navigation table way off kilter. Anyway.

    The code you posted is really Perl 4 style, with a whole whack of arrays instead of the Perl 5 style Array of Arrays (or AoA as you will hear more often). AoA is a much easier way to implement what you have done. Easier is better, no?

    I would define your input file format, first, in a structure, and then write a loop to use this information to re-parse the file. Consider making an array that has only the start positions of each of the fields:
    # Define the format of the file my (@file_format) = ( 0, 3, 53, 60, 67, 74, # etc. 241+169, # Last position, presumably );
    Now the length of each field $n, for substr() purposes, at least, is simply $file_format[$n+1] - $file_format[$n]. Note that the last entry in the table shouldn't be used, that is, $n should only go as high as $#file_format-1.

    Now you can put each line into an array as you read it in, and then write it to a file straight away. Just open both files at the same time using two different filehandles, such as IN_FILE and OUT_FILE. You are putting your data into temporary arrays, but since the data is only used exactly once.
    my (@field_data); for (my $i = 0; $i < $#file_format; $i++) { $field_data[$i] = substr($_, $file_format[$i], $file_format[$i+1]- $file_format[$i]); # Clean up as required, by trimming $field_data[$i] =~ s/\s+$//; } print OUT_FILE join('|', @field_data);
    If you want, you can use unpack instead, but apart from stylistic differences, there is no real point unless you need maximum speed (i.e for 5 million line files, or what have you).
Re: Converting fixed record length files to pipe delimited
by unixwzrd (Beadle) on Feb 20, 2001 at 01:32 UTC
    I had a similar situation where I had a fixed length file generated on a mainframe. I actually used this to do some edits and inserted rows into an Oracle database, but I've shortened it a bit here and used joining the record with a "pipe":
    #!/usr/bin/perl use strict; my @record_layout = qw( state_code place_code state_alpha_code class_code place_name county_code county_name zip_code ); my %field_types = ( state_code => 'A2', place_code => 'A5', state_alpha_code => 'A2', class_code => 'A2', place_name => 'A52', county_code => 'A3', county_name => 'A22', zip_code => 'A5' ); my %fips_data; my $fips_template = join(" ", @field_types{@record_layout}); while(my $fips_line = <>){ @fips_data{@record_layout} = unpack($fips_template, $fips_line); next if $fips_data{'state_code'} == 52; print STDOUT join('|', @fips_data{@record_layout}); }
    Update: This post and its follow-up keep getting panned. It would be nice to get some constructive criticism rather than watching the numbers continue to fall on this, after all I would like to know what's wrong or could be done better so I can grow as a Perl programmer.

    Thanks,
    Mike

    "The two most common elements in the universe are hydrogen... and stupidity."
    Harlan Ellison

      Oh, one other thing I forgot to mention, I was only using "A" data types, but this method would work for any type of fixed records with binary or other embedded data types in it, just simply change the field types for the record layout...

      Mike

      "The two most common elements in the universe are hydrogen... and stupidity."
      Harlan Ellison

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://59390]
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.