Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: Reading File and Seperating into columns

by boftx (Deacon)
on Sep 25, 2013 at 22:58 UTC ( #1055756=note: print w/replies, xml ) Need Help??

in reply to Reading File and Seperating into columns

For starters, I noticed that 99HEADER appears twice in the data before 99TERMIN does (the second time is 13 lines before the 99TERMIN.)

This is one of the strangest data files I have seen, but that aside, I suspect there might be a problem lurking in the wings by simply treating every delimited record as having the same data structure when clearly they don't. It seems to me that the first field is a record type code and each type has its own definition that should be honored. This is quite common in data transfer files, and I think it would be obvious if this was in XML instead of a bastardized fixed-length record structure. (leave it to the insurance industry to screw this up.)

Thankfully you apparently don't need to break apart the data before the 99HEADER, but those records you are concerned with are in fact easy to work with as pipe (|) delimited and I wouldn't bother converting to CSV for all the reasons given above.

What I would be concerned with are those type codes in the first field, especially since there are repeating field types, including the 99HEADER type.

If I had to guess it looks like you are dealing with a single account record that has coverage information on multiple vehicles (with 99HEADER indicating the start of a new vehicle.)

On a side note, Perl is ideal for processing this kind of data if you have the full API spec for the data file handy.

On time, cheap, compliant with final specs. Pick two.
  • Comment on Re: Reading File and Seperating into columns

Replies are listed 'Best First'.
Re^2: Reading File and Seperating into columns
by Jalcock501 (Sexton) on Sep 26, 2013 at 08:03 UTC
    Haha, you've almost nailed it, but its not different vehicles but different types of cover.

    I'm just stuck and don't know how I am going to separate the fields into a readable format. Thanks for pointing out that 99HEADER appeared twice that slipped by me.

    So this is what I have so far:
    #! /usr/bin/perl -w use strict; my @files = <*.in>; for my $file (@files) { open my $handle, '<', $file; chomp(my @lines = <$handle>); close $handle; open my $write, '>', "$file.sep"; my @enr_data = grep {/^99/} @lines; s/99/\n99/g for (@enr_data); close($handle); }
    This separates the lines I need from the file after more data analysis I realised that there are more areas with 99 Factors appear.I basically just need to cut fields up so that they can be read by your standard user.

      This is just a crude outline, but given the large number of different record types (judging by the values in field 1) I would use a hash with keys consisting of the various types you are interested in and the values being a hashref that includes formatting strings for sprint. Something like this:

      # This is NOT real code, but just a concept my %record_types = ( 99HEADER => { format => "%s %s", code => undef, }, 99INSFAC => { format => "%s %s %s %04.2f", code = >\&process_99insfac, }, }; for my $line ( @input_lines ) { my ($rec_type,@rec_data) = split(/|/,$line); next unless exists $record_types{$rec_type); # Call a pre-processor if present, maybe skip empty records. # Note, the syntax for a proper dispatch table might be wrong here +. You # will probably need to play with this a bit, but it is nifty when + it works. next if defined( $record_types{$rec_type}{code} ) && !&{$record_types{$rec_type}{code}}( data => \@rec_data ); say sprint($record_types{$rec_type}{format},@rec_data); } exit; sub process_99insfac { my %args = @_; my @rec_data = @{$args{data}}; return unless $rec_data[2]; # no date? nothing to do $rec_data[3] = some calculation; # do something nifty here return 1; }

      This is a very crude presentation, but I think you can get the idea and can see that you can take advantage of the type hash by adding more info such as code references to sub-routines to do any special processing if needed. You would need to track entering and leaving each new record structure, but I doubt you would have much trouble with that logic. This approach should give you a lot of flexibility for layout and dealing with the different sub-record types.

      Update: added example for a preprocessor code ref.

      On time, cheap, compliant with final specs. Pick two.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1055756]
and nobody stirs...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (9)
As of 2018-06-17 22:53 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (107 votes). Check out past polls.