Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: COBOL Layout parsing

by mAsterdam (Sexton)
on Mar 29, 2002 at 09:39 UTC ( [id://155220]=note: print w/replies, xml ) Need Help??


in reply to COBOL Layout parsing

Broomberg:

     ...write a generic layout parser which includes data conversion.

There is a awful lot of data out there in files stored with COBOL-FD/WS layouts waiting to be extracted and reported, an awful lot of pathological rubbish waiting to be eclectically listed.

In other words: this sounds like a Perl problem of the first hour. There are little languages that deal with such data efficiently but they are quite expensive (Easytrieve plus for instance). Somehow the efforts put into perl dealing with them are either non-existant, non-succesfull or not widely accepted (and thus on CPAN) upto now. These days even Linux is swimming into the pool of z-Series mainframes, so now might be a good time to get this going.

I for one would be very interested.

What modules would be necessary in a good COBOL-data toolkit?

Replies are listed 'Best First'.
Modules?
by broomberg (Initiate) on Mar 29, 2002 at 14:12 UTC
    Hmm.

    I guess I'd start the discussion by showing a COBOL layout, and then pointing to the pieces that would need to be dealt with.

    We'd need an object that took the layout as a create arg. It would parse out all the fields, setting up a array of field objects. Each field object would have enough info to read/understand that field type.

    I'd have a higher level structure that would take into account the physical read/unpack of the data, which would then be mapped back to the individual fields.

    In the case of 'redefines', I'd need a higher level parent object, then children that refer back to it, into pieces of it. We can't just chunk out the data since either object modifying it should be modifying the same data.

    In the case of 'occurs', we'd need arrays of objects.

    If the data has no 'comp' fields, I can do an EBCDIC to ASCII conversion of front to the entire record. If not, we've got to deal with it against ranges of bytes split by comp fields.

      What are we talking about? Show us the code. Well, ok.

      Is this what usage could look like?

      DISCLAIMER
      The COBOL might be a bit rusty (1987) and the perl is definitely a little speculative :-) so if I made mistaeks please correct them, but do not bother to check matching ({[< and the like - I didn't - the code is not meant to be run yet. It is meant to illustrate the questions and comments below the code.

      #!/usr/bin/perl -w # ModuleWithAcceptableNameThatReflectsThePurpose Q1 use COBOLstorage; $datafile="somepathtothedata.coboldata"; $COBOLrecord = <<'FD1'; # C1 * paymenttrack-file 01 Payment. 03 Rec-id pic X(6) value "PAY001". 03 Date pic X(10) value spaces. 03 EEEE-MM-DD Redefines Date. 05 EEEE pic 9999. 05 FILLER pic X. 05 MM pic 99. 05 FILLER pic X. 05 DD pic 99. 03 Amount pic 9(12)V99 comp-3 value zeroes. 03 Currency-code pic XXX value spaces. 03 Description. 05 Desc-line occurs 4 pic X(50) value spaces. 03 Originator. 05 Ident-type-code pic x value spaces. * 'P' Natural person * 'B' Business * 'N' National/Local Gov Agency * 'S' Supranational Agency 05 Ident. 07 Accountnumber pic 9(10) comp-3 value zeroes. 07 N-Name. 09 Familyname pic X(36) value spaces. 09 Title pic X(6) value spaces. 09 Initials pic X(6) value spaces. 09 FILLER pic X(10) value spaces. 07 O-Name redefines N-Name. 09 OrganisationName pic X(48). 09 Contact-nick pic X(10). 07 Address. 09 Street pic X(48) value spaces. 09 Nr pic X(8) value spaces. 09 Postal-code pic X(7) value spaces. 09 Country-code pic XX value spaces. 03 Beneficiary. 05 Ident-type-code pic x value spaces. * 'P' Natural person * 'B' Business * 'N' National/Local Gov Agency * 'S' Supranational Agency 05 Ident. 07 Accountnumber pic 9(10) comp-3 value zeroes. 07 N-Name. 09 Familyname pic X(36) value spaces. 09 Title pic X(6) value spaces. 09 Initials pic X(6) value spaces. 09 FILLER pic X(10) value spaces. 07 O-Name redefines N-Name. 09 OrganisationName pic X(48). 09 Contact-nick pic X(10). 07 Address. 09 Street pic X(48) value spaces. 09 Nr pic X(8) value spaces. 09 Postal-code pic X(7) value spaces. 09 Country-code pic XX value spaces. FD1 open(IN,$datafile); associate($blurk,$COBOLrecord); # Q2; my $total = 0; while(<IN>) { $blurk = $_; #========== this is where the module magic should take effect: # we can now use the COBOL data-item in perl. if ( $Description =~ /(?:salary|gage)/i ) { if ($Beneficiary.ident-type-code ne 'P') { # Q3 print << "WARNING"; # C2 Check this! Salary paid to organisation?: On $Date $Originator.Accountnumber paid $Beneficiary.A +ccountnumber $Currency-code $Amount BENY: $Beneficiary.Organisation +Name $Beneficiary.Contact-nick \"$Description\"\n" WARNING } else { $total{$Currency-code} += $Amount; } } } print "Total salaries / gages in USD $total{"USD"} \n";

      So here is the preliminary list of questions:

      • Q1: What should the name be - preferrably something more fitting than "COBOLstorage" used here.
      • C1: In actual practice this would come from a copylibrary or dictonary. I put it here in the code to open it up for critique / amendments.
      • Q2: What should be the ways to associate (both lvalues and rvalues, where appropriate) data with a specific COBOL-FD or WS entry, the COBOL way of structuring? I used "associate" here, but I think that may be to generic. "tie" comes to mind, but "tie" is richer than what I imagine this module should do, because "tie" also takes care of really reading and storing the data if necessary. The module however should (IMO) limit itself to parsing and composing COBOLdata.
      • Q3: Is this going to be the perl 6 way of qualifying data? $Bigstructure.partofbigstructure.iteminpart If so, let's go for that.
      • C2: This syntax is hopefully going to be ok in perl 6.

      The example leaves a lot of questions unanswered:

      • - What abouts arrays ?
      • - occurs depending on, how? (should it be @Description in the example print-statement?)
      • - 66, 88 levels
      • - initialising, writing COBOL-data
      • - machine dependencies
      • - locale dependencies
      ... we can only start at one place at a time. Maybe you could reflect on this, or on your vision of how you would name / use this module.

      HTH,

            Danny

        This would be useful, the idea being that "I've got a copylib member, I've got a data file, I just want to process it in Perl". I have done this quite a bit in the past, but always make up my own data storage, parsing code, etc. Q1: maybe IO::COBOL ??? Q2: Not sure if I agree with your assumptions, here. My own take on such a module would be the ability to do something like
        IO::COBOL::ReadData($FILEHANDL, $COBOLRecord); IO::COBOL::WriteData($OUTFILEHANDL, $COBOLRecord);
        Q3: Omygod, I'm not ready for Perl6 Other: Why not just add subscripts to the (target) data variables for an OCCURS statement? 88 levels are just Static symbols. I would assume that all data has already been translated to ASCII encoding in order to use this. Are you working on something? I can work on it if you don't have anything yet (although I've never written a CPAN module before). ... HH
Re: Re: COBOL Layout parsing
by torsore (Initiate) on Mar 29, 2002 at 21:42 UTC
    I've been working/struggling with COBOL and Perl for the last year, so I would be extremely interested in seeing something of this nature on CPAN. The struggling comes from COBOL, I assure you. Everyone knows there's nothing wrong with Perl! ;)

    My needs are generally not (data|data layout) conversion oriented, but I could certainly use a module that parsed COBOL data layouts and plopped them into Perl data structures.

    I would like to see a way to deal with the myriad of different COBOL dialects out there. It seems I am continuously reworking my scripts in order to deal with yet another COBOL dialect that my benevolent and sage employers have seen fit to jump right in to. Different dialects of COBOL (Unisys, Microfocus, IBM, Tandem, VAX, Wang, etc.) often handle data in different ways. The differences are usually subtle, but very important. Especially when dealing with comp fields. Would it make sense to have a base module implemented under ASCII COBOL (or some other standard) guidelines with other dialect specific modules available that would override the appropriate portions of the base?

    I know we are discussing a COBOL-data toolkit, but how about parsing COBOL code in general? I am just now starting to really research what is already out there so please be gentle on a newbie to the monastery! The same concerns over dialect apply here as well.
      I have started work on a module to do some COBOL data division parsing, you might want to check it out here.

      It doesn't address all your needs (I only have IBM COBOL II to work with), but it's a start.

      What more would you like to see? Thx... HH

      torsore:

           My needs are generally not (data|data layout) conversion oriented, ...

      Could you state more specifically what functionality you would like to see available?

      Thx,

          Danny

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://155220]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2024-03-28 13:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found