http://www.perlmonks.org?node_id=155189

broomberg has asked for the wisdom of the Perl Monks concerning the following question:

O glorious ones.

Ok, enough ass kissing.

I'm not a newbie, but sure feel like it. I'd like to know if there is any Perl package that deal with COBOL layout conversions.

I've wrote a few attempts over the years, but everytime I have one working for one project, it seems that the next project's layout if different enough to trigger a rewrite. I obviously don't know enough about COBOL (and would like to keep wallowing in my ignorance if possible) to write a generic layout parser which includes data conversion.

Here are the issues I am trying to deal with:

EBCDIC -> ASCII - Can't use 'dd' since I only want to convert non comp fields.

Comp fields, depending on which machine (endian wise) the data comes from.

Repeating fields.

Redefined fields.

Implied decimal.

88 levels for validation.

I've used the EBCDIC Perl libs which seem to work (but SLOW). I've actually split out the binary vs text data in the past, ran the text through a 'dd' conversion to ASCII and then joined the binary back in, and then attempted to deal with the comp field via varous unpack hacks. Twas nasty.

Redefines a another killer. I used to the in 'C' unions, so I understand them, but I have a tough time twisting them into Perl style hash ref records.

So, is this here (CPAN, etc) already and I simply missed it?

Also, due to data size, is this even reasonable? I may have to convert 500 million records on an on-going monthly basis. Should I accept the fact I'll need a hardcoded solution and just go from there?

Replies are listed 'Best First'.
Re: COBOL Layout parsing
by mAsterdam (Sexton) on Mar 29, 2002 at 09:39 UTC
    Broomberg:

         ...write a generic layout parser which includes data conversion.

    There is a awful lot of data out there in files stored with COBOL-FD/WS layouts waiting to be extracted and reported, an awful lot of pathological rubbish waiting to be eclectically listed.

    In other words: this sounds like a Perl problem of the first hour. There are little languages that deal with such data efficiently but they are quite expensive (Easytrieve plus for instance). Somehow the efforts put into perl dealing with them are either non-existant, non-succesfull or not widely accepted (and thus on CPAN) upto now. These days even Linux is swimming into the pool of z-Series mainframes, so now might be a good time to get this going.

    I for one would be very interested.

    What modules would be necessary in a good COBOL-data toolkit?

      Hmm.

      I guess I'd start the discussion by showing a COBOL layout, and then pointing to the pieces that would need to be dealt with.

      We'd need an object that took the layout as a create arg. It would parse out all the fields, setting up a array of field objects. Each field object would have enough info to read/understand that field type.

      I'd have a higher level structure that would take into account the physical read/unpack of the data, which would then be mapped back to the individual fields.

      In the case of 'redefines', I'd need a higher level parent object, then children that refer back to it, into pieces of it. We can't just chunk out the data since either object modifying it should be modifying the same data.

      In the case of 'occurs', we'd need arrays of objects.

      If the data has no 'comp' fields, I can do an EBCDIC to ASCII conversion of front to the entire record. If not, we've got to deal with it against ranges of bytes split by comp fields.

        What are we talking about? Show us the code. Well, ok.

        Is this what usage could look like?

        DISCLAIMER
        The COBOL might be a bit rusty (1987) and the perl is definitely a little speculative :-) so if I made mistaeks please correct them, but do not bother to check matching ({[< and the like - I didn't - the code is not meant to be run yet. It is meant to illustrate the questions and comments below the code.

        #!/usr/bin/perl -w # ModuleWithAcceptableNameThatReflectsThePurpose Q1 use COBOLstorage; $datafile="somepathtothedata.coboldata"; $COBOLrecord = <<'FD1'; # C1 * paymenttrack-file 01 Payment. 03 Rec-id pic X(6) value "PAY001". 03 Date pic X(10) value spaces. 03 EEEE-MM-DD Redefines Date. 05 EEEE pic 9999. 05 FILLER pic X. 05 MM pic 99. 05 FILLER pic X. 05 DD pic 99. 03 Amount pic 9(12)V99 comp-3 value zeroes. 03 Currency-code pic XXX value spaces. 03 Description. 05 Desc-line occurs 4 pic X(50) value spaces. 03 Originator. 05 Ident-type-code pic x value spaces. * 'P' Natural person * 'B' Business * 'N' National/Local Gov Agency * 'S' Supranational Agency 05 Ident. 07 Accountnumber pic 9(10) comp-3 value zeroes. 07 N-Name. 09 Familyname pic X(36) value spaces. 09 Title pic X(6) value spaces. 09 Initials pic X(6) value spaces. 09 FILLER pic X(10) value spaces. 07 O-Name redefines N-Name. 09 OrganisationName pic X(48). 09 Contact-nick pic X(10). 07 Address. 09 Street pic X(48) value spaces. 09 Nr pic X(8) value spaces. 09 Postal-code pic X(7) value spaces. 09 Country-code pic XX value spaces. 03 Beneficiary. 05 Ident-type-code pic x value spaces. * 'P' Natural person * 'B' Business * 'N' National/Local Gov Agency * 'S' Supranational Agency 05 Ident. 07 Accountnumber pic 9(10) comp-3 value zeroes. 07 N-Name. 09 Familyname pic X(36) value spaces. 09 Title pic X(6) value spaces. 09 Initials pic X(6) value spaces. 09 FILLER pic X(10) value spaces. 07 O-Name redefines N-Name. 09 OrganisationName pic X(48). 09 Contact-nick pic X(10). 07 Address. 09 Street pic X(48) value spaces. 09 Nr pic X(8) value spaces. 09 Postal-code pic X(7) value spaces. 09 Country-code pic XX value spaces. FD1 open(IN,$datafile); associate($blurk,$COBOLrecord); # Q2; my $total = 0; while(<IN>) { $blurk = $_; #========== this is where the module magic should take effect: # we can now use the COBOL data-item in perl. if ( $Description =~ /(?:salary|gage)/i ) { if ($Beneficiary.ident-type-code ne 'P') { # Q3 print << "WARNING"; # C2 Check this! Salary paid to organisation?: On $Date $Originator.Accountnumber paid $Beneficiary.A +ccountnumber $Currency-code $Amount BENY: $Beneficiary.Organisation +Name $Beneficiary.Contact-nick \"$Description\"\n" WARNING } else { $total{$Currency-code} += $Amount; } } } print "Total salaries / gages in USD $total{"USD"} \n";

        So here is the preliminary list of questions:

        • Q1: What should the name be - preferrably something more fitting than "COBOLstorage" used here.
        • C1: In actual practice this would come from a copylibrary or dictonary. I put it here in the code to open it up for critique / amendments.
        • Q2: What should be the ways to associate (both lvalues and rvalues, where appropriate) data with a specific COBOL-FD or WS entry, the COBOL way of structuring? I used "associate" here, but I think that may be to generic. "tie" comes to mind, but "tie" is richer than what I imagine this module should do, because "tie" also takes care of really reading and storing the data if necessary. The module however should (IMO) limit itself to parsing and composing COBOLdata.
        • Q3: Is this going to be the perl 6 way of qualifying data? $Bigstructure.partofbigstructure.iteminpart If so, let's go for that.
        • C2: This syntax is hopefully going to be ok in perl 6.

        The example leaves a lot of questions unanswered:

        • - What abouts arrays ?
        • - occurs depending on, how? (should it be @Description in the example print-statement?)
        • - 66, 88 levels
        • - initialising, writing COBOL-data
        • - machine dependencies
        • - locale dependencies
        ... we can only start at one place at a time. Maybe you could reflect on this, or on your vision of how you would name / use this module.

        HTH,

              Danny

      I've been working/struggling with COBOL and Perl for the last year, so I would be extremely interested in seeing something of this nature on CPAN. The struggling comes from COBOL, I assure you. Everyone knows there's nothing wrong with Perl! ;)

      My needs are generally not (data|data layout) conversion oriented, but I could certainly use a module that parsed COBOL data layouts and plopped them into Perl data structures.

      I would like to see a way to deal with the myriad of different COBOL dialects out there. It seems I am continuously reworking my scripts in order to deal with yet another COBOL dialect that my benevolent and sage employers have seen fit to jump right in to. Different dialects of COBOL (Unisys, Microfocus, IBM, Tandem, VAX, Wang, etc.) often handle data in different ways. The differences are usually subtle, but very important. Especially when dealing with comp fields. Would it make sense to have a base module implemented under ASCII COBOL (or some other standard) guidelines with other dialect specific modules available that would override the appropriate portions of the base?

      I know we are discussing a COBOL-data toolkit, but how about parsing COBOL code in general? I am just now starting to really research what is already out there so please be gentle on a newbie to the monastery! The same concerns over dialect apply here as well.
        I have started work on a module to do some COBOL data division parsing, you might want to check it out here.

        It doesn't address all your needs (I only have IBM COBOL II to work with), but it's a start.

        What more would you like to see? Thx... HH

        torsore:

             My needs are generally not (data|data layout) conversion oriented, ...

        Could you state more specifically what functionality you would like to see available?

        Thx,

            Danny

Re: COBOL Layout parsing
by PrakashK (Pilgrim) on Mar 29, 2002 at 04:38 UTC
    A quick CPAN search revealed the following modules: The first one is probably useful for part of what you are looking for.

    /prakash

      Yes, I've used it, as mentioned (poorly) in my post. But it was a bit too slow.<br
      I'll probably end up using an translation array or a series of tr// statements for that part of the problem.

Re: COBOL Layout parsing
by converter (Priest) on Mar 31, 2002 at 14:53 UTC

    When you mention "redefined fields," are you referring to variant records, where different record types are stored in the same data set and an indicator field is used to distinguish record type?

    If so, grab each record and use the type indicator to key into a hash of record layout definitions.

    Also, I have a nice EBCDIC ascii/hex dump module here. If you'd like to give it a try I'd be happy to ship you a copy. It dumps ASCII with non-printables masked with dots followed by a two-row hex dump of the source data (unconverted), and can print a ruler line above each record. I use it to dump EBCDIC data sets to files for reference while I'm working on conversions and I've found it quite handy. I plan to submit the module to the CPAN one of these days and I'd love to have some folks test it a bit and suggest improvements.