Re: COBOL Layout parsing
by mAsterdam (Sexton) on Mar 29, 2002 at 09:39 UTC
|
Broomberg:
...write a generic layout parser which includes data conversion.
There is a awful lot of data out there in files stored with COBOL-FD/WS layouts waiting to be extracted and reported, an awful lot of pathological rubbish waiting to be eclectically listed.
In other words: this sounds like a Perl problem of the first hour. There are little languages that deal with such data efficiently but they are quite expensive (Easytrieve plus for instance). Somehow the efforts put into perl dealing with them are either non-existant, non-succesfull or not widely accepted (and thus on CPAN) upto now. These days even Linux is swimming into the pool of z-Series mainframes, so now might be a good time to get this going.
I for one would be very interested.
What modules would be necessary in a good COBOL-data toolkit? | [reply] |
|
Hmm.
I guess I'd start the discussion by showing a COBOL
layout, and then pointing to the pieces that would need
to be dealt with.
We'd need an object that took the layout as a create
arg. It would parse out all the fields, setting up
a array of field objects. Each field object would
have enough info to read/understand that field type.
I'd have a higher level structure that would take
into account the physical read/unpack of the data,
which would then be mapped back to the individual fields.
In the case of 'redefines', I'd need a higher level parent
object, then children that refer back to it, into pieces
of it. We can't just chunk out the data since either
object modifying it should be modifying the same
data.
In the case of 'occurs', we'd need arrays of objects.
If the data has no 'comp' fields, I can do an EBCDIC
to ASCII conversion of front to the entire record.
If not, we've got to deal with it against ranges of
bytes split by comp fields.
| [reply] |
|
What are we talking about?
Show us the code. Well, ok.
Is this what usage could look like?
DISCLAIMER
The COBOL might be a bit rusty (1987) and the perl
is definitely a little speculative :-) so if I made mistaeks please correct them, but do not bother to check matching ({[< and the like - I didn't - the code is not meant to be run yet. It is meant to illustrate the questions and comments below the code.
#!/usr/bin/perl -w
# ModuleWithAcceptableNameThatReflectsThePurpose Q1
use COBOLstorage;
$datafile="somepathtothedata.coboldata";
$COBOLrecord = <<'FD1'; # C1
* paymenttrack-file
01 Payment.
03 Rec-id pic X(6) value "PAY001".
03 Date pic X(10) value spaces.
03 EEEE-MM-DD Redefines Date.
05 EEEE pic 9999.
05 FILLER pic X.
05 MM pic 99.
05 FILLER pic X.
05 DD pic 99.
03 Amount pic 9(12)V99 comp-3 value zeroes.
03 Currency-code pic XXX value spaces.
03 Description.
05 Desc-line occurs 4 pic X(50) value spaces.
03 Originator.
05 Ident-type-code pic x value spaces.
* 'P' Natural person
* 'B' Business
* 'N' National/Local Gov Agency
* 'S' Supranational Agency
05 Ident.
07 Accountnumber pic 9(10) comp-3 value zeroes.
07 N-Name.
09 Familyname pic X(36) value spaces.
09 Title pic X(6) value spaces.
09 Initials pic X(6) value spaces.
09 FILLER pic X(10) value spaces.
07 O-Name redefines N-Name.
09 OrganisationName pic X(48).
09 Contact-nick pic X(10).
07 Address.
09 Street pic X(48) value spaces.
09 Nr pic X(8) value spaces.
09 Postal-code pic X(7) value spaces.
09 Country-code pic XX value spaces.
03 Beneficiary.
05 Ident-type-code pic x value spaces.
* 'P' Natural person
* 'B' Business
* 'N' National/Local Gov Agency
* 'S' Supranational Agency
05 Ident.
07 Accountnumber pic 9(10) comp-3 value zeroes.
07 N-Name.
09 Familyname pic X(36) value spaces.
09 Title pic X(6) value spaces.
09 Initials pic X(6) value spaces.
09 FILLER pic X(10) value spaces.
07 O-Name redefines N-Name.
09 OrganisationName pic X(48).
09 Contact-nick pic X(10).
07 Address.
09 Street pic X(48) value spaces.
09 Nr pic X(8) value spaces.
09 Postal-code pic X(7) value spaces.
09 Country-code pic XX value spaces.
FD1
open(IN,$datafile);
associate($blurk,$COBOLrecord); # Q2;
my $total = 0;
while(<IN>) {
$blurk = $_;
#========== this is where the module magic should take effect:
# we can now use the COBOL data-item in perl.
if ( $Description =~ /(?:salary|gage)/i ) {
if ($Beneficiary.ident-type-code ne 'P') { # Q3
print << "WARNING"; # C2
Check this! Salary paid to organisation?:
On $Date $Originator.Accountnumber paid $Beneficiary.A
+ccountnumber
$Currency-code $Amount BENY: $Beneficiary.Organisation
+Name $Beneficiary.Contact-nick
\"$Description\"\n"
WARNING
}
else {
$total{$Currency-code} += $Amount;
}
}
}
print "Total salaries / gages in USD $total{"USD"} \n";
So here is the preliminary list of questions:
- Q1: What should the name be - preferrably something
more fitting than "COBOLstorage" used here.
- C1: In actual practice this would come from a copylibrary or dictonary.
I put it here in the code to open it up for critique / amendments.
- Q2: What should be the ways to associate (both lvalues and rvalues, where appropriate) data with a specific COBOL-FD or WS entry, the COBOL way of structuring? I used "associate" here, but I think that may be to generic.
"tie" comes to mind, but "tie" is richer than what I imagine this
module should do, because "tie" also takes care of really
reading and storing the data if necessary. The module however should (IMO) limit itself to parsing and composing COBOLdata.
- Q3: Is this going to be the perl 6 way of qualifying data?
$Bigstructure.partofbigstructure.iteminpart
If so, let's go for that.
- C2: This syntax is hopefully going to be ok in perl 6.
The example leaves a lot of questions unanswered:
- - What abouts arrays ?
- - occurs depending on, how? (should it be @Description in the example print-statement?)
- - 66, 88 levels
- - initialising, writing COBOL-data
- - machine dependencies
- - locale dependencies
... we can only start at one place at a time. Maybe you could reflect on this, or on your vision of how you would name / use this module.
HTH,
Danny | [reply] [d/l] |
|
|
I've been working/struggling with COBOL and Perl for the last year, so I would be extremely interested in seeing something of this nature on CPAN. The struggling comes from COBOL, I assure you. Everyone knows there's nothing wrong with Perl! ;)
My needs are generally not (data|data layout) conversion oriented, but I could certainly use a module that parsed COBOL data layouts and plopped them into Perl data structures.
I would like to see a way to deal with the myriad of different COBOL dialects out there. It seems I am continuously reworking my scripts in order to deal with yet another COBOL dialect that my benevolent and sage employers have seen fit to jump right in to. Different dialects of COBOL (Unisys, Microfocus, IBM, Tandem, VAX, Wang, etc.) often handle data in different ways. The differences are usually subtle, but very important. Especially when dealing with comp fields. Would it make sense to have a base module implemented under ASCII COBOL (or some other standard) guidelines with other dialect specific modules available that would override the appropriate portions of the base?
I know we are discussing a COBOL-data toolkit, but how about parsing COBOL code in general? I am just now starting to really research what is already out there so please be gentle on a newbie to the monastery! The same concerns over dialect apply here as well.
| [reply] |
|
I have started work on a module to do some COBOL data division parsing, you might want to check it out here.
It doesn't address all your needs (I only have IBM COBOL II to work with), but it's a start.
What more would you like to see?
Thx... HH
| [reply] |
|
torsore:
My needs are generally not (data|data layout) conversion oriented, ...
Could you state more specifically what functionality you would like to see available?
Thx,
Danny
| [reply] |
Re: COBOL Layout parsing
by PrakashK (Pilgrim) on Mar 29, 2002 at 04:38 UTC
|
A quick CPAN search revealed the following modules:
The first one is probably useful for part of what you are looking for.
/prakash | [reply] |
|
Yes, I've used it, as mentioned (poorly) in my post.
But it was a bit too slow.<br
I'll probably end up using an translation array or
a series of tr// statements for that part of the problem.
| [reply] |
Re: COBOL Layout parsing
by converter (Priest) on Mar 31, 2002 at 14:53 UTC
|
When you mention "redefined fields," are you referring to variant records, where different record types are stored in the same data set and an indicator field is used to distinguish record type?
If so, grab each record and use the type indicator to key into a hash of record layout definitions.
Also, I have a nice EBCDIC ascii/hex dump module here. If you'd like to give it a try I'd be happy to ship you a copy. It dumps ASCII with non-printables masked with dots followed by a two-row hex dump of the source data (unconverted), and can print a ruler line above each record. I use it to dump EBCDIC data sets to files for reference while I'm working on conversions and I've found it quite handy. I plan to submit the module to the CPAN one of these days and I'd love to have some folks test it a bit and suggest improvements.
| [reply] |