plegall has asked for the wisdom of the Perl Monks concerning the following question:

My EBCDIC and COBOL skills are very low, I'm new to this old technology.

I have some COBOL records EBCDIC encoded files. Each file is a list of records. I want to convert it into a Perl data structure, perform some transformations on fields, convert back to COBOL data structure and EBCDIC encoding.

I've found the Convert::EBCDIC CPAN module of course, but the COBOL record definition is not obvious for me : I don't understand how to jump from the COBOL definition to the unpackeb pattern :-/ Prior to this field extraction, I don't understand how to extract lines from the file !

My COBOL definition is:


what should the pattern be to unpack fields?

Convert::EBCDIC module on CPAN

Replies are listed 'Best First'.
Re: EBCDIC and COBOL records
by Corion (Patriarch) on Jul 09, 2008 at 13:54 UTC

    I use the following module to parse COBOL copy structures and to create the appropriate decoders:

    package PIC; use strict; use Encode qw(decode encode); use POSIX qw(ceil); use Carp qw(croak); use vars qw(%parse_PIC); =head2 C<decode_COMP3> Decodes a BCD number with trailing sign. =cut sub decode_COMP3 { # 0C -> + # 0D -> - # 0F -> (unsigned) my $digits = unpack "H*", $_[0]; my $sign = chop $digits; #print "$digits\n"; if ($sign eq 'D' or $sign eq 'd') { $sign = '-' } elsif ($sign eq 'C' or $sign eq 'c') { $sign = '+' } elsif ($sign eq 'F' or $sign eq 'f') { $sign = ' ' } else { $digits .= $sign; $sign = '?' }; "$sign$digits" }; =head2 C<make_COMP3_decoder PRE, POST, SEP> Returns a subroutine that parses the digits into PRE places before and POST places after the decimal delimiter and returns a string with SEP as the decimal delimiter =cut sub make_COMP3_decoder { my ($pre,$post,$sep) = @_; $sep ||= ","; return sub { my $res = decode_COMP3($_[0]); substr($res,$pre+2,0) = $sep; $res }; }; =head2 C<decode_EBCDIC> Decodes an EBCDIC string to a Perl internal string Shorthand for decode( 'cp1047', $_[0] ) =cut sub decode_EBCDIC { decode( 'cp1047', $_[0] )}; # The patterns for things I recognize in the COBOL copy structures I e +ncounter %parse_PIC = ( qr/^PIC\s+S?9\s*\((\d+)\)$/ => sub { return 0+$1, \&decode_EBCDIC +}, qr/^PIC\s+X\s*\((\d+)\)$/ => sub { return 0+$1, \&decode_EBCDIC }, qr/^PIC\s+S?9\s*\((\d+)\)\s*COMP-3$/ => sub { return ceil(($1+1) / +2), \&decode_COMP3 }, qr/^PIC\s+S?9\s*\((\d+)\)V9\((\d+)\)\s*(?:USAGE\s+)?COMP-3$/ => su +b { return ceil(($1+$2+1)/2), make_COMP3_decoder($1,$2,$_[0]) }, qr/^PIC\s+S?9\s*\((\d+)\)V\s*COMP-3$/ => sub { return ceil(($1+1) +/2), \&decode_COMP3 }, qr/^PIC\s+S?9\s*\((\d+)\)\s+OCCURS\s+(\d+)$/ => sub { return $1*$2 +, \&decode_EBCDIC }, ); =head2 C<decode_PIC> Returns two values: =over 4 =item * Length in bytes of the PIC expression =item * Code reference that decodes the passed value according to the +copy structure =back =cut sub decode_PIC { my ($pic,$sep) = @_; for my $re (keys %parse_PIC) { if ($pic =~ /$re/) { return ($parse_PIC{$re}->($sep)); }; }; croak "Couldn't decode '$pic'"; }; =head2 C<PIC_length> Returns the length of the expression in bytes =cut sub PIC_length { my ($pic) = @_; my ($len,$decoder) = decode_PIC($pic); $len }; 1;
      Your package Pic seems to match one of my need, but can you give a example of how to use it ? I've tried to execute decode_PIC on each line of my COBOL definition or on the whole file, none work :-/

        Wow - more than 10 years and somehow I missed your reply.

        Here is a usage, roughly transcribed from what would happen in the Real World:

        my $struct = <<'DEF'; * Some comment 05 AMOUNT VALUE PIC S9(5)V99. 05 VALUE-DATE VALUE PIC X(8). DEF my @lines = split /\n/, $struct; my $buffer = ''; for my $line (@lines) { next if $line =~ /^ \*/; # comment $buffer .= $line; next unless $buffer =~ /\./; # the line did not end, we need more $buffer =~ /^\s+(\d+)\s+(.+)/; or croak "Weirdo input found: $buffer"; my( $level, $info) = ($1,$2); if( $level == 1 ) { # A record or group, ignore } elsif ($info !~ /^(\S+)\s*(REDEFINES\s+(\S+))?\s*(PIC\s+[9XS].*? +\.$/) { croak "Weirdo info found: $info"; }; my $name = $1; my $redefine = $2; my $pic = $3; my $size = 0; my $decoder = sub { $_[0] }; if( $pic ) { ($size, $decoder) = PIC::decode_pic($pic,$decimal_separator,$e +ncoding); }; # ... use $size and $decoder to read and decode a number of bytes
Re: EBCDIC and COBOL records
by NetWallah (Canon) on Jul 09, 2008 at 17:47 UTC
    The CPAN module will help with the (poorly named) FILLER-ASCII field. (There is a chance that the data is already 7-bit ASCII stored in 8-bit chunks)

    The other 2 "COMP" fields will need to be better defined:

    Comp (Computational)
    Comp (with no suffix) leaves the choice of the data type to the compiler writer. The intent of this data type is to make it the most efficient format on any given machine, which is usually some binary format. Because of this, comp varies greatly between platforms, more than most other types.
    This means that the COMP field, which is probably binary-numeric has a size that varies with the hardware and compiler.

    Your best bet is to use a system utility on the original harware to get a hex dump of a few records, then write code to deal with it. The last 2 words(likely to be 32-bit chunks) of the record probably correspond to the last 2 fields.

         Have you been high today? I see the nuns are gay! My brother yelled to me...I love you inside Ed - Benny Lava, by Buffalax

Re: EBCDIC and COBOL records
by Aim9b (Monk) on Jul 10, 2008 at 19:10 UTC
    It's been a while, but in the IBM BigIron world, the x(292) is character data, and the two 9(8) COMP fields are 4 bytes of up to 8 digits zero-filled. Page 217 of the zOS COBOL Language Ref. offers the following...

    Computational items
    A computational item is a value used in arithmetic operations. It must be numeric. If the USAGE of a group item is described with any of these items, the elementary items within the group have this usage. The maximum length of a computational item is 18 decimal digits, except for a PACKED-DECIMAL item. If the ARITH(COMPAT) compiler option is in effect, then the maximum length of a PACKED-DECIMAL item is 18 decimal digits. If the ARITH(EXTEND) compiler option is in effect, then the maximum length of a PACKED-DECIMAL item is 31 decimal digits. The PICTURE of a computational item can contain only:
    9 One or more numeric character positions
    S One operational sign
    V One implied decimal point
    P One or more decimal scaling positions

    COMPUTATIONAL-1 and COMPUTATIONAL-2 items (internal floating-point) cannot have PICTURE strings.

    BINARY Specified for binary data items.
    Such items have a decimal equivalent consisting of the decimal digits 0 through 9, plus a sign. Negative numbers are represented as the two’s complement of the positive number with the same absolute value. The amount of storage occupied by a binary item depends on the number of decimal digits defined in its PICTURE clause:
    Digits in PICTURE clause = Storage occupied
    1 through 4 = 2 bytes (halfword)
    5 through 9 = 4 bytes (fullword)
    10 through 18 = 8 bytes (doubleword)

    Binary data is big-endian: the operational sign is contained in the leftmost bit.
    BINARY, COMPUTATIONAL, and COMPUTATIONAL-4 data items can be affected by the BINARY and TRUNC compiler options. For information about the effect of these compiler options, see the Enterprise COBOL Programming Guide.

    COMPUTATIONAL or COMP (binary) This is the equivalent of BINARY. The COMPUTATIONAL phrase is synonymous with BINARY.

    Hope this helps. If you could post a sample record or two, I might be able to offer a bit more help. Good Luck.
      Here comes a set of files : definition1.txt comes with sample1.txt, definition2.txt comes with sample2.txt That's very kind of you if you can help :-)
        plegall, I'm sorry, but all I can do is rule OUT some things...

        First, Even though the name in your Data Division implies otherwise, your sample data is deffinitely not ASCII, it truely is EBCDIC.

        Second, the 9(8) Comp fields do not look like anything I've seen before in a COMP field. The comp's I'm familiar with contain only 0-9, with one C/D for the + or - Sign. (ie. like the folowing, for $123.45...
        0024 Zone portion
        0135 numeric portion

        or in the case of a signed value of -3697...

        So, my questions would be, what sort of data is this, and are you sure these are COMP fields? Sorry, I couldn't be more help. Good Luck. Aim9b.
Re: EBCDIC and COBOL records
by Anonymous Monk on Oct 14, 2011 at 15:47 UTC
    How did you get the job if you are new to the subject? Oh, I see. I hadn't read the name yet. I understand.