Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Check21/X9.37 text extractor

by delirium (Chaplain)
on Jul 31, 2009 at 19:53 UTC ( [id://784982]=sourcecode: print w/replies, xml ) Need Help??
Category: Text Processing
Author/Contact Info Curtis Autery
Description: This is a simple script to extract the EBCDIC text from an X9.37 formatted file. For those unfamiliar, this is a file format used in banking that has scanned check images in it, mixed with flatfile data describing the account numbers, dollar amounts, etc. The file format is obfuscated, but straightforward. You have a 4 byte record length field, then that many bytes of EBCDIC text, with one exception: the "52 Record". The first two bytes of data are the record number. Record 52 has 117 bytes of EBCDIC, and the remainder is binary TIFF data. This script has a flag that determines whether or not to ignore the binary TIFF data, or export it to files.
#!/usr/bin/perl -w
use strict;
use Encode;

my $tiff_flag = 0;
my $count = 0;

open(FILE,'<',$ARGV[0]) or die 'Error opening input file';
binmode(FILE) or die 'Error setting binary mode on input file';

while (read (FILE,$_,4)) {
  my $rec_len = unpack("N",$_);
  die "Bad record length: $rec_len" unless ($rec_len > 0);
  read (FILE,$_,$rec_len);
  if (substr($_,0,2) eq "\xF5\xF2") {
    if ($tiff_flag) {
      $count++;
      open (TIFF, '>', $ARGV[0] . '_img' . sprintf("%04d",$count) . '.
+tiff')
  or die "Can't create image file";
      binmode(TIFF) or die 'Error setting binary mode on image file';
      print TIFF substr($_,117);
      close TIFF;
    }
    $_ = substr($_,0,117);
  }
  print decode ('cp1047', $_) . "\n";
}
close FILE;
Replies are listed 'Best First'.
Re: Check21/X9.37 text extractor
by jwkrahn (Abbot) on Jul 31, 2009 at 23:28 UTC

    open(FILE,'<',$ARGV[0]) or die 'Error opening input file';

    You should include the $! variable in the error message so you know why it failed to open:

    open FILE, '<:raw', $ARGV[0] or die "Error opening '$ARGV[0]' $!";


    while (read (FILE,$_,4)) {

    Just because you ask for 4 bytes that does not mean the operating system will give you 4 bytes.   You should verify that you actually received 4 bytes.


      read (FILE,$_,$rec_len);

    Same thing here.


        if ($tiff_flag) {

    The only place that you assign any value to $tiff_flag is at the beginning where you set it to 0 so it will always be false.

      or die "Can't create image file";

    You should include the $! variable in the error message so you know why it failed to open.

      You should include the $! variable in the error message so you know why it failed to open:

      There's no need to add that to every statement for a quick script unless you do expect to run on that error often. You can figure out the error code after the fact as perl's die statement helpfully sets the exit code to $! % 256 || 1, so eg. if you print the exit code from the shell after you get this error message and you find that it's 2 you can be sure the error was ENOENT (No such file or directory). (The higher error codes are platform-dependent, but you can still find out the strerror message for them with eg. perl -we 'die($!=2)')

Re: Check21/X9.37 text extractor
by Anonymous Monk on Nov 19, 2009 at 21:57 UTC
    This is awesome. Thanks.
Re: Check21/X9.37 text extractor
by Anonymous Monk on Jun 16, 2015 at 20:08 UTC
    Text part is viewed fine but tiff output cannot be read by any viewer. Paint, Fax reader, MS Image etc. Running on red hat Linux.
    #!/usr/bin/perl -w use strict; use Encode; my $tiff_flag = 0; my $count = 0; open(FILE,'<',$ARGV[0]) or die 'Error opening input file'; binmode(FILE) or die 'Error setting binary mode on input file'; while (read (FILE,$_,4)) { my $rec_len = unpack("N",$_); die "Bad record length: $rec_len" unless ($rec_len > 0); read (FILE,$_,$rec_len); if (substr($_,0,2) eq "\xF5\xF2") { if ($tiff_flag) { $count++; open (TIFF, '>', $ARGV[0] . '_img' . sprintf(" +%04d",$count) . '.tiff') or die "Can't create image file"; binmode(TIFF) or die 'Error setting bina +ry mode on image file'; print TIFF substr($_,117); close TIFF; } $_ = substr($_,0,117 +); } print decode ('c +p1047', $_) . "\n"; } close FILE;

    Code tags added by GrandFather

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://784982]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2024-03-19 03:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found