Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

X12Splitter: A Tool For Splitting X12-Formatted .dat Files

by bpoag (Monk)
on Dec 31, 2017 at 02:30 UTC ( #1206465=CUFP: print w/replies, xml ) Need Help??

Now, if you're like most people, most of your day is spent wandering around aimlessly and asking yourself, "Man, I wish I had a Perl script that would take like a huuuuge X12-formatted file, and split it up into input files, each one no greater than 1500KB, or 2500 claims, whichever comes first. Wow, if I had that... Man, I'd even be willing to edit the hardcoded output path in that script to suit where I wanted those chunks to go!" Well, look no further:
#!/usr/bin/perl ## ## X12Splitter written 043013 by Bowie J. Poag ## ## X12Splitter takes an X12-formatted .dat file, and splits it ## up into inputFiles no greater than 1500KB or 2500 claims, ## whichever comes first. ## ## Usage: ## ## x12splitter <filename> ## ## Example: ## ## x12splitter foo.dat ## $|=1; $numRecords=0; $numBytes=0; $fileName=$ARGV[0]; errorCheckAndPrep(); dumpChunks(); sub errorCheckAndPrep { print "\n\nX12Splitter: Checking $fileName for any structural probl +ems.."; @inputFile=`cat $fileName`; @temp=`ls -l $fileName`; @fileDetails=split(" ",$temp[0]); $fileSize=$fileDetails[4]+0; $numElements=scalar(@inputFile); $numTotalBytes=length($inputFile[0]); if ($numElements > 1) { print "X12Splitter: Input file is malformed. Exiting..\n"; exit(); } else { print ".."; } if ($fileSize!=$numTotalBytes) { print "X12Splitter: Payload size and stated file size mismatch. +Exiting.\n"; exit(); } else { print ".."; } if ($inputFile[0]=~/^ISA/) { print "Done.\n"; } print "X12Splitter: Check complete. Parsing file..\n"; @payload=split("~ST",$inputFile[0]); $envelopeOpen=$payload[0]; $envelopeClose=$payload[-1]; $envelopeClose=~/~GE/; $envelopeClose="~GE$'"; $payload[-1]=$`; if ($envelopeOpen=~/^ISA/ && $envelopeClose=~/~GE/) { print "X12Splitter: Evenvelope open and close chunks found succe +ssfully.\n"; } else { print "X12Splitter: Unexpected problem with envelope open. Openi +ng ISA header or ~GE close not found.\n"; exit(); } shift (@payload); ## Don't bother processing the envelope.. foreach $item (@payload) { $recordCount++; $openRecordText=substr($item,0,15); $closeRecordText=substr($item,length($item)-40,40); printf ("\rX12Splitter: Record %6d: [%15s.....%-40s] \r", $recor +dCount, $openRecordText, $closeRecordText); } print "\nX12Splitter: $recordCount total records found. Splitting.. +\n"; } sub dumpChunks { $chunkPayload=""; $chunkNum=0; $numBytesInThisChunk=0; $numRecordsInThisChunk=0; foreach $item (@payload) { $numBytesInThisChunk=length($chunkPayload); $numRecordsInThisChunk++; $chunkPayload.="~ST$item"; if ($numRecordsInThisChunk>2000 || $numBytesInThisChunk>1000000) { $chunkPayload="$envelopeOpen"."$chunkPayload"."$envelopeClose +"; open ($fh,'>',"/demo/fin/healthport/$fileName.part.$chunkNum" +); print $fh "$chunkPayload"; close ($fh); print "X12Splitter: $numRecordsInThisChunk records saved to / +demo/fin/healthport/$fileName.part.$chunkNum\n"; $numBytesInThisChunk=0; $numRecordsInThisChunk=0; $chunkNum++; $chunkPayload=""; } } ## Clean up the last of it.. $chunkPayload="$envelopeOpen"."$chunkPayload"."$envelopeClose"; open ($fh,'>',"/demo/fin/healthport/$fileName.part.$chunkNum" +); print $fh "$chunkPayload"; close ($fh); print "X12Splitter: $numRecordsInThisChunk records saved to / +demo/fin/healthport/$fileName.part.$chunkNum\n"; } print "\n\n\n";

Replies are listed 'Best First'.
Re: X12Splitter: A Tool For Splitting X12-Formatted .dat Files
by afoken (Abbot) on Dec 31, 2017 at 11:38 UTC

    I have no idea what X12 files are, where they come from, and how they are structured. The posting does not help in this respect. And unfortunately, the code has tons of problems, too:

    • no strict
    • no warnings
    • tons of global variables that should be local (use my)
    • abusing cat to read a file (use open and readline)
    • possible shell injection while abusing cat
    • assuming every platform has a cat tool that behaves like Unix cat (hint: not available everywhere)
    • abusing ls to get file information (use lstat, stat or the -X functions)
    • possible shell injection while abusing ls
    • assuming every platform has a ls tool that behaves like Unix ls (hint: not available everywhere, behaviour depends on platform and enviroment)
    • confusing indent (use perltidy)
    • useless quotes and string interpolation (just concat the variables)
    • hardcoded output filenames (use Getopt::Long or similar)
    • lack of error checks (use or die "... $!" or autodie)

    ... and that's just the first ~20 and the last ~10 lines of code.

    #!/usr/bin/perl ## ## X12Splitter written 043013 by Bowie J. Poag ## ## X12Splitter takes an X12-formatted .dat file, and splits it ## up into inputFiles no greater than 1500KB or 2500 claims, ## whichever comes first. ## ## Usage: ## ## x12splitter <filename> ## ## Example: ## ## x12splitter foo.dat ## $|=1; $numRecords=0; $numBytes=0; $fileName=$ARGV[0]; errorCheckAndPrep(); dumpChunks(); sub errorCheckAndPrep { print "\n\nX12Splitter: Checking $fileName for any structural probl +ems.."; @inputFile=`cat $fileName`; @temp=`ls -l $fileName`; @fileDetails=split(" ",$temp[0]); $fileSize=$fileDetails[4]+0; $numElements=scalar(@inputFile); $numTotalBytes=length($inputFile[0]); if ($numElements > 1) { print "X12Splitter: Input file is malformed. Exiting..\n"; exit(); } else { print ".."; } if ($fileSize!=$numTotalBytes) { print "X12Splitter: Payload size and stated file size mismatch. +Exiting.\n"; exit(); } else { print ".."; } if ($inputFile[0]=~/^ISA/) { print "Done.\n"; } print "X12Splitter: Check complete. Parsing file..\n"; @payload=split("~ST",$inputFile[0]); $envelopeOpen=$payload[0]; $envelopeClose=$payload[-1]; $envelopeClose=~/~GE/; $envelopeClose="~GE$'"; $payload[-1]=$`; if ($envelopeOpen=~/^ISA/ && $envelopeClose=~/~GE/) { print "X12Splitter: Evenvelope open and close chunks found succe +ssfully.\n"; } else { print "X12Splitter: Unexpected problem with envelope open. Openi +ng ISA header or ~GE close not found.\n"; exit(); } shift (@payload); ## Don't bother processing the envelope.. foreach $item (@payload) { $recordCount++; $openRecordText=substr($item,0,15); $closeRecordText=substr($item,length($item)-40,40); printf ("\rX12Splitter: Record %6d: [%15s.....%-40s] \r", $recor +dCount, $openRecordText, $closeRecordText); } print "\nX12Splitter: $recordCount total records found. Splitting.. +\n"; } sub dumpChunks { $chunkPayload=""; $chunkNum=0; $numBytesInThisChunk=0; $numRecordsInThisChunk=0; foreach $item (@payload) { $numBytesInThisChunk=length($chunkPayload); $numRecordsInThisChunk++; $chunkPayload.="~ST$item"; if ($numRecordsInThisChunk>2000 || $numBytesInThisChunk>1000000) { $chunkPayload="$envelopeOpen"."$chunkPayload"."$envelopeClose +"; open ($fh,'>',"/demo/fin/healthport/$fileName.part.$chunkNum" +); print $fh "$chunkPayload"; close ($fh); print "X12Splitter: $numRecordsInThisChunk records saved to / +demo/fin/healthport/$fileName.part.$chunkNum\n"; $numBytesInThisChunk=0; $numRecordsInThisChunk=0; $chunkNum++; $chunkPayload=""; } } ## Clean up the last of it.. $chunkPayload="$envelopeOpen"."$chunkPayload"."$envelopeClose"; open ($fh,'>',"/demo/fin/healthport/$fileName.part.$chunkNum" +); print $fh "$chunkPayload"; close ($fh); print "X12Splitter: $numRecordsInThisChunk records saved to / +demo/fin/healthport/$fileName.part.$chunkNum\n"; } print "\n\n\n";

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://1206465]
Approved by Athanasius
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (2)
As of 2018-07-23 04:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (459 votes). Check out past polls.

    Notices?