http://www.perlmonks.org?node_id=853215

The question came up on a list dedicated to EDI (electronic data interchange) about how to recognize EDI data in a file. Interchanges formatted in accordance with ANSI X12 standards are made up of labeled, delimited records called segments.

A pair of segments form the outermost "envelope" around the data, the ISA and IEA segments. The ISA segment is 106 characters long, including the segment terminator character. The terminator tells you how to break the file into segments and another character, the element delimiter, tells how to break the segments into elements. This last is the 4th character in the ISA segment. These two characters, the segment terminator and element delimiter, are not specified in the standards and companies/industries use a variety of choices.

Ever since I first learned about Perl I figured it would be ideal for dealing with EDI files. I worked out a regexp for identifying an EDI file by the presence of this fixed length ISA segment. The result is the ability to recognize an X12 file, without knowing ahead of time which delimiter and terminator your trading partner chose, and it delivers them up to you as a side effect of recognition.

($elemdelim,$segterm) = ( $contents =~ /ISA(.)..\1.{10}\1..\1.{10}\1..\1.{15}\1..\1.{15}\1.{6}\1. +{4}\1.\1.{5}\1.?.{9}\1.\1.\1.(.)/);

It is said among X12 experts that the ISA is not fixed length. Element 14 is a 9 digit control number, which could be negative (according to the data type definition of the element but in complete contravention to the practical use of control numbers), in which case the ISA would be 107 characters long. This is silly, in my opinion, but is also easily dealt with using ".?".

I have also opted to use the . instead of \d or \w. That's just because I can't be bothered to type two characters when one will do.