http://www.perlmonks.org?node_id=1057557

Jalcock501 has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow Monks

I come before you today with a strange question!

Here is the situation: In a file, I am comparing, it holds a record known as a G record (it begins with G) G3301R7435459:LNI10708. This record is made up of snippets from other E records:
E99POLCOM|3||CAP01|66|3301R7435459||||| E99INSFAC2|MSRA01_1||||||"LNI10708"|
Sometimes multiple parts of E records or sometimes just one part. In the example given, the G record is made up of the 6th pipe delimitation on the first line and the 7th on the second, with a colon inbetween.

Here is the question, the E segment varies in size to as much as 256bits, now how can I check the G record reflects the E segments correctly when I don't know what size they are? Is there a way to do something like this:

(SUDO) load pipe delimited into hash with first record as key: look at E99POLCOM[6] look at E99INSFAC2[7] if G eq "E99POLCOM[6]:E99INSFAC2[7]"{PASS;}

Please help... if this doesn't make sense or if you think you might need more information just let me know

Thanks Jim

Replies are listed 'Best First'.
Re: Substrings of unusual size!
by RichardK (Parson) on Oct 09, 2013 at 15:16 UTC

    You could try using split. split G records with : as a delimiter & E records with |.

    then you don't care what length things are, only which fields you're interested in.

Re: Substrings of unusual size!
by Anonymous Monk on Oct 09, 2013 at 15:26 UTC

    Could you possibly change the title to be more descriptive? (I don't care to spend time what that would be.)

    Pseudocode ...

    open file while reading file line by line in @data, save lines from /^G/ (inclusive) to next /^G/ (exclusive) call parse_check on @data close file in parse_check, $data[0] =~ /^G/ or early return @find = split $data[0] on ":" remove /^G/ from $find[0] @field = parsed CSV output of @data[1,$#data]. remove /^E/ from @field for space in @field match space with each of @find if all succeed, well, you succeeded and return true outside of loop, return false
Re: Substrings of unusual size!
by keszler (Priest) on Oct 09, 2013 at 15:18 UTC
    If I'm understanding you correctly, you need the =~ operator but haven't done much with regular expressions yet. perlretut should help.
Re: Substrings of unusual size!
by ww (Archbishop) on Oct 10, 2013 at 01:19 UTC

    It's probably possible to do what you ask with regexen and length (to establish a $len for use as a quantifier). OTOH, I ran out of time to play with that, so here's (clumsy, verbose, but functional) code, inspired -- in part -- by RichardK's reply, above.

    #!/usr/bin/perl -w use 5.016; use Data::Dumper; # 1057557a1 my @grec = ('G3301R7435459:LNI10708', 'GA4:99TGSFAZ3', 'GINSFAC2:"A_1"', ); my @erec = ('E99POLCOM|3||CAP01|66|3301R7435459|||||', 'E99INSFAC2|MSRA01_1||||||"LNI10708"|', 'E99TGSFAZ3|A4|||743|||"A_1"|||', ); my ($grec, $erec, @Grec_split, @Erec_split, $Grec_split, $Erec_split); for $grec(@grec) { $grec =~ s/^G//; my @Grec = split /:/, $grec; push @Grec_split, @Grec; } for $erec(@erec) { $erec =~ s/^E//; my @Erec = split /\|/, $erec; push @Erec_split, @Erec; # Better to eliminate empty fi +elds before push } my $i = 0; my $j = 0; if ( $i < ( $#Grec_split +2 ) ) { # +\d 1 if arrs are equal len +; 2 if unequal no warnings 'uninitialized'; # no warn for arrays w/differ +ent nums of elements while ( $Grec_split[$i] ) { if ( $Erec_split[$j] eq '' ) { say "\t DEBUG Skipping empty \$Erec_split[$j]: $Erec_spli +t[$j]"; if ( $j < ($#Erec_split +1) ) { $j++; say "\t DEBUG: \$j at Ln 41 after increment: $j "; } } if ( $Grec_split[$i] eq $Erec_split[$j] ) { say "\n --> \$Erec_split[$j]: $Erec_split[$j] MATCHES \$G +rec_split[$i]: $Grec_split[$i]\n"; $i++; } else { # say "\t Could NOT MATCH \$Erec_split[$j] (|$Erec_split[$ +j]|) IN \$Grec_split[$i] ( $Grec_split[$i] )"; # DEBUG if ( $j < ($#Erec_split +1) ) { $j++; } else { $j = 0; $i++; } } } $i++; } =head Execution 1057557a1.pl: DEBUG Skipping empty $Erec_split[2]: DEBUG: $j at Ln 41 after increment: 3 --> $Erec_split[5]: 3301R7435459 MATCHES $Grec_split[0]: 3301R7435459 DEBUG Skipping empty $Erec_split[8]: DEBUG: $j at Ln 41 after increment: 9 DEBUG Skipping empty $Erec_split[10]: DEBUG: $j at Ln 41 after increment: 11 DEBUG Skipping empty $Erec_split[12]: DEBUG: $j at Ln 41 after increment: 13 DEBUG Skipping empty $Erec_split[16]: DEBUG: $j at Ln 41 after increment: 17 DEBUG Skipping empty $Erec_split[19]: DEBUG: $j at Ln 41 after increment: 20 DEBUG Skipping empty $Erec_split[22]: DEBUG Skipping empty $Erec_split[2]: DEBUG: $j at Ln 41 after increment: 3 DEBUG Skipping empty $Erec_split[8]: DEBUG: $j at Ln 41 after increment: 9 DEBUG Skipping empty $Erec_split[10]: DEBUG: $j at Ln 41 after increment: 11 DEBUG Skipping empty $Erec_split[12]: DEBUG: $j at Ln 41 after increment: 13 --> $Erec_split[15]: A4 MATCHES $Grec_split[2]: A4 DEBUG Skipping empty $Erec_split[16]: DEBUG: $j at Ln 41 after increment: 17 DEBUG Skipping empty $Erec_split[19]: DEBUG: $j at Ln 41 after increment: 20 DEBUG Skipping empty $Erec_split[22]: DEBUG Skipping empty $Erec_split[2]: DEBUG: $j at Ln 41 after increment: 3 DEBUG Skipping empty $Erec_split[8]: DEBUG: $j at Ln 41 after increment: 9 DEBUG Skipping empty $Erec_split[10]: DEBUG: $j at Ln 41 after increment: 11 DEBUG Skipping empty $Erec_split[12]: DEBUG: $j at Ln 41 after increment: 13 DEBUG Skipping empty $Erec_split[16]: DEBUG: $j at Ln 41 after increment: 17 DEBUG Skipping empty $Erec_split[19]: DEBUG: $j at Ln 41 after increment: 20 DEBUG Skipping empty $Erec_split[22]: DEBUG Skipping empty $Erec_split[2]: DEBUG: $j at Ln 41 after increment: 3 DEBUG Skipping empty $Erec_split[8]: DEBUG: $j at Ln 41 after increment: 9 DEBUG Skipping empty $Erec_split[10]: DEBUG: $j at Ln 41 after increment: 11 DEBUG Skipping empty $Erec_split[12]: DEBUG: $j at Ln 41 after increment: 13 DEBUG Skipping empty $Erec_split[16]: DEBUG: $j at Ln 41 after increment: 17 DEBUG Skipping empty $Erec_split[19]: DEBUG: $j at Ln 41 after increment: 20 --> $Erec_split[21]: "A_1" MATCHES $Grec_split[5]: "A_1"

    Obviously, this can be greatly improved; some possibilities can be found in the thread beginning at Parallel processing two arrays with different numbers of elements and in Q&A under QandASection: arrays.

    If I've misconstrued your question or the logic needed to answer it, I offer my apologies to all those electrons which were inconvenienced by the creation of this post.
Re: Substrings of unusual size!
by boftx (Deacon) on Oct 09, 2013 at 22:54 UTC

    Primary question: do you have the rules that determine how each 'G' record is composed?

    Secondary question: Is the 'G' record in the same file as the 'E' records?

    If the answer to both of those questions, or even just the primary question, is "yes" then this becomes relatively simple with split. I recall you asking about working with the source files in an earlier question, so I presume you figured a bit of this out already. The same thing applies here. Make a hash that defines what field from a given type of 'E' record is used in constructing the 'G' record and the approach you are going is quite workable.

    On time, cheap, compliant with final specs. Pick two.
Re: Substrings of unusual size! (variable length substrings)
by Anonymous Monk on Oct 10, 2013 at 00:08 UTC
    So the length of the substrings varies, they're variable length substrings, this is pretty common in everything , real life, tv/books/internet ... :)

    Now you know :P