Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Calling all REGEX Gurus - nasty problem involving regular expressions combined with hash keys - I need ideas as to how to even approach the problem

by davido (Archbishop)
on Feb 08, 2013 at 22:50 UTC ( #1017897=note: print w/ replies, xml ) Need Help??


in reply to Calling all REGEX Gurus - nasty problem involving regular expressions combined with hash keys - I need ideas as to how to even approach the problem

I didn't see in your problem explanation a description of how many "standardized responses" there could be. Are we talking about thousands? Hundreds? Tens?

It would also be useful to know whether the incomplete versions of the standardized responses are at least predictable, and unique. I understand that the entire response text might differ from transaction to transaction, but does a "100 - Bad Transaction" message always get abbreviated as "Bad T" before being embedded in the response text, and is the abbreviation unique so that no two standardized response codes could have the same abbreviation?

Let's say you've got a total possible 100 standardized responses / codes. Start by building a crossreference table that x-refs abbreviations with their full-sized versions:

my %xref = ( 'abbrev1' => '100 - Non-Abbreviated 1', 'abbrev2' => '200 - Non-Abbreviated 2', ... );

Next build up a big regex full of alternations:

my $alternations = join '|', keys %xref; my $regexp = qr/\b($alternations)\b/;

Next, scan your response text and look up the crossref:

my( $abbrev ) = $raw_response =~ m/$regexp/; my $std_response; if( exists $xref{$abbrev} ) { $std_response = $xref{$abbrev}; } else { die "No valid response found in <<$std_response>>"; } print "$std_response\n";

Perl's regular expression engine (as of 5.10, if I recall) performs "trie optimization" for alternation, which should be very fast. While hash keys cannot be Regexp objects, they could contain the text that you will use as components of a regexp pattern.

It's possible that this approach won't work for you if the possible abbreviations aren't unique, or if one abbreviation could be truncated in some way as to produce another valid abbreviation. It also won't work if you can't count on abbreviations being predictable. If those sorts of issues exist, you might have to explain to us how you as a human would look at the response text and visually/mentally detect a standardized response abbreviation. Then the problem would be to try to turn that process into a set of rules that could be implemented programatically.


Dave


Comment on Re: Calling all REGEX Gurus - nasty problem involving regular expressions combined with hash keys - I need ideas as to how to even approach the problem
Select or Download Code
Re^2: Calling all REGEX Gurus - nasty problem involving regular expressions combined with hash keys - I need ideas as to how to even approach the problem
by ted.byers (Scribe) on Feb 08, 2013 at 23:25 UTC

    Thanks

    There are around 100 possible standard response texts.

    I have not seen a consistent pattern in the abbreviations, although I hope against hope that they'd at least abbreviate any given response text the same way consistently.

    Part of the challenge is to deliberately trigger each response code, so that we can see how each is abbreviated at least once. It is easy to generate an error related to a bad date, some of the other errors are quite hard to deliberately trigger, especially when the server in question has to be treated as a black box (we know what the response codes are, but not their validation rules that apply them - and their documentation leaves everything to be desired).

    That said, I will make a test script based on the code you show, just in case the assumptions that code makes are satisfied by the data they send.

    Thanks

    Ted

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1017897]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (5)
As of 2014-10-02 03:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (45 votes), past polls