http://www.perlmonks.org?node_id=11123471

ovedpo15 has asked for the wisdom of the Perl Monks concerning the following question:

In previous semester there was a riddle to write a regex which catches valid strings that meet the following conditions:

1. The length of the string can be zero or more.
2. It can contain any ASCII char that can be printed (see below definition), except the following chars:
- Backslash: \
- double quote: "
- LF char: \n (when it comes as a single char)
- LR char: \r (when it comes as a single char)
They are valid only if they are coming as part of a valid escape sequence.
3. Valid escape sequences: \\, \", \n, \t, \r, \0 \xdd (where dd represents an hexadecimal digit)

Examples of valid strings:
"hello"
"hi 'Hello'"
"Hey there\n"
"hi1 \x10"
"hi2 \x3A"
"hi\thow\tare\tyou\tdoing"

Examples of invalid strings:
'bad"
"bad
"multi-line bad
string"
"inner-"-bad"
"bad escape \"
The code to fill: if (<REGEX1>) { print("Valid String"); } else if (<REGEX2>) { print("Invalid char"); } else if (<REGEX3>) { print("Close the string!"); } else if (<REGEX4>) { print("Invalid escape"); }
Valid ASCII values: value between 0x20 and 0x7E and also whitespaces like 0x09, 0x0A, 0x0D.
You can change the order of the if-else statements but you can't use else (without if), meaning you have to write regex for each statement. The riddle didn't have a solution but I'm interest to see one.

The first statement should check if a string is valid (meets all the conditions describes before).
The second statement should check if it's contains an invalid char (see Valid ASCII values).
The third statement should check if it's contains an unclosed string.
The fourth statement should check if it's contains invalid escape (Not one of \\, \", \n, \t, \r, \0 \xdd).

How would you do it?

EDIT: As I understand, I could use the regex ([\x00-\x09\xB-\xC\xE-\x21\x23-\x5B\x5D-\xFF]) to catch the valid chars and (\\x[0-9A-Fa-f]{2}) for the hex digits. My problem of understanding on how to solve it is due to the fact that all of them I need to do something like "show all but ...". Checking for unclosed string is easy (I think) because it's just \".*. On the other hand it catches also valid strings like "aaa".

Replies are listed 'Best First'.
Re: Riddle with regex (updated)
by AnomalousMonk (Bishop) on Nov 07, 2020 at 17:51 UTC

    This does smell a bit homeworky. I also have some questions about details of the requirement specification. However, if I were to approach this problem, I would likely first set up a testing environment with placeholder regexes, then begin refining my understanding of the requirement and the definitions of the regexes. As the requirement clarifies, the regexes will sharpen and the number of test cases will grow.

    An initial testing framework (which at least compiles):

    use strict; use warnings; use Test::More; use Test::NoWarnings; use Data::Dump qw(pp); my @Tests = ( 'all valid strings', [ qq{"hello"}, 'Valid String' ], [ qq{"hi 'Hello'"}, 'Valid String' ], [ qq{"Hey there\n"}, 'Valid String' ], [ qq{"hi1 \x10"}, 'Valid String' ], [ qq{"hi2 \x3A"}, 'Valid String' ], [ qq{"hi\thow\tare\tyou\tdoing"}, 'Valid String' ], 'various invalid strings', [ qq{'bad"}, 'Close the string!' ], [ qq{"bad}, 'Close the string!' ], [ qq{"multi-line bad\nstring"}, 'Invalid char' ], [ qq{"inner-"-bad"}, 'Invalid char' ], [ qq{"bad escape \\"}, 'Invalid escape' ], ); # end array @Tests my @additional = qw(Test::NoWarnings); # each of these adds 1 test plan 'tests' => (scalar grep { ref eq 'ARRAY' } @Tests) + @additional ; VECTOR: for my $ar_vector (@Tests) { if (not ref $ar_vector) { note $ar_vector; next VECTOR; } my ($string, $expected) = @$ar_vector; my $got = classify($string); is $got, $expected, sprintf "'%s' -> $expected", pp $string; } # end for VECTOR exit; # function(s) under test ########################################### sub classify { my ($string, ) = @_; # placeholder regexes. my $rx_valid = qr{ \A " [^"\\]* (?: \\. [^"\\]* )* " \z + }xms; my $rx_close_the_string = qr{ .* }xms; # always true for developm +ent my $rx_invalid_char = qr{ .* }xms; my $rx_invalid_scape = qr{ .* }xms; return $string =~ $rx_valid ? 'Valid String' : $string =~ $rx_close_the_string ? 'Close the string!' : $string =~ $rx_invalid_char ? 'Invalid char' : $string =~ $rx_invalid_scape ? 'Invalid escape' : die "unclassifyable string ", pp $string ; } # end sub classify()

    Update: Also see How to ask better questions using Test::More and sample data.


    Give a man a fish:  <%-{-{-{-<

Re: Riddle with regex
by stevieb (Canon) on Nov 07, 2020 at 14:14 UTC

    Since this was from last semester, I'm sure you'll be happy to post the code you submitted before anyone here posts how they'd do it, right?

    Otherwise, this just sounds like someone wanting their homework done for them ;)

Re: Riddle with regex
by BillKSmith (Prior) on Nov 07, 2020 at 18:14 UTC
    I do not understand your examples! If I interpret your first one as my $valid0=q("hello"); it is invalid because it contains double quotes. If I ignore the surrounding quotes, there is nothing wrong with my $invalid0 = q('bad);. I cannot interpret them as valid Perl code because "bad is not. If there is any consistent way to read your examples, I cannot find it. You want a Perl solution. Please post your test cases as valid Perl strings.
    Bill
Re: Riddle with regex
by AnomalousMonk (Bishop) on Nov 08, 2020 at 00:13 UTC

    As an example of my confusion about requirements (as I and others have already noted):

    • The OPed example strings 'bad" "bad can only be defined (in Perl) by some statement | expression like q{'bad"} q{"bad} respectively. This implies that the double-quotes at the start and end of "hello" and various other examples of valid strings are required parts of the string, and could be defined as q{"hello"} or similar. Is this true?
    • What is the classification of '' (or q{}, the empty string)? How about things like q{""} and q{"''"}? Edge cases are important.


    Give a man a fish:  <%-{-{-{-<

Re: Riddle with regex
by karlgoethebier (Abbot) on Nov 07, 2020 at 18:32 UTC

    Great post. See also

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help