Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^4: Unescaped left brace in regex is passed through in regex

by Aldebaran (Curate)
on Jun 07, 2022 at 20:25 UTC ( [id://11144485]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Unescaped left brace in regex is passed through in regex
in thread Unescaped left brace in regex is passed through in regex

I try to follow what LanX does with the REPL. I'm confused right now.

$ perl -de1 Loading DB routines from perl5db.pl version 1.53 Editor support available. Enter h or 'h h' for help, or 'man perldebug' for more help. main::(-e:1): 1 DB<1> $str = '\\x{A3f4}' + DB<2> p $str + \x{A3f4}

Ok, so we begin the game with a string that has 2 backslashes before an x and then curly braces containing a value. We ask the debugger to print the value, and the output differs from the original string by having one fewer backslash.

Q1) What makes you think this is a left-curly brace? I don't doubt you; I just can't get there:

DB<2> print $str + \x{A3f4} DB<3> print "$str" + \x{A3f4}

Q2) What is \x{value} called? I'm going up a wall trying to disambiguate it. I think it's completely-different from my \x experience with it in a regex: perlre#/x-and-/xx. Continuing:

DB<3> if ( $str =~ /(\\x{[A-F\d]+})/i ) { print $1 } + Unescaped left brace in regex is deprecated here (and will be fatal in + Perl 5.30), passed through in regex; marked by <-- HERE in m/(\\x{ < +-- HERE [A-F\d]+})/ at (eval 13)[/usr/share/perl/5.28/perl5db.pl:738] + line 2.

Ok, let's stop here. When I'm debugging my stuff, I typically think that the first error message is the one I need to attend to. What follows was the stack of things that bombed out because of the first error. This is as clear as an error message is, except that I'm not quite sure where it stops and the next one begins. Let's take a look at the rest of the message:

at (eval 13)[/usr/share/perl/5.28/perl5db.pl:738] line 2. eval 'no strict; ($@, $!, $^E, $,, $/, $\\, $^W) = @DB::saved;pack +age main; $^D = $^D | $DB::db_stop; if ( $str =~ /(\\\\x{[A-F\\d]+})/i ) { print $1 }; ' called at /usr/share/perl/5.28/perl5db.pl line 738 DB::eval called at /usr/share/perl/5.28/perl5db.pl line 3138 DB::DB called at -e line 1 Unescaped left brace in regex is deprecated here (and will be fatal in + Perl 5.30), passed through in regex; marked by <-- HERE in m/(\\x{ < +-- HERE [A-F\d]+})/ at (eval 13)[/usr/share/perl/5.28/perl5db.pl:738] + line 2. at (eval 13)[/usr/share/perl/5.28/perl5db.pl:738] line 2. eval 'no strict; ($@, $!, $^E, $,, $/, $\\, $^W) = @DB::saved;pack +age main; $^D = $^D | $DB::db_stop; if ( $str =~ /(\\\\x{[A-F\\d]+})/i ) { print $1 }; ' called at /usr/share/perl/5.28/perl5db.pl line 738 DB::eval called at /usr/share/perl/5.28/perl5db.pl line 3138 DB::DB called at -e line 1 Unescaped left brace in regex is deprecated here (and will be fatal in + Perl 5.30), passed through in regex; marked by <-- HERE in m/(\\x{ < +-- HERE [A-F\d]+})/ at (eval 13)[/usr/share/perl/5.28/perl5db.pl:738] + line 2. at (eval 13)[/usr/share/perl/5.28/perl5db.pl:738] line 2. eval 'no strict; ($@, $!, $^E, $,, $/, $\\, $^W) = @DB::saved;pack +age main; $^D = $^D | $DB::db_stop; if ( $str =~ /(\\\\x{[A-F\\d]+})/i ) { print $1 }; ' called at /usr/share/perl/5.28/perl5db.pl line 738 DB::eval called at /usr/share/perl/5.28/perl5db.pl line 3138 DB::DB called at -e line 1 \x{A3f4}

Can you "see" the layering of this, because I cannot. Q3) Is there a setting for the debugger that allows for more human readable error output, maybe a newline between layers?

Q4) Why is any part of perl5db.pl asking itself this question:

eval 'no strict; ($@, $!, $^E, $,, $/, $\\, $^W) = @DB::saved;package +main; $^D = $^D | $DB::db_stop; if ( $str =~ /(\\\\x{[A-F\\d]+})/i ) { print $1 };

There are 4 backslashes before that x. I know this is a situation of garbage in, but how is this the garbage out?

DB<5> $str = '123X12' + DB<6> p $str + 123X12 DB<7> if ( $str =~ /^(\d{2,4})[^\d](\d{2})/) { print "$1;$2" } + 123;12 DB<8> $str = '123{12' + DB<9> if ( $str =~ /^(\d{2,4})[^\d](\d{2})/) { print "$1;$2" } + 123;12

Well, the rest of this makes sense to me, and it shows a use of the curly braces that I might know. I wouldn't understand it without working through it with the REPL. One final question:

Q5) Is it said correctly that perl is written in c, but the perl debugger is written in perl?

Gruss aus Amiland

Replies are listed 'Best First'.
Re^5: Unescaped left brace in regex is passed through in regex
by AnomalousMonk (Archbishop) on Jun 07, 2022 at 21:24 UTC

    I'm not the best one to answer questions about the Perl debugger, but I can answer some more general questions.

    Q0) ... we begin the game with a string that has 2 backslashes ... the output differs from the original string by having one fewer backslash.
    Due to the way the single-quote string constructor handles backslashes (escapes), the \\ will in this case compile to a single literal backslash. See Quote and Quote-like Operators and the discussion of q/STRING/ in Quote-Like Operators.
    Q1) What makes you think this is a left-curly brace?
    I'm not quite sure what "this" refers to, but do you dispute that there is a left-curly (and a right-curly) in the \x{A3f4} string? What else would you call it/them?
    Q2: What is \x{value} called? ...
    I would call it (or in this case \x{A3f4}) "the string compiled from '\\x{A3f4}'". The \x part has nothing to do with the /x or /xx regex modifiers.


    Give a man a fish:  <%-{-{-{-<

      Due to the way the single-quote string constructor handles backslashes (escapes), the \\ will in this case compile to a single literal backslash. See Quote and Quote-like Operators and the discussion of q/STRING/ in Quote-Like Operators.

      Thanks for your comment, AM. Your link is a good read and worth reposting. I thought that the collapsing of backslashes was done by the OS in resolving paths. I was unaware that perl did it.

      do you dispute that there is a left-curly (and a right-curly) in the \x{A3f4} string? What else would you call it/them?

      I do not dispute that, so this string itself never represents a left curly brace, rather it has a left curly brace in it.

      I would call it (or in this case \x{A3f4}) "the string compiled from '\\x{A3f4}'"

      Ok. From the above source we have:

      \x{263A} [1,8] hex char (example shown: SMILEY) \x{ 263A } Same, but shows optional blanks inside and adjoining the braces \x1b [2,8] restricted range hex char (example: ESC)

      So, I think "aha, it's a hex representation", but then I can't get there with the REPL:

      DB<1> $str2='\\x{263}' + DB<2> p $str2 + \x{263} DB<3> p hex $str2 + 0 DB<4> print hex $str2 + 0

      I would expect to see a smiley face rather than zero. This is a head-scratcher:

      DB<6> $str3='\\\\\\\x{aF}' + DB<7> p $str3 + \\\\x{aF} DB<8> p hex $str3 + 0 DB<9> print hex $str3 + 0

      $str3 goes from 7 to 4 backslashes when compiled(?). But I get zero for a hex value no matter what I try:

      DB<10> $str4='\x{aF}' + DB<11> p $str4 + \x{aF} DB<12> print hex $str4 + 0 DB<13> print hex 'aF' + 175

      How do I tease 175 out of $str4?

      The \x part has nothing to do with the /x or /xx regex modifiers.

      That part is clearer now. I have that backslash/forwardslash disphoria going on now where I can hardly see the difference and it looks like a toothpick war. I get the occasional billiken that I read or write the wrong way.

        > So, I think "aha, it's a hex representation", but then I can't get there with the REPL:

        you are still confusing interpolation (double-quotes) from literal strings (single-quotes)

        DB<28> p $str1 = "\x{41}" # interpolation A DB<29> p $str2 = '\x{41}' # literal \x{41} DB<30> p $str2 = '\\x{41}' # literal but escaping escaping \ \x{41}

        now, the double escape in line 30 is playing safe, because there is a difference between \\' and \'

        BUT this

        \x{ 263A } Same, but shows optional blanks inside and adjoining the braces

        doesn't work for me! (oO ???)

        DB<31> p " \x{ 41 } " ^@

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        Some random responses...

        do you dispute that there is a left-curly (and a right-curly) in the \x{A3f4} string? What else would you call it/them?
        I do not dispute that, so this string itself never represents a left curly brace, rather it has a left curly brace in it.

        Oh, so you were thinking that "\x{A3f4}" when compiled double-quotishy into a string and then printed should print a left-curly! I follow you a little better now. My terminal is not configured for Unicode (as I assume this character to be) right now, so I cannot confirm what it will print, and I'm reluctant to launch myself into Unicode-land on-line to find out. However, I agree that the escape sequence \x{A3f4} when compiled double-quotishly (e.g., "ab\x{A3f4}cd") will compile to some character. But the single-quote-compiled string '\x{A3f4}' will always be literally \x{A3f4} and nothing else.

        It's important to understand how backslashes (escapes) are compiled in single- and double-quoted strings. Consider the following:

        Win8 Strawberry 5.8.9.5 (32) Tue 06/07/2022 12:17:53 C:\@Work\Perl\monks >perl use strict; use warnings; print '-\-\\-\\\-\\\\-\\\\\-\\\\\\-\\\\\\\-\\\\\\\\-'; ^Z -\-\-\\-\\-\\\-\\\-\\\\-\\\\-
        Why do '\\\\\\\' and '\\\\\\\\' (7 and 8 backslashes, respectively) both compile to and print as four backslashes? How would this be different if compiled as a double-quoted string?

        DB<1> $str2='\\x{263}'

        This compiles to (and prints) the literal string \x{263} or literal-backslash, literal-lowercase-x, literal-left-curly, literal-2, literal-6, literal-3, literal-right-curly. The hex built-in cannot interpret a string in this format (and so returns zero (update: and a warning)), but can in "proper" format:

        Win8 Strawberry 5.8.9.5 (32) Tue 06/07/2022 22:09:02 C:\@Work\Perl\monks >perl use strict; use warnings; my $h1 = 'A3f4'; my $h2 = 'xA3f4'; print hex 'A3f4', "\n"; print hex $h1, "\n"; print hex 'xA3f4', "\n"; print hex $h2, "\n"; print hex '\xA3f4', "\n"; print hex '\x{A3f4}', "\n"; ^Z 41972 41972 41972 41972 Illegal hexadecimal digit '\' ignored at - line 13. 0 Illegal hexadecimal digit '\' ignored at - line 14. 0

            DB<10> $str4='\x{aF}'
        ...
        How do I tease 175 out of $str4?

        We know that \x{aF} will not be interpreted by hex as a hex number. One way to extract the hex substring:

        Win8 Strawberry 5.8.9.5 (32) Tue 06/07/2022 22:25:09 C:\@Work\Perl\monks >perl use strict; use warnings; my $str = '\x{aF}'; $str =~ m{ \A \\ x \{ ([[:xdigit:]]+) \} \z }xms; my $hex_digits = $1; print ">$hex_digits< \n"; my $hex_number_in_decimal = hex $hex_digits; print "$hex_number_in_decimal \n"; ^Z >aF< 175

        Update: Another approach:

        Win8 Strawberry 5.8.9.5 (32) Sat 06/11/2022 15:18:47 C:\@Work\Perl\monks >perl use strict; use warnings; my $str = '\x{aF}'; my ($hex_digits) = $str =~ m{ [[:xdigit:]]+ }xmsg; my $hex_number_in_decimal = hex $hex_digits; print "'$hex_digits' == $hex_number_in_decimal decimal \n"; ^Z 'aF' == 175 decimal
        This approach can be useful when a string or record has been "validated" as to its structure and you know that certain substrings or fields are unambiguously present: these substrings/fields can then be easily and quickly extracted. Note the /g modifier on the m// match.


        Give a man a fish:  <%-{-{-{-<

Re^5: Unescaped left brace in regex is passed through in regex
by LanX (Saint) on Jun 07, 2022 at 23:35 UTC
    >  $str =  '\\x{A3f4}'

    please note the single quotes, it creates just the string \x{A3f4}

    only two characters need to be escaped inside single-quotes ... the single-quote and consequently the escape-character, which is the backslash.

    "\x{HEX}" is a notation for hex-character only inside double-quotes, so it seems someone is trying to parse code similar to Perl.

    I ran the debugger with 5.32 without warnings pre-activated, I'm surprised about your errors. Looks like you ran something like perl -dwe0

    And you are seeing the internal stack-trace of the debugger, I don't wanna comment on the perldebguts

    The debugger is mostly written in Perl, but using hooks written in C.

    I.O.W. perl5db.pl is written in Perl, but only possible because of the debugging API written in C, expecting to find functions like &DB::DB

    And there are some old debugging flags available written in C, things I've never used.

    Greetings from Krautland ;-)

  • Rolf

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11144485]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-03-28 10:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found