Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Problem with regex is a bug? or my regex

by hanspr (Sexton)
on Nov 20, 2021 at 02:48 UTC ( #11138966=perlquestion: print w/replies, xml ) Need Help??

hanspr has asked for the wisdom of the Perl Monks concerning the following question:

Hi , dear monks

I'm using expect to detect login sequences, and I'm facing this un-explicable behaviour of my regex

This is the regex
(([#%:>~\$\] ])(?!\2)){3,4}|([\w\-\.]*)\$ *$|(\w[@\/]\w|sftp).*?[#%>~\ +$\]]|^[#%\$>\:]~] *$

And the text it fails against is shown below.

This regex runs correctly in Perl 5.30

And it fails in Perl 5.34

It works as expected in testing websites as : https://regex101.com/

I expected to match this sequence --> : ~#

But instead is matching --> \s\s#

as seen in the "Match string message below"

Match string: ` #'


I would like to confirm with your wisdom before opening bug request.
###################################################################### +######## # This system is a restricted access system. + # # If collected security information reveals possible criminal activity + that # # exceeds privileges, evidence of such activity may be provided to the + rele- # # vant authorities for further action. By continuing past this point, +you # # expressly consent to this security monitoring. + # ###################################################################### +######## hostname: ~# spawn id(5): Does `################################################### +###########################\r\n# This system is a restricted access s +ystem. #\r\n# If collected security i +nformation reveals possible criminal activity that #\r\n# exceeds pr +ivileges, evidence of such activity may be provided to the rele- #\r\ +n# vant authorities for further action. By continuing past this point +, you #\r\n# expressly consent to this security monitoring. + #\r\n########################################## +####################################\r\n\r\nhostname: ~# ' match: pattern #2: -eof `'? No. pattern #3: -re `\\[__PAC__SUDO__PROMPT__\\]'? No. pattern #4: -re `^.+ontinue connecting \\(([^/]+)\\/([^/]+)(?:[^)]+) +?\\)\\?\\s*$'? No. pattern #5: -re `.*(any key to continue|tecla para continuar).*'? No +. pattern #6: -re `.*ffending .*key in (.+?)\\:(\\d+).*'? No. pattern #7: -re `([lL]ogin|[uU]suario|([uU]ser-?)*[nN]ame.*|[uU]ser) +\\s*:\\s*$'? No. pattern #8: -re `([pP]ass|[pP]ass[wW]or[dt](\\s+for\\s+|\\w+@[\\w\\- +\\.]+)*|[cC]ontrase.a|Enter passphrase for key \'.+\')\\s*:\\s*$'? No +. pattern #9: -re `(([#%:>~\\$\\] ])(?!\\2)){3,4}|([\\w\\-\\.]*)\\$ *$ +|\\w[@\\/]\\w.*?[#%>~\\$\\]]|^[#%\\$>\\:]~] *$'? YES!! Before match string: `############################################ +##################################\r\n# This system is a restricted a +ccess system. ' Match string: ` #' After match string: `\r\n# If collected security information revea +ls possible criminal activity that #\r\n# exceeds privileges, eviden +ce of such activity may be provided to the rele- #\r\n# vant authorit +ies for further action. By continuing past this point, you #\r\n# +expressly consent to this security monitoring. + #\r\n########################################################## +####################\r\n\r\n\r\nhostname: ~# ' Matchlist: (`#', `#', `') Calling hook CODE(0x55f2fb33fe20)...

Replies are listed 'Best First'.
Re: Problem with regex is a bug? or my regex (updated)
by haukex (Bishop) on Nov 20, 2021 at 08:59 UTC

    <update2> Just posting this here for visibility: The solution is further down in the thread. </update2>

    Both when filing a bug report and when asking a question here, you'll need to provide a Short, Self-Contained, Correct Example that reproduces the issue. That is, runnable code that includes sample input and expected output. Note that here, the regex you showed and the output you provided do not match (and it doesn't appear you're using a common module such as Data::Dumper to output your strings?). Your regexes would also benefit greatly from the /x modifier. The following code runs fine on Perl 5.18 through 5.34 on my system.

    use warnings; use strict; use Test::More tests=>4; my $str = "########################################################### +###################\r\n# This system is a restricted access system. + #\r\n# If collected security informati +on reveals possible criminal activity that #\r\n# exceeds privileges +, evidence of such activity may be provided to the rele- #\r\n# vant +authorities for further action. By continuing past this point, you + #\r\n# expressly consent to this security monitoring. + #\r\n################################################## +############################\r\n\r\nhostname: ~# "; my $re1 = qr{(([#%:>~\$\] ])(?!\2)){3,4}|([\w\-\.]*)\$ *$|\w[@\/]\w.*? +[#%>~\$\]]|^[#%\$>\:]~] *$}; my $re2 = qr{(([#%:>~\$\] ])(?!\2)){3,4}|([\w\-\.]*)\$ *$|(\w[@\/]\w|s +ftp).*?[#%>~\$\]]|^[#%\$>\:]~] *$}; ok $str =~ $re1; is $&, ": ~#"; ok $str =~ $re2; is $&, ": ~#";

    Update: After looking at those regexes a little closer, I fail to see how either of them could match "  #" at all: in the first branch, every time a space matches it has to be followed by something that isn't a space or #, and in the second through fourth branches, each potential match of spaces has to be preceded by something that isn't a space, and in the second and fourth branches, the spaces need to be at the end of the line. Perhaps you made a mistake when editing \w[@\/]\w to (\w[@\/]\w|sftp), which could maybe explain the match you observed. Again, please use Data::Dumper with $Data::Dumper::Useqq=1; or Data::Dump to output strings and regexen in a representative manner. Also tweaked test code a tiny bit.

    (Update 2: Why is whitespace being compacted in these <code> tags?? "    #" - hm, probably a stylesheet issue) Also clarified wording in the above update.

      Thank you for your guide,

      I managed to create a self contained example that reproduces the problem.

      Now, I ran your test and it works on both perl versions, as you said.

      But running the regex inside Expect.pm fails.

      So the bug is in the expect package?

      chain.txt

      ###################################################################### +########\r\n# This system is a restricted access system. + #\r\n# If collected security information reveals po +ssible criminal activity that #\r\n# exceeds privileges, evidence of + such activity may be provided to the rele- #\r\n# vantauthorities fo +r further action. By continuing past this point, you #\r\n# expressly + consent to this security monitoring. #\r\n############# +#################################################################\r\n +\r\nhostname: ~#
      test.pl
      use strict; use Expect; my $re1 = '(([#%:>~\$\] ])(?!\2)){3,4}|([\w\-\.]*)\$ *$|(\w[@\/]\w|sft +p).*?[#%>~\$\]]|^[#%\$>\:]~] *$'; my $test; open($test,"<","chain.txt"); my $exp = Expect->exp_init($test); $exp->expect(1, [ $re1 => sub { my $exp = shift; print "Match before : ",$exp->before(),"\n"; print "Match : ",$exp->match(),"\n"; print "Match after : ",$exp->after(),"\n"; }] ); close $test;




      hans@hans-desktop ~ perl -v This is perl 5, version 30, subversion 0 (v5.30.0) built for x86_64-li +nux-gnu-thread-multi hans@hans-desktop ~ perl -MExpect -e 'print $Expect::VERSION ."\n";' 1.21 perl test.pl Match before : ####################################################### +#######################\r\n# This system is a restricted access syste +m. #\r\n# If collected security informa +tion reveals possible criminal activity that #\r\n# exceeds privileg +es, evidence of such activity may be provided to the rele- #\r\n# van +tauthorities for further action. By continuing past this point, you # +\r\n# expressly consent to this security monitoring. #\r +\n################################################################### +###########\r\n\r\nhostname Match : : ~# Match after :



      [hans@fedora ~]$ perl -v This is perl 5, version 34, subversion 0 (v5.34.0) built for x86_64-li +nux-thread-multi [hans@fedora ~]$ perl -MExpect -e 'print $Expect::VERSION ."\n";' 1.35 [hans@fedora ~]$ perl test.pl Match before : ####################################################### +#######################\r\n# This system is a restricted access syste +m. Match : # Match after : \r\n# If collected security information reveals possibl +e criminal activity that #\r\n# exceeds privileges, evidence of such + activity may be provided to the rele- #\r\n# vantauthorities for fur +ther action. By continuing past this point, you #\r\n# expressly cons +ent to this security monitoring. #\r\n################## +############################################################\r\n\r\nh +ostname: ~#

        Thanks for posting the details. Expect 1.21 is about ten years older than 1.35 (2007 vs 2017). Since I'm unable to reproduce your issue with Expect 1.35 on both versions of Perl, I am guessing that the issue lies with one of the bugs that was fixed in Expect over those 10 years. I'd say your best course of action is to upgrade the module.

        Update: Sorry, I see now that you're getting your expected behavior on the older version of the module instead of the newer version. The Changelog does mention "Eliminate $` and $' from the code. part of (RT #61395) This fix might break some existing code n some extreme cases when the regex being matched has a lookbehind or a lookahead at the edges." which could potentially be a hint, but finding out if this actually is the issue will take a bit more digging.

        For reference:

        $ perl -v This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-li +nux-gnu-thread-multi $ perl -MExpect -e 'print $Expect::VERSION ."\n";' 1.21 $ perl 11138979.pl Match before : ####################################################### +#######################\r\n# This system is a restricted access syste +m. #\r\n# If collected security informa +tion reveals possible criminal activity that #\r\n# exceeds privileg +es, evidence of such activity may be provided to the rele- #\r\n# van +tauthorities for further action. By continuing past this point, you # +\r\n# expressly consent to this security monitoring. #\r +\n################################################################### +###########\r\n\r\nhostname Match : : ~# Match after :

        Good Day,
            Dean

Re: Problem with regex is a bug? or my regex
by dave_the_m (Monsignor) on Nov 20, 2021 at 08:21 UTC
    Its not clear from your post what exact string is being matched against which works on 5.30 and fails on 5.34.

    Dave.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11138966]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (3)
As of 2021-12-04 16:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (30 votes). Check out past polls.

    Notices?