Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Greets,

In the Perl core, there's a function called Perl_sv_utf8_upgrade_flags_grow. Many paths ultimately lead to this function; I got there by way of the XS macro SvPVutf8, but the easiest way invoke it from Perl-space is via utf8::upgrade.

It turns out that this function doesn't play nice with capture variables like $1. After it is invoked on them, they no longer capture properly:

marvin@smokie:~/perltest $ bleadperl dollar_one_utf8_upgrade.pl Without utf8::upgrade... a b c With utf8::upgrade... a a a Problem persists... a a a marvin@smokie:~/perltest $

Here's the test script:

use strict; use warnings; my $text = "a b c"; print "Without utf8::upgrade...\n"; while ( $text =~ /(\S)/g ) { print "$1\n"; } print "With utf8::upgrade...\n"; while ( $text =~ /(\S)/g ) { print "$1\n"; utf8::upgrade($1); } print "Problem persists...\n"; my $more_text = "d e f"; while ( $more_text =~ /(\S)/g ) { print "$1\n"; }

The problem appears to be that Perl_sv_utf8_upgrade_flags_grow turns on the SVf_POK flag by way of SvPV_force. It doesn't seem to have anything to do with whether the SVf_UTF8 flag is on in either the string being regexed or the capture variable itself.

Applying the following patch to sv.c in blead appears to kill the bug at the source. All of Perl's test cases still pass after it is applied.

marvin@smokie:~/projects/perl-git $ git diff diff --git a/sv.c b/sv.c index a53669a..280f064 100644 --- a/sv.c +++ b/sv.c @@ -3231,6 +3231,8 @@ Perl_sv_utf8_upgrade_flags_grow(pTHX_ register S +V *const sv, const I32 flags, if (extra) SvGROW(sv, SvCUR(sv) + extra); return len; } + } else if (SvGMAGICAL(sv) && mg_find(sv, PERL_MAGIC_sv)) { + ; } else { (void) SvPV_force(sv,len); }

I'd like to solve this issue and supply a working patch via perlbug, so I can say that I solved a UTF-8 bug in the Perl core. :) However, I'm not sure that this patch is legit, because I don't understand exactly what PERL_MAGIC_sv is all about.

I think what's going on is that $1 and friends are magical variables that should never have the SVf_POK flag on, since that indicates that they contain real strings. The regex engine probably doesn't use the standard string assignment interface and goes through the magic interface instead, hence its work is no longer visible once the standard channel is open. But is it safe to have the SVf_POK flag off for the remainder of Perl_sv_utf8_upgrade_flags_grow? SvPV_force was probably called for a reason, after all.


In reply to utf8::upgrade and $1 by creamygoodness

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others lurking in the Monastery: (10)
    As of 2014-12-22 18:57 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Is guessing a good strategy for surviving in the IT business?





      Results (126 votes), past polls