Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Text::ExtractWords exhibits incomprehensible behavior?

by ibm1620 (Hermit)
on Jun 15, 2024 at 15:42 UTC ( [id://11160008]=perlquestion: print w/replies, xml ) Need Help??

ibm1620 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, Monks,

I'm looking for a module to extract words from a stream of text. I installed Text::ExtractWords and tried it:

#!/usr/bin/env perl use v5.38; use Text::ExtractWords qw(words_list); say $^V; my $text = "12/21/84 Bob's 21st b'day was a wine-and-dine."; say $text; my @list; words_list(\@list, $text, {minwordlen => 2, maxwordlen => 26 }); say "Found words: " . join ' ', map {"[$_]"} @list; say $text;
v5.38.2 12/21/84 Bob's 21st b'day was a wine-and-dine. Found words: [12] [21] [84] [bob's] [21st] [b'day] [was] [a] [wine-and +-dine] 122184bob's21stb'daywasawine-and-dine
Note the output value of $text after call. How is it possible for any subroutine to modify a parameter, since I don't pass in a reference?

Moreover, I looked at Text::ExtractWords.pm and it's all basically boilerplate except maybe for one line, which I don't understand:

bootstrap Text::ExtractWords $VERSION;
I feel like I'm missing something obvious here...

Update: I'm not able to step into words_list() using the debugger! It just proceeds to my next source line.

Update2: I even tried making a copy of $text (my $copy = $text) and passing $copy to words_list(), and afterwards *both* $copy and $text were munged. As Danny mentions below, this is XS code.

Replies are listed 'Best First'.
Re: Text::ExtractWords exhibits incomprehensible behavior?
by Danny (Friar) on Jun 15, 2024 at 16:14 UTC
    I think the XS interface to the C function must pass a pointer to $text. In ExtractWords.xs you can see that the second argument to the function ew_words_list is a pointer to a character 'char *'
      Yes, I didn't realize this was XS.
Re: Text::ExtractWords exhibits incomprehensible behavior?
by GrandFather (Saint) on Jun 16, 2024 at 21:19 UTC

    Consider:

    use strict; use warnings; my @powers = map{undef} 0 .. 3; my $x = 2; powers ($x, @powers); print("$x @powers\n"); sub powers { $_[$_] = $_[$_ - 1] * $_[0] for 1 .. @_ - 1; }

    Prints:

    2 4 8 16 32

    Perl effectively passes parameters by reference. Arrays and hashes get flattened so each element becomes a parameter.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
      Or perhaps more clearly:
      my $me = "old me"; s1($me); print "$me\n"; sub s1 { $_[0] = "new me"; }
      But it's customary to make local copies of the parameters like 'my $var = shift' so they don't get changed.
        I had no idea this was possible.

        So, I could create a version of `trim` that was more `chomp`-like?

        sub mytrim { $_[0] =~ s/^\s+|\s+$//g; } my $text = ' this could use a trim. '; mytrim $text;
Re: Text::ExtractWords exhibits incomprehensible behavior?
by Danny (Friar) on Jun 16, 2024 at 14:57 UTC
      ibm1620 wrote: I even tried making a copy of $text (my $copy = $text) and passing $copy to words_list(), and afterwards *both* $copy and $text were munged.

    That I don't understand. Were $text and $copy munged in the same way after the call or different? What does printf("%s %s\n", \$text, \$copy) give before and after the call?

    In general, it seems like a bad practice for an XS function to modify it's arguments unless that is the specific purpose.

      Yes, exact same way, despite being at different addresses:
      #!/usr/bin/env perl use v5.38; use Text::ExtractWords qw(words_list); say $^V; my $text = "12/21/84 Bob's 21st b'day was a wine-and-dine."; my $copy = $text; say "Text: $text"; say "Copy: $copy"; printf("%s %s\n", \$text, \$copy); my @list; words_list(\@list, $copy, {minwordlen => 2, maxwordlen => 26 }); say "Found words: " . join ' ', map {"[$_]"} @list; say "Text: $text"; say "Copy: $copy"; printf("%s %s\n", \$text, \$copy); __END__ v5.40.0 Text: 12/21/84 Bob's 21st b'day was a wine-and-dine. Copy: 12/21/84 Bob's 21st b'day was a wine-and-dine. SCALAR(0x140829a68) SCALAR(0x1408299d8) Found words: [12] [21] [84] [bob's] [21st] [b'day] [was] [a] [wine-and +-dine] Text: 122184bob's21stb'daywasawine-and-dine Copy: 122184bob's21stb'daywasawine-and-dine SCALAR(0x140829a68) SCALAR(0x1408299d8)
        That's the SV addresses that are different. Try Devel::Peek's Dump function to see what the PV is? I suspect this is supposed to be copy-on-write (COW), but that the XS code is just overwriting the (still-shared) data area without doing the correct COW API stuff.
Re: Text::ExtractWords exhibits incomprehensible behavior?
by cavac (Parson) on Jun 19, 2024 at 07:45 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11160008]
Approved by jo37
Front-paged by jo37
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-07-23 18:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.