Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Parsing Problems

by haukex (Bishop)
on May 13, 2019 at 16:08 UTC ( #1233709=note: print w/replies, xml ) Need Help??


in reply to Parsing Problems

I'd suggest creating a Short, Self-Contained, Correct Example - i.e. reducing both the input file and the code down to the bare minimum needed to reproduce the problem. That should help you narrow down the problem, and it also gives you something to post here. To remove all ambiguity, also use hexdump or od to show the input files, e.g. hexdump -C input.txt or od -tx1c input.txt, and use Devel::Peek to show the data once it is read into Perl. For example:

$ hexdump -C test.txt 00000000 48 e2 82 ac 6c 6c 6f 2c 20 57 c3 b6 72 6c 64 21 |H...llo, +W..rld!| 00000010 0a |.| 00000011 $ cat test.pl use warnings; use strict; use open qw/:std :encoding(UTF-8)/; use Devel::Peek; while (<>) { chomp; Dump($_); } $ perl test.pl test.txt SV = PV(0x55b66e50c080) at 0x55b66e547398 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x55b662e533d0 "H\342\202\254llo, W\303\266rld!"\0 [UTF8 "H\x{2 +0ac}llo, W\x{f6}rld!"] CUR = 16 LEN = 81

Update: To look at the input file, you might also be interested in my script enctool.

Replies are listed 'Best First'.
Re^2: Parsing Problems
by ftherese (Novice) on May 13, 2019 at 17:46 UTC

    Thank you for the suggestion. I have now ruled out the thought that this is a parsing error. I now believe it has something to do with how a function is being called:

    $temp = eval qq~modes::~.$ARGV[2].qq~::flex(/$line);~; It's supposed to run a subroutine from a Perl module file in a directory (lib, opendir, readdir). Has something changed with package, lib, eval, or qq~? I'm trying to figure out how distill my example.
      Has something changed with package, lib, eval, or qq~?

      I'm not certain, but there may have been small changes to how eval handles Unicode, but evalbytes was introduced in 5.16, so before the 5.18 that you're migrating away from. Other than that, based on this little piece of code, I don't see anything that might be different between the two versions. And whether something has changed with Text::Unaccent::PurePerl, you'd have to check what version you have installed in both environments (perl -MText::Unaccent::PurePerl -le 'print $Text::Unaccent::PurePerl::VERSION').

      If you think you're having trouble with eval, then it's best if you built its argument first, stored it in a variable, used Data::Dumper or Data::Dump to show it, and also check eval for errors using the pattern eval "...; 1" or die "eval failed: $@" (see Bug in eval in pre-5.14). (Update before posting: I see haj made a similar point.)

      However, I would strongly recommend against using eval in the first place, building Perl code from strings and trying to run it can be quite brittle, and in many cases even a major security risk. If you were to show more context (SSCCE), we could most likely suggest an alternative without eval (Update: Yep!).

      Two things stand out here:

      • $ARGV[2] might contain un-decoded UTF-8 characters in "modern" terminals. File systems with UTF-8 characters behave differently, depending on the platform (Windows/Unix).
      • (/$line) looks weird. That needs really special values for $line to produce valid Perl!

      An idea is to print the string you're evaling, and of course checking whether eval produced an error in $@:

      my $code = qq~modes::~ . $ARGV[2] . qq~::flex(/$line);~; warn $code; eval $code; warn "Eval failed: '$@'" if $@;
        Okay! Now we're getting somewhere. In the old environment, no error is reported. In the new environment: Undefined subroutine &modes::three::a called at (eval 32) line 1. ...

        Also, I mistyped because I'm on my phone it is \$line.

      I would really, really suggest replacing the eval() with something like

      my $name_space = "modes::$ARGV[2]"; my $code = $name_space->can( 'flex' ) or die "$name_space does not implement flex()"; $temp = $code->( $line );

      This is on the hypothesis that in fat-fingering in your example you reversed the slope of a back slash.

      When obscure code fails it is really hard to debug. I infer from your eval() example that what you are trying to do is this:

      1. Run a script whose second argument is the name of a processor for your input.
      2. Process each line of the file using a subroutine named flex() in a Perl module named "modes::$ARGV[2]"
      3. This module has already been loaded.

      If these are correct, the above code should implement it in an easier-to-read-and-debug manner. The three statements compute the name of the module, get the address of the subroutine in that module (failing if it can not be found), and call the subroutine, passing it the line from the file, and storing the result in $temp.

      This is off-topic for your question, but all your Perl should have use strict; and use warnings; near the top of the file. If your script does not do this, all sorts of bugs could lurk. Of course, if you just slam these into a legacy script it may find all sorts of things -- more than you are able to fix in one sitting. But it's worth a try -- one of the warnings/errors may tell you what your problem is.

        Thank you for your reply. This is helpful as I've traced the error to how Perl is handling the modules, @INC, and referencing subroutines.

        What my code is supposed to do is apply different sets of transformations to a text file based on options specified from the command-line:

         modes.pl [rawtext] [style] [mode](one, two, etc.) [1st variation](a, a_prime, b, etc.) [2nd variation]

        "style" is one of several directories containing nine .pm files. Each .pm file has at least the subroutines "first" and "flex." The other subroutines are variations.

        Here is the code that I'm using to load the modules:

        package modes; my $lP; my $modCount = 0; BEGIN{ $lP = "$ARGV[1]"; } use lib $lP; if(opendir(LIB, $lP)){ foreach my $l (readdir(LIB)){ unless ($l !~ /^(.*)\.pm$/){ eval qq~require ~ . $ARGV[1] . qq~::~ . $1 . qq~;~; print $@; } } }

        The error that code produces (given: ./modes.pl "filename" english three a b) in Perl 5.26.1 (error does not occur in 5.18.2) is the following:

        Can't locate english/eight.pm in @INC (you may need to install the eng +lish::eight module) (@INC contains: english /home/******/perl5/lib/pe +rl5/5.26.1/x86_64-linux-gnu-thread-multi /home/*********/perl5/lib/pe +rl5/5.26.1 /home/***/perl5/lib/perl5/x86_64-linux-gnu-thread-multi /h +ome/******/perl5/lib/perl5 /etc/perl /usr/local/lib/x86_64-linux-gnu/ +perl/5.26.1 /usr/local/share/perl/5.26.1 /usr/lib/x86_64-linux-gnu/pe +rl5/5.26 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.26 /usr/sh +are/perl/5.26 /home/****/perl5/lib/perl5/5.26.0 /home/*****/perl5/lib +/perl5/5.26.0/x86_64-linux-gnu-thread-multi /usr/local/lib/site_perl +/usr/lib/x86_64-linux-gnu/perl-base) at (eval 5) line 1.

        Setting $lP to the fully qualified PWD (or simply "." since the "english" directory is located here) fixes that error, but the subroutines are still undefined:

        Undefined subroutine &modes::three::flex

        I'm not sure why this is. I will change my code based on your recommendations, but I don't understand it well enough myself to do so yet. I borrowed the module loading code from another programmer years ago simply because it did what I needed - I never fully understood it.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1233709]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2020-10-22 19:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (229 votes). Check out past polls.

    Notices?