Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Non asci character and system call

by Anonymous Monk
on Nov 08, 2017 at 17:59 UTC ( #1202968=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks

What it is wrong with this code? My file is: $DocumentPath="сервис.pdf" (Windows 10, ActiveState 5.16)

#!/usr/local/bin/perl -w use strict; use utf8;#no change if I use this or not my $commandline = qq{start "$DocumentPath"}; system($commandline) == 0 or die qq{Couldn't launch '$commandline': $!/$?};

If instead of Russian I have latin letters, I do not experience any problems

Replies are listed 'Best First'.
Re: Non asci character and system call
by kcott (Chancellor) on Nov 08, 2017 at 19:27 UTC
    use utf8;#no change if I use this or not

    From the documentation for the utf8 pragma (original emphasis retained):

    Do not use this pragma for anything else than telling Perl that your script is written in UTF-8.

    So, if you script includes the UTF-8 string "сервис.pdf", you should use it; if it contains "xyz.pdf" (or whatever), you don't need it.

    The thing to check here would be whether you've actually included "сервис.pdf" as a UTF-8 string. If you've used some other encoding, that may be the cause of whatever issues you're encountering.

    You haven't shown the assignment to $DocumentPath in your code; nor have you actually said what problems you're experiencing: "How do I post a question effectively?" provides guidelines on what information to include with your question such that we can better help you (as an example, whatever value $! holds may be useful).

    I'm wondering if whatever encoding your MSWin setup uses is behind the problem: you're passing cyrillic from Perl; MSWin sees that as something completely different. I don't have that available to test — I have:

    $ perl -v | head -2 | tail -1 This is perl 5, version 26, subversion 0 (v5.26.0) built for darwin-th +read-multi-2level

    However, I can confirm that what you're attempting, has no basic flaws with respect to cyrillic. Here's a little test one-liner (which I've split over several lines for ease of viewing):

    $ perl -E '
        use utf8;
        my $f = "сервис.pdf";
        my @c = (ls => $f);
        if (-e $f) {
            system @c
        }
        else {
            say 0
        }
    '
    

    Here's some test runs, with that one-liner reduced to just "perl -E '...'".

    $ perl -E '...'
    0
    $ > сервис.pdf.
    $ perl -E '...'
    сервис.pdf
    $ rm сервис.pdf
    $ perl -E '...'
    0
    

    [In case you're unfamiliar with those commands, ">" is just creating an empty file with the name given and "rm" removes (deletes) it.]

    Posting non-ASCII code and data can be an issue. I try to keep it to a minimum. When it is needed, I use <pre>...</pre> instead of <code>...</code> for blocks; and <tt>...</tt> instead of <c>...</c> within paragraphs. You also need to replace special characters with entities (e.g. s/&/&amp;/, s/>/&gt;/, etc.), which is another reason to keep it minimal: there's a list of those entities right after the textarea where you compose your node.

    One final point, system takes a list of arguments. Of course, that list could consist of a single string; however, there's no need to programmatically concatenate the arguments into a single string. See @c in my code above.

    Update: Oops! Changed "replace entities with special characters" to "replace special characters with entities".

    — Ken

Re: Non asci character and system call
by thanos1983 (Priest) on Nov 08, 2017 at 18:48 UTC

    Hello Anonymous Monk,

    I do not have a WindowsOS to test your code, but I tried it on a LinuxOS that I am having and seems to work with my pdf viewer (okular). Try something like that:

    #!/usr/bin/perl
    use utf8;
    use strict;
    use warnings;
    use open ':std', ':encoding(UTF-8)';
    
    my $commandline = qq{start "" /max "c:\сервис.pdf"}; # full screen
    system($commandline) == 0
        or die qq{Couldn't launch '$commandline': $!/$?};
    

    I think you have not defined correctly the path. Let us know if it worked.

    BR / Thanos

    Seeking for Perl wisdom...on the process of learning...not there...yet!

      Hi Thanos, nope, your code always fires "Couldn't launch". PS: Windows opens an error message saying that it was not possibile to find the file. Interesting enough the name of the file is a word salad. There must have to do with the encoding, but I have no clue.

        Hello again Anonymous Monk,

        Try to run the command manually from the command line:

        start "" /max "c:\сервис.pdf"
        

        Experiment until the command it self works. After that you should be able to add it and work out of the box.

        Let us know if you manage to make it work, so other can benefit in similar cases.

        BR / Thanos

        Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: Non asci character and system call
by ikegami (Pope) on Nov 08, 2017 at 20:54 UTC

    The command surely needs to be encoded using your machine's ANSI encoding (obtainable from "cp".Win32::GetACP()).

Re: Non asci character and system call
by vr (Friar) on Nov 08, 2017 at 20:07 UTC
    use strict; use utf8; use Encode 'encode'; my $commandline = qq{start "" "$DocumentPath.pdf"}; system( encode 'CP1251', $commandline ) == 0 or die encode 'CP866', qq{Couldn't launch '$commandline': $!/$?};

      Thank you. My machine uses CP1251 (just tested), but telling the system to encode the string as you suggested turns out to fire "Couldn't launch '$commandline': $!/$?" the same.

        Works for me. Is your $commandline as simple as shown? It may have characters outside of ANSI CP, or characters that need escaping (quoting). Try pasting its content directly to command prompt and see if it's working. Or maybe file is not in current directory.

        use strict;
        use warnings;
        use feature 'say';
        use utf8;
        use Encode 'encode';
        use Win32;
        
        say Win32::GetACP;      # 'ANSI Code Page'
        say Win32::GetOEMCP;    # 'OEM Code Page'
        
        my $commandline = qq{start "" "сервис.pdf"};
        system( encode 'CP'. Win32::GetACP, $commandline ) == 0
        or die encode 'CP'. Win32::GetOEMCP, qq{Couldn't launch '$commandline': $!/$?};
        
Re: Non asci character and system call
by Anonymous Monk on Nov 09, 2017 at 07:39 UTC

    After testing all proposed approaches listed here (thanks), the only one that seems to work is the following

    use strict; use utf8; use Win32::Unicode::Process; my $DocumentPath="&#1074;&#1085;&#1077;&#1089;.pdf"; my $commandline = qq{start "$DocumentPath" "$DocumentPath"}; systemW($commandline ) == 0 or die qq{Couldn't launch '$commandline': $!/$?};

    With "cp".Win32::GetACP(); I could detect that my machine (Windows 10) is using CP1252; cmd is able to handle cyrillic (I hust can past in it any Russian word I want); systemW pass the filename to cmd and it is visualized correctly; all others produced a word salad.

    What I DON'T like is the necessity to rely on an external module for such a (for sure complex) but also basic operation, at least in my eyes. The fact that the module's author states "THIS MODULE IS ALPHA LEVEL AND MANY BUGS." doesn't make me feel much better. Furhermore, the fact that I want my script to be "machine indipendent", i.e. be able to run on any Windows with different locale/encodings without having to manually adapt it, give me the feeling "just let's hope that it will run". Not good feeling.

      my $DocumentPath="внес.pdf"; is what I was testing. Perlmonks converted it while publishing my comment.

Re: Non asci character and system call
by Anonymous Monk on Nov 08, 2017 at 21:37 UTC
      I was going to say

      Combine with Win32::ShellQuote

      since to use unicode Win32::Unicode systemW needs to use the shell, so, but that just reveals you can't mix the two willy nilly, probably better to just use system_detached

      *groan*

      Thank you. Using this module was the only solutions that worked. However, the module's author stating "THIS MODULE IS ALPHA LEVEL AND MANY BUGS." makes me feel not very good.

        However, the module's author stating "THIS MODULE IS ALPHA LEVEL AND MANY BUGS." makes me feel not very good.

        Well, feel better, no need to worry :)

        Here is an level upgrade for rt://Win32-Unicode, that makes it use the shell the official safe way with Win32::ShellQuote

        You use mySystemW and myExecWjust like you would systemW and execW

        sub myExecW { use Win32::Unicode::Process qw//; my $pi = My_create_process( @_ ) or return 1; Win32::Unicode::Process::close_handle($pi->{thread_handle}); Win32::Unicode::Process::close_handle($pi->{process_handle}); return 0; } ## end sub myExecW sub mySystemW { use Win32::Unicode::Process qw//; my $pi = My_create_process( @_ ) or return 1; Win32::Unicode::Process::close_handle( $pi->{thread_handle} ); Win32::Unicode::Process::wait_for_input_idle( $pi->{process_handle +} ); Win32::Unicode::Process::wait_for_single_object( $pi->{process_han +dle} ); my $exit_code = Win32::Unicode::Process::get_exit_code( $pi->{process_handle} ); Win32::Unicode::Process::close_handle( $pi->{process_handle} ); return defined $exit_code ? $exit_code : 1; } ## end sub mySystemW sub My_create_process { use Win32::Unicode::Process qw//; use Win32::ShellQuote qw//; @_ or return; my $program = ""; my $cmdline = ""; if( @_ > 1 ) { ( $program ) = @_; $cmdline = Win32::ShellQuote::quote_system_string( @_ ); } else { ( $cmdline ) = @_; } die "Bad program name ( $program )" if $program =~ m/[<>"?*|]/; $program = Win32::Unicode::Process::utf8_to_utf16( $program ) . Win32::Unicode::Process::NULL(); $cmdline = Win32::Unicode::Process::utf8_to_utf16( $cmdline ) . Win32::Unicode::Process::NULL(); return Win32::Unicode::Process::create_process( $program, $cmdline + ); } ## end sub My_create_process
Re: Non asci character and system call
by Anonymous Monk on Nov 08, 2017 at 22:29 UTC
    It can be helpful to examine the source-code using a tool such as hexdump to actually look at the byte-sequences that make up the encoding. I seem to remember also that Windows uses UTF-16.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1202968]
Approved by Corion
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (10)
As of 2017-12-13 12:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (361 votes). Check out past polls.

    Notices?