http://www.perlmonks.org?node_id=1225256


in reply to using online translation engines with perl

you can have exotic utf8 values without use utf8;, but not variable names.
Note that the "exotic" (non-ASCII) values would be slightly different in those cases. If you don't use utf8;, you get a string scalar consisting of exactly the bytes that happened to be in the file you saved. (They can mean some text in UTF-8, or KOI8-R, or Shift JIS or nothing at all.) If you do use utf8;, Perl automatically decodes the string you'd typed from UTF-8 into Unicode characters. You can notice the difference if you use Dumper or length on the strings:
# Dumper prints character codes for wide characters
$ perl -MData::Dumper -Mutf8 -E'print Dumper "привет"'
$VAR1 = "\x{43f}\x{440}\x{438}\x{432}\x{435}\x{442}";
# Dumper prints bytes, and I get exactly what I'd typed
$ perl -MData::Dumper -E'print Dumper "привет"'
$VAR1 = 'привет';
# byte-string consists of 12 bytes of UTF-8
$ perl -MData::Dumper -E'say length "привет"'
12
# character-string consists of 6 wide characters
$ perl -MData::Dumper -Mutf8 -E'say length "привет"'
6
(Using <pre> because PM engine encodes all non-ASCII characters into HTML entities and <code> doesn't let them be interpreted.)
Every subsequent time, it fails, and $! is stone silent.
Documentation examples show that you are supposed to look for the error in $sftp->error, not $!.
The functionality I would like to add are conditions such that if a translate call hangs, I can get to the next one, so what I'm fishing for is code that would go the next in the for loop if it lasts for, say, a minute.
For a really primitive timeout implementation, see alarm. If you need more fine-grained control, threads with Thread::Queue might be a good solution.

Replies are listed 'Best First'.
Re^2: using online translation engines with perl
by Aldebaran (Curate) on Nov 07, 2018 at 00:56 UTC

    Thanks for elaborating on utf8 and suggesting on a way forward with my questions. As is frequently the case, the monastery has another thread going to discuss the same issue: Can we write multi-threaded scripts ?. I looked at the reference that localshop posted and found code that uses perl's native language to thread. I would prefer not to have to use a module just for this little task. I must say, however that I fall short on my first attempt in some ways. Output then source:

    $ ./2.fork.pl 
    this is task 1 with pid 0
    Waiting for child processes..
    this is task 2 with pid 0
    this is task 3 with pid 0
    44
    bing's translation is 
    unctuous hypocrisy flowing from the tube
    
    елейным лицемерие течет из трубки
    
     English -> &#1056;&#1091;&#1089;&#1089;&#1082;&#1080;&#1081; 
    Child with PID=23085 finished..
    43
    42
    41
    slept 5
    40
    39
    38
    ERROR Oops! Something went wrong and I can't translate it for you :(
    yandex's translation is 
    Child with PID=23086 finished..
    37
    36
    slept 10
    35
    google's translation is 
    unctuous hypocrisy flowing from the tube
    
    неприступное лицемерие, вытекающее из трубки
    (nepristupnoye litsemeriye, vytekayushcheye iz trubki)
    
    Translations of unctuous hypocrisy flowing from the tube
     English -> &#1056;&#1091;&#1089;&#1089;&#1082;&#1080;&#1081; 
    
    unctuous hypocrisy flowing from the tube
        неприступное лицемерие, вытекающее из трубки, елейное лицемерие течет из трубки
    Child with PID=23087 finished..
    Done.
    $ cat 2.fork.pl 
    #!/usr/bin/perl -w
    use 5.011;
    
    my $string = "unctuous hypocrisy flowing from the tube";
    
    my $pid1 = fork();
    if ( $pid1 == 0 ) {    # Task 1
        say "this is task 1 with pid $pid1";
        my $command = "trans :ru -e bing \"$string\" >1.bing.txt"; 
        system("$command");
        say "bing's translation is ";
        system("cat 1.bing.txt");
        exit 0;
    }
    
    my $pid2 = fork();
    if ( $pid2 == 0 ) {    # Task 2
        say "this is task 2 with pid $pid2";
        sleep 5;
        say "slept 5";
        system("trans :ru -e yandex \"$string\" >1.yandex.txt");
        say "yandex's translation is ";
        system("cat 1.yandex.txt");
        exit 0;
    }
    
    my $pid3 = fork();
    if ( $pid3 == 0 ) {    # Task 3
        say "this is task 3 with pid $pid3";
        sleep 10;
        say "slept 10";
        system("trans :ru -e google \"$string\" >1.google.txt");
        say "google's translation is ";
        system("cat 1.google.txt");
        exit 0;
    }
    
    say "Waiting for child processes..";
    my $counter = 45;
    local $SIG{ALRM} = sub {
        say --$counter;
        alarm 1;
    };
    alarm 1;
    
    while ((my $pid = wait) != -1) {
        say "Child with PID=$pid finished..";
    }
    
    alarm 0;
    say "Done.";
    __END__
    

    Even with all the say statements dropped in, I find execution hard to follow. I'm baffled that pid's are zero within a block, but not so when they finish. What alarm truly does here is unclear. Finally, there is no code to kill pid's when the timer reaches zero.

    I'm also fishing for code that would do this within the following loop:

    print "Get other translations(y/n)?: "; my $prompt = <STDIN>; chomp $prompt; if ( $prompt eq ( "y" | "Y" ) ) { my @translators = qw /yandex bing/; for my $remote (@translators) { my $trans_munge = path( $vars{translations}, "$remote." . $munge + ); ## use trans shell say "getting translation from $remote"; system("trans :$lang -e $remote file://$in_path >$trans_munge"); }

    Again, thanks for the very helpful comments.

      I'm baffled that pid's are zero within a block, but not so when they finish.

      fork causes the execution of a process to split into two different processes, sharing the initial memory layout but not the following changes to the variables. To help the program discern which of the two it has become, the newly produced ("child") process receives 0 as the return value of fork() and the old one ("parent") gets returned the PID of the new process. This is why you had to put an exit 0; at the end of the if ($pid == 0) block: otherwise both the parent and the child would continue executing the same parts of the program, causing a lot of confusion.

      wait returns the real (former) PIDs of the freshly terminated child processes. If the child wanted for some reason to know its PID, it could have used the $$ variable.

      This example might help:

      if (fork() == 0) { system("sh", "-c", "sleep 3 # getting yandex translation"); exit(0); } if (fork() == 0) { system("sh", "-c", "sleep 2; # getting google translation"); exit(0); } system("pstree -Apal $$"); # pstree gets the PID of the parent process + as its argument while((my $pid = wait()) != -1) { say "$pid terminated" } __END__ perl,29173 test.pl |-perl,29174 test.pl | `-sh,29177 -c sleep 3 # getting yandex translation | `-sleep,29179 3 |-perl,29175 test.pl | `-sh,29178 -c sleep 2; # getting google translation `-pstree,29176 -Apal 29173 29175 terminated 29174 terminated
      pstree command draws all child processes of the main Perl process (including itself) with their command line arguments. The Perl processes having the same command line arguments but different PIDs are the copies created by fork().
      What alarm truly does here is unclear. Finally, there is no code to kill pid's when the timer reaches zero.
      alarm arranges for a SIGALRM signal to be delivered to the calling process in the number of seconds specified as its argument. It behaves as if something else kills the process which had called alarm and forgot to disarm it. The good part is that one can use $SIG{ALRM} to trap the signal and do something meaningful instead of dying, but in your case the work is already being done in a child process, and we can let it be killed:
      my $start = time; if (fork() == 0) { # arm the alarm clock alarm(10); # create a child process that sleeps system("sleep 365d"); exit(0); } system("pstree -Apal $$"); while((my $pid = wait()) != -1) { say "$pid terminated" } say time-$start, " seconds elapsed instead of a year" __END__ perl,29620 |-perl,29621 | `-sleep,29623 365d `-pstree,29622 -Apal 29620 29621 terminated 10 seconds elapsed instead of a year