Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

The Monastery Gates

( #131=superdoc: print w/replies, xml ) Need Help??

Donations gladly accepted

If you're new here please read PerlMonks FAQ
and Create a new user.

poll ideas quest 2020
Starts at: Jan 01, 2020 at 00:00
Ends at: Dec 31, 2020 at 23:59
Current Status: Active
3 replies by pollsters
    First, read How do I create a Poll?. Then suggest your poll here. Complete ideas are more likely to be used.

    Note that links may be used in choices but not in the title.

Using Parallel::ForkManager on multiple files using backtick operators for multiple files being processed simulnaneously
2 direct replies — Read more / Contribute
by symgryph
on Feb 19, 2020 at 13:30
    I have some code that essentailly runs a bash script sequentially, and was trying to multiprocess the managed children of the program. Aka, I use perl to run program x on y # of files. I use it as an orchestrator. When I try to run on multiple files, I get two running processes with the same filename, instead of two processes running on two different filenames. I am not sure on how to make my code multi-process aware, and need some help. Here is my code.
    #!/usr/bin/env perl -w use Parallel::ForkManager; my $filename = 'all.txt'; my $failuresfilename="failed.tsv"; open (my $target, "<", $filename) or die "Cannot open < $filename: $!" +; open (my $failures, ">", $failuresfilename) or die "Cannot open > $fai +luresfilename: $!"; sub readinFile { @lines = <$target>; } sub execute { $multiprocess = Parallel::ForkManager->new(2); TARGETS: foreach $processme (@lines) { $multiprocess->start and next TARGETS; chomp $processme; $command="cfn_nag_scan -o json --input-path $processme > $processm +e_.cfnag.json"; `$command`; $multiprocess->finish; } } sub findFailures { @files=`find ./ -iname "*cfnag*"`; $jqcommand='jq --raw-output \'.[] | select (.file_results.failure_co +unt > 0) |[.filename, .file_results.failure_count] |@tsv\''; foreach (@files) { chomp; s/\/\//\//g; @a=`cat $_ |$jqcommand`; print $failures @a; } } readinFile(); execute(); #findFailures(); close $failures; close $failuresfilename;
    The subroutine in question is 'execute'. Any help would be appreciated. My input is a bunch of filenames that come from the 'find' command (in this case things I want to scan with cfn_nag). The system sub-executes cfn_nag_scan from the filenames array, which in turn system's the cfn_nag which outputs a bunch of 'scan' result files. Perl is more of a dispatcher than a processor of data.
    "Two Wheels good, Four wheels bad."
Optimizing a CHI-based data throttler
3 direct replies — Read more / Contribute
by perlancar
on Feb 19, 2020 at 00:53

    I'm experimenting on using CHI as the backend of a Data::Throttler-like module. The speed is not great: my module is becoming linearly slower as the max_items parameter is increased: it's about 3 times slower than Data::Throttler with max_items=100, and 20 times slower with max_items=1000. Any idea on how to close the gap, or is my endeavor with CHI in this case a lost cause?

    package Data::Throttler_CHI; use strict; use warnings; sub new { my ($package, %args) = @_; bless \%args, $package; } my $counter = 0; sub try_push { my $self = shift; my $now = time(); $counter++; $counter = 0 if $counter == 2e31; # wraparound 32bit int $self->{cache}->set("$now|$counter", 1, $self->{interval}); # Y228 +6! my @keys0 = $self->{cache}->get_keys; my @keys; for my $key (@keys0) { my ($key_time, $key_serial) = split /\|/, $key, 2; if ($key_time >= $now - $self->{interval}) { push @keys, $key; } } # these drivers return expired keys: Memory. so we need to purge t +hese keys my $do_purge = rand() < 0.05; # probabilistic $self->{cache}->purge if $do_purge && @keys < @keys0; return @keys <= $self->{max_items} ? 1:0; } 1;

    More complete code with documentation and tests is on CPAN: Data::Throttler_CHI.

Percentage of MS Windows Perl Users
2 direct replies — Read more / Contribute
by thechartist
on Feb 18, 2020 at 21:24

    I'm interested in improving the reliability of CPAN modules on systems that do not get all that much attention, and MSWin32 seems to be one of them.

    I have been studying the various CPAN testing documents, as well as checking out the CPAN testers log file, that gives the 1000 most recent reports.

    This is off-the-cuff and not rigorous, but a quick check for "Linux" gives me around 600 of test results, "BSD" around 360, and "MSWin32" around 40. This suggests to me (making certain assumptions that the testing population represents the broad user population) -- that only about 4% of Perl users are on Windows.

    Do those estimates seem at all accurate? Thanks for the guidance.

"print" of nonexistent element is actually altering a hash
5 direct replies — Read more / Contribute
by larrymenard
on Feb 17, 2020 at 13:11
    Monks, your responses to others have been very helpful to me for many years. Now however it is time to post my own question.

    I am creating a multi-dimensional hash and then printing a non-existent key in that hash. Curiously (at least to me), that "print" is actually altering the hash, adding an invalid (for lack of a better word) key.

    #!/usr/bin/perl use strict; use Data::Dumper; my %hash; $hash{'key1'}{'key2'} = 'value'; print "\nDump of \%hash (1):\n"; print Dumper \%hash; # This print statement is actually altering the hash print "\n\"$hash{'key0'}{'key1'}{'key2'}\"\n"; print "\nDump of \%hash (2):\n"; print Dumper \%hash;
    The result is:
    Dump of %dtoHash (1): $VAR1 = { 'key1' => { 'key2' => 'value' } }; "" Dump of %dtoHash (2): $VAR1 = { 'key1' => { 'key2' => 'value' }, 'key0' => { 'key1' => {} } };

    The "print" statement is the only thing that can possibly be altering the hash. Indeed, comment it out and the 2nd dump is normal.

    I have reproduced this on multiple versions of perl 5, up to and including 5.26.3 (on CentOS 8).

    Why is the "print" statement altering the hash?

    Any explanation (or even better, advice on how to avoid it) would be much appreciated.

    Thanks in advance.

IO::Socket tutorial
1 direct reply — Read more / Contribute
by BernieC
on Feb 17, 2020 at 10:55
    I'm converting a program that uses sockets like file descriptors {it does a <$ssh> to read a line. UGH} to using sockets properly. And I can't find a clear tutorial. I know the command to send data in my new regime is send($socket, <stuff>, flags). But I've tried to chase down what the flags are. perldoc unhelpfully just says "Takes the same flags as the system call of the same name.". First off, that seems to presume that you have a unix handy to check on the system call. Second, what I found was
    $ man 2 send No manual entry for send in section 2
    so I have no clue what the flags do, but the tutorials seem to all give a value of 0. ??

    But my real interest in the switchover is to get reads to time out. I have a pokey host I connect to; sometimes {annoyingly regularly} the server seems to balk and my program just hangs, dead in the water, on the <$server>. So what I'd like to do is use the socket timeout mechanism to let my program continue. I see that there's a Timeout parameter that unhelpfully just says "Timeout value for various operations". ?? Units?? {I'd have guessed milliseconds, but IO::Socket says it is in seconds, which makes sense} which operations?? what happens when the timeout value is reached??

    Also, my incoming data is line-at-a-time and so the <$ssh> is perfect for what I need. Is there an equivalent way to do that with IO::Socket? I guess I could recv a character at a time until I got a newline.

error in POE::Component::Client::Telnet manpage?
2 direct replies — Read more / Contribute
by Anonymous Monk
on Feb 16, 2020 at 20:01
    In the POE::Component::Client::Telnet manpage it has a "sub result" with a typo in it.
    print STDERR join(' ', @{ $ref->{error} ) . "\n";
    should that be:
    print STDERR join(' ', @{ $ref->{error} } ) . "\n";
Find element in array
8 direct replies — Read more / Contribute
by Sofie
on Feb 16, 2020 at 07:36
    Hi I am very new to perl and struggling with simple things... I am trying to check if an input DNA sequence only contains nucleotides. And if it doesn't I want to print out the position in the sequence where an invalid character was entered. This is as far as I have come:
    #!/usr/bin/perl -w $DNA = <STDIN>; chomp ($DNA); @DNA = split ("", $DNA); $lengthseq = scalar @DNA; print "The length of the sequence is:\n", $lengthseq, "\n"; @nucleotideDNA = ""; #check if each element in array is nucleotide foreach $nucleotide (@DNA){ if ($nucleotide =~ /^[ATCG]+$/){ push @nucleotideDNA, $nucleotide; } else { push @nonvalid, $nucleotide; } }
    But how can I print the position of the non valid character? Not sure if this makes any sense.. Thanks
Match something that does not match
1 direct reply — Read more / Contribute
by jo37
on Feb 15, 2020 at 12:42

    every now and then I stumble upon the question of how to match something that doesn't match something else. I.e. some expression in the sense of [^...]* for a general given regex $match. The best I got so far is:

    my $does_not_match = qr{((?:.*?(?=$match))|(?:(?:.(?!$match))*))};

    The first branch matches a substring up to the given regex $match if there is a match and the second branch matches the whole string if there is no match. Both fail in the opposite case. (The second branch by missing the last character.)

    Does anybody know something simpler? Or do you see any issues with the given regex?

    Here is a small example:

    #!/usr/bin/perl use Test2::V0; sub do_not_match { my $pat = shift; return qr{(?:.*?(?=$pat))|(?:(?:.(?!$pat))*)}; } my $re = do_not_match(qr{\b[aeiou][a-z]*ion\b}); is [/($re)/], ['stimulated '], 'matches prefix' for 'stimulated emission of radiation'; is [/($re)/], ['electron transition'],'no match' for 'electron transition'; is [/($re)/], [''], 'matches empty prefix' for 'absorbtion of photons'; is [/($re)/], ['light '], 'matches not greedy' for 'light amplification by stimulated emission of radiation'; is [/($re)\bimpact/], ['electron '], 'gives characters back' for 'electron impact ionization'; done_testing;

    I might put this into an extension module for Regexp::Common, but I'm not sure if this makes sense at all. And what would a proper naming be? Maybe something like:

    use Regexp::Common 'do_not_match'; my $re = $RE{do_not}{-match => 'something'}

    Your opinions?


wiki regex reprocessing replacement
4 direct replies — Read more / Contribute
by LanX
on Feb 15, 2020 at 09:39


    I need a regex to transform wiki markup surrounding words to html, * to <b> etc.

    my problem is that */_ could be combined at word boundaries, see the following example

    DB<66> $_=$wiki; tf();tf();tf() ; print "'$wiki' \n=>\n'$_'" '_*one /two/*_ _*three /four/*_ _*five /six/*_' => '<u><b>one <i>two</i></b></u> <u><b>three /four/</b></u> <u><b>five <i +>six</i></b></u>' DB<67>
    '_*one /two/*_ _*three /four/*_ _*five /six/*_'
    'one two three /four/ five six'

    as you can see I have to run the tf() transformation thrice

    DB<40> %h = ( '*'=>'b', '/' => 'i' , '_' => 'u' ) DB<59> sub tf { s{ $pre ([_*/]) (.*?) \2 $post}{$1<$h{$2}>$3</$h{$2} +>$4}xg } DB<62> $pre = qr/(^|\s|>)/ DB<63> $post = qr/($|\s|<)/ DB<65> $wiki='_*one /two/*_ _*three /four/*_ _*five /six/*_'


    Is there a way to make it a one-run transformation?

    Trouble is that /g continues after the inserted replacement, here underline

    I was experimenting with lookaround-assertions and \G and couldn't get it done.


    The only ways I can (theoretically) think of so far are

    • to loop over /g in scalar context while (s///g) { ... } and to manipulate pos
    • or to manipulate pos in an embedded Perl code (?{...})
    • to call tf() recursively in the /e evaled replacement part
    NB: It's a more theoretical question because running tf() three times doesn't pose problems.


    I just noticed a bug, since four wasn't expanded.

    &tf has to be better written with a lookbehind which doesn't consume the next whitespace

    DB<90> sub tf { s{ $pre ([_*/]) (.*?) \2 (?=$post)}{$1<$h{$2}>$3</$h +{$2}>}xg }

    I'll update an SSCCE soon.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

RESOLVED - DBI, DBD::Oracle, Inconsistent Fetch Failures
2 direct replies — Read more / Contribute
by perldigious
on Feb 14, 2020 at 17:54

    Hi Monks,

    So I'm still having major issues with inconsistent/intermittent data fetch failures from my company's Oracle DB. I've been digging, a lot, and I may have narrowed down the issue further based on some earlier suggestions from the Monastery... but I still don't have a solid solution. I think this is an Oracle DB datatype issue, where there is a mismatch between what the DB has and DBD::Oracle is expecting, but being a newbie I'm not sure that's correct. Here is the relevant portion of the script I'm currently trying and using tracing options on to try and debug further.

    use strict; use warnings; use DBI; use DBD::Oracle qw(:ora_types); print "Establishing DWASAS connection...\n"; # establish database connection and enable tracing option with dump to + 'tracelog.txt'. my $dsn = 'dbi:Oracle:DWASAS'; my @connection = ($dsn, $user, $pass, {InactiveDestroy => 1, PrintErro +r => 0, RaiseError => 1}); my $dbh = DBI->connect(@connection) or die; open(my $tracelog_fh, '>', 'tracelog.txt') or die "Cannot open \"trace +log.txt\": $!."; $dbh->trace(4, $tracelog_fh); # pepare and execute 'prepack_candidates.sql' print "Preparing 'prepack_candidates.sql'...\n"; my $sth1 = $dbh->prepare($candidates_sql) or die; print "Executing 'prepack_candidates.sql'...\n"; $sth1->execute() or die; print "Fetching 'prepack_candidates.sql' returned data...\n"; print "\n----------\n"; my $print_format = '%7s '. '%-50s '. '%11s '. '%12s' ; printf("$print_format\n", 'ROW_NBR', 'REPORT_PART_NBR', 'SHIPPED_QYT', 'COUNT_AT_QTY' ); my $row_counter = 1; while(my $row = $sth1->fetchrow_hashref) { foreach (keys(%$row)) {$row->{$_} = '' if (!defined($row->{$_}))}; + # change all NULLs to empty strings printf("$print_format\n", $row_counter, $row->{'REPORT_PART_NBR'}, $row->{'SHIPPED_QTY'}, $row->{'COUNT_AT_QTY'} ); $row_counter++; } print "----------\n";

    And here is the basic SQL query it's running. And yes, it is successfully running, I can run this same query in 3 other tools: SQL Developer, Alation, and SAS (SAS modified for proc sql))

    The failure happens during the $sth1->fetchrow_hashref loop. If I try various CASTs in the SQL it will tend to get hung up on different rows of data, but at this point rarely finishes successfully. But here's where things get interesting. The trace file, when it fails the fetch, just looks like this.

    But, way back up at the top, when it's doing the prepare statement, the datatypes don't match what I would expect. When I look at them in the DB tables shown in Alation, Alation says one is a VARCHAR2(50) and the other is a NUMBER(10). I'm not sure what the COUNT I'm doing would become, but I was guessing NUMBER of some size as well.

    So, for those Monks that know DBI and especially DBD::Oracle well, or any Monk who has an opinion really... thoughts that may help me?


    UPDATE - Resolved:

    Okay, so after wasting a huge amount of time over the past few weeks trying to debug all my issues from a Perl client and/or Oracle DB side, I have this resolved thanks to one of my DBAs and on of my Network Engineers at my employer. The issue had nothing to do with Perl, my script, or the Oracle DB.

    The DBA, after doing some tracing on my client side and the DB side and speaking with Oracle support, was able to narrow down something in the trace log that led him to believe a firewall or something similar was killing the connection during the data fetching process. So he referred me to the Network Engineers. The Network Engineer had seen similar issues before with other things besides Perl scripts (our Software Engineers do a lot of Python scripting), so he had me try simply physically plugging in via an Ethernet cable instead of going through the wireless network. That immediately fixed ALL the issues I was seeing and all my scripts run perfectly and seamlessly over and over without fail... so far anyway (see, if nothing else I've learned to qualify). :-)

    The Network Engineer explained to me that he isn't exactly sure what kills things like this since it's outside his area, but he knows there's something in my employer's wireless network chain where the connections are routed through that has extra layers of security that can be bypassed with a physically wired connection since it doesn't route the connection through the same switches (if I'm reiterating what he said accurately).

    Thanks to all the Monks who tried to help me with this despite the actual problem being something completely unrelated (sorry, I don't know what I don't know). I'm going to twist this experience in my mind and try and think of it as me spending the better part of three weeks "learning new things" rather than just having dumped all that time unnecessarily in to a big black hole... it's working now in any case, so I'm happy. :-)

    Just another Perl hooker - My clients appreciate that I keep my code clean but my comments dirty.
How to export hash references
1 direct reply — Read more / Contribute
by Anonymous Monk
on Feb 14, 2020 at 06:03
    I'm trying to export a reference to a hash but can only access it with a fully-qualified name. What am I doing wrong? Thanks

    Package Foo; use strict; use warnings; use Exporter 'import'; our @ISA = qw[Exporter]; our @EXPORT_OK = qw[%env $env]; our %env = %ENV; our $env = \%ENV; 1;
    perl -I. -MFoo=%env -le 'print scalar keys %env'
    perl -I. -MFoo=$env -le 'print scalar keys %$env'
    perl -I. -MFoo=$env -le 'print scalar keys %$Foo::env'
Perl in data science: could a grant from Perl foundation be useful?
1 direct reply — Read more / Contribute
by zubenel0
on Feb 18, 2020 at 14:15

    Recently I was thinking about if it is possible to make Perl a more attractive option for data science. I know that some great initiatives exist like RFC: 101 Perl PDL Exercises for Data Analysis or RFC: 100 PDL Exercises (ported from numpy). On my part, I will try to write a blog post with a particular machine learning task I have chosen. Nevertheless, as Ovid wrote falling short in data science field is a significant drawback of Perl. How to fix this?

    What I thought about as a way to to proceed could be a grant from Perl foundation. It could work only if it would be possible to find someone interested in a project related to Perl and data science and capable to do it. IMO one of the solutions that could help would be to write a book on How to use Perl in Data Science. Again, this idea is not mine as it was mentioned in perlblogs as a desire to have a new PDL book. Maybe with a help from Perl foundation such a project could encompass even more than PDL and include several other modules suited for data science.

    Another interesting idea that I have encountered was to create Perl/XS graphics backend as there is a need to have graphic library which can create 2D/3D chart easily - see the comments on perlblogs. Unfortunately, I know very little about this but I guess that it might be a very hard task... So these are just a couple of examples but actually the main issue is if it is feasible in general - to have a grant for data science using Perl? What do you think? Do you know someone that could be interested in it? Or do you think that this approach is flawed and have some other suggestions?

PerlMonks Discussions
Chatterbox: NEW link to sent messages
No replies — Read more | Post response
by LanX
on Feb 16, 2020 at 07:11

    Private /msg [messages] are sometimes hard to understand if they are a reply to something you wrote a day ago.

    Actually they are still available in the "Delete" folder but only for a limited time (36h ?)

    click here to check

    The chatterbox nodelet used to have various links to the message box in the bottom row

       And 69 more, 90 archived

    It now contains a link to recently sent messages too

       And 69 more, 90 archived, recently sent

    The "recently" was necessary to avoid confusion about older messages missing. You need to archive them explicitly if you want keep them longer.

    Thanks to the gods for accepting my patch. :)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    ) The link is a bit long for my taste, probably "just sent" or only "sent" with alt-text would have been better

Log In?

What's my password?
Create A New User
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2020-02-19 20:23 GMT
Find Nodes?
    Voting Booth?
    What numbers are you going to focus on primarily in 2020?

    Results (84 votes). Check out past polls.