Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

The Monastery Gates

( #131=superdoc: print w/ replies, xml ) Need Help??

Donations gladly accepted

If you're new here please read PerlMonks FAQ
and Create a new user.

New Questions
Forking Perl5 with Microperl++?
1 direct reply — Read more / Contribute
by rje
on Oct 21, 2014 at 10:27

    "If I were to implement a language now, I'd write a very minimal core suitable for bootstrapping. ... Think of a handful of ops. Think very low level. (Think something a little higher than the universal Turing machine and the lambda calculus and maybe a little bit more VMmy than a good Forth implementation, and you have it.) If you've come up with something that can replace XS, stop. You're there. Do not continue. That's what you need." (Chromatic, January 2013)

    This is in Chromatic's response to Moe, which sounds -Ofun and seems like a good idea.

    And now that I've crawled out from under my rock, I see microperl, an ancient tool included with 5.8. Alright, I don't recall ever hearing about microperl. It's not in my copy of the Programming Perl book, and that sucker is a thousand pages.

    And lately, haven't I seen grumblings about the insularity and stubborn bikeshedding nature of the Perl community?

    So how about this. Microperl isn't the "handful of ops" that Chromatic is suggesting, and it ain't XS, but it's tiny and, well, it's Perl. Therefore, what are the issues around forking and modifying microperl (oh so gently) to bootstrap a variant of Perl5? I mean, it's already written, and it's tiny -- minimal, if Simon is to be believed. I mean, the thing built in a 300k footprint when compiled with a proper optimizer. So what's not to like?

    So I re-ask my question. Doesn't microperl seem to be at least halfway to a bootstrap up to a useful and somewhat regularized, normalized, unpackable, what-have-you-able variant of Perl5?

    Just for reference, here's a sample microperl script written in zentara's discussion. It seems pretty handily powerful for being a tiny subset of perl. Is there documentation for microperl?

Recursively executed function in regex RHS, AKA "wait, why does this work?"
5 direct replies — Read more / Contribute
by Cody Fendant
on Oct 21, 2014 at 07:21

    I was playing around with some code to render a kind of templated HTML, and I came up with the code below.

    The HTML contains special markup, <include id="n"> where n is an integer. These are looked up in a db but here they're just returned by a sub to make this question simple and self-contained. In the example below, there are nested includes where include 3 contains a reference to include 4 and 4 contains a reference to 5.

    I did the rendering by re-calling the rendering function, inside the rendering function as the right-hand side of a regular expression.

    My question is—why does this work? How come it renders all the way down instead of stopping after the first level? I thought it would need some kind of while loop to keep it rendering until it had processed all possible nested includes.

    #!/usr/local/bin/perl print render(1); sub render { my $id = shift; $HTML = return_html($id); $HTML =~ s/<include id="(\d+)">/render($1)/eg; return $HTML; } ### this sub is only here to mimic database lookups sub return_html { my $id = shift; if ( $id == 1 ) { return '<html> <head> <include id="2"> <title>hello world</title> </head> <body> <div class="container"> <h1>Hello world</h1> <include id="3"> </div> </body> </html>'; } if ( $id == 2 ) { return '<link rel="stylesheet" href="foo.css">'; } if ( $id == 3 ) { return '<div id="footer">copyright <include id="4"></div>'; } if ( $id == 4 ) { return '2014 <include id="5">'; } if ( $id == 5 ) { return 'and 2015'; } }
No need to escape right brace in regex
2 direct replies — Read more / Contribute
by syphilis
on Oct 21, 2014 at 00:13
    In recent versions of blead (including current 5.21.5) I get:
    C:\>perl -le "$_ =~ /sub mymy{ }/;" Unescaped left brace in regex is deprecated, passed through in regex; +marked by <-- HERE in m/sub mymy{ <-- HERE }/ at -e line 1. C:\>
    That's easily fixed:
    C:\>perl -le "$_ =~ /sub mymy\{ }/;" C:\>
    But why not also (or instead) deprecate the unescaping of the *right* brace ?

    Cheers,
    Rob
Human-readable date format strings
3 direct replies — Read more / Contribute
by thewebsi
on Oct 20, 2014 at 22:28

    I'd like to work with human-readable date format strings - ie, allow the user to select the date format for input and output using a human-readable string such as 'yyyy-mm-dd'.

    Unfortunately, the popular date manipulation strftime()/strptime() functions from POSIX and Time::Piece use a non-human-friendly '%X' format. In my search, I found Time::Format::time_format(), and this does exactly what I want for output (strftime), but there is no input (strptime) equivalent.

    Barring any better options, I'll probably write a regex block to convert format strings from a human-readable format to the POSIX format so I can use the standard functions. Before I embark on that however, does anyone have other suggestions? Thanks.

    Edit: To clarify, instead of POSIX::strftime ( '%Y %m %d', localtime() ), I'd like to use SomeModule::SomeFunction ( 'yyyy-mm-dd', localtime() ), as Time::Format::time_format() does, but also need support for strptime().

Problem with monitoring running tasks
2 direct replies — Read more / Contribute
by cariddiEMC
on Oct 20, 2014 at 13:04

    Hi, I am pretty new to perl, so please be understanding. Below is a perl subroutine, that I inherited, that is used to run a set of "X" tests simultaneously. This subroutine is called by a script that I will call runtasks.pl that takes a file containing a list of tests and runs them. Periodically I am finding that for some reason or another 1 or more of the tests we run do not complete. Each test has a timeout that will terminate the test if exceeds the time limit.

    When I use system monitor to see what is going on, none of my tests are running anymore, but I see 2 or more instances of runtasks.pl running. Every instance of runtasks.pl, except 1, that are running, have a pid that relates to the test or tests that have not completed. I am assuming that pid of the process that I cannot match up is the pid of the terminal window where I originally ran runtasks.pl from.

    Can somebody help me find what is wrong? I have done a lot of reading on waitpid and on the other commands used in the runall subroutine but I have not found any smoking gun.

    Thanks. Mark

    sub runAll { my $self = shift; my $doneThreadsLock : shared = 1; my $testQueueLock : shared = 1; my @workers; for (1..$self->{'maxtasks'}) { push @workers, threads::async { $SIG{'INT'} = sub { my $tid = threads->tid(); # Tell user we've been terminated if (exists $self->{'runningTasks'}->{$tid}) { my $worker_task = $self->{'runningTasks'}->{$t +id}; my $exec_name = getExecName(join(" ",@{$worker +_task->{'command'}})); my $pid = $worker_task->{'pid'}; kill(-12, $pid); # print "Worker $tid >> Waiting$exec_name pid: +$pid\n"; waitpid($pid, 0); # print "Worker $tid >> Done $exec_name pid: $p +id\n"; $worker_task->{'endTime'} = time(); } { lock $doneThreadsLock; $self->{'doneThreads'}->enqueue($tid); } print "INT: Worker $tid exiting\n"; threads->exit(); }; my $tid = threads->tid(); my $rc; my $continueProcessing = 1; my $worker_task; while ($continueProcessing) { { lock $testQueueLock; $worker_task = $self->{'testQueue'}->dequeue(); } if(!$worker_task) { $continueProcessing = 0; next; } my $exec_name = getExecName(join(" ",@{$worker_task->{ +'command'}})); chdir ($worker_task->{'workdir'}); print "Running: $exec_name\n"; $worker_task->{'startTime'} = time(); $self->{'runningTasks'}->{$tid} = $worker_task; my $cmd = join(" ", @{$worker_task->{'command'}}); my $pid = fork; # if we are child process, pid will be 0 otherwise we +are the master if ($pid == 0) { # print "pid = 0, $exec_name running as child proce +ss.\n"; my $logfile = FileHandle->new; $logfile->open("> $worker_task->{'log'}"); $logfile->autoflush(1); open(STDOUT, '>&=' . $logfile->fileno); open(STDERR, '>&=' . $logfile->fileno); select STDERR; $| = 1; # make unbuffered select STDOUT; $| = 1; # make unbuffered $logfile->close; # print "Execing child process $exec_name pid($pid) +.\n"; { lock $doneThreadsLock; $self->{'doneThreads'}->enqueue($tid); } setpgrp; exec("$cmd"); exit(1); } $worker_task->{'pid'} = $pid; my $child_status; # print "Waiting for $exec_name $pid to exit\n"; while (waitpid($pid, POSIX::WNOHANG) != -1) { $child_status = $?; sleep(1); } # print "Done waiting for $exec_name $pid to exit\n"; $worker_task->{'endTime'} = time(); $worker_task->{'status'} = $child_status; # print "Done waiting for $exec_name $pid to exit statu +s ($child_status)\n"; } { lock $doneThreadsLock; $self->{'doneThreads'}->enqueue($tid); } threads->exit(); }; } print "all tasks queued, Now waiting for tasks to exit before queu +ing more.\n"; { lock $testQueueLock; $self->{'testQueue'}->enqueue(undef) for @workers; } my $threads_exited = 0; my $stop_done = 0; while ($threads_exited < $self->{'maxtasks'}) { my $tid; { lock $doneThreadsLock; $tid = $self->{'doneThreads'}->dequeue_nb(); } if (defined($tid)) { threads->object($tid)->join(); $threads_exited++; print "threads_exited($threads_exited) maxtasks:($self->{' +maxtasks'}).\n"; } else { if ($self->{stop} && !$stop_done) { $stop_done = 1; foreach my $thr (threads->list()) { $thr->kill('INT'); } } else { sleep(1); } } } }
combining emacs' cperl-mode formatting with Perltidy
1 direct reply — Read more / Contribute
by LanX
on Oct 20, 2014 at 08:33
    Hi

    I love the instant formatting features emacs gives me when typing Perl, but they can't cope with perltidy's detailed control.

    So I would like to run perltidy from time to time to update my text buffer (anything less frequent than with every return hit)

    Problem is to have a compatible and "tolerant" config, where both formatter don't start messing the result of the other one.

    Did anyone already try to achieve this somehow?

    Cheers Rolf

    (addicted to the Perl Programming Language and ☆☆☆☆ :)

Using Shift with Parameters in a subroutine
6 direct replies — Read more / Contribute
by ron_abraham
on Oct 20, 2014 at 05:00
    Hi ,

    Is there any way to compress both these statements to one line ? It would only seem rational that we have a way to do so, especially if i am passing 10 references to my subroutine.

    sub foo ( my $query_ref = shift(@_); my $session_ref = shift(@_); #do something return \$someref; }

    Not directly related to the above, but going through the man pages , i see there is no formal parameter support , at least in perl 5.10.1 (which is my customer version ) . I heard we have subroutine signature support in perl5.20. if so can you point me to the man pages for that ?

Generate list with array elements repeated x times
2 direct replies — Read more / Contribute
by Anonymous Monk
on Oct 16, 2014 at 14:03

    Simple enough; I need to repeat a date from an array of dates x number of times and list it to a text file, repeat the next date and list them under the first date, etc. So:
    2014-01-01
    2014-01-01
    2014-01-01
    2014-02-02
    2014-02-02
    2014-02-02
    ...repeat for more. Le Code:

    #!/usr/bin/perl -w use Term::ANSIColor; use Date::Calc qw(Add_Delta_Days); use strict; use warnings; my $START_DATE = &prompt("Please enter the start date (format yyyy-mm- +dd):"); my $END_DATE = &prompt("Please enter the ending date (format yyyy-mm-d +d):"); open LABFILE, '>', "labels.txt"; print LABFILE join("\n", date_list($START_DATE, $END_DATE)); close LABFILE; sub date_list { my ($from, $to) = @_; my @dates = $from; my $intermediate = $from; while ($intermediate ne $to) { $intermediate = sprintf "%04d-%02d-%02d", Add +_Delta_Days(split(/-/, $intermediate), 1); push @dates, $intermediate; } my $first = shift(@dates); my $last = pop(@dates); my $dates = join("\n", @dates); my @multiple = $dates; my @mults = map { "$dates" } @multiple; foreach my $date(@mults) { #return $date; return (($date) x 10); } } sub prompt { my($prompt, $default) = @_; my $defaultValue = $default ? "[$default]" : ""; print color("yellow"), "$prompt $defaultValue", color ("reset"); chomp(my $input = <STDIN>); return $input ? $input : $default; }

    ...which if you use dates 2014-01-01 to start and 2014-01-10 to end, returning $date (commented out) by itself gives you:
    2014-01-02
    2014-01-03
    ..
    2014-01-09
    ..and if you run the uncommented line I get:
    02
    ..through..
    09
    repeated 10 times. I don't want to repeat the whole array, just repeat each element 10 times in succession. I am prostrate before your light, an empty vessel, yet wide open.

how to read binary file and give a binary output?
3 direct replies — Read more / Contribute
by perllearn
on Oct 16, 2014 at 08:44
    hi all, i want to read a binary file, do some operations in it (i want to ZERO all the value before 0x4fff and then preserve all the values after it ) and then output a modified binary file with ". bin" extension. i am a beginner in perl, could anyone of you help me do this as soon as possible?
Problem of context?
2 direct replies — Read more / Contribute
by pabla23
on Oct 16, 2014 at 04:20
    Good Morning! I have this problem, i hope you can help me:

    use strict; use warnings; use Data::Dumper; open (FILE, "/Users/Pabli/Desktop/gene_association.goa_human"); open (FILE2, "/Users/Pabli/Desktop/go3.obo"); while(<FILE>) { my @array_with_all_fields=split(/\t/); if ($array_with_all_fields[2] eq "TP53"){ my $mio=$array_with_all_fields[4]; my %go_accession_hash=($mio,''); my @test2= keys %go_accession_hash; } } while(<FILE2>){ print "".$test2[0]."\n"; } close FILE; close FILE2;

    When the compiler enter into the second while the variable "test2" is not seen! I try to "play"with the declaration global/local...but the result is the same! Can someone help me?

    Thanks Paola

Controlling MinGW GDB using Perl
1 direct reply — Read more / Contribute
by perl_sck
on Oct 16, 2014 at 01:55
    Hi I am trying to control MinGW's GDB using Devel::GDB perl module in windows, I am using cygwin environment for the same, the problem I'm facing is that the function/method 'get', though is executing the command but is not returning any result/output. It works fine if I communicate with gdb of cygwin. But I need to communicate with MinGW GDB. So what changes do I need to do to get the results from the 'get' command.
Where to put an application database?
8 direct replies — Read more / Contribute
by boftx
on Oct 16, 2014 at 00:43

    Given that you want to have your CPAN application use an SQLite database, what directory would you place the database files in?

    One might think that having it under the module directory would be a good choice, but that would almost certainly require root access unless one is using cpanm or something similar to install the module, not to mention it would most likely require a mode of 666.

    Another choice would be under /var but that has the same disadvantages as the first choice.

    /tmp is always good for not needing root, but is subject to being blown away on reboots (though that is not the case so much anymore.)

    The problem of having mode 666 on the database files is almost always present, especially in the case of a command line application. It is simple enough to mark the app as setuid, but that introduces a set of headaches.

    What is your preferred way to deal with this?

    You must always remember that the primary goal is to drain the swamp even when you are hip-deep in alligators.
New Meditations
Refactoring Perl5 with Lua
No replies — Read more | Post response
by rje
on Oct 21, 2014 at 14:31

    If you've read my previous post on microperl, then you're sufficiently prepared to take this post with a grain of salt. As a brief summary, I'll re-quote something Chromatic wrote to start me thinking about this problem in general:

    "If I were to implement a language now, I'd write a very minimal core suitable for bootstrapping. ... Think of a handful of ops. Think very low level. (Think something a little higher than the universal Turing machine and the lambda calculus and maybe a little bit more VMmy than a good Forth implementation, and you have it.) If you've come up with something that can replace XS, stop. You're there. Do not continue. That's what you need." (Chromatic, January 2013)

    Warning: I've never written a VM or a bytecode interpreter. I have written interpreters and worked with bytecodes before (okay, a 6502 emulator, but that's basically a bytecode interpreter, right?) Just remember that I'm not posting from a position of strength.

    So I found the Lua opcode set, and it seems a good starting point for talking about a small, though perhaps not minimal, Turing machine that seems to do much of what Chromatic was thinking about... except for XS, which I still haven't wrapped my head around.

    Lua has a register-based 35 opcode VM with flat closures, threads, coroutines, incremental garbage collection... and manages to shoehorn in a tail call, a "for" loop, and a CLOSURE for goodness' sake. And some of those opcodes could be "macros" built on top of other opcodes, rather than atomic opcodes (only if speed were unimportant): SUB, MUL, DIV, POW, LE.

    Again, a disclaimer: I haven't been in a compiler construction class for 25 years, and my career has typically been enterprise coding, data analysis, and tool scripting. Regardless, a small opcode set seems to me to be important for portability. And... 35 codes... well, that's dinky.

    I don't assume that Lua's codes are sufficient for Perl... things are likely missing or just not quite right for Perl. But I have to start somewhere, right? And I figure some of you have the right Domain Knowledge to shed some light on the subject. Right?

    There's lots of neat notes in the aforementioned Lua design doc, written in a clear and concise manner. And now for a brief glance at Lua's opcodes:

On optimizing nested loops
2 direct replies — Read more / Contribute
by FloydATC
on Oct 19, 2014 at 06:05

    While working on a complex script doing lookups and searches on a dozen arrays of hashes (each array representing a relational database table) I stumbled across an extremely simple improvement that instantly gave almost twice the performance.

    The original loop looked like this:

    sub filter { my $where = shift; my @in = @_; # This class method is used to filter an array of hashrefs against a + set of criteria defined in $where. # Example: # @matching_hosts = filter( { site => 56, type => 4 }, @all_hosts) +; # In this example, @matching_hosts will only contain those hashrefs +that would return TRUE for the following code: # ($_->{'site'} eq '56' && $_->{'type'} eq '4') # Note that the "eq" and "&&" are implied; no other operators are su +pported. # The order of the array is not affected. my @out = (); foreach my $record (@in) { my $keep = 1; foreach my $field (keys %{$where}) { unless ($record->{$field} eq $where->{$field}) { $keep = 0; last; } push @out, $record if $keep; } } return @out; }

    The rewritten loop looks like this:

    sub filter { my $where = shift; my @in = @_; # This class method is used to filter an array of hashrefs against a + set of criteria defined in $where. # Example: # @matching_hosts = filter( { site => 56, type => 4 }, @all_hosts) +; # In this example, @matching_hosts will only contain those hashrefs +that would return TRUE for the following code: # ($_->{'site'} eq '56' && $_->{'type'} eq '4') # Note that the "eq" and "&&" are implied; no other operators are su +pported. # The order of the array is not affected. my @out = (); # Make one pass per match term foreach my $field (keys %{$where}) { my $value = $where->{$field}; @out = grep { $_->{$field} eq $value } @in; @in = @out; # Prepare for next pass (if any) } return @out; }

    The running times of actual reports dropped from over 4 seconds to less than 2 seconds. Some of that improvement obviously came from using the built-in grep{} function instead of manually checking each value and push()'ing hashrefs to the @out array, but I didn't expect that much of an improvement.

    There had to be a different explanation, and that got me thinking about the cost of setting up and executing a foreach() loop:

    $ cat foreach_inner #!/usr/bin/perl use strict; use warnings; foreach my $foo (1 .. 3) { foreach my $bar (1 .. 10000000) { my $pointless = "$foo.$bar"; } }
    $ time ./foreach_inner real 0m8.975s user 0m8.954s sys 0m0.013s
    $ cat foreach_outer #!/usr/bin/perl use strict; use warnings; foreach my $foo (1 .. 10000000) { foreach my $bar (1 .. 3) { my $pointless = "$foo.$bar"; } }
    $ time ./foreach_outer real 0m14.106s user 0m14.092s sys 0m0.003s

    Both test scripts do the exact same amount of (pointless) work, the difference between the two scripts is that 'foreach_inner' has to execute 9999997 more foreach() loops than 'foreach_outer'.

    Sometimes, even a seemingly pointless improvement can make a significant difference if made in the right place.

    Now, the way filters are specified in $where is pretty much nailed down because that hashref is built and used in a lot of different contexts. I am still looking for a way to express the whole thing as a single grep{} block to eliminate the looping altogether. Maybe tomorrow.

    -- FloydATC

    Time flies when you don't know what you're doing

Default Dropdown Value
3 direct replies — Read more / Contribute
by choroba
on Oct 17, 2014 at 03:35
    Recently, I was refactoring a CGI script at work. It contained a subroutine used to determine the default value for a dropdown list:
    sub DefaultHashValue { my %h = @_; my %r = reverse %h; my @k = sort values %h; return $r{ $k[0] } }

    Neat and short, I thought. But wait, what exactly does it do? We pick up the asciibetically first value and find the corresponding key. It took me some time to understand it (yes, I'm tough). Could this code be written in a more speaking way?

    I'd probably write it differently:

    sub sort_keys { my %h = @_; my @s = sort { $h{$a} cmp $h{$b} } keys %h; return $s[0] }

    Our dropdowns vary in size from 2 elements to several hundreds. For pure curiosity (there were no speed problems), I benchmarked the solutions (see below). Interestingly, for lists over 50 elements, the original solution was faster.

    It wasn't so hard to come with a winner. It's still readable, too:

    sub min { my %h = @_; my $min = (keys %h)[0]; $h{$_} lt $h{$min} and $min = $_ for keys %h; return $min }

    Which solution would you use and why? Or, would you use something else? Why? (I stayed with the original).

    For the interested, the full testing and benchmarking code:

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2014-10-22 05:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (113 votes), past polls