Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

The Monastery Gates

( #131=superdoc: print w/ replies, xml ) Need Help??

Donations gladly accepted

If you're new here please read PerlMonks FAQ
and Create a new user.

New Questions
Replacing builtin macro languages with perl -- how difficult?
2 direct replies — Read more / Contribute
by Anonymous Monk
on Nov 23, 2014 at 00:22
    I keep getting involved in projects where a big software package has a hand-rolled macro language that is pathologically deficient. I'd love to replace such languages with Perl and expose the underlying data without needing to copy it or pipe into a separate perl process. Can anyone comment on how difficult this might be? Are there any examples of this being done?
Verify syntax of large JSON files
1 direct reply — Read more / Contribute
by ron800
on Nov 22, 2014 at 22:00
    Hello, I've searched through this group, but no one seems to have the same issue. I want to parse large JSON files to see if the syntax is correct. I found examples but they all seem to want to parse from a string. I want to do something like this:
    use JSON::Parse 'assert_valid_json'; unless (valid_json ($json)) { # do something } eval { assert_valid_json ('["xyz":"b"]'); }; if ($@) { print "Your JSON was invalid: $@\n"; } }
    My files are multiple megabytes and reading them into a string for parsing is not realistic. Any suggestions would be appreciated.
Getopt::Long defaults for options with multiple values
3 direct replies — Read more / Contribute
by PetaMem
on Nov 21, 2014 at 07:23

    Reviewing some middle-aged code, I stumbled across this topic...

    Unfortunately, the G:L documentation is silent about default values for options with multiple values: Getopt::Long#Options-with-multiple-values. Even more unfortunate seems an inconsistency compared to defaults for options with single values and maybe a semantic inconsistency at all. If you have a single value option, you may define a default like:

    my $tag = 'foo'; # option variable with default value GetOptions ('tag=s' => \$tag);

    This works as expected. Good. You can - of course - do a similar thing for options that take multiple values:

    my $listref = ['a','b','c']; # option variable with default valu +es GetOptions ('list=s{,}' => $listref);

    If you omit the -list option, the program will have the default value, which is good. If you, however, will give a list option, G:L seems to push that option to the list already given in default, which may have its applications, but is not that great as default behavior. If you want to actually replace the default given, you would have to define defaults the ugly, backward and programmatically DIY way:

    my $listref = []; # no default GetOptions ('list=s{,}' => $listref); my $listref = @{$listref} ? $listref : ['a', 'b', 'c']; # DIY default

    Which is actually code I see right now. Yuck! That can't be right - can it?

        All Perl:   MT, NLP, NLU

Runaway CGI script
3 direct replies — Read more / Contribute
by Pascal666
on Nov 19, 2014 at 11:15
    tl;dr: Somehow a CGI script that doesn't write to disk kept running for about 16 hours after the client disconnected, filled up the disk about 10 hours in, and then freed the space when Apache was killed. Contents of script unknown.

    Fully patched CentOS 7. Woke up this morning to "Disk quota exceeded" errors and:

    # df -h Filesystem Size Used Avail Use% Mounted on /dev/simfs 25G 25G 0 100% / # du -sh 3.9G .
    Top indicated that I had plenty of ram left and a CGI script I wrote yesterday was the likely culprit:
    KiB Mem: 1048576 total, 380264 used, 668312 free, 0 buffe +rs KiB Swap: 262144 total, 81204 used, 180940 free. 33856 cache +d Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COM +MAND 5140 apache 20 0 239888 2348 1756 R 17.3 0.2 144:42.67 htt +pd 14980 apache 20 0 30840 1884 1228 S 15.6 0.2 153:43.94 bou +nced.cgi
    I killed Apache and now my disk and cpu utilization are normal. I didn't have lsof installed so I couldn't see what file was causing the problem.

    Access_log shows only me accessing the script, and error_log shows nothing since I wrote it.

    I wrote this quickly yesterday with no error handling, but worst I expected to happen if there was an error was the script to die. I can't understand how the following could possibly fill up my disk. It appears to work as intended.

    #!/usr/bin/perl use strict; use warnings; use CGI; use CGI::Carp qw(fatalsToBrowser); my $q = new CGI; print $q->header; print $q->start_html('Bounce summary'); my @files = </home/user/Maildir/cur/*>; for (@files){ open(IN, $_); while (<IN>) { last if /The mail system/; } while (<IN>) { if (m#/$#) { print '<p>'; last; } s/</&lt/g; print; } close IN; } print $q->end_html;
    Edited to add:
    Pulling the CGI components out gives nearly identical output to what the web browser tab displays, with no errors showing. The directories I was/will run this against never have subdirectories.

    Having thought about it today, I believe one of my initial assumptions when opening this thread was probably incorrect. As a CGI script it only runs when I access it. I only ran it a couple times in its final state (above). It is probable the stuck version was different, and I simply didn't notice it. It could have run for many hours before crippling the server. I do not make a habit of confirming scripts end when they stop loading or I hit X in my web browser, I just assume Apache will kill them.

    I just really don't understand how a cgi script could stay running without a client attached. I just created one with an intentional infinite loop, and as soon as I hit X Apache killed it.

    From /var/log/messages after I ran "service httpd stop" this morning:

    Nov 19 10:38:44 systemd: httpd.service stopping timed out. Killing. Nov 19 10:38:44 systemd: httpd.service: main process exited, code=kill +ed, status=9/KILL Nov 19 10:38:44 systemd: Unit httpd.service entered failed state.
    "kill -9 14980" probably would have fixed the problem without killing Apache, but I didn't think of it at the time.

    Update 2:
    It is actually trivial to create a cgi script that won't die when the client disconnects. My test above contained a "print" inside the loop. Looks like Apache disconnects STDOUT when the client disconnects which causes print to kill the script. For example, a cgi containing just:

    #!/usr/bin/perl sleep 1 while 1;
    will keep running after the client disconnects, and a "service httpd stop" will yield the same errors as above, however, Apache will kill it after the cgi timeout. So apparently one of my interim scripts entered an infinite loop without a print, but with something that caused Apache's timeout not to kill it. Still no idea how that could use up all free disk space, and then free it immediately when killed.

    I just tried writing to STDERR in the loop, both with "print STDERR" and by trying to read a closed filehandle. In both cases error_log recorded the errors immediately and continued to grow in size. When I experienced the disk full error yesterday one of the first things I checked was the log files. error_log was only 7728 bytes.

Basic Apache2::REST implementation
1 direct reply — Read more / Contribute
by Anonymous Monk
on Nov 19, 2014 at 10:33

    I'm new to Perl and i'm trying to implement REST API using Perl. I installed Apache2::REST , and i couldn't find enough detailed example to implement it (i'm poor in Perl).

    Ok, i see the below details in the document attached with the module ( ,

    <Location />
    SetHandler perl-script
    PerlSetVar Apache2RESTHandlerRootClass "MyApp::REST::API"
    PerlResponseHandler Apache2::REST

    i use Apache2 with Mod Perl, my web root path is /usr/local/apache2/htdocs

    and i see the Sample program/package named MyApp::REST::API in the document

    My question is, where should i place this Package/program ? under ..../htdocs/MyApp/REST/ ? or ..../htdocs/ or how i tried both but no luck,..

    Could any one please guide me to implement this API?

loading libxml2 as a prerequisite
3 direct replies — Read more / Contribute
by jandrew
on Nov 18, 2014 at 19:55


    I have recently loaded a package to CPAN Spreadsheet::XLSX::Reader::LibXML. The package is built on XML::LibXML which requires the libxml2 library in order to build successfully. Strangely this seems to come packaged with the OS for Windows but not for Unix or Linux OS's. I'm wondering if there is a good way to require libxml2 and it's dev package to load on non-windows systems through Makefile.PL magic with a Dist::Zilla plugin, ExtUtils::MakeMaker, or Module::Build in order to auto build this package from CPAN(M|P). I first noticed this issue when I wasn't getting many test reports from Linux / Unix systems in the CPAN Testers page.

    I am not a strong ExtUtils::MakeMaker or Module::Build user so I have relied heavily on Dist::Zilla++ and friends to get my modules out in the past. I will knuckle down and read what I need to but I'm not quite sure where to start. If it were as simple as requiring Alien::LibXML I might have tried to muddle through but I think I'm out of my depth here. Pointers on where to start would be greatly appreciated!

Watching File Changes Under OSX
1 direct reply — Read more / Contribute
by iaw4
on Nov 18, 2014 at 13:45

    there are a number of nice packages on cpan for monitoring files in a cross-OS portable fashion, such as File::Monitor, or File::ChangeNotify, or AnyEvent::Filesys::Notify. I believe, for real-time quick notifications, they rely on Mac::FSEvents. Otherwise, they fall back to very slow scanning.

    the problem is that Mac::FSEvents no longer compiles on osx yosemite. there are now errors in FSEvents.xs. probably a change in OSX.

    is there a different cpan module recommended that provides immediate notification of file changes? I just need a simple blocking call on a few files with a callback.

Archive::Extract alternatives?
1 direct reply — Read more / Contribute
by beginner1010
on Nov 18, 2014 at 10:32
    Hello everybody,

    i got a problem with the "ARCHIVE::Extract" function with .tar archives. It looks like it is only able to handle a specific file size in relation to the system's RAM. Currently we only have 2GB within the Virtual Server which means, that every .tar file above about 1.2GB will freeze the script.

    Simply upgrading the systems memory is unfortunatelly not an option. The System does not have direct internet access so installing any further packages/extensions is a real pain in the back which i would like to avoid if possible.

    I already tried Archive::tar which has the same restrictions. I assume, both function place the archive within the RAM to be able to manipulate it before the actual extraction.
    $Archive::Extract::PREFER_BIN is set to true which does no good because the script is running on a Windows System. Also i found the Archive::Extract::Libarchive which needs the libarchive installed - i did not look further into this yet because it needs further programms to compile it first on the system.

    Is there a way to install a command line tool on Windows so the "$Archive::Extract::PREFER_BIN" will work? Are there any other alternatives to the function? We do not need anything else besides unpacking the files to a specific folder.

    Any help is highly appreciated!

    Tried to reply on your comments but i only get a "Permission Denied" Page :(

    Anyway - Thank you for your quick reply!
    I looked into the ARCHIVE::Tar::Wrapper and so far it looks very promising - how can i install tar on a Windows System? Or can i use any other archiver (WinRar/7zip..), too?

    I totally agree - usually it is just a few clicks to add more memory to a VM but we have quite strict prozesses here. Adding more memory has to be requested through the proper channels and can take up to several months..
utf8 "\xD0" does not map to Unicode at /path/ line line_number, <STDIN> line line_number
3 direct replies — Read more / Contribute
by igoryonya
on Nov 18, 2014 at 08:55

    Also, I get:
    utf8 "\xD1" does not map to Unicode at /path/ line line_number, <STDIN> line line_number.

    I have it with some file names piped from the find program. It happened only with some file names recently, for the first time of the few years that I've been using and developing this program.

    Seems like some of the file names are corrupt.

    When I print out such file names with my program, I get something like:


    Ф\xD1%80\xD1%8Dнк \xD0%9F\xD1%8C\xD1%8E\xD1%81елик. \xD0%9D\xD0%9B\xD0%9F. \xD0%9C\xD0%95Т\xD0%90 \xD0%9Cодел\xD1%8C.webm

    The same file names displayed on the terminal by find before piping to my program display:


    Ф?%80?%8Dнк ?%9F?%8C?%8E?%81елик. ?%9D?%9B?%9F. ?%9C?%95Т?%90 ?%9Cодел?%8C.webm

    As I said, it's the first time I encountered such a problem after a few years of dayly usage of this program.

    here is a sample piping launch of the program from the linux terminal:
    find /some/path -type f| /some/path/ /path/to_folder/with_similar_dir_tree/ -parameters


    I've just noticed, that the file names get truncated after I tried: find /some/path -type f -exec /path/ {} /path/to_folder/with_similar_dir_tree/ -parameters \;
    Path, being provided by {} is being truncated significantly, maybe this is the problem that happens with stdout|stdin.
    Seems like, there is a very small limit on how many characters can be piped or passed by {} or, maybe, the files are being truncated because of an invalid characters.
    I guess, I have to resort to the usage of perl's internal find command.
    I don't see anything wrong with that command, I just wanted my program to be flexible, so it could be used either way: by using it's internal directory traversal or paths being piped from some other program.

    Update 2

    Thank you all, who participated in my problem solving. To be honest, since I've been trying to convert my programs to unicode, my understanding about this topic was pretty vague, althoug many things. After solving my problem got clarified, there is still a lot to understand about utf8 and unicode in general. When I look at amount of the perl's unicode documentation, it's pretty daunting when I realize that I need to therally read and digest all it. Until now, I thought that unicode is an answer to all textual problems and everything should be in utf8, until I stumbled on this particular problem. Now, I am realizing, that there are excepthions.

    At first, I didn't even have a clue, where to start to solve my problem, after talking to you. I understood, what needs to be done, but didn't understand, how. That frustrated me, because, I felt like unicode should be behind the curtains and I didn't want to saturate the fun of programming, which I love, with the daunting unicode "bookkeeping". Also, I keep confusin gthe encode and decode commands. Then I calmed down, skimmed the unicode, utf8 and encode documentation for the needed parts and started trying.

    When I set up a check on every variable, involved in path/file name processing for utf8-ness (utf8::is_utf8) and if it is utf8, set the utf8 flag off (Encode::_utf8_off), along the path of the code, the final paths started resolving for existence (-e). I realize, that if I encounter some part of the path, converted to utf8 and set the flag off, if that path portion was corrupt, before became utf8, the final resulted path could not resolve for existence (-e), but I don't know how to process certain strings without them being converted to character mode, like regex substitution, always returning a value with utf8 flag set, for example, so, for now, I will live it as it is and work on the fix and read more of utf8 and unicode docs when I encounter such problem.

Trying to Understand the Discouragement of Threads
7 direct replies — Read more / Contribute
by benwills
on Nov 18, 2014 at 01:18

    In the most sincere and blunt sense of the word, I've been a hack programmer since I started about 20 years ago. I never went deep into learning the art of programming, but could usually Frankenstein together what I needed. I'm stating this so you may consider the source (that would be me) of this question about why the use of threads is discouraged...

    I've spent the last two+ weeks learning(ish) perl to write the leanest and fastest threaded/asynchronous/parallel/forked code to perform a pretty basic task, millions of times a day (downloading web pages). In the process, I tried every forking/parallel/asynchronous/threaded solution I could hack together. I tried every http client I could find. I tested all of them in terms of speed, accuracy, and resource usage. (If it's important: in the end, I went with a pure-perl socket connection (not IO::Socket, but Socket) with some fine tuning of my own.)

    But, more to the point of the question, I found that absolutely no solution competed with threads in any way, shape, or form. Every non-thread solution was much heavier than threads, functioned much slower, and, for whatever reason (additional layers of code?), produced less accurate results and required more "management" in the code.

    Yes, figuring out the right threads solution took longer. But for the best solution (if threads is the best solution), I'll spend an extra few days on it to get it right.

    I've seen the heated debates about thread usage. I've read just about every single piece of threaded code BrowserUK has posted (and couldn't have written what I wrote without his help in the forums). And I've tested it all for my own use. And the answer is clear: threads wins, hands down.

    So: why such severe discouragement? Because it's a little more confusing and not as straightforward? Is there something I'm missing in terms of performance? Is my code situation unique to where threads are outperforming the alternatives, and this is uncommon?

    I found absolutely zero public data on performance comparisons, but lots of assertions about performance that contradicted my own tests.

    So, I'm just confused and, if I'm missing something, would love to know how to look at this differently.

    But, if I'm not confused, then why are threads so actively and severely discouraged? I'm really just trying to understand this.

    And if this isn't the place for this question, let me know where a more appropriate forum would be.

    Thanks for any help/pointers/thoughts you have that could help me understand this better.


New Meditations
Sub signatures, and a vexing parse
2 direct replies — Read more / Contribute
by davido
on Nov 18, 2014 at 16:53

    I was experimenting with the experimental subroutine signatures feature of Perl 5.20 today along with the much maligned prototypes feature of old, and encountered a most vexing parse that interested me. So I wanted to mention it here.

    First, something that is not a problem:

    *mysub = sub : prototype(\@\@) ($left,$right) { ... };

    This parses correctly, and will generate a subroutine named mysub with a prototype of \@\@, and with named parameters of $left and $right, which when called will contain array refs. But this doesn't do much. My real goal was generating several similar subroutines, and called upon map in a BEGIN{ ... } block to do the heavy lifting.

    Here is a contrived example that isn't terribly useful, but that works, and demonstrates the issue:

    use strict; use warnings; no warnings 'experimental::signatures'; use feature qw/say signatures/; use List::Util qw(all); BEGIN { ( *array_numeq,*array_streq ) = map { my $compare = $_; sub :prototype(\@\@) ($l,$r) { @$l == @$r && all { $compare->($l->[$_],$r->[$_]) } 0 .. $#$l } } sub { shift == shift }, sub { shift eq shift } } my @left = ( 1, 2, 3 ); my @right = ( 1, 2, 3 ); { local $" = ','; say "(@left) ", ( array_numeq @left, @right ) ? "matches" : "doesn't match", " (@right)"; }

    Do you see what the problem is? The compiler doesn't care for this at all, and will throw a pretty useless compiletime error:

    Array found where operator expected at line 14, at end of l +ine (Missing operator before ?) syntax error at line 14, near "@\@) " Global symbol "$l" requires explicit package name at line 1 +4. Global symbol "$r" requires explicit package name at line 1 +4. Global symbol "$l" requires explicit package name at line 1 +5. Global symbol "$r" requires explicit package name at line 1 +5. Global symbol "$l" requires explicit package name at line 1 +6. Global symbol "$r" requires explicit package name at line 1 +6. Global symbol "$l" requires explicit package name at line 1 +7. BEGIN not safe after errors--compilation aborted at line 17 +.

    Q: So what changed between the first example, that works, and the second example, that doesn't?

    A: Lacking other cues, the compiler parses  sub : as a label named sub, and thinks that I'm trying to call a subroutine named prototype... and from that point on things are totally out of whack.

    Solution: +. Anything that can remind the parser that it's not looking at a label will do the trick. Parenthesis around the sub : ... construct works, but + is easier, and probably more familiar to programmers who use + to get {....} to be treated as an anonymous hash ref constructor rather than as a lexical block.

    With that in mind, here's code that works:

    use strict; use warnings; no warnings 'experimental::signatures'; use feature qw/say signatures/; use List::Util qw(all); BEGIN { ( *array_numeq,*array_streq ) = map { my $compare = $_; + sub :prototype(\@\@) ($l,$r) { @$l == @$r && all { $compare->($l->[$_],$r->[$_]) } 0 .. $#$l } } sub { shift == shift }, sub { shift eq shift } } my @left = ( 1, 2, 3 ); my @right = ( 1, 2, 3 ); { local $" = ','; say "(@left) ", ( array_numeq @left, @right ) ? "matches" : "doesn't match", " (@right)"; }

    ...or how a single keystroke de-vexed the parse.

    A really simple example that breaks is this:

    my $subref = do{ sub : prototype($) ($s) { return $s; }; # Perl thinks sub: is a lab +el here. };

    I don't really see any way around the parsing confusion in the original version that doesn't work. That perl considers sub : to be a label in the absence of other cues is probably not something that can be fixed without making sub an illegal label. But if I were to file a bug report (which I haven't done yet), it would probably be related to the useless error message.

    This example is fairly contrived, but it's not impossible to think that subs with signatures and prototypes might be generated in some similar way as to fall prey to this mis-parse.

    Credit to mst and mauke on for deciphering why the compiler fails to DWIW.


Log In?

What's my password?
Create A New User
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (14)
As of 2014-11-24 16:20 GMT
Find Nodes?
    Voting Booth?

    My preferred Perl binaries come from:

    Results (143 votes), past polls