Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

The Monastery Gates

( #131=superdoc: print w/replies, xml ) Need Help??

Donations gladly accepted

If you're new here please read PerlMonks FAQ
and Create a new user.

New Questions
DBD::Sqlite queries slow - and gives wrong results
5 direct replies — Read more / Contribute
by astroboy
on Dec 11, 2017 at 14:42

    Ok, this is pretty weird IMHO. I have some Perl code written to create and populate a SQLite database. It's very simple - a list of users and their AD groups in two different tables. It's been running ok for at least 18 months. I have some other Perl code that will query the database. This querying code has started running slowly, and in some cases returns no results where I know there are records - I can test it by pasting the SQL in to the SQLiteStudio v2.1.5 editor. It returns the results instantly.

    Here's an example of some Perl query code:

    #!/usr/bin/perl -w use strict; use DBI; my $dbfile = 'C:/db/employee.db'; my $dbh = DBI->connect( "dbi:SQLite:dbname=$dbfile", "", "", { RaiseError => 1, AutoCommit => 0, } ); my $sql = q{ select e.* from employees e, groups g where e.sam_account_name = g.sam_account_name and g.group_name = 'Group Name' order by last_name, first_name }; foreach my $emp (@{$dbh->selectall_arrayref($sql, {Slice => {}})}) { printf( "\t\t%-15s %-15s: (%s)\n", $emp->{first_name}, $emp->{last_name}, $emp->{sam_account_name}, ); } $dbh->rollback;

    The query in $sql is copied and pasted into the SQLiteStudio editor. Regardless of the group name I choose, the editor returns rows in approximately 0.001 seconds. Perl takes several seconds may return no rows, even where there are matching candidates. If I change the group name, it may return the same rows as the editor but it can take 10+ seconds. The result set is always small 2 - 20 rows depending on the group. The database is 520MB.

    Note the code above was written to simplify my problem. The actual code has the group name as a placeholder, and I simply fetch each row rather than returning everything into an array as I do above. Regardless, the results are the same.

    This is running on Windows 7. I was using DBD::SQLite 1.54 - As a test this morning I upgraded to 1.55_04 to see if there were any fixes in the developer version. I recreated the database, but the results are still the same

unique sequences
7 direct replies — Read more / Contribute
by Anonymous Monk
on Dec 10, 2017 at 18:31

    I am really new to perl and am taking a course on it. I wrote the following program for an assignment and am getting the incorrect output. I'm getting over a million lines while the expected output is closer to 250,000. The last 12 nts need to be unique to the genome. I have a feeling it's due to my regex. Any advice would be greatly appreciated. Thankyou.

    #!/usr/bin/perl use strict; use warnings; my %windowSeqScore = (); my $input_file = '/scratch/Drosophila/dmel-all-chromosome-r6.02.fasta' +; my $sequenceRef = loadSequence($input_file); my $output_file = 'unique12KmersEndingGG.fasta'; open (KMERS,">", $output_file) or die $!; my $windowSize = 21; my $stepSize = 1; for ( my $windowStart = 0 ; $windowStart <= ( length ( $$sequenceRef ) + - $windowSize ); $windowStart += $stepSize ) { my $windowSeq = substr ( $$sequenceRef, $windowStart, $windowS +ize); if ($windowSeq =~ /([ATCG]{10}GG$)/) { $windowSeqScore{$windowSeq}++; } } my $count = 0; for (keys %windowSeqScore){ $count ++; if ($windowSeqScore{$_} == 1 ) { print KMERS ">crispr_$count", "\n", $_, "\n"; } } sub loadSequence { my ($sequenceFile) = @_; my $sequence = ""; unless ( open( FASTA, "<", $sequenceFile ) ) { die $!; } while (<FASTA>){ my $line = $_; chomp ($line); if ($line !~ /^>/ ) { $sequence .= $line; } } return \$sequence; }

    This is some of the output I'm getting


    this is some of the expected output I should be getting

Rename files in gzip tarball: No such file in archive: '/path/to/file1.txt'
2 direct replies — Read more / Contribute
by Bowlslaw
on Dec 08, 2017 at 19:03

    This program reads a list of files from the directory specified on the command line, creates an array of hashes, where each file has key path, size, and sha256sum.

    I am trying to create a gzipped tarball of the files, where each files is name the checksum appended with the file's original extension. I create a gzipped tarball of the files successfully. However, when I try to use Archive::Tar's rename method, I am met with this error: No such file in archive: '/path/to/file1.txt' at ./ line 62. This error repeats for each file in the archive.

    Is it because the archive is just a flat list of files? If so, how does one use the rename method?

    use strict; use warnings; use Data::Dumper qw(Dumper); use File::Spec qw(catfile rel2abs); use Digest::SHA qw(sha256_hex); use Archive::Tar; use Archive::Tar::File; my $dir = $ARGV[0]; my $url = $ARGV[1]; my @AoH; my @checksumfiles; my $tar = Archive::Tar->new; my $archive = "archive.tar.gz"; opendir DIR, $dir or die "cannot open dir $dir: $!\n"; chdir $dir or die "cannot navigate to dir $dir: $!\n"; while(my $file = readdir DIR) { next unless(-f File::Spec->catfile($dir, $file)); next if($file =~ m/^\./); my $fullpath = File::Spec->rel2abs($file); my $fullsize = -s File::Spec->catfile($dir, $file); my $fullid = sha256_hex($fullpath); my %hash = ( path => $fullpath, size => $fullsize, id => $fullid, ); push(@AoH, \%hash); } my @array; for my $i(0..$#AoH) { no warnings 'uninitialized'; my ($ext) = $AoH[$i]{path} =~ (/(\.[^.]+)$/); my $idext = $AoH[$i]{id} . $ext; push(@checksumfiles, $idext); push(@array, $AoH[$i]{path}); } Archive::Tar->create_archive($archive, COMPRESS_GZIP, @array); #print Archive::Tar->list_archive($archive, COMPRESS_GZIP), "\n"; for my $i(0..$#array) { $tar->rename($array[$i], $checksumfiles[$i]); } print Dumper sort \@array; print Dumper sort \@checksumfiles; #print Dumper sort \@AoH;
Tidying and simplifying a regular expression
4 direct replies — Read more / Contribute
by Dallaylaen
on Dec 08, 2017 at 12:00

    Hello monks and nuns,

    I'm just wondering if there is a module or recipe to strip a regular expression of meaningless grouping. Consider the following code:

    bash$ perl -wle 'my $rex = qr/./; $rex = qr/$rex./ for 1..10; print $r +ex;' (?^:(?^:(?^:(?^:(?^:(?^:(?^:(?^:(?^:(?^:(?^:.).).).).).).).).).).)

    It's relatively easy to spot that it's just a (?:..........), however, the expression is not stringified exactly like that. Is there a way possible to tidy it up automatically?

    Inspired by this node, but I think it would be nice to have a simplifier anyway...

Guidelines for listing functions/methods in POD?
7 direct replies — Read more / Contribute
by Dallaylaen
on Dec 07, 2017 at 15:34

    Hello dear esteemed monks,

    Having published some modules, I finally started wondering about formatting function names

    For some reason, cannot even recall why, I began documenting my modules' functions using a header with usage example:

    =head2 frobnicate( $foo, $bar )

    Now it looks rather cumbersome to me, so I'm leaning towards

    =head2 frobnicate =over =item frobnicate( $foo, $bar ) =item frobnicate( \%baz ) =back

    But I see that many CPAN authors go even further and remove functions/methods from index altogether, leaving only

    =item frobnicate()

    I for one prefer more structured documentation. Where can I find guidelines for doing it properly? What are the reasons for and against each practice? At least Test::Pod::Coverage permits all three...

    Oh, it looks like I'm sold on the second variant: after re-reading perldoc perlpod it turns out that sections are linkable via L<Foo::Bar/frobnicate>. Still posting this, there sure is something to add to my thoughts!

Single sign on with AD
2 direct replies — Read more / Contribute
by newbie200
on Dec 07, 2017 at 10:45

    Hello, I am currently new to perl. I am trying to implement sso on a perl web app but don't seem to get my head round it. here are the technical details

    on apache i downloaded, installed and configured the module. this allowed me to detect a user logged on a computer, I was able to know if the user was in a local domain or global domain. now comes the tricky part. i have to program in my web app an sso which sees the person logged on from apache. I also have ldap configured. It just seems so confusing to me

    I would be glad if someone can explain more on this. do i need an sso server? how do i connect my perl webapp to read my apache and get the information required.
parsing whois data
2 direct replies — Read more / Contribute
by Discipulus
on Dec 06, 2017 at 15:11
    Hello monks and nuns,

    I'm in the situation where I want to extract some whois data, live. I tested Net::Whois::Raw and Net::Whois::Parser and while the first returns all the results in a single big string (as raw in the namespace suggests) the latter parses the results and output a hash of scalars/arrays.

    The problem is that Net::Whois::Parser does not returns all informations for all tld domains: for example .it domains return no nameservers fields because it happens to be a multiline record for all domain i tested.

    Net::Whois::Parser by other hand provides a way to specify a custom parser for specific whois servers.

    Let's say I need Domain status and Nameserver fields (but maybe more) there is a better universalistic way to get them parsed for every top level domains?

    Due to my ignorance i ignore if for each tld there is only one whois server or one format for the data or if i can have multiple format for different, for example, .it domains (this will be a pain..).


    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
sending mail
2 direct replies — Read more / Contribute
by dangiles
on Dec 06, 2017 at 09:47

    When I use mailx or sendmail (on RHEL 6.9, Perl v5.10.1) from within perl such as

    qx( qq\cat $log | mailx -r $EMAIL_ADMIN -s "Exchange Backup Status $rp +tdate" $SEND_TO\ );
    I receive the email in outlook with the following in the body of the message:
    Message-ID: <> User-Agent: Heirloom mailx 12.4 7/29/08 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit
    If I run the command directly from the o/s, or have the perl code create a shell script with this command that then gets executed, this preamble does not appear.

    Anyone know what may be going on here? I really don't want to use packages for such a simple operation.

    Dan Giles, simple novice
Compiling Perl using Win10 Subsystem for Linux (WSL)
3 direct replies — Read more / Contribute
by holli
on Dec 06, 2017 at 04:39
    I am trying to compile Perl using the new Linux on Windows 10 feature in the new shiny bash. So I installed perlbrew and gave it a go. Here is the output of the build process which failed in the test phase. (I removed lines with passing tests) Looks like there is a problem with io sockets. How do I fix that?
Problems with values expressed in scientific notation not being returned through DBI
2 direct replies — Read more / Contribute
by DazedConfuzed
on Dec 05, 2017 at 16:47

    I've been using Perl scripts as an interface to my MS Access (Office 2007) databases for quite some time but recently ran into what seems to be an obscure issue. I say 'obscure' because internet searches with appropriate keyword terms usually land me on a PerlMonks page where usually quickly find my answer since others have experienced similar issues before me. I can't seem to dig up a pre-existing explanation/solution and so I present a distilled synopsis of the original more complicated problem.

    The Setup:

    Basically, I've distilled this down to a simple example using a simplified MS Access DB. The database (Play.mdb) is sitting in a folder and a symbolic data source (System DSN) has been created called "PLAY". I have to use the 32 bit version of the ODBC Data Source Administrator as I'm connecting to an older Office 2007 database which is only available in 32 bit architecture (though my Win 7 is 64-bit). Yup. I'm a dinosaur using an old OS and older (purchased) copy of MS Office and I don't feel like upgrading to new software if that is the solution.

    The simplified database contains a single table called Test which has but a single row and contains 4 columns generically labeled (A-D). These columns all have the Data Type=Number and columns A-C are defined as Single while I've changed column D to Double.

    The Code:

    No laughing at inefficient or na´ve Perl code (I learn just enough Perl to get by).

    use DBI; my ($dbh, $sth, $a, $b, $c, $d); $dbh = DBI->connect('DBI:ODBC:PLAY', '', '', {RaiseError => 1, AutoCom +mit => 1}); $sth = $dbh->prepare("SELECT A, B, C, D FROM Test"); $sth->execute(); ($a, $b, $c, $d) = $sth->fetchrow; print "A=$a\n"; print "B=$b\n"; print "C=$c\n"; print "D=$d\n"; print "\n"; $sth = $dbh->prepare("SELECT A+0, B+0, C+0, D+0 FROM Test"); $sth->execute(); ($a, $b, $c, $d) = $sth->fetchrow; print "A=$a\n"; print "B=$b\n"; print "C=$c\n"; print "D=$d\n";
    The Output:
    A=6.02518 B= C= D=8.99280607700348E-2 A=6.02517986297607 B=3.50719429552555E-2 C=8.99280607700348E-2 D=8.99280607700348E-2

    The Problem:

    Basically, in a nutshell, I noticed this when certain calculated values in a database table were being returned as empty (but not 'undef') values from my query. You'll see in the first query that the values for B and C are just plain missing. The second query only varies by adding zero to the values for the columns (which I assume may be holding the computed amount as a Double value rather than Single). This allows you to get an idea of what the missing numbers are in the first query. To test this precision theory I changed column D in the table to a Double instead of a Single and then it started showing up as expected in the first query. It should be noted that this problem only seems to occur when the (single precision) value ends up being expressed as a number in scientific notation (e.g. "8.992806E-02").

    As I mentioned earlier, this problem is really as simple of a distillation as I can seem to make from what was a much larger program that generates many MS Access tables and fetches the entire contents of the table with a ($sth->fetchall_arrayref) before pasting it into a programmatically constructed spreadsheet representation of the database table. I've tried to strip-away all that clutter to show a simpler example of scientific notation numbers not being returned as expected (or, at least, as I expected) from a simple query.

    As my actual query involves a "SELECT * FROM $table_name" type of a query, the "+0" hack will not work as a solution (would be too ugly to live with anyway). My other option is to go back through all of my code that generates these tables and only use DOUBLE rather than SINGLE in my CREATE TABLE statements. This may provide an appropriate workaround (though I do not need that much precision on these values). The other (more tedious) change might be to find the parts of the code that are calculating these values and specifically round the results to just a handful of digits which should keep some of them from having to be expressed by scientific notation (which seems to be the root of this problem).

    As I need to progress on with this, I'll likely try a SINGLE => DOUBLE type solution to see if that will get me past this roadblock but I'd really like to know why it seems that single precision numbers expressed in scientific notation cannot be retrieved through the DBI interface. I'm open to all the enlightenment I can get on this quirky little issue.


    Been dazed and confused for so long...

usage of '+' sign in say statement
5 direct replies — Read more / Contribute
by seki
on Dec 05, 2017 at 11:19

    Hi Monks,

    while looking for a setlocale + strftime example, I stumbled on this gist that makes a weird (for me) usage of the '+' sign with 'say':

    use strict; use warnings; use utf8; use 5.10.0; use POSIX (); sub localize_strftime { my $locale = shift; my $default = POSIX::setlocale(POSIX::LC_TIME); POSIX::setlocale(POSIX::LC_TIME, $locale); my $retval = POSIX::strftime(@_); POSIX::setlocale(POSIX::LC_TIME, $default); return $retval; } say+localize_strftime('en_US', '%a, %d %b %Y %T %z', localtime(time));

    Has it a specific meaning there, or is it just an habit with a precise goal in everyday code?

    The best programs are the ones written when the programmer is supposed to be working on something else. - Melinda Varian
Monitor new file
4 direct replies — Read more / Contribute
by colox
on Dec 05, 2017 at 07:16

    Dear Monks, From my other posts/inquiries, I am using File::Find to search through a directory for files new than a timestamp I saved into a hash file. However, I realized this is very inefficient as it has to comb through the whole directory & check each & every file in it to determine if the file timestamp is newer than my reference (hash). Any recommendation (module) that would do the task more efficient? Thank you as usual for your valuable inputs & advise.

New Meditations
The problem of "the" default shell
4 direct replies — Read more / Contribute
by afoken
on Dec 09, 2017 at 08:17

    I've got a little bit tired of searching my "avoid the default shell" postings over and over again, so I wrote this meditation to sum it up.

    What is wrong with the default shell?

    In an ideal world, nothing. The default shell /bin/sh would have a consistent, well-defined behaviour across all platforms, including quoting and escaping rules. It would be quite easy and unproblematic to use.

    But this is the real world. Different platforms have different default shells, and they change the default shell over time. Also, shell behaviour changed over time. Remember that the Unix family of operating systems has evolved since the 1970s, and of course, this includes the shells. Have a look at "Various system shells" to get a first impression. Don't even assume that operating systems keep using the same shell as default shell.

    And yes, there is more than just the huge Unix family. MS-DOS copied concepts from CP/M and also a very little bit of Unix. OS/2 and the Windows NT family (including 2000, XP, Vista, 7, 10) copied from MS-DOS. Windows 1-3, 9x, ME still ran on top of DOS. From this tree of operating systems, we got and cmd.exe.

    By the way: Modern MacOS variants (since MacOS X) are part of the Unix family, and so is Android (after all, it's just a heavily customized Linux).

    Some ugly details:

    And when it comes to Windows (and DOS, OS/2), legacy becomes really ugly.

    So, to sum it up, there is no thing like "the" default shell. There are a lot of default shells, all with more or less different behaviour. You can't even hope that the default shell resembles a well-known family of shells, like bourne. So there is much potential for nasty surprises.

    Why and how does that affect Perl?

    Perl has several ways to execute external commands, some more obvious, some less. In the very basic form, you pass a string to perl that roughly ressembles what you would type into your favorite shell:

    • system('echo hello');
    • exec('echo hello');
    • open my $pipe,'echo hello |' or die "Can't open pipe: $!"; my $hello=do { local $/; <$pipe> }; close $pipe;
    • my $hello=qx(echo hello);
    • my $hello=`echo hello`;

    Looks pretty innocent, doesn't it? And it is, until you want to start doing real-world things, like passing arguments containing quotes, dollar signs, or backslashes to an external program. You need to know the quoting rule of whatever shell happens to be the default shell.

    For those cases, perl is expected to pass the string to /bin/sh for execution. Except that in this innocent case, and several other cases, perl does not invoke the default shell at all. Burried deep in the perl sources, there is some heuristics happening. If perl thinks that it can start the executable on its own, because the command does not contain what is documented as "shell metacharacters", perl splits the command on its own and can avoid invoking the default shell.

    Why? Because perl can easily figure out what the shell would do, and do it by itself instead. This avoids a lot of overhead and so is faster and does not use as much memory as invoking the shell would.

    Unfortunately, the documentation is a little bit short on details. See "Perl guessing" in Re^2: Improve pipe open? (redirect hook): From the code of Perl_do_exec3() in doio.c (perl 5.24.1), it seems that the word "exec" inside the command string triggers a different handling, and some of the logic also depends on how perl was compiled (preprocessor symbol CSH).

    If you don't need support from the default shell, you can help perl by passing system, exec, and open a list of arguments instead of a string. This "multi-argument" or "list form" of the commands always avoids the shell, and it completely avoids any need to quote.

    (Well, at least on Unix. Windows is a completely different beast. See Re^3: Perl Rename and Re^3: Having to manually escape quote character in args to "system"?. It should be safe to pretend that you are on Unix even if you are on Windows. Perl should do the right thing with the "list form".)

    So our examples now look like this:

    • system('echo','hello','here','is','a','dollar:','$');
    • exec('echo','hello','here','is','a','dollar:','$');
    • open my $pipe,'-|','echo','hello','here','is','a','dollar:','$' or die "Can't open pipe: $!"; my $hello=do { local $/; <$pipe> }; close $pipe;

    Did you notice that qx() and its shorter alias `` don't support a list form? That sucks, but we can work around that by using open instead. Writing a small function that wraps open is quite easy. See "Safe pipe opens" in perlipc.

    Edge cases

    OK, let's assume I've convinced you to use the list forms of system, exec, and open. You want to start a program named "foo bar", and it needs an argument "baz". Yes, the program has a space in its name. This is unusual but legal in the Unix family, and quite common on Windows.

    • system('foo bar','baz');
    • exec('foo bar','baz');
    • open my $pipe,'-|','foo bar','baz' or die ...

    or even:

    my @command=('foo bar','baz'); and one of:

    • system @command;
    • exec @command;
    • open my $pipe,'-|',@command or die ...

    All is well. Perl does what you expect, no default shell is ever involved.

    Now, "foo bar" get's an update, and you no longer have to pass the "baz" argument. In fact, you must not pass the "baz" argument at all. Should be easy, right?

    • system 'foo bar';
    • exec 'foo bar';
    • open my $pipe,'-|','foo bar' or die ...


    my @command=('foo bar'); and one of:

    • system @command;
    • exec @command;
    • open my $pipe,'-|',@command or die ...

    Wrong! system, exec, and even open in the three-argument form now see a single scalar value as the command, and start once again guessing what you want. And they will wrongly guess that you want to start "foo" with an argument of "bar".

    The solution for system and exec is hidden in the documentation of exec: Pass the executable name using indirect object syntax to system or exec, and perl will treat the single-argument list as list, and not a single command string.

    • system { 'foo bar' } 'foo bar';
    • exec { 'foo bar' } 'foo bar';


    my @command=('foo bar'); and one of:

    • system { $command[0] } @command;
    • exec { $command[0] } @command;

    If the command list is not guaranteed to contain at least two arguments (e.g. because arguments come from the user or the network), you should always use the indirect object notation to avoid this trap.

    Did you notice that we lost another way of invoking external commands here? There is (currently) no way in perl to use pipe open with a single-element command list without triggering the default shell heuristics. That's why I wrote Improve pipe open?. Yes, you can work around by using the code shown in "Safe pipe opens" in perlipc and using exec with indirect object notation in the child process. But that takes 10 to 20 lines of code just because perl tries to be smart instead of being secure.

    Avoiding external programs

    Why do you want to run external programs? Perl can easily replace most of the basic Unix utilities, by using internal functions or existing modules. And as an additional extra, you don't depend on the external programs. This makes your code more portable. For example, Windows does not have ls, grep, awk, sed, test, cat, head, or tail out of the box, and find is not find, but a poor excuse for grep. If you use perl functions and modules, that does not matter at all. Likewise, not all members of the Unix family have the GNU variant of those utilities. Again, if you use perl functions and modules, it does not matter.

    ToolPerl replacement
    echoprint, say
    rm -rFile::Path
    mkdir -pFile::Path
    grepgrep (note: you need to open and read files manually)
    ls, findFile::Find, glob, stat, lstat, opendir, readdir, closedir
    test, [, [[stat, lstat, -X, File::stat
    cat, head, tailopen, readline, print, say, close, seek, tell
    lnlink, symlink
    curl, wget, ftpLWP::UserAgent and friends
    sshNet::SSH2, Net::OpenSSH

    Note: The table above is far from being complete.


    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Log In?

What's my password?
Create A New User
[Eily]: hello perlmonks :)
[Corion]: choroba: I think it's great that your kids (have to) perform in public - that makes you appreciate any kind of show much better IMO

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2017-12-12 09:04 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (327 votes). Check out past polls.