Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

The Monastery Gates

( #131=superdoc: print w/ replies, xml ) Need Help??

Donations gladly accepted

If you're new here please read PerlMonks FAQ
and Create a new user.

New Questions
Wait for individual sub processes
4 direct replies — Read more / Contribute
by crackerjack.tej
on Apr 25, 2015 at 03:01

    Dear monks,

    I am essentially writing a Perl script that divides a large input file for a text processing tool, so that I can process the files faster. I am working on a CentOS 6 based cluster, where each CPU has 16 cores. My idea is to split the input file into 16 parts, and run 16 instances of the text processing tool, and once all of them are done, I parse the output and merge it into a single file. In addition, the script will continue to process the next input file in a similar way. I have achieved that using fork(), wait() and exec() as follows (Omitting code that is not relevant):

    use strict; use warnings; use POSIX ":sys_wait_h"; #Split input files into parts and store the filenames into array @ +parts ... my %children; foreach my $part (@parts) { my $pid = fork(); die "Cannot fork for $part\n" unless defined $pid; if ($pid == 0) { exec("sh text_tool $part > $part.out") or die "Cannot exec + $part\n"; } print STDERR "Started processing $part with $pid at ".localtim +e."\n"; $children{$pid} = $part; } while(%children) { my $pid = wait(); die "$!\n" if $pid < 1; my $part = delete($children{$pid}); print STDERR "Finished processing $part at ".localtime."\n"; }

    While I got what I wanted, there is a small problem. Due to the nature of the text processing tool, some parts get completed much before others, in no specific order. The difference is in hours, which means that many cores of the CPU are idle for a long time, just waiting for few parts to finish.

    This is where I need help. I want to keep checking which part (or corresponding process) has exited successfully, so that I can start the processing of the same part of the next input file. I need your wisdom on how I can achieve this. I tried searching a lot on various forums, but did not understand correctly how this can be done.

    Thanks.

    ------UPDATE---------

    Using a hash, I can now find out which process is exiting when. But I fail to understand how to use this code in an if block, so that I can start the next process. Can someone help me with that? I have updated the code accordingly.

    ----------------UPDATE 2--------------

    I guess it's working now. Using Parallel::ForkManager, and a hash of arrays that stores the pids of each input file, I am able to track the sub processes of each file separately. By maintaining a count of number of subprocesses exited, I can call the sub for output parsing as soon as the count reaches 16 for an input file. I will come back if I run into any other problem.

    Thanks a lot for all the help :)

    P.S. Is there any flag that I have to set that this thread is answered/solved?

Refactoring Perl5 with XS++
2 direct replies — Read more / Contribute
by rje
on Apr 25, 2015 at 01:06

    Last time I mused aloud about "refactoring" Perl, I referenced Chromatic's statement/challenge:

    "If I were to implement a language now, I'd write a very minimal core suitable for bootstrapping. ... Think of a handful of ops. Think very low level. (Think something a little higher than the universal Turing machine and the lambda calculus and maybe a little bit more VMmy than a good Forth implementation, and you have it.) If you've come up with something that can replace XS, stop. You're there. Do not continue. That's what you need." (Chromatic, January 2013)

    I know next to nothing about XS, so I started reading perldoc.

    It seems to me that XS is a low-level language that provides the structures that Perl uses, but without the syntactic sugar. Yes, it primarily maps types between C and Perl. But it does more than that.

    Now this low-level language may be very nearly self-extending. I'll have to read more about XS.

Out of Memory Error : V-Lookup on Large Sized TEXT File
7 direct replies — Read more / Contribute
by TheFarsicle
on Apr 24, 2015 at 09:14
    Hello perlmonks,

    I am newbie to Perl & working on the Perl script to perform an action similar to V-Lookup.

    So,

    As an input I have some large sized text files around 200 MB. These text files are to be searched for all the records present in the another file, say Reference.txt (This file is normally not more than one MB)

    I have written script to find all the lines present in these large sized files based on text (string values) in Reference.txt file. All the found records are then written into a new file per each large file iteration.

    The script works fine for normal size like 30-40 MB but when it goes more than 100 MB or so. It throws out of memory error.

    I have designed these operations as subroutine and calling them.

    The code goes something like this...

    open (FILE, $ReferenceFilePath) or die "Can't open file"; chomp (@REFFILELIST = (<FILE>)); open OUTFILE, ">$OUTPUTFILE" or die $!; foreach my $line (@REFFILELIST) { open (LARGEFILE, $LARGESIZEDFILE) or die "Can't open File"; while (<LARGEFILE>) { my $Result = index($_, $line); if ($Result > 0) { open(my $FDH, ">>$OUTPUTFILE"); print $FDH $_; } } close(LARGEFILE); } close(OUTFILE); close(FILE);

    Can you please guide me on where I am going wrong and what would be the best way to address this issue?

    Thanks in advance.

    FR

DESTROY and AUTOLOAD in 5.20.1
4 direct replies — Read more / Contribute
by szabgab
on Apr 24, 2015 at 05:36
    Given this script:
    use strict; use warnings; use 5.010; use Greeting; say 'Hi'; { my $g = Greeting->new; } say 'Bye';
    and this module:
    package Greeting; use strict; use warnings; use 5.010; use Data::Dumper; sub new { my ($class) = @_; return bless {}, $class; } sub AUTOLOAD { our $AUTOLOAD; say $AUTOLOAD; } DESTROY { say 'destroy'; } 1;
    I can see the word "destroy" printed as I would expect. However, if I remove the DESTROY from the module I don't see AUTOLOAD being called instead of the missing DESTROY. I only checked it with 5.20.1 but I wonder what am I missing here?

    Update

    Reported with perlbug as RT #124387
Getting an unknown error
5 direct replies — Read more / Contribute
by andybshaker
on Apr 23, 2015 at 10:02

    Basically, different arrays have different pieces of information in them and I have to go from one to another to another from that. In this case, I have to go from each element in @Genes and extract its corresponding element from a long file which I read in as @lines. I keep getting a strange error that reads, syntax error at findscaffold.pl line 38, near "$N (" Does anyone know what this is? Here is the code. </p?

    my @Genes = qw(A B C D) my @ptt = ("19384..003059 0 - - A","203581..39502 0 + - B) my @contig = (); my @Coordinates; my @Number; my $R; foreach my $G (@Genes){ for my $x (0..$#ptt){ if($ptt[$x] =~ /$G/){ push(@Coordinates,"$ptt[$x]"); print "$ptt[$x]\n";} } } foreach my $C (@Coordinates){ push (@Number, split(" ", $C));} my %hash = (); my $file = "scaffold_contig.txt"; open(IN, "<$file") or die "Cannot open file $file\n"; my @lines = <IN>; foreach $1 (@lines){ chomp($1); my %columns = split(">", $1);} close(IN); print "$lines[1];\n" foreach my $N (@Number){ for $R (0..$#lines){ if($lines[$R] =~ /$N/){ print "lines[$R]\n" } } }

    Here is line 38: foreach my $N (@Number){

Retrieving content from couchdb using CouchDB::Client
3 direct replies — Read more / Contribute
by shivam99aa
on Apr 23, 2015 at 09:36

    I am able to create new documents using CouchDB::Client as well as able to verify if any doc is present. What i am not able to do is to retrieve the contents of any doc. The reason is i am not able to get the correct syntax which is to be used for it. I am not a perl genius so taking a look at the source code did not help me either.

    use warnings; use CouchDB::Client; my $c = CouchDB::Client->new(uri => 'http://127.0.0.1:5984/'); my $db = $c->newDB('test'); my $doc = $db->newDoc('12345', undef, {'foo'=>'bar'})->create; if ($db->docExists('12345')){ print "hello\n"; } #my $doc=CouchDB::Client::Doc->new($db); print $doc->retrieve('12345');

    I am able to create document but then i need to comment that line on next run as this will give storage error. But after commenting i have no way to retrieve the doc as i have no object remaining. But this should not be the constraint as there should be a way to retrieve doc using the db object by giving id to it.

Regex for files
5 direct replies — Read more / Contribute
by bmcquill
on Apr 22, 2015 at 22:01
    I'm trying to get better at regex and I'm starting with Perl. I want to be able to go through a directory and find all the files that begin with messages and MAY have a "." and a digit behind it, but it should not match something that has say .txt, .pl, etc. Any assistance is greatly appreciated. I want to find all the files that are messages, messages., messages.1, etc. but NOT messages.txt or messages.pl. Does that help?
Best practice for sending results to a user via email
4 direct replies — Read more / Contribute
by Anonymous Monk
on Apr 22, 2015 at 16:30
    Dear Monks,
    I hereby ask your wisdom on the following problem:
    I have set up a simple web-server in PHP, with a submission form (textarea). When the user submits the form, the contents are put into a file and a perl script is being executed on the file. The output of the script is written in a text file at the end, also an image is being produced.
    My question has two parts:
    1) Because this script takes some time to execute, I think it is not a good practice to just let it run on the web-server, since there is great possibility it will hang and then it will not output anything. So I thought the best is to just get the input from the user and then just send an email to him saying "your work is completed" with a link that will be an HTML page with the results.
    Does this sound reasonable practice to you?
    2) Can you give me some hints as to how such a thing is accomplished? I mean, what steps should I follow and maybe point me to some examples of sending emails to user that can direct the user to an HTML page with the final results?
    Thank you in advance!
Reading Text File into Hash to Parse and Sort
8 direct replies — Read more / Contribute
by Perl_Derek
on Apr 21, 2015 at 14:49
    Hello, I am just getting started with Perl and I have not used any programming languages in the past. I hope you can help me with this.

    I am not able to effectively use hashes to parse, sort or replace data from a text file. I have read through a couple of books (Learning Perl, Perl by Example) and browsed Perl websites, but I am still having difficulty grasping the concepts. For example, I am trying to take a text file and sort the results in any way I choose. I can't seem to find a real-world example where I read a file into a hash and parse/sort and/or replace data. Here is my code for parsing data using arrays which works:

    #!/usr/bin/perl

    use strict;

    use warnings;

    open (FILE, '<', 'FB_LPWAP.txt') or die ("ERROR: Could not read file");

    print <FILE>; #Prints entire file

    while (<FILE>){

    my @array = split /\t/, $_;

    print "$array[0]\t"; #Date

    print "$array1\t"; #Closing Price

    print "$array2\n"; #Weighted Average Price

    }

    close (FILE);


    I still have a lot to learn, but I am hoping someone could help point me in the right direction as I have not be able to use hashes on a text file that has multiple columns of data without receiving a number of syntax/compilation errors. The text file I have has multiple columns and I would like to parse certain columns using hashes. I also have would like to sort the data. I don't know how to reference my columns. Here is the sample data in my test file:

    TICKER| CO. NAME| PRICE| MARKET CAP| INDUSTRY

    ABC | ABC Co.| 15.5| 5000| Industrials

    AB | Alpha Beta| 12| 2500| Materials

    DOZ | ZZZZZ| 5.05| 2800| Telecom

    DX | DX Co.| 77.2| 12000| Industrials

    DXX | DXX Co.| 50.25| 9000| Utilities



    Thank you.
Bizarre and provocatively irritating cpantester reports
2 direct replies — Read more / Contribute
by syphilis
on Apr 21, 2015 at 10:20
    Hi,

    I'm looking at this fail report for a module I've written and I'm seeing (in both Ubuntu's Firefox browser and Microsoft's IE8 browser):
    GMPz.xs: In function âRmpz_cdiv_q_2expâ: GMPz.xs:489: error: âmp_bitcnt_tâ undeclared (first use in this functi +on)
    The first thing that strikes me is that, in GMPz.xs, there is no such function as "âRmpz_cdiv_q_2expâ".
    The second thing that strikes me is that, in GMPz.xs, there is no declaration of "âmp_bitcnt_tâ".

    Sure, there's a function called "Rmpz_cdiv_q_2exp", and there's an "mp_bitcnt_t" data type declared - and, if the complaint was about the declaration of the "mp_bitcnt_t" data type then I would simply assume that Bingos had just set up yet another smoker that contained some antiquated build of gmp whose headers did not define the "mp_bitcnt_t" data type.
    And I'd probably then go ahead and fix that shortcoming in Math::GMPz.

    But ... no, the complaint is clearly and definitively about a non-existent "âmp_bitcnt_tâ" data type.

    I'm not the first person to have been subjected to such fuckbrained garbage, and I'm sure that lots of monks will have seen similar before.
    But why does this happen ?

    Cheers,
    Rob
YAML Alias
1 direct reply — Read more / Contribute
by ianwinter
on Apr 21, 2015 at 09:43
    Hi, I'm trying to setup a YAML file using YAML::XS, now, for the most part it's fine if I use a standard YAML format, but, I'd like to include defaults and aliases, but, whenever I load a "defaulted" block it's in a << hash first.
    #!/usr/bin/env perl use Data::Dumper; use YAML::XS; my $config = LoadFile('config.yml')->{'production'}; print Dumper($config);
    defaults: &defaults src: host: localhost database: db username: root password: nfs: /tmp development: <<: *defaults test: <<: *defaults production: <<: *defaults nfs: ian
    I'd also ideally like to be able to override the defaults. If my block looks like this:
    production: src: host: localhost database: db username: root password: nfs: /tmp
    All is well and fine. Any tips gratefully received, the alias stuff comes from a ruby background.
How do I go from procedural to object oriented programming?
11 direct replies — Read more / Contribute
by Lady_Aleena
on Apr 20, 2015 at 16:43

    (Jenda corrected me on what my subroutines are. They are procedures not functions. The title of this node and where ever the word, or any derivatives of, function has been changed to procedure.)

    Going from procedural programming to object oriented programming has been on my mind a lot lately. I have been told by a couple of people my code is very close to being OO, however when I gave OO a try the first time, I was told I was doing it all wrong. I would like to see how close I am, but I am having a hard time learning objects because the tutorials I have found start writing code right away. I have yet to find a tutorial which starts with the objective of the objects being written. For example I want...

    Criminal Minds is a 2005 television series which is still running.

    Iron Man is a 2008 film based on comics by Marvel Comics.

    The tutorials also do not show the data being munged up front like...

    They all start right in on the objects, leaving me completely in the dark about what the end goal for the objects is. From above you know my objective and have the data to reference while reading the procedures I wrote to get to my objective. The whole module is here if you would like to see the bigger picture.

    Now putting it all together to get my objective.

    I would like to know how close I am to having objects, what they would look like if I am close, and if there is anything which does not fit into OO. Is there anywhere I need to change my thinking (which will be hard since I have been doing things as above for a long while now)?

    If any of the OO tutorials were written with the objective, data, code, and wrap-up in that order; I might get them.

    Another reason I am doing this is to get my mind off of my pain and impending surgery. I am doing everything I can think of to get my mind off of them and stave off panic. Would you please help me?

    I hope I am not asking too much. Please sit back and enjoy a cookie.

    No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
    Lady Aleena
New Meditations
Want to make an Everything site of your own?
1 direct reply — Read more / Contribute
by thomas895
on Apr 20, 2015 at 03:48

    Ever wanted to experiment with the engine PerlMonks is built on? I did, but it's rather difficult to install, so I thought I'd write this for anyone who wanted to give it a go themselves.

    My Perl is v5.16.2. YMMV for others.

    Requirements:

    • A MySQL/MariaDB server

    • GNU patch

    • C compiler

    • Some knowledge of how Apache works

    Estimated time: a quiet evening or so

    1. Download Apache 1.3.9 and mod_perl 1.14 from your nearest mirror, then unpack them. You may use other versions, but this guide won't apply to them.

    2. I wanted to install it all to my home directory. I ran mod_perl's Makefile.PL like so:

      perl Makefile.PL APACHE_SRC=../apache_1.39/src APACHE_PREFIX=$HOME/opt/apache1.3.9 DO_HTTPD=1 USE_APACI=1 EVERYTHING=1 PREFIX=/home/thomas/perl5
      Adjust as needed.
    3. If you have a relatively new version of gcc and a Perl v5.14 or newer, you will need to make some changes to the source. Apply this patch file to the mod_perl directory, and this one to the apache directory. It does the following (you can skip these details if you want):

      • In v5.16, $<; and friends are no longer cached. I just tried removing the problematic section that used these variables, and that seemed to work. You might not be able to run your server as root (which requires being able to change its own permissions), but I haven't checked.

      • For some reason, it was a trend to supply your own version of getline (maybe the libc one was broken, haven't looked it up) in those days. In any case, gcc complains about it, so I updated all of the code to use the Apache one. (it only affects the password utility, which is not really needed in our case, but it does cause make to fail)

      • In v5.14, you can't use the result of GvCV and friends as lvalues anymore, so I replaced the places where something was assigned to the result of that function with the appropriate _set macro, as the delta advises.

    4. Run make and make install, and go make some coffee. You can make test, too, but then also grab a sandwich.

    5. Try to start Apache as make install's instructions suggest, to make sure it works. You may need to choose a different port number, do so with the Listen and Port options in httpd.conf

      • If you installed Apache locally, you will need to modify apachectl and/or your shell startup script: make sure that the PERL5LIB environment variable is set to where mod_perl's Perl libraries are installed.

      Now for Everything else...

    6. Download this, unpack it, and follow QUICKINSTALL up to (but not yet including) the install_esite

      • When running Makefile.PL, if you want to install locally, don't forget to set PREFIX accordingly.

      • It is not necessary to let it append things to your httpd.conf, in a later step I'll show you why and what to do instead.

    7. If you have a modern mysql/mariadb, some of the SQL scripts won't work. Here is another patch to fix them.

      • It mostly has to do with the default values of the integer columns: by getting rid of the default value of a quoted zero, mysql accepts it.

      • There is also a timestamp column that has a size in the script, but mysql doesn't like that, so by getting rid of it, it works again.

    8. Now run install_esite, as QUICKINSTALL says.

    9. For some reason, index.pl only showed up as text, perhaps due to the other mod_perl settings I'd been playing with, or perhaps it was something else. I added this to httpd.conf, and then it worked:

      PerlModule Apache::Registry PerlModule Apache::DBI PerlModule CGI <Files *.pl> SetHandler perl-script PerlHandler Apache::Registry Options +ExecCGI PerlSendHeader On PerlSetupEnv On </Files>
    10. (Re)start Apache, visit /index.pl, and have lots of fun!

    If something doesn't work for you, post it below.

    Happy hacking!


    Edit: forgot the third patch

    -Thomas
    "Excuse me for butting in, but I'm interrupt-driven..."
    Did you know this software was released when I was only 3 years old? Still works, too -- I find that amazing.
Data-driven Programming: fun with Perl, JSON, YAML, XML...
6 direct replies — Read more / Contribute
by eyepopslikeamosquito
on Apr 19, 2015 at 04:41

    The programmer at wit's end for lack of space can often do best by disentangling himself from his code, rearing back, and contemplating his data. Representation is the essence of programming.

    -- from The Mythical Man Month by Fred Brooks

    Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

    -- Rob Pike

    As part of our build and test automation, I recently wrote a short Perl script for our team to automatically build and test specified projects before checkin.

    Lamentably, another team had already written a truly horrible Windows .BAT script to do just this. Since I find it intolerable to maintain code in a language lacking subroutines, local variables, and data structures, I naturally started by re-writing it in Perl.

    Focusing on data rather than code, it seemed natural to start by defining a table of properties describing what I wanted the script to do. Here is a cut-down version of the data structure I came up with:

    # Action functions (return zero on success). sub find_in_file { my $fname = shift; my $str = shift; my $nfound = 0; open( my $fh, '<', $fname ) or die "error: open '$fname': $!"; while ( my $line = <$fh> ) { if ( $line =~ /$str/ ) { print $line; ++$nfound; } } close $fh; return $nfound; } # ... # -------------------------------------------------------------------- +---- # Globals (mostly set by command line arguments) my $bldtype = 'rel'; # -------------------------------------------------------------------- +---- # The action table @action_tab below defines the commands/functions # to be run by this program and the order of running them. my @action_tab = ( { id => 'svninfo', desc => 'svn working copy information', cmdline => 'svn info', workdir => '', logfile => 'minbld_svninfo.log', tee => 1, prompt => 0, run => 1, }, { id => 'svnup', desc => 'Run full svn update', cmdline => 'svn update', workdir => '', logfile => 'minbld_svnupdate.log', tee => 1, prompt => 0, run => 1, }, # ... { id => "bld", desc => "Build unit tests ${bldtype}", cmdline => qq{bldnt ${bldtype}dll UnitTests.sln}, workdir => '', logfile => "minbld_${bldtype}bldunit.log", tee => 0, prompt => 0, run => 1, }, { id => "findbld", desc => 'Call find_strs_in_file', fn => \&find_in_file, fnargs => [ "minbld_${bldtype}bldunit.log", '[1-9][0-9]* errors +' ], workdir => '', logfile => '', tee => 1, prompt => 0, run => 1, } # ... );

    Generally, I enjoy using property tables like this in Perl. I find them easy to understand, maintain and extend. Plus, a la Pike above, focusing on the data first usually makes the coding a snap.

    Basically, the program runs a specified series of "actions" (either commands or functions) in the order specified by the action table. In the normal case, all actions in the table are run. Command line arguments can further be added to specify which parts of the table you want to run. For convenience, I added a -D (dry run) option to simply print the action table, with indexes listed, and a -i option to allow a specific range of action table indices to be run. A number of further command line options were added over time as we needed them.

    Initially, I started with just commands (returning zero on success, non-zero on failure). Later "action functions" were added (again returning zero on success and non-zero on failure).

    As the table grew over time, it became tedious and error-prone to copy and paste table entries. For example, if there are four different directories to be built, rather than having four entries in the action table that are identical except for the directory name, I wrote a function that took a list of directories and returned an action table. None of this was planned, the script just evolved naturally over time.

    Now is time to take stock, hence this meditation.

    Coincidentally, around the same time as I wrote my little script, we inherited an elaborate testing framework that specified tests via XML files. To give you a feel for these, here is a short excerpt:

    <Test> <Node>Muss</Node> <Query>Execute some-command</Query> <Valid>True</Valid> <MinimumRows>1</MinimumRows> <TestColumn> <ColumnName>CommandResponse</ColumnName> <MatchesRegex row="0">THRESHOLD STARTED.*Taffy</MatchesRegex> </TestColumn> <TestColumn> <ColumnName>CommandExitCode</ColumnName> <Compare function="Equal" row="0">0</Compare> </TestColumn> </Test>

    Now, while I personally detest using XML for these sorts of files, I felt the intent was good, namely to clearly separate the code from the data, thus allowing non-programmers to add new tests.

    Seeing all that XML at first made me feel disgusted ... then uneasy because my action table was embedded in the script rather than more cleanly represented as data in a separate file.

    To allow my script to be used by other teams, and by non-programmers, I need to make it easier to specify different action tables without touching the code. So I seek your advice on how to proceed:

    • Encode the action table as an XML file.
    • Encode the action table as a YAML file.
    • Encode the action table as a JSON (JavaScript Object Notation) file.
    • Encode the action table as a "Perl Object Notation" file (and read/parse via string eval).
    • Turn the script and action table/s into Perl module/s.

    Another concern is that when you have thousands of actions, or thousands of tests, a lot of repetition creeps into the data files. Now dealing with repetition (DRY) in a programming language is trivial -- just use a function or a variable, say -- but what is the best way of dealing with unwanted repetition in XML, JSON and YAML data files? Suggestions welcome.

    References

Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (4)
As of 2015-04-25 11:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who makes your decisions?







    Results (477 votes), past polls