Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

buffering issue?

by dannyjmh (Initiate)
on Mar 31, 2013 at 13:09 UTC ( #1026353=perlquestion: print w/ replies, xml ) Need Help??
dannyjmh has asked for the wisdom of the Perl Monks concerning the following question:

Hey there, monks. I think i'm having a buffering issue since i need to read and parse big text files (created by myself in previous lines of the code) to finally print things in another file. At some point, after reading a file with 90855 lines, the script is not reading a line of the next file completely. I have counted the number of characters read until this happens: 233467, and therefore tried to flush the buffer and sleep before reading the next line of the file. Doesn't work. Any suggestion, please? thanks a lot. The part of the code coming:

for my $o (0..1){ if ($o==0){ @files = reverse <*_SITES_3utr>; }else{ @files = reverse <*_SITES_cds>; } undef(%pita_sites_nu);undef(%pita_tot_score);my($comp_p);undef(%allowe +d_wobbles);#undef(%site_nu); foreach $i(@files){ my $buff=0; print "Analyzing $i\n";sleep(1); $program= $1 if $i=~ /(\w+)_SITES/; open(FIL, $i) or die "$!: $i\n"; while(<FIL>){ $buff += length($_); if ($buff >= 230000){$buff=0;sleep(1);select( +(select(FIL), $|=1)[0]);} #FLUSH THE BUFFER, NOT WORKING!!! undef($a); unless($.== 1){ if ($o==0){ if (/^\d+\t(\S+)\t(\S+)\t(\d+)\t(\d+)\t(\S+)\t(\S+)\t(.*)/){ $mirna= $1; $target= $2; $start= $3; $end= $4; $site= $5; $c +omp_p= $6;$a= $7;$j= "${mirna}_${target}_${start}_$end"; $site_nu{$j}= "$mirna\t$target\t$start\t$end\t$site\t$comp_p +";#Store each site in a hash }else{die "$buff characters, in line $.:$_\n"} #DIES HERE!!! }else{ if (/^\d+\t(\S+)\t(\S+)\t(\d+)\t(\d+)\t(\S+)\t(.*)/){ $mirna= $1; $target= $2; $start= $3; $end= $4; $site= $5;$a= + $6;$j= "${mirna}_${target}_${start}_$end"; $site_nu{$j}= "$mirna\t$target\t$start\t$end\t$site";#Store +each site in a hash } }

It dies at the "DIES HERE!!" die, after reading 3413 characters of the second file. Happens because the regex doesn't work since only half of the line is in $_. Help please! Thanks again.

Comment on buffering issue?
Download Code
Re: buffering issue?
by moritz (Cardinal) on Mar 31, 2013 at 13:42 UTC
    At some point, after reading a file with 90855 lines, the script is not reading a line of the next file completely.

    I'm still trying to understand what you actually observe. Does $_ not end in a newline (without being the last line in the file)? Have you verified that the correct data is in the file? What character encoding is the input file encoded in?

    f ($buff >= 230000){$buff=0;sleep(1);select((select(FIL), $|=1)[0]);} #FLUSH THE BUFFER, NOT WORKING!!!

    There's no reason to flush an input buffer.

      Hey moritz. Thanks for the reply. Yes I have checked that all the data is in the input file, which is encoded in UTF-8. Input buffer flushing eliminated. Is there a way to flush the output buffer before start parsing the files? Thanks in advance for any other suggestions yo may have. What I see is this line:

      845667 homosapiens ENSG00000104904|ENST0000059094

      instead of:

      845667 homosapiens ENSG00000104904|ENST00000590943 92 98 7mer-m8 ? 0.017184 1<\p>

        If the file you are reading was created by the same program, make sure you closed the file before trying to read it.
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1026353]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (9)
As of 2014-12-21 05:11 GMT
Find Nodes?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?

    Results (103 votes), past polls