Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

buffering issue?

by dannyjmh (Novice)
on Mar 31, 2013 at 13:09 UTC ( #1026353=perlquestion: print w/replies, xml ) Need Help??
dannyjmh has asked for the wisdom of the Perl Monks concerning the following question:

Hey there, monks. I think i'm having a buffering issue since i need to read and parse big text files (created by myself in previous lines of the code) to finally print things in another file. At some point, after reading a file with 90855 lines, the script is not reading a line of the next file completely. I have counted the number of characters read until this happens: 233467, and therefore tried to flush the buffer and sleep before reading the next line of the file. Doesn't work. Any suggestion, please? thanks a lot. The part of the code coming:

for my $o (0..1){ if ($o==0){ @files = reverse <*_SITES_3utr>; }else{ @files = reverse <*_SITES_cds>; } undef(%pita_sites_nu);undef(%pita_tot_score);my($comp_p);undef(%allowe +d_wobbles);#undef(%site_nu); foreach $i(@files){ my $buff=0; print "Analyzing $i\n";sleep(1); $program= $1 if $i=~ /(\w+)_SITES/; open(FIL, $i) or die "$!: $i\n"; while(<FIL>){ $buff += length($_); if ($buff >= 230000){$buff=0;sleep(1);select( +(select(FIL), $|=1)[0]);} #FLUSH THE BUFFER, NOT WORKING!!! undef($a); unless($.== 1){ if ($o==0){ if (/^\d+\t(\S+)\t(\S+)\t(\d+)\t(\d+)\t(\S+)\t(\S+)\t(.*)/){ $mirna= $1; $target= $2; $start= $3; $end= $4; $site= $5; $c +omp_p= $6;$a= $7;$j= "${mirna}_${target}_${start}_$end"; $site_nu{$j}= "$mirna\t$target\t$start\t$end\t$site\t$comp_p +";#Store each site in a hash }else{die "$buff characters, in line $.:$_\n"} #DIES HERE!!! }else{ if (/^\d+\t(\S+)\t(\S+)\t(\d+)\t(\d+)\t(\S+)\t(.*)/){ $mirna= $1; $target= $2; $start= $3; $end= $4; $site= $5;$a= + $6;$j= "${mirna}_${target}_${start}_$end"; $site_nu{$j}= "$mirna\t$target\t$start\t$end\t$site";#Store +each site in a hash } }

It dies at the "DIES HERE!!" die, after reading 3413 characters of the second file. Happens because the regex doesn't work since only half of the line is in $_. Help please! Thanks again.

Replies are listed 'Best First'.
Re: buffering issue?
by moritz (Cardinal) on Mar 31, 2013 at 13:42 UTC
    At some point, after reading a file with 90855 lines, the script is not reading a line of the next file completely.

    I'm still trying to understand what you actually observe. Does $_ not end in a newline (without being the last line in the file)? Have you verified that the correct data is in the file? What character encoding is the input file encoded in?

    f ($buff >= 230000){$buff=0;sleep(1);select((select(FIL), $|=1)[0]);} #FLUSH THE BUFFER, NOT WORKING!!!

    There's no reason to flush an input buffer.

      Hey moritz. Thanks for the reply. Yes I have checked that all the data is in the input file, which is encoded in UTF-8. Input buffer flushing eliminated. Is there a way to flush the output buffer before start parsing the files? Thanks in advance for any other suggestions yo may have. What I see is this line:

      845667 homosapiens ENSG00000104904|ENST0000059094

      instead of:

      845667 homosapiens ENSG00000104904|ENST00000590943 92 98 7mer-m8 ? 0.017184 1<\p>

        If the file you are reading was created by the same program, make sure you closed the file before trying to read it.
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1026353]
Approved by Corion
[choroba]: erix :-D

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (9)
As of 2018-05-22 16:23 GMT
Find Nodes?
    Voting Booth?