Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Faster and more efficient way to read a file vertically

by Anonymous Monk
on Nov 03, 2017 at 15:10 UTC ( #1202693=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!
I have a file with millions of lines that look like this (DNA sequences):
ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA

My question is, how can I read it vertically, i.e. extract e.g. the 10th column? All lines are of the same length, but are not tab or space-separated where I could use the cut command. My approach would be to split each line and then keep only the 10th letter everywhere, but this takes enormous amount of time and I was hoping that it might be easier/faster to do somehow.
Any ideas?

Replies are listed 'Best First'.
Re: Faster and more efficient way to read a file vertically
by BrowserUk (Pope) on Nov 03, 2017 at 20:12 UTC

    If you make an array of substr references to the characters in a buffer, and then overlay each line into that buffer, the cost of performing the splitting/indexing of the strings is done once:

    #! perl -slw use strict; my $c = $ARGV[ 0 ] // 25; my $buf = chr(0) x 62; my @cRefs = map \substr( $buf, $_, 1 ), 0 .. length( $buf )-1; until( eof( DATA ) ) { substr( $buf, 0 ) = <DATA>; print ${ $cRefs[ $c ] }; } __DATA__ ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz

    A few runs:

    C:\test>1202693 0 A A A A C:\test>1202693 25 Z Z Z Z C:\test>1202693 32 6 6 6 6 C:\test>1202693 61 z z z z

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
Re: Faster and more efficient way to read a file vertically
by choroba (Bishop) on Nov 03, 2017 at 15:23 UTC
    My cut (GNU 8.25) also supports the -c and -b options to only print the given character or byte range, respectively.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      Great, I also saw it now!
      So basically I can say  cut -c 10 and get the 10th character. Thank you very much!
Re: Faster and more efficient way to read a file vertically
by Laurent_R (Canon) on Nov 03, 2017 at 18:27 UTC
    This is a perl one-liner doing just what you want:
    $ echo 'ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA > ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA > ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA > ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA > ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA > ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA > ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA > ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA > ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA > ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA > ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA > ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA > ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA' | perl -nE 'say substr($_, + 10, 1);' C C C C C C C C C C C C C
    Check, though, that 10 is the right second parameter for substr, you may have to change it depending on which character you want exactly.
      another one-liner:

      $ perl -F'' -anE 'say $F[9]'

Re: Faster and more efficient way to read a file vertically -- updated
by Discipulus (Monsignor) on Nov 03, 2017 at 16:09 UTC
    Hello,

    million of lines still probably fit in memory.. Note that $#{$aoa[0]} assumes all lines are of the same length as you said.

    use strict; use warnings; my @aoa; while (<DATA>) { chomp; push @aoa,[split '',$_]; } foreach my $col(0..$#{$aoa[0]}){ print "Column $col: ", (join ' ',map { $aoa[$_][$col] } 0..$#aoa), "\n"; } __DATA__ ACATCACCTC ACATCACCTC ACATCACCTC ACATCACCTC # out Column 0: A A A A Column 1: C C C C Column 2: A A A A Column 3: T T T T Column 4: C C C C Column 5: A A A A Column 6: C C C C Column 7: C C C C Column 8: T T T T Column 9: C C C C

    L*

    UPDATE if really care memory you can try the following (*untested*)approach:

    # pseudocode!! # analize first line my $line = <$fh>; chomp $line; # compute last index of the future array (or future string? be aware o +f possible off one errors!!); my last = length $line - 1; # rewind the filehandle seek $fh,0,0; sub get_column{ my $col = shift; my $line = shift; if($col==0){$line=~/^(.)/} elsif($col==$last){$line=~/(.)$/} else{ $line=~/.{$col-1}(.)/} # or $last - $col? return $1; } while (<$fh>){ chomp; print get_column(3,$_) }
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      million of lines still probably fit in memory.
      Maybe. Or maybe not. But why take the chance? Especially with an AoA which has some extra cost. It is so easy to do everything in the first loop, when reading each line. And BTW, it is also probably faster, because using an array of arrays implies copying the data once more.
        Yes Laurent_R you are absolutely rigth and probably i gave a dumb answer. I not even looked other's replies carefully before posting: as only excuse i can say i was filling the bathtub.. ;=)

        If data must be accessed more times probably is worth to put into an sqlite db, a char per column and access it via SQL queries. No big memory overhead and super speed.



        L*

        There are no rules, there are no thumbs..
        Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Faster and more efficient way to read a file vertically
by thanos1983 (Priest) on Nov 03, 2017 at 19:03 UTC

    Hello Anonymous Monk,

    Similar question to yours was asked at the Monastery before How do I get the Nth Character of a String?.

    Here are sample of codes from the relevant question:

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use feature 'say'; # use Benchmark qw(:all) ; # WindowsOS use Benchmark::Forking qw( timethese cmpthese ); # UnixOS sub getn_unpack { return unpack "x" . ($_[1]-1) . "a", $_[0]; } sub getn_substr { return substr $_[0], $_[1]-1, 1; } sub getn_split { return +(split //, $_[0])[$_[1]-1]; } my $strNum = "12345678910"; my $string = "ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA"; # say getn_unpack($string, 10); # say getn_substr($string, 10); # say getn_split($string, 10); my $results = timethese(1000000000, { 'unpack' => getn_unpack($string, + 10), 'substr' => getn_substr($string, 10), 'split' => getn_split($string, 10), }, 'none'); cmpthese( $results ); __END__ $ perl test.pl Rate unpack substr split unpack 171232877/s -- -23% -31% substr 223713647/s 31% -- -10% split 248138958/s 45% 11% --

    It looks like the more efficient choice would be to use unpack. Something like that could do what you need. Reading one line at a time, extract the data that you want (one character) and finally push it into an array. Sample of code below:

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; sub getn_unpack { return unpack "x" . ($_[1]-1) . "a", $_[0]; } my $file = 'data.txt'; my @array; if (open(my $fh, '<', $file)) { while (<$fh>) { chomp; push @array, getn_unpack($_, 10); } } else { warn "Could not open file '$file' $!\n"; } print Dumper \@array; __END__ $ cat data.txt ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTACCACAACGAGGACTACACCATCGTGGAACA $ perl test.pl $VAR1 = [ 'C', 'A' ];

    Update: Thanks to fellow Monk karlgoethebier for observing my mistake I would suggest an alternative solution to your problem. Use split instead of unpack. See sample of code below:

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; sub getn_split { return +(split //, $_[0])[$_[1]-1]; } my $file = 'data.txt'; my @array; if (open(my $fh, '<', $file)) { while (<$fh>) { chomp; push @array, getn_split($_, 10); } } else { warn "Could not open file '$file' $!\n"; } print Dumper \@array; __END__ $ cat data.txt ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTACCACAACGAGGACTACACCATCGTGGAACA $ perl test.pl $VAR1 = [ 'C', 'A' ];

    Hope this helps, BR

    Seeking for Perl wisdom...on the process of learning...not there...yet!
      "...It looks like the more efficient choice would be to use unpack..."

      I'm not so sure. As you wrote:

      $ perl test.pl Rate unpack substr split unpack 171232877/s -- -23% -31% substr 223713647/s 31% -- -10% split 248138958/s 45% 11% --

      Ergo:

      karls-mac-mini:monks karl$ perl -e 'printf ("%.1f\n", 248138958/171232 +877);' 1.4

      As i wrote at Re^6: Question on Regex:

      "...use cmpthese, the results are sorted from slow to fast..."

      Sorry in advance if i did something wrong missed something.

      Best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

      perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

        Hello karlgoethebier,

        You are absolutely right. I also read the Benchmark/Optional-Exports where is clearly stated:

        cmpthese ( COUNT, CODEHASHREF, [ STYLE ] ) Optionally calls timethese(), then outputs comparison chart. This: cmpthese( -1, { a => "++\$i", b => "\$i *= 2" } ) ; outputs a chart like: Rate b a b 2831802/s -- -61% a 7208959/s 155% --

        This chart is sorted from slowest to fastest, and shows the percent speed difference between each pair of tests. cmpthese can also be passed the data structure that timethese() returns:

        Thanks for correcting me I will also update my answer. Although to be honest I am kind of impressed how unpack is slower in comparison to substr and split.

        Thanks again for your time and effort, BR.

        Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: Faster and more efficient way to read a file vertically
by karlgoethebier (Prior) on Nov 04, 2017 at 11:14 UTC

    Stolen, cannibalized and slightly adopted from this older thread: Threads From Hell #2: How To Search A Very Huge File [SOLVED]:

    #!/usr/bin/env perl # http://www.perlmonks.org/?node_id=1202693 # $Id: loop.pl,v 1.2 2017/11/04 11:02:41 karl Exp karl $ use strict; use warnings; use MCE::Loop; use Time::HiRes qw( time ); use feature qw(say); my $file = q(data.txt); MCE::Loop::init( { max_workers => 4, use_slurpio => 1 } ); my $start = time; my @result = mce_loop_f { my $slurp_ref = $_[1]; my @column; open my $fh, '<', $slurp_ref; binmode $fh, ':raw'; while (<$fh>) { push @column, substr( $_, 10, 1 ) } close $fh; MCE->gather(@column); # sleep 2; } $file; say join( '', @result ); printf "Took %.3f seconds\n", time - $start; __END__

    Thanks to marioroy.

    See also MCE.

    Update: To avoid the call to binmode please see Encoding horridness revisited: What's going on here? [SOLVED].

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re: Faster and more efficient way to read a file vertically
by vr (Pilgrim) on Nov 03, 2017 at 17:48 UTC

    If "same length", then straightforward and perhaps not perlish, and idea originated before Discipulus's answer :). I wonder how inefficient this is compared to slurping/reading in large blocks, i.e. if read and seek 'cooperate' on input buffer (I don't know enough on underlying C calls).

    use strict; use warnings; use autodie; my $POS = 10; open my $fh, '<', 'dna.txt'; my $L = length( <$fh> ) - 1; seek $fh, $POS - 1, 0; my ( $s, $i ) = ( '', 0 ); seek $fh, $L, 1 while read $fh, $s, 1, $i++; print "$s\n";
      FWIW, seek was my first thought, too. (Also that I'd prototype in Perl, then write the same thing in C. I might've found my weekend project... :) I can't imagine that allocating memory is going to help (I like when my imagination is challenged, though). I think at least if we can assume the file is in filesystem cache the read will be coming from RAM already anyway.
        I think this is parallelizable, too. If you have 24 cores, you can seek to $L/24, do your thing, combine results.
Re: Faster and more efficient way to read a file vertically
by johngg (Abbot) on Nov 05, 2017 at 15:37 UTC

    I put together a benchmark for most of the suggested solutions (or adaptations of them to get consistent results) and ran tests against an inline dataset of 50 lines with Test::More then with a 50,000 line file produced by this one-liner.

    perl -E ' my @alpha = ( qw{ A C G T } ) x 5; push @alpha, qw{ . . }; say join q{}, map { $alpha[ rand @alpha ] } 1 .. 50 for 1 .. 50000;' > spw1202693.txt

    Here's the script.

    And the results.

    ok 1 - ANDmask ok 2 - brutish ok 3 - pushAoA ok 4 - regex ok 5 - rsubstr ok 6 - seek ok 7 - split ok 8 - substr ok 9 - unpack ok 10 - unpackM Rate pushAoA brutish split seek regex unpack substr rsubstr + unpackM ANDmask pushAoA 1.11/s -- -35% -61% -62% -91% -97% -98% -98% + -98% -99% brutish 1.71/s 55% -- -39% -41% -86% -95% -96% -96% + -97% -98% split 2.82/s 155% 65% -- -3% -77% -92% -94% -94% + -95% -97% seek 2.91/s 163% 70% 3% -- -76% -92% -94% -94% + -95% -97% regex 12.3/s 1010% 617% 336% 322% -- -65% -74% -75% + -79% -88% unpack 35.0/s 3060% 1943% 1141% 1102% 185% -- -25% -27% + -40% -67% substr 46.9/s 4137% 2638% 1564% 1512% 282% 34% -- -3% + -20% -55% rsubstr 48.2/s 4254% 2714% 1610% 1556% 292% 38% 3% -- + -18% -54% unpackM 58.7/s 5194% 3321% 1979% 1914% 377% 68% 25% 22% + -- -44% ANDmask 105/s 9407% 6045% 3634% 3517% 757% 201% 124% 118% + 80% -- 1..10

    The two substr solutions are neck and neck in the lead, unpack a distant third and everything else well behind. However, I have cocked up benchmarks before so take this with a pinch of salt!

    Update: Corrected attribution of the "unpack" method and incorporated the two new methods and benchmark results from this post. Working with multi-line buffers using unpack or a mask to AND with the buffer seems to be the fastest approach.

    Cheers,

    JohnGG

      Interesting. I had similar partial synthetic benchmark yesterday, thought to publish it mainly to advice against my "seek" solution as too slow, then decided not to :), because maybe it's not worth readers' effort.

      Nevertheless, somewhat different results for a 1 million lines file, and fast NVMe SSD storage. Below is the case for returning a hash with chars counts, but it's similar for returning string.

      $ perl vert2.pl ok 1 - same results ok 2 - same results ok 3 - same results (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate seek buk substr slurp seek 0.920/s -- -61% -84% -88% buk 2.36/s 157% -- -58% -69% substr 5.66/s 515% 140% -- -26% slurp 7.69/s 736% 226% 36% -- 1..3

        The following provides a parallel version for the slurp routine. I'm not sure why or where to look, running MCE via cmpthese reports inaccurately with MCE being 300x faster which is wrong. So, I needed to benchmark another way.

        Regarding MCE, workers receive the next chunk and tally using a local hash. Then, update the shared hash.

        use strict; use warnings; use MCE; use MCE::Shared; use String::Random 'random_regex'; use Time::HiRes 'time'; my $fn = 'dna.txt'; my $POS = 10; my $shrcount = MCE::Shared->hash(); my $mce; unless ( -e $fn ) { open my $fh, '>', $fn; print $fh random_regex( '[ACTG]{42}' ), "\n" for 1 .. 1e6; } sub slurp { open my $fh, '<', $fn; my $s = do { local $/ = undef; <$fh> }; my $count; $count-> { substr $s, $POS - 1 + 43 * $_, 1 }++ for 0 .. length( $s ) / 43 - 1; return $count } sub mce { unless ( defined $mce ) { $mce = MCE->new( max_workers => 4, chunk_size => '300k', use_slurpio => 1, user_func => sub { my ( $mce, $slurp_ref, $chunk_id ) = @_; my ( $count, @todo ); $count-> { substr ${ $slurp_ref }, $POS - 1 + 43 * $_, 1 }++ for 0 .. length( ${ $slurp_ref } ) / 43 - 1; # Each key involves one IPC trip to the shared-manager. # # $shrcount->incrby( $_, $count->{$_} ) # for ( keys %{ $count } ); # The following is faster for smaller chunk size. # Basically, send multiple commands at once. # push @todo, [ "incrby", $_, $count->{$_} ] for ( keys %{ $count } ); $shrcount->pipeline( @todo ); } )->spawn(); } $shrcount->clear(); $mce->process($fn); return $shrcount->export(); } for (qw/ slurp mce /) { no strict 'refs'; my $start = time(); my $func = "main::$_"; $func->() for 1 .. 3; printf "%5s: %0.03f secs.\n", $_, time() - $start; } __END__ slurp: 0.487 secs. mce: 0.149 secs.
      > unpack  => sub { # Suggested but not implemented by pryrt

      Actually unpack was suggested (and not implemented) by me first. ;)

      FWIW: My idea was to unpack multiple lines simultaneously instead of going line by line.

      If you are interested and all lines really have the same length (the OP never clarified)

      • read a chunk of complete lines bigger 4 or 8kb (depending on the blocksize of the OS to optimize read operations)
      • run a repeated unpack pattern
      • get a list of 1 result for each chunk line

      Please see if substr on single lines is still faster then.

      $line_length += $newline_length; # OS dependend $line_count = int(8 * 1024 / $line_length) +1; $chunk_size = $line_count * line_length;

      And yes I'm still reluctant to implement it, smells too much like an XY Problem :)

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

      update

      In hindsight... probably having a slightly smaller chunk is more efficient :

      $line_count   = int(8 * 1024 / $line_length)

        Actually unpack was suggested (and not implemented) by me first. ;)

        Ah! Sorry, I missed that :-/

        Cheers,

        JohnGG

Re: Faster and more efficient way to read a file vertically
by LanX (Bishop) on Nov 03, 2017 at 15:15 UTC
    > but this takes enormous amount of time

    what does this mean?

    Maybe it's just file access on the HD?

    Please show some reference code.

    > Any ideas?

    You can slurp the whole file and run a regex ... something like @col10 = /^.{9}(.)/g on it (with the appropriate /s or /m modifier of course)

    corrected my @col = ( $file =~ /^.{9}(.)/mg );

    Using unpack might be even faster, but I'm no expert here.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

      So basically I have this (brute-force attack):
      while(<>) { if($_=~/^(.*?)\t(.*)/) { $read_seq=$1; $read_id=$2; @split_read=split(//, $read_seq); $respective_read_letter=$split_read[$i]; if($respective_read_letter eq 'A') {$count_A++;} elsif($respective_read_letter eq 'T') {$count_T++;} elsif($respective_read_letter eq 'C') {$count_C++;} elsif($respective_read_letter eq 'G') {$count_G++;} elsif($respective_read_letter eq '.') {$count_dot++;} else {print "ERROR in read: $read\t$respective_read_letter\ +n";} } } $total=$count_A+$count_T+$count_C+$count_G+$count_dot; $fraction_A = sprintf("%.2f", 100*($count_A/$total)); $fraction_T = sprintf("%.2f", 100*($count_T/$total)); $fraction_C = sprintf("%.2f", 100*($count_C/$total)); $fraction_G = sprintf("%.2f", 100*($count_G/$total)); $fraction_dot = sprintf("%.2f", 100*($count_dot/$total)); print $actual_pos,"\t",$expected_letter,"\t",$fraction_A,"\t",$fra +ction_T,"\t",$fraction_G,"\t",$fraction_C,"\t",$fraction_dot,"\n"; +

        If you're really only going to be doing one column, but want it to be chosen by the variable $i, I'd suggest substr: $respective_read_letter = substr $read_seq, $i, 1;. If finding an optimum solution is important to you (ie, if you'll use this script many times for the forseeable future, rather than just once or twice where "fast engouh" is fast enough), then I'd recommend Benchmarking the substr vs unpack vs LanX's regex (and any others that are suggested). But whatever you do, make sure to use ++LanX's hash %count.

        use warnings; use strict; use Benchmark qw/cmpthese/; use Test::More tests => 1; my @dataset = (); push @dataset, join('', map { (qw/A C G T/)[rand 4] } 1 .. 30 ) for 1 +.. 1000; my $i = $ARGV[0] // 10; sub test { my $fnref = shift; my $count; for my $read_seq( @dataset ) { my $letter = $fnref->($read_seq, $i); $count->{$letter}++; } return $count; } sub rfn { test( sub { my $skip = $_[1]; $_[0] =~ /.{$skip}(.)/; return $1; }); }; sub sfn { test( sub { substr $_[0], $_[1], 1; }); }; sub ufn { test( sub { ... # I'm no unpack expert }); }; cmpthese(0, { regex => \&rfn, substr => \&sfn, #unpack => \&ufn, }); is_deeply rfn(), sfn(), 'same results';
        $i is variable in your example. Reading vertically doesn't make sense then.

        I'd suggest $count{$letter}++ with a hash %count to speed things up.

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!

Re: Faster and more efficient way to read a file vertically
by Anonymous Monk on Nov 04, 2017 at 10:03 UTC

    If speed is of high priority, one shouldn't overlook the mmap() approach using File::Map. It has its limitations (no piped data) but it allows regular files to be efficiently handled as one big string.

Re: Faster and more efficient way to read a file vertically
by wazat (Scribe) on Nov 04, 2017 at 18:56 UTC

    OOPS, I see that yr already identified this approach.

    If your lines are really all the same length, you could do the job via a seek() / read() loop. The example below needs error checking. I haven't done any speed tests.

    #!/usr/bin/perl use strict; use warnings; my $linesep_len = length($/); my $rec_len = length('ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA') + $li +nesep_len; my $read_len = 1; my $skip_len = $rec_len - $read_len; binmode(DATA); seek(DATA, 10, 1) or die "seek error"; my $buf = ' ' x $read_len; while (read(DATA, $buf, $read_len) > 0) { print $buf, "\n"; seek(DATA, $skip_len, 1) or last; } __DATA__ ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCxCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCsCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCjCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCcCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA ACATCACCTC-CACAACGAGGACTACACCATCGTGGAACA
    Output:
    C x s j c C C C C -
Re: Faster and more efficient way to read a file vertically
by Anonymous Monk on Nov 04, 2017 at 18:43 UTC
    Congrats on the new job!
      "...new job"

      Fake News.

      «The Crux of the Biscuit is the Apostrophe»

      perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1202693]
Front-paged by haukex
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2017-11-19 07:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:













    Results (278 votes). Check out past polls.

    Notices?