Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: upper or lower triangular matrix to full

by Anonymous Monk
on Sep 01, 2017 at 16:19 UTC ( [id://1198528]=note: print w/replies, xml ) Need Help??


in reply to upper or lower triangular matrix to full

Read as many rows into memory as you can. Append each column to a separate output file (you'll have to close the file afterwards, because you can't hold 460000 file handles open). After processing the whole input file, read the column files in order to get the matrix transpose. Take the lower triangle of that and combine with the upper triangle of the original.

Replies are listed 'Best First'.
Re^2: upper or lower triangular matrix to full
by choroba (Cardinal) on Sep 01, 2017 at 23:50 UTC
    The following still takes 90s for size 10_000, and 800s for size 20_000 on my machine (with some random tuning of the LOAD_AT_ONCE constant). 640_000 would still take several years. Nevertheless, I haven't been able to find a faster solution.

    I used the following code to generate the input matrix:

    my $SIZE = 1000; sub create_matrix { my ($filename) = @_; open my $OUT, '>', $filename or die $!; for my $i (1 .. $SIZE) { for my $j ( 1 .. $SIZE ) { print {$OUT} $i <= $j ? $i * $j : 'NA'; print {$OUT} ' ' unless $SIZE == $j; } print {$OUT} "\n"; } }

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      You and others in this thread silently assume "matrix" meaning a somehow delimited data file. What if the file looks like this:
      0001 1202 3030 ... 8491 9382 9381 ...
      In such a fixed lenght case you don't need any memory (well, kinda) and can just do the task by seek()ing the appropriate positions on disk.

      We won't know unless the OP tells us.


      holli

      You can lead your users to water, but alas, you cannot drown them.
        I originally started with
        sub fill_matrix { my ($in) = @_; open my $IN, '<', $in or die $!; my @index = (0); push @index, tell $IN while <$IN>; pop @index; for my $line_no (0 .. $#index) { print STDERR "$line_no\r"; for my $idx (0 .. $line_no - 1) { seek $IN, $index[$idx], 0; my $line = <$IN>; print +(split ' ', $line, $line_no + 2)[$line_no], ' '; } seek $IN, $index[$line_no], 0; my $line = <$IN>; print +(split ' ', $line, $line_no + 1)[-1]; } }

        but it was much slower: 28s for SIZE 1000, 280s for SIZE 2000.

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      It helps if you combine several columns into each of the tempfiles. But yeah, slinging around this much data is... challenging.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1198528]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-03-29 13:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found