Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^3: efficient perl code to count, rank (B-tree)(update)

by LanX (Sage)
on Jul 17, 2021 at 21:13 UTC ( #11135117=note: print w/replies, xml ) Need Help??


in reply to Re^2: efficient perl code to count, rank
in thread efficient perl code to count, rank

> How would a database compensate for this?

Databases use structures like B-trees for indexing and sorting and are pretty good in balancing these trees between RAM and disk.

They optimize the trade-off between memory and time.

Perl is pretty bad in sorting linear stuff which is part in RAM and part on disk.

(Well you could tie an AoA to a file representing a table. Sorting that array would go without much RAM but mean constant overhead with each access. This is certainly worth a shot, but as I said DBs have this already optimized.)

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

update

) from WP B-tree

    Unlike other self-balancing binary search trees, the B-tree is well suited for storage systems that read and write relatively large blocks of data, such as disks ...

    ... for the purpose of efficiently managing index pages for large random access files. The basic assumption was that indexes would be so voluminous that only small chunks of the tree could fit in main memory.

  • Comment on Re^3: efficient perl code to count, rank (B-tree)(update)

Replies are listed 'Best First'.
Re^4: efficient perl code to count, rank (merge sort)
by LanX (Sage) on Jul 18, 2021 at 12:36 UTC
    > (Well you could tie an AoA to a file representing a table. Sorting that array would go without much RAM but mean constant overhead with each access. This is certainly worth a shot, but as I said DBs have this already optimized.)

    I don't think this will work because of the way merge sort is implemented in Perl

    It is first comparing all pairs in a breadth first search, which would mean far too many disk operations and no sensible caching option

    DB<7> sort { print "$a,$b |"; $a <=> $b } reverse 1..16 16,15 |14,13 |12,11 |10,9 |8,7 |6,5 |4,3 |2,1 |15,13 |15,14 +|11,9 |11,10 |13,9 |13,10 |13,11 |13,12 \ |7,5 |7,6 |3,1 |3,2 |5,1 |5,2 |5,3 |5,4 |9,1 |9,2 |9,3 |9,4 + |9,5 |9,6 |9,7 |9,8 | DB<8>

    explained

    16,15 |14,13 |12,11 |10,9 |8,7 |6,5 |4,3 |2,1 | + # sort pairs 15,13 |15,14 |11,9 |11,10 + # sort 1st and 2nd 4-tuple |13,9 |13,10 |13,11 |13,12 + # sort 1st 8 tuple |7,5 |7,6 |3,1 |3,2 + # sort 3rd and 4th 4-tuple |5,1 |5,2 |5,3 |5,4 + # sort 2nd 8tuple |9,1 |9,2 |9,3 |9,4 |9,5 |9,6 |9,7 |9,8 | + # sort 16 tuple

    A caching/memoizing of data read from disk would be far more efficient if the chunks were strictly chosen depth first.

    FWIW WP lists some divide-and-conquer approaches for merge-sort

    https://en.wikipedia.org/wiki/Mergesort#Parallel_merge_sort

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11135117]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2021-09-19 11:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?