Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: Windows 7 Remove Tabs Out of Memory

by Anonymous Monk
on Aug 01, 2012 at 08:45 UTC ( #984734=note: print w/ replies, xml ) Need Help??


in reply to Re: Windows 7 Remove Tabs Out of Memory
in thread Windows 7 Remove Tabs Out of Memory

2 seconds to read it; 1/2 second to process it; 4 seconds to write it; and only 510MB memory used in the process!

That's efficient!

Not really, from a memory standpoint. You could do much better with a standard loop that reads to a small buffer and writes to the output file in a loop.

(Not to mention that you seem to have really fast disks (SSDs?). Haven't met a HDD yet that could read faster than 150 MB/s or write faster than 100 MB/s.)

open my $in, '<', 'input.txt' or die; open my $out, '>', 'output.txt' or die; my $buf; while (read $in, $buf, 4096) { $buf =~ tr/\t/ /; print $out $buf; } close $_ for ($in, $out);

But, this has a large potential to slow down the loop to around 10 MB/s because of properties of seeking media, and OS algorithms on read-ahead and flushing that never quite give that good performance [1]. Still a helluva lot better than the OS swapping you out because it can't fit the 500 MB into memory.

[1] I have never seen an OS successfully avoid doing reading and writing in parallel (= sub-optimal) for cat largefile > otherfile


Comment on Re^2: Windows 7 Remove Tabs Out of Memory
Select or Download Code
Re^3: Windows 7 Remove Tabs Out of Memory
by moritz (Cardinal) on Aug 01, 2012 at 08:55 UTC
    (Not to mention that you seem to have really fast disks (SSDs?). Haven't met a HDD yet that could read faster than 150 MB/s or write faster than 100 MB/s.)

    Or maybe the OS simply caches the reads and delays the writes, leading to faster-than-disk performance.

Re^3: Windows 7 Remove Tabs Out of Memory
by BrowserUk (Pope) on Aug 01, 2012 at 09:20 UTC
    Not really, from a memory standpoint.

    See below...

    You could do much better with a standard loop that reads to a small buffer and writes to the output file in a loop.

    Actually no. That forces the OS to keep the disk head moving back and forth between source(*) and destination.

    One read & one write will always beat 125 iddy biddy reads and 125 iddy biddy writes, with a seek across the disk between each, hands down. (Not to mention 125 invocations of s/// or tr/// instead of one.)

    (Not to mention that you seem to have really fast disks (SSDs?).

    Not yet :) I waiting for a PCIe flash card that presents itself as additional (slow) ram at a reasonable price.

    Haven't met a HDD yet that could read faster than 150 MB/s or write faster than 100 MB/s.)

    As moritz points out: file system caching.

    The timings posted were not the first runs; but the same caching benefited all three versions.

    Not really, from a memory standpoint... Still a helluva lot better than the OS swapping you out because it can't fit the 500 MB into memory.

    Buy more!

    My last memory purchase:

    1 "Komputerbay 8GB (2 X 4GB) DDR3 DIMM (240 pin) 1600Mhz PC3 12800 8 G +B KIT (9-9-9-25) with Heatspreader for extra Cooling" 30.00 In stock Sold by: KOMPBAY

    (*Even if the input is cached from a previous read of the file, writing to disk before the entire input has been read is quite likely to cause some or all of the input file to be discarded before it has been read, to accommodate the output.)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Actually no. That forces the OS to keep the disk head moving back and forth between source(*) and destination.

      It doesn't "force", actually. But as I stated in my post, I have yet to see an OS that could handle such a simple single-threaded situation intelligently enough. (That is, delay writing until file is closed or otherwise absolutely necessary.)

      BTW, there is something odd with your numbers. The Perl timestamps only give a delta of slightly less than six seconds (~83 MB/s average => reading 228 MB/s, writing 152 MB/s), but your console says almost 22 seconds (which would put the performance at around 23 MB/s)

        BTW, there is something odd with your numbers. The Perl timestamps only give a delta of slightly less than six seconds (~83 MB/s average => reading 228 MB/s, writing 152 MB/s), but your console says almost 22 seconds (which would put the performance at around 23 MB/s)

        The difference between the timestamps and the prompt timing is because the prompt had been sat there for a few seconds before (and during) my typing the command.

        Here are two sets of prompts, the first shows the prompt had been sat idle for a while before I entered the command. The second run, initiated using command retrieval whilst the first run was in progress executes immediately the first is complete and the delta then reflects the timestamps output by the program:

        [14:19:53.80] C:\test>984648-3.pl 500MB.csv junk.txt 1343827530.71078 1343827532.87958 1343827533.36184 ## Delta here several minute +s 1343827536.56099 [14:25:36.61] C:\test>984648-3.pl 500MB.csv junk.txt 1343827536.68696 1343827538.87822 1343827539.3639 ## delta here a gnat's under + 7 seconds! 1343827543.53711 [14:25:43.58] C:\test>

        And as far as IO throughput, here's a snapshot of the process performance dialog from Process Explorer after the process finished.

        Each vertical gridline represents 3 seconds.

        The cyan is the read, and the throughput is shown in the small popup. 120.2MB/0.5 seconds. 500/120.2/2 = 2.0798668885191347753743760399334 seconds to read,

        The purple is the write and the throughput is shown on the gauge at the left. 90.4MB/0.5 seconds. 500/90.4/2 = 2.7654867256637168141592920353982 seconds to write.

        Slightly optimistic figures, but the total throughput is the area, and if you look closely, you'll see the top of the graphs aren't straight and the leading and trailing edges aren't vertical.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://984734]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2014-07-23 21:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (152 votes), past polls