Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: In place editing without reading further

by trippledubs (Deacon)
on Jan 28, 2015 at 04:18 UTC ( [id://1114700]=note: print w/replies, xml ) Need Help??


in reply to Re: In place editing without reading further
in thread In place editing without reading further

Hi SuicideJunkie, thanks for the comments.

So, if I understand correctly, this is a file of <1000 bytes of text, followed by a few gigabytes of binary data?

Yes, exactly.

However, I would not expect it to take very long to spin through even a huge file and copy it, as long as you're not trying to parse it. You're already doing sysreads and syswrites, so you're not accidentally going to try and read it a line at a time. Why not just read in reasonably sized chunks, and copy the file if you need to increase the length of the content_architecture value?

It takes too long. Making a copy of the whole thing would be a waste of time and space. We don't want to read it at all, much less write it out again. If we wanted backups we could use copy on write zfs snapshots to get them near instantaneously.

There are many arguments against an extra five minutes. 10 people waiting an extra 5 minutes is 50 minutes. Five minutes lost on this server and every server that is dependent on the services that this server provides. What if one of my team mates finishes 5 minutes faster than me? That is 300 full Mississippi's of animadversions I will have to patiently bear. Not to mention the lost opportunity cost of not being able to taunt them. If you're not first, you're last!!

If you only have to process a few files occasionally, and it takes a minute instead of a second, then the extra safety margin is probably worth it.

Probably so but, if it doesn't work now you have two copies that don't work and even further behind.

  • Comment on Re^2: In place editing without reading further

Replies are listed 'Best First'.
Re^3: In place editing without reading further
by SuicideJunkie (Vicar) on Jan 28, 2015 at 14:48 UTC

    If the use case demands it, then so be it.

    However, it sounds like you don't actually have any hard numbers. Give it a try both ways and see how long it really takes! Benchmarks are far better than random numbers pulled out of ... the air.

    You can also consider a two-pass system, where you do the in-place option first, and if $valueLength was too short, then rewrite the file after you have finished all the easy files.

    Either way, benchmark it! Tell us how long it actually takes.

      Test box at home - Ubuntu ext3 fs, I think about.. 2 years old.
      5gb - 32k 2m24.716s 2m25.723s 2m24.012s
      5gb - 64k 2m23.235s 2m25.939s
      5gb - 128k 2m18.724s

      11gb - 32k 5m48.613s 5m50.557s 5m55.207s
      11gb - 128k 5m38.264s 5m29.513s 5m38.922s

      15.5gb 128k 9m31.711s 7m45.154s 9m32.641s

      Beefy server with SAN storage

      14gb - 64k2m16.941s2m40.087s2m30.454s
      14gb - 128k2m14.720s 2m22.201s2m26.875s

      We roughly judge the penalty for failure at about 40 minutes and discard the home server results. The script penalty is about 2 and a half minutes of time and suppose the payoff is a failure rate of 0%. So I interpret loosely this to mean that, if the edit in place script fails more than once out of every sixteen runs, it is not worth running. If it fails less than once out of every sixteen runs, it is worth the risk of damaging the file, and having to redo everything.

      14gb is a good estimate for how large these files will be, but when they become smaller it looks very risky to make the edit in place since the savings become smaller and the time penalty will not decrease proportionally.

      Okay. Here is the code I came up with attempting to implement your suggestion and anon's. I will adopt for the real deal shortly and post soon.
      HEADER a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 END HEADER b1 b2 c1 c2 .. z1 z2
      I chose a5 as the line to change and want to only change that line, everything should be exactly the same.
      #!/usr/bin/env perl use strict; use warnings; use Data::Dump; open (my $fh, 'message.txt') or die $!; LINE: while (<$fh>) { last LINE if /END HEADER\s\w/s; } my $headerEndingPositionInBytes = tell($fh); print "Found header ending at $headerEndingPositionInBytes\n"; sysseek $fh,0,0; # Rewind to beginning of file my $header; my $bytesRead = sysread $fh, $header, $headerEndingPositionInBytes; print "Read $bytesRead into header variable\n"; my @lines = split '\n', $header; for (0..$#lines) { $lines[$_] =~ s/^a5$/new magic/; } $header = join "\n",@lines; open (my $newFile, '>','message-fixed.txt') or die "$!"; syswrite($newFile, $header); # Write the header my $blockSize = 32 * 1<<10; #32k my $window; while (my $bytesRead = sysread $fh, $window,$blockSize) { syswrite $newFile, $window, $blockSize; } syswrite $newFile, "\n";

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1114700]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2024-04-19 15:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found