Re^2: In place editing without reading further

Hi SuicideJunkie, thanks for the comments.

So, if I understand correctly, this is a file of <1000 bytes of text, followed by a few gigabytes of binary data?

Yes, exactly.

However, I would not expect it to take very long to spin through even a huge file and copy it, as long as you're not trying to parse it. You're already doing sysreads and syswrites, so you're not accidentally going to try and read it a line at a time. Why not just read in reasonably sized chunks, and copy the file if you need to increase the length of the content_architecture value?

It takes too long. Making a copy of the whole thing would be a waste of time and space. We don't want to read it at all, much less write it out again. If we wanted backups we could use copy on write zfs snapshots to get them near instantaneously.

There are many arguments against an extra five minutes. 10 people waiting an extra 5 minutes is 50 minutes. Five minutes lost on this server and every server that is dependent on the services that this server provides. What if one of my team mates finishes 5 minutes faster than me? That is 300 full Mississippi's of animadversions I will have to patiently bear. Not to mention the lost opportunity cost of not being able to taunt them. If you're not first, you're last!!

If you only have to process a few files occasionally, and it takes a minute instead of a second, then the extra safety margin is probably worth it.

Probably so but, if it doesn't work now you have two copies that don't work and even further behind.

Comment on Re^2: In place editing without reading further

Replies are listed 'Best First'.

Re^3: In place editing without reading further
by SuicideJunkie (Vicar) on Jan 28, 2015 at 14:48 UTC

If the use case demands it, then so be it.

However, it sounds like you don't actually have any hard numbers. Give it a try both ways and see how long it really takes! Benchmarks are far better than random numbers pulled out of ... the air.

You can also consider a two-pass system, where you do the in-place option first, and if $valueLength was too short, then rewrite the file after you have finished all the easy files.

Either way, benchmark it! Tell us how long it actually takes.

[reply]

Re^4: In place editing without reading further

by trippledubs (Deacon) on Jan 29, 2015 at 15:24 UTC

5gb - 32k	2m24.716s	2m25.723s	2m24.012s
5gb - 64k	2m23.235s	2m25.939s
5gb - 128k	2m18.724s

11gb - 32k	5m48.613s	5m50.557s	5m55.207s
11gb - 128k	5m38.264s	5m29.513s	5m38.922s

15.5gb 128k

9m31.711s

7m45.154s

9m32.641s

Beefy server with SAN storage

14gb - 64k	2m16.941s	2m40.087s	2m30.454s
14gb - 128k	2m14.720s	2m22.201s	2m26.875s

We roughly judge the penalty for failure at about 40 minutes and discard the home server results. The script penalty is about 2 and a half minutes of time and suppose the payoff is a failure rate of 0%. So I interpret loosely this to mean that, if the edit in place script fails more than once out of every sixteen runs, it is not worth running. If it fails less than once out of every sixteen runs, it is worth the risk of damaging the file, and having to redo everything.

14gb is a good estimate for how large these files will be, but when they become smaller it looks very risky to make the edit in place since the savings become smaller and the time penalty will not decrease proportionally.

[reply]

Re^4: In place editing without reading further

by trippledubs (Deacon) on Jan 28, 2015 at 20:21 UTC

HEADER
a1
a2
a3
a4
a5
a6
a7
a8
a9
a10
END HEADER
b1
b2
c1
c2
..
z1
z2
[download]

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dump;

open (my $fh, 'message.txt') or die $!;

LINE: while (<$fh>) {
   last LINE if /END HEADER\s\w/s;
}

my $headerEndingPositionInBytes = tell($fh);

print "Found header ending at $headerEndingPositionInBytes\n";
sysseek $fh,0,0; # Rewind to beginning of file

my $header;
my $bytesRead = sysread $fh, $header, $headerEndingPositionInBytes;

print "Read $bytesRead into header variable\n";

my @lines = split '\n', $header;
for (0..$#lines) {
   $lines[$_] =~ s/^a5$/new magic/;
}
$header = join "\n",@lines;

open (my $newFile, '>','message-fixed.txt') or die "$!";
syswrite($newFile, $header); # Write the header


my $blockSize = 32 * 1<<10; #32k
my $window;
while (my $bytesRead = sysread $fh, $window,$blockSize) {
   syswrite $newFile, $window, $blockSize;
}
syswrite $newFile, "\n";
[download]

[reply]
[d/l]
[select]


XP is just a number
	PerlMonks