In place editing without reading further

Replies are listed 'Best First'.
Re: In place editing without reading further by SuicideJunkie (Vicar) on Jan 27, 2015 at 21:47 UTC
So, if I understand correctly, this is a file of <1000 bytes of text, followed by a few gigabytes of binary data? It sounds like the approach you've got works, and as long as the old string is not too short (as checked for by the code), and the end usage doesn't care about trailing whitespace in the field, you should remain fine. However, I would not expect it to take very long to spin through even a huge file and copy it, as long as you're not trying to parse it. You're already doing sysreads and syswrites, so you're not accidentally going to try and read it a line at a time. Why not just read in reasonably sized chunks, and copy the file if you need to increase the length of the content_architecture value? How long does it actually take if you read 32k at a time, grow or shrink the first chunk, and then write out to a new file until done? It would be nice to have a scheme that works all the time, and doesn't damage the original file. If you only have to process a few files occasionally, and it takes a minute instead of a second, then the extra safety margin is probably worth it.	[reply]
Re^2: In place editing without reading further by trippledubs (Deacon) on Jan 28, 2015 at 04:18 UTC
Hi SuicideJunkie, thanks for the comments. So, if I understand correctly, this is a file of <1000 bytes of text, followed by a few gigabytes of binary data? Yes, exactly. However, I would not expect it to take very long to spin through even a huge file and copy it, as long as you're not trying to parse it. You're already doing sysreads and syswrites, so you're not accidentally going to try and read it a line at a time. Why not just read in reasonably sized chunks, and copy the file if you need to increase the length of the content_architecture value? It takes too long. Making a copy of the whole thing would be a waste of time and space. We don't want to read it at all, much less write it out again. If we wanted backups we could use copy on write zfs snapshots to get them near instantaneously. There are many arguments against an extra five minutes. 10 people waiting an extra 5 minutes is 50 minutes. Five minutes lost on this server and every server that is dependent on the services that this server provides. What if one of my team mates finishes 5 minutes faster than me? That is 300 full Mississippi's of animadversions I will have to patiently bear. Not to mention the lost opportunity cost of not being able to taunt them. If you're not first, you're last!! If you only have to process a few files occasionally, and it takes a minute instead of a second, then the extra safety margin is probably worth it. Probably so but, if it doesn't work now you have two copies that don't work and even further behind.	[reply]
Re^3: In place editing without reading further by SuicideJunkie (Vicar) on Jan 28, 2015 at 14:48 UTC
If the use case demands it, then so be it. However, it sounds like you don't actually have any hard numbers. Give it a try both ways and see how long it really takes! Benchmarks are far better than random numbers pulled out of ... the air. You can also consider a two-pass system, where you do the in-place option first, and if $valueLength was too short, then rewrite the file after you have finished all the easy files. Either way, benchmark it! Tell us how long it actually takes.	[reply]
Re^4: In place editing without reading further by trippledubs (Deacon) on Jan 29, 2015 at 15:24 UTC
Re^4: In place editing without reading further by trippledubs (Deacon) on Jan 28, 2015 at 20:21 UTC
Re: In place editing without reading further by Anonymous Monk on Jan 28, 2015 at 03:08 UTC
If it works, it works. But all these calculations look rather error-prone to me. I would read the whole header (800 bytes?) in a string and split it by newlines. `my @lines = split /\n/, $header, -1;` [download] (-1 so that split wouldn't remove trailing newlines, if any) Then I would find the needed line in @lines and substitute stuff like this: `for my $magic_part ( substr $needed_line, 23 ) { die "Line is too short!" if length $magic_part < 5; $magic_part =~ tr// /c; # double magic :) substr( $magic_part, 0, 5 ) = "sun4v"; }` [download] I guess not many people know that substr is magic (and so is foreach loop)... but it works. It's straight from the documentation, so it can't be too bad :) `tr` works with empty searchlist and "c"omplement option, I don't know why, it's probably better written like this: `tr/\x00-\xff/ /` `my $line = "x" x 23 . "something"; for my $magic ( substr $line, 23 ) { print length $magic, "\n"; $magic =~ tr/\x00-\xff/ /; substr( $magic, 0, 5 ) = "sun4v"; } print "~~$line~~", "\n" __END__ 9 ~~xxxxxxxxxxxxxxxxxxxxxxxsun4v ~~` [download] Then I would join the array with newlines again and overwrite the whole header. Yes, pretty obscure stuff here, but I'm very bad at math. I try to avoid it as much as I can :)	[reply] [d/l] [select]


more useful options
	PerlMonks