Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

In place editing without reading further

by trippledubs (Deacon)
on Jan 27, 2015 at 19:22 UTC ( [id://1114662]=perlmeditation: print w/replies, xml ) Need Help??

In transitioning Solaris Sparc sun4u to newer sun4v architecture we found that, sometimes, the image of the old server would not install onto the new server. The image file contains 20-30 text lines describing the system that was imaged and then the image itself. This file is quite large in some cases, takes a long time to create, and is made during an outage.

The fix, once the image is already made, is quite janky. You need to append the string 'sun4v' into the the field 'content_architectures=' at the 20th or so. The other part is you do not want to read the rest of the file. Someone came up with this and saved the day. What do you think? Was there a better approach? Is there a way to do this using command line arguments that makes sense?

Replies are listed 'Best First'.
Re: In place editing without reading further
by SuicideJunkie (Vicar) on Jan 27, 2015 at 21:47 UTC

    So, if I understand correctly, this is a file of <1000 bytes of text, followed by a few gigabytes of binary data?

    It sounds like the approach you've got works, and as long as the old string is not too short (as checked for by the code), and the end usage doesn't care about trailing whitespace in the field, you should remain fine.

    However, I would not expect it to take very long to spin through even a huge file and copy it, as long as you're not trying to parse it. You're already doing sysreads and syswrites, so you're not accidentally going to try and read it a line at a time. Why not just read in reasonably sized chunks, and copy the file if you need to increase the length of the content_architecture value?

    How long does it actually take if you read 32k at a time, grow or shrink the first chunk, and then write out to a new file until done? It would be nice to have a scheme that works all the time, and doesn't damage the original file. If you only have to process a few files occasionally, and it takes a minute instead of a second, then the extra safety margin is probably worth it.

      Hi SuicideJunkie, thanks for the comments.

      So, if I understand correctly, this is a file of <1000 bytes of text, followed by a few gigabytes of binary data?

      Yes, exactly.

      However, I would not expect it to take very long to spin through even a huge file and copy it, as long as you're not trying to parse it. You're already doing sysreads and syswrites, so you're not accidentally going to try and read it a line at a time. Why not just read in reasonably sized chunks, and copy the file if you need to increase the length of the content_architecture value?

      It takes too long. Making a copy of the whole thing would be a waste of time and space. We don't want to read it at all, much less write it out again. If we wanted backups we could use copy on write zfs snapshots to get them near instantaneously.

      There are many arguments against an extra five minutes. 10 people waiting an extra 5 minutes is 50 minutes. Five minutes lost on this server and every server that is dependent on the services that this server provides. What if one of my team mates finishes 5 minutes faster than me? That is 300 full Mississippi's of animadversions I will have to patiently bear. Not to mention the lost opportunity cost of not being able to taunt them. If you're not first, you're last!!

      If you only have to process a few files occasionally, and it takes a minute instead of a second, then the extra safety margin is probably worth it.

      Probably so but, if it doesn't work now you have two copies that don't work and even further behind.

        If the use case demands it, then so be it.

        However, it sounds like you don't actually have any hard numbers. Give it a try both ways and see how long it really takes! Benchmarks are far better than random numbers pulled out of ... the air.

        You can also consider a two-pass system, where you do the in-place option first, and if $valueLength was too short, then rewrite the file after you have finished all the easy files.

        Either way, benchmark it! Tell us how long it actually takes.

Re: In place editing without reading further
by Anonymous Monk on Jan 28, 2015 at 03:08 UTC
    If it works, it works. But all these calculations look rather error-prone to me. I would read the whole header (800 bytes?) in a string and split it by newlines.
    my @lines = split /\n/, $header, -1;
    (-1 so that split wouldn't remove trailing newlines, if any)

    Then I would find the needed line in @lines and substitute stuff like this:

    for my $magic_part ( substr $needed_line, 23 ) { die "Line is too short!" if length $magic_part < 5; $magic_part =~ tr// /c; # double magic :) substr( $magic_part, 0, 5 ) = "sun4v"; }
    I guess not many people know that substr is magic (and so is foreach loop)... but it works. It's straight from the documentation, so it can't be too bad :) tr works with empty searchlist and "c"omplement option, I don't know why, it's probably better written like this: tr/\x00-\xff/ /
    my $line = "x" x 23 . "something"; for my $magic ( substr $line, 23 ) { print length $magic, "\n"; $magic =~ tr/\x00-\xff/ /; substr( $magic, 0, 5 ) = "sun4v"; } print "~~$line~~", "\n" __END__ 9 ~~xxxxxxxxxxxxxxxxxxxxxxxsun4v ~~
    Then I would join the array with newlines again and overwrite the whole header.

    Yes, pretty obscure stuff here, but I'm very bad at math. I try to avoid it as much as I can :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://1114662]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-04-25 09:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found