Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

can I seek properly on a huge device (/dev/sdaX)

by exodist (Monk)
on May 29, 2007 at 21:57 UTC ( #618056=perlquestion: print w/replies, xml ) Need Help??
exodist has asked for the wisdom of the Perl Monks concerning the following question:

I am working on a project that requires me to be able to seek to any part of a drive, then read/write. I am using sysseek, syswrite, and sysread. I have not gotten far enough to test on an actual drive and am using 1gb disk image files. It occurs to me however that the sysseek probably keeps track of it's position on the file/device as an integer, and I know that using an integer you can only address so many bytes before it overflows. I could potentially be using this on files/devices larger than 2tb. Do I need to be worried? or is the position stored in such a way that it does not matter?

Before I say why I need this I will say I know a lot of people will have opinions on weather or not they think this project is worth doing, and weather it is worth doing in perl, let me just say I am doing this, and I am doing it in perl, I respect your opinions, however I am not asking for them. I simply have a question that I need answered, the following information is so that you know the context of the question. I am sorry if this seems rude.

I am writing a distributed database driven filesystem, but it also has a local disk filesystem components. I need to use perl to directly access any given portion of a disk at any given time, I am wondering if the sysseek function is able to seek far enough.

will this work:
I figure I am probably gonna have to seek multiple times of a given quantity before reaching the desired byte. I know that if I need to address a byte that is more than an integer's maximum value away from the current position I may need to seek a couple times to reach it passing an argument less than the maximum integer value each time.
  • Comment on can I seek properly on a huge device (/dev/sdaX)

Replies are listed 'Best First'.
Re: can I seek properly on a huge device (/dev/sdaX)
by BrowserUk (Pope) on May 29, 2007 at 22:07 UTC

    If your perl -v includes uselargefiles=define, you should be safe upto files of 1000 TB (Petabyte?), as perl will use reals to hold the positions and they are accurate upto 53 bits.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Thank you, one question remains: Do I need to navigate to a specific position in small jumps of less than the maximum integer value? or can I put a huge value (greater than size of integer) into the seek() call? Of cource the only situation I can think of that would allow for this would be if I passed a math formula as the seek length, afterall if I passed a variable it would be integer or smaller right?
Re: can I seek properly on a huge device (/dev/sdaX)
by halley (Prior) on May 30, 2007 at 14:02 UTC
    I hadn't thought about low-level disk access from perl before. I'm familiar with the "everything is a file" nature of the /dev tree under Un*x, but haven't really done much with direct access.

    The questions that come to my mind are whether you must/should do such accesses with the partition unmounted, and how you'd go about testing such modifications to sectors of an image which will be mounted as a filesystem. (And what filesystem(s) you're dealing with, out of curiosity.)

    [ e d @ h a l l e y . c c ]

      First off let me say this is more a learning experience than anything else, if something works and is useful great, otherwise it is purely educational, that being said here is what I am doing:

      This is essentially a distributed filesystem with variable redundancy, and 3 node types. A database system will keep track of everything.
      First off files are stored either on a disk directly using a filesystem I am writing into the program, or in a folder on an existing filesystem, or both, nodes can independantly choose, or even mix and match.

      The 3 node types are:
      Server - Server systems (there can be more than one) will try to maintain a copy of every file in the filesystem (if space allows, otherwise the most important ones). It also keeps track of what nodes are connected currently and which files they have. A server is allowed to maintain the only copy of a file, though generally at least one redundant copy of the file will be maintained on another node, more copies for files marked high importance.

      Client - A client will be considered unreliable, that is it is not expected to be on or connected, this means it cannot maintain the only copy of a file. When a client needs a file it will first check to see if it has a copy, if it does great, otherwise it requests a copy from one of the servers, the server then figures out the best way to get the file to the client (of the nodes that have a copy who is under the lowest load and can send it - this may be changed to a torrent liek system where multiple systems contribute.) If the client looses it's connection to the servers and cannot re-establish it then it will be put into read only mode until the connection is re-established.

      Ronin - The third type is an inbetween, it is not considered as reliable as a server, but needs to maintain read/write in the event of an outage. If files need a redundant copy and a secont server is unavailable for the redundancy then ronin's can take the slack up. Ronins can also act as servers, but they will only track files they have copies of, in the event another file is needed then they revert to asking the servers.

      How files are tracked/stored
      All files are identified by a number. The number is the files ID in the database.
      The database has the following tables: Instance, Occurence The Instance table will keep track of every instance of the file in the overall directory structure, it's columsn are similar to this (my notes are not handy):
      ID(8 bytes), NameLength(1 byte), FileName(255 bytes), Directory(8 bytes - Directories are also files, this 8 byte number is the ID for the directory), Permissions(2 bytes), Owner(4 bytes), Group (4 bytes), type (1 byte), Category (4 bytes), SubCategory (4 bytes), Importance(1 byte), A couple others not worth mentioning now

      the Occurence table keeps track of who has the files, basically it is ID, Node (IP Address)

      When files are stored in a directory on an existing filesystem it will basically write the files with the numeric ID as the filename, then in the same directory keep a file that holds the DB information for each file.

      Now the actual disk access stuff Sometimes an entire disk may be used, when this is the case we do not want the overhead of an existing filesystem. I am writing my own local filesystem for such disks.
      First the disk is divided into 4 sections, the first section is the header, it stores the basic information of how the filesystem is configured/setup, it is a static size, so the next section will come immediately after on the next byte. Allocation Section The allocation section is a mini-db, basically each db row is 8 + 8 + 8 + 8 bits (32) The first 8 store the file id, the second 8 are the start position for the file (on the drive) the next 8 are the end position for the file on the drive. The last 8 are used if a file has been appended or did not fit in an available space, it points to the next record in the allocation that contains a part of a file (fragmented files, ick, but a necessary evil) This makes it VERY easy to iterate the allocations (24 bytes each) to find the desired file (by id) then lookup where on the disk it is stored. The allocation section immediately follows the header and is written from the left to the right (as in from the beggining of the drive moving twords the end.)
      The DB section
      The DB section will be located at the end of the drive. Records will be written starting at the end of the drive and moving twords the beginning, (they will still be read in the usual direction). Each time a file is added to the drive a record is added to the db, once again it is a static size so it is easy to iterate, however it is not part of the allocations because we do not want to waste our time iterating all this information finding a file by ID when we probably already know the id because of what is kept in memory.

      the file section
      The file section is inbetween the Allocation and the DB, this is where the files are stored. The first file written will be written starting directly in the middle of the section. When a file needs to be written it will write it to one side of the 'clump' of files, it will choose the side that has the most space between it and the next closest section.

      There will also be a pruning system so that when a disk starts to run low on space it will prune files rarely used, the servers will try to balance free space on disks and the need for at least one copy of all files and more for important files.

      There are a few other things to address obviously. When a file is pruned from the disk there is a free space inside the file clump, this is unusable without extra logic, I am working on what is the best way to approach this, there are other issues as well. But for now I want to get some basic functionality going. While pruning does need to free the space on the disk, deleting a file from the overall fs will not free the space, the file will remain just without any links, this allows for easy recovery of files, however older deleted files may be pruned as space is needed.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://618056]
Approved by BrowserUk
Front-paged by Old_Gray_Bear
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (13)
As of 2018-10-16 16:03 GMT
Find Nodes?
    Voting Booth?
    When I need money for a bigger acquisition, I usually ...

    Results (86 votes). Check out past polls.