Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

What would you like to see in a Virtual Filesystem for Perl?

by NERDVANA (Curate)
on Aug 21, 2023 at 21:23 UTC ( [id://11153988]=perlquestion: print w/replies, xml ) Need Help??

NERDVANA has asked for the wisdom of the Perl Monks concerning the following question:

I have too many projects as it is, but I keep coming back to the idea that Perl ought to have a universal "virtual filesystem" module to abstract away some details of the platforms it runs on. There are a lot of ways this could go, but I have two main itches to scratch:

  1. Seamless support for Unicode file names in a Path::Class-like API.
  2. The ability to work with filesystems that may be backed by real files or by emulated filesystems, i.e. browsing zip files, ftp, webdav, iso9660, and so on, and the ability to merge them together like mounts on Linux, but without needing elevated privileges to the host system.

As it happens, there is a great CPAN namespace "VFS" that a similar-minded person uploaded in 2004 and then never finished an implementation of. I've reached out to him and it seems he might be open to the idea of handing it off to me to finish. Negotiations are ongoing.

But, before I touch such a great namespace, I'd like to collect ideas from more minds than just my own! Here are some important points that I am considering:

Unicode Filenames

On UNIX, filenames are just bytes. Unix people added unicode support through the use of "Locale" features, so that unicode-aware programs could try decoding the filenames according to the locale, but Perl does not respect the locale and always returns bytes from readdir / glob / readlink / getcwd. Also, in Perl, if you take a filename that is bytes which happen to be valid UTF-8, and then append Unicode to that string, the resulting string will not be usable as a filename. (it will flatten to bytes with a warning, but double-encode the high bytes you read from readdir, so the directory won't exist)

On Windows, Perl uses the ascii API rather than the wide-character API, but the bytes you get from readdir are dependent on the Windows Code Page. This can work if the program is configured to run in the UTF-8 codepage, but that is almost never the default, so most people get garbage when they read unicode filenames under Windows, and have to do a lot of studying before they can make it work. If you do have the utf-8 codepage, it still leaves you with the mess that you would have on Unix.

There are other filesystems where path names belong to known character sets, and not left to guessing with locales. For instance with iso9660 you know from the metadata which character set is being used, and Locale doesn't enter into it. A module that walks a iso9660 filesystem should always be understood to return unicode names, and not get tangled up with the program's Locale settings.

Proposal: While I might like a mode in Perl where readdir() returns Unicode, I suspect doing that on a global basis would break things too much, so I think a better solution is to have a Path::Class / Path::Tiny themed module where it is understood that all names given and returned will properly respect unicode. By using this module, authors can be assured that their code will work properly when presented with non-ascii directory and file names, and work cross-platform.

Virtual Filesystems

There are lots of great reasons for wanting virtual filesystems in the host, like FUSE modules, but why should we have them inside Perl?

  • Avoid messing with the Host:

    Lets say you want to walk a tree of a Git filesystem. You could check out a git branch, but that uses extra disk space, and if the program crashes it might leave behind the files which need cleaned up. You could FUSE-mount the git branch as a mounted filesystem, but if the program crashes you'd leave behind a mount point, which could cause even more trouble. (such as preventing unmounting of the parent volume) You could use a Git API for it, but then you have to use an unfamiliar API and maybe it isn't as advanced as your favorite File::Find module. Having a "virtual filesystem" in perl would solve this, as long as your favorite File::Find module could be pointed at it. If "VFS::Path" happened to have your favorite API for traversing trees, that would solve the problem.

  • Abstracting the Files Being Served:

    If you write a server for i.e. WebDAV or SFTP, the first thing those modules need is a data store of files to serve. Those modules then probably also offer you back-end hooks to handle what happens when users upload a file or want a directory listing. If there was a standard VFS for this, we could seamlessly plug together the modules that serve files with the modules that provide views of filesystems without doing a bunch of messy integration. Also, if the VFS module could be trusted to not allow symlinks to escape a designated sub-tree, that would help with security when writing these sorts of modules.

  • Minting Root Filesystems:

    I often want to create root-level tarballs of things like device nodes or root-owned files. Currently, I need to run my perl scripts as root just to be able to create the tree to pass to tar. But, it should be possible to specify these details in memory and write out the tar file directly without ever touching the filesystem metadata. A VFS module in perl userspace would allow the code designed for writing the real filesystem to write to a tar file instead, and without root access.

  • Virtualizing old code that expects root access:

    If the VFS was also able to intercept core perl file operations, you could take old perl code that expects to perform operations on root-owned files, and have that code instead modify in-memory simulations of those file systems. This could be handy for unit tests, or just adapting with old code without a rewrite.

Proposal:
To deal with all of these, I think the virtual filesystem should have independent filesystem objects, so they aren't all interconnected by default, and then an optional ability to use one of them to override the core perl IO operations. Each filesystem should have the ability to mount other filesystems at arbitrary paths, and each should have the ability to derive a "chroot" filesystem from an arbitrary path.

Problems

  • Windows has a concept of "volumes", and Unix does not. Should the VFS have a concept of volumes-per-filesystem? or a concept of global root volumes which virtual filesystems can be mounted on? Or skip volumes entirely and let that be a "Windows user problem"? I'm leaning toward volumes-per-filesystem where most filesystems just have a default volume of '' (empty string) and then design the API in a way that avoids referencing volume name most of the time.
  • In order to fully emulate the real filesystem using a virtual filesystem, I will need to track the "current directory" independent from the real filesystem. That way relative paths will resolve correctly. This also means I will need to fully resolve relative paths before using them. (so, adds overhead cost, but then that helps with implementing chroots). I think in the case where the filesystem is the real filesystem with no mounts and no chroots, I can optimize by using the real "chroot" and pass relative paths to the OS, avoiding the overhead. Thoughts?
  • In order to override core perl file operations, I think I need XS. I can override CORE::GLOBAL::..., but if a module e.g. has a method named "open" then they will use "CORE::open" any time they need the one that isn't their own method, and that defeats the override I would make of CORE::GLOBAL::open. Even then, overriding PerlIO in XS won't help for XS modules that use other C libraries to open files. I'm not sure how successful this feature would be overall. Thoughts?

Prior Work

I'm not the first one with this idea, of course. So far, I've found:
  • Filesys::POSIX

    This module implements a full POSIX virtual filesystem, though as the name implies, it does not handle any Windows concepts like volumes or alternate path separators. It makes the odd choice to throw exceptions for failed operations, including 'stat' which many users would use to test for existence of files. Tests currently fail on BSD and Win32. Aside from these problems, it is a very complete implementation. Oddly, there don't seem to be any CPAN plugins built on it.

  • Filesys::Virtual

    This module intends to be a VFS, but lacks any specification of how the API should behave, and was last updated in 2009. It also lacks an API for file ownership (chmod etc). CPAN has implementations for SSH, DAAP, and a FUSE adapter to use it as the back-end for a real mounted filesystem.

  • VFSsimple

    Very sparse API (insufficient for most uses), and last updated 2007. CPAN has implementations for ISO, FTP, HTTP, and "rsync" (which just uses rsync to clone a remote file system locally)

  • File::Redirect

    Same idea of redirecting global PerlIO into a module, but the implementation is limited to stat / open / close, uses XS, doesn't work on perls newer than 5.20, and was last updated in 2012. It comes with support for mounting Zip files into the virtual filesystem.

What Am I Forgetting?

So, if you made it through all of that, what I'm looking for are ideas! What am I forgetting? What other features would you like to see? What do you feel are deficiencies in the current popular path modules like Path::Class or Path::Tiny? Should I just be building on some other CPAN module?

Also, I wrote a rough draft of the POD for such a module at https://github.com/nrdvana/perl-VFS/blob/main/lib/VFS.pm

  • Comment on What would you like to see in a Virtual Filesystem for Perl?

Replies are listed 'Best First'.
Re: What would you like to see in a Virtual Filesystem for Perl?
by Corion (Patriarch) on Aug 22, 2023 at 07:34 UTC

    I'm toying with Filesystem-like systems and the one thing I'd really like is a better abstraction for File::Find / File::Find::Rule. Especially being able to formulate queries as SQL is a very nice thing that allows interesting queries that would have to be programmed otherwise. Having queries for the file content is another interesting thing but might go too far. Adding content-queries is just a single attribute in the query language anyway.

    Another thing is being able to nest VFSes, so that I can treat archives as directories, and I can access files over ssh / sftp, and I can also access archives as directories via ssh.

    My two attempts at this live at FFRIndexed and Filesys::DB.

      I'm all in favor of adding lots of shortcuts to other modules, like how Path::Class does for File::Temp and so on.

      The first idea I get from looking at your modules would be something like making DBIC resultsets out of file trees, like

      $fileSet= $path->find( name => qr/.../, size => '>=4000', dir_filter => sub($d) { ... }, ); # Throws an exception unless MIME::Detect is installed $fileSet= $fileSet->find(mime_type => 'text/plain'); my $iter= $fileSet->iter; # depth-first, unless parameter "bfs" given while (&$iter) { ... } # if the list is known to be small my @files= $fileSet->all;
      Then your module could adapt those into a SQL query and cache a directory tree, and then use the same path objects to query that database.
Re: What would you like to see in a Virtual Filesystem for Perl?
by afoken (Chancellor) on Aug 22, 2023 at 20:40 UTC

    I don't think Perl should implement a VFS. Simply because there are so many filesystems around, and as far as I understand your idea, that would require one module per filesystem. wc -l /prof/filesystems on a random Debian box shows 33 filesystems, including FUSE, which may be used to implement many more filesystems, perhaps even homegrown ones. Also, you need to know which filesystem is mounted in each and every directory, and you will probably also need to know the mount options (Linux can mount FAT and friends with different codepages, see mount).

    I would like to see a different approach: A (maybe highly magical) use unicodepaths; that makes all filesystem functions (limited to a scope) accept and return Unicode strings.

    As far as a I understand Windows, this would essentially mean to switch from the legacy ANSI API to the Wide (Unicode) API. Windows would perhaps be a good testbed for that switch, as it has an API that explicitly expects and returns Unicode.

    For Linux and other Unix systems, some more thinking is needed. You basically need to know if the filenames are just bytes or if they are encoded in UTF-8.

    Perhaps just guessing and trying to convert may work good enough for Unix:

    Any filename returned from the operating system should be treated as bytes, unlesss unicodepaths is active. If unicodepaths is active, try to decode the bytes as UTF-8. If that succeeds, use the result as Unicode string. If that fails, keep the bytes as-is, and don't set the UTF-8 flag on the returned filename.

    Any filename passed to the operating system should be encoded to a UTF-8 byte stream if unicodepaths is active and the filename has the UTF-8 flag set. If unicodepaths is not active and/or the filename has the UTF-8 flag cleared, no encoding should happen. If unicodepaths is not active but the filename has the UTF-8 flag set, a warning should be issued.

    (That warning does not seem to happen on my Debian box: perl -w -E '$fn="x\x{ABCD}"; open my $f,">",$fn; say $f "hi"; close $f;' does not warn at all. Perl is v5.32.1 for x86_64.)

    Both combined should allow Perl to see Unicode where Unicode happens, while not messing with the encoding for non-Unicode filenames.

    Maybe this idea needs some more relaxed encoding of UTF-8 to allow a round-trip of any random bytes in a filename.

    Maybe this idea needs to split paths and handle each element of the path separately.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      as far as I understand your idea, that would require one module per filesystem. wc -l /prof/filesystems on a random Debian box shows 33 filesystems, including FUSE

      Not quite; my idea is that this is all represented by "Unix Native Filesystem" because they all share the same API for querying the files. So, in its default state, VFS would just pass through to "Unix Native Filesystem" or "Windows Native Filesystem" and essentially provide nothing but unicode handling for them.

      The multiple-filesystem aspect comes into play when you want to do something like browse a zip file: open(my $f, "<", "~/example.zip/path/to/Foo.txt") Currently, your only option for that is a FUSE module like fuse-zip The downside is you need to install a set-uid program for that, and the mounts of zip files are visible system-wide to all users, modulo permissions. I would much rather have the zipfile "mounted" exclusively inside the perl interpreter.

      For Linux and other Unix systems, some more thinking is needed. You basically need to know if the filenames are just bytes or if they are encoded in UTF-8. Perhaps just guessing and trying to convert may work good enough for Unix

      I thought I summed this up about how Unix uses 'locale', but I might be wrong! I can't find any reference to an official standard for respecting LC_ALL in path names. I've decided to make a "Meditation" about it. Coming up soon...

      Edit: Meditation complete! (wow that used up most of my evening...)

Re: What would you like to see in a Virtual Filesystem for Perl?
by zmughal (Acolyte) on Aug 23, 2023 at 13:43 UTC
    This is a great idea. I remember wanting to have VFS support for some applications before, but didn't fully investigate what the CPAN modules out there can do. I just want to point out that it might be worth looking at Tcl's VFS implementation that they have as a core part of the language meaning it can be used at the C level and at the Tcl level. Another thing to look at is the Glib VFS library, but obviously Glib is not going to be available easily everywhere.
Re: What would you like to see in a Virtual Filesystem for Perl?
by etj (Deacon) on Mar 17, 2024 at 22:25 UTC
    Things you may not have seen are:
    • a virtual filesystem in the Make tests: https://github.com/klp2/Make/blob/master/t/make.t (the Make module has an abstracted filesystem to make testing it practicable without the File::Temp torture needed in EUMM testing)
    • a virtual sort-of filesystem in App::cpanel which implements an async sort-of WebDAV-like thing to do two-way mirroring of a cPanel website
    The latter I wanted to make more general as a "mirror this into that", so the "this" and "that" could be things like a remote website, or an XML document, or a database, or... But I haven't done it yet.
Re: What would you like to see in a Virtual Filesystem for Perl?
by mwray (Initiate) on Mar 09, 2024 at 16:35 UTC
    I'd like to see this with the ability to implement userspace sshfs export on specified directories, or at least hooks to do so. I'm trying to work on a way to remote into a machine and run tools on the remote side, but have the output redirect to the calling machine, especially in an environment where the remote machine may be gapped, but it may be necessary to export logs for further analysis. In some cases logs would need to be extracted from dbs, and in some cases the logs are in text files. Many times, the analysis cannot be done on the remote host as there isn't enough space to do so without causing problems. And sometimes there's not enough space to actually collect the logs into a bundle.
      I'm not quite sure how that would work. Do you mean that you want a remote perl script to reach backward across an SSH connection to write files onto your workstation?

      I know the SSH2 protocol can do some fancy things, but I haven't studied the details. I think the closest you could get with existing tools is to connect to that remote system with a port-forward of your own host's SSH port.

      ssh -R 54321:localhost:22 user@remote_system

      Now, the remote system can ssh back to you on port 54321. Then you could run sshfs on the remote system, and have your local files appear mounted on that remote system, where your application could write its log files. However, I don't know if sshfs is fancy enough to forward the lines of logging as they get written - it might do something like cache them locally until the file is closed then xfer the whole file.

      I think your best approach would be to make the remote application have the option of logging everything to stdout, instead of files. Then you can just see it on your existing ssh connection.

Re: What would you like to see in a Virtual Filesystem for Perl?
by harangzsolt33 (Chaplain) on Mar 20, 2024 at 13:39 UTC
    First of all, what is a virtual file system? How do you define it? Is it like a ramdrive? Is it just another archive file format? Is it a file system stored in a file like iso files? Is it an alternative or advanced version of the "mount" command which allows you to connect hardware devices to your linux file system? My next question would be: why? Who needs a virtual file system and why? For what purpose?

      It's an abstraction on the filesystem layer of your perl script, so that you can plug in other behavior than actually reaching out to the OS to access real files.

      You'd want it for all the same reasons that you might use sshfs, gitfs, mount.cifs, etc. I thought I described the advantage of having it in userspace instead of kernel space fairly well in my post...?

        I see... Interesting. So, does this mean you have to overload builtin functions like open() and seek() to make this work?
      First of all, what is a virtual file system?

      You could have looked this up in less time that it took to post these questions, answering literally all of them....
        My take on their post was that they wanted to better understand what this was about.
          A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11153988]
Approved by hippo
Front-paged by Discipulus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-05-22 00:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found