Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

RFC: Fuse::DBI - mount database as filesystem

by dpavlin (Friar)
on Oct 09, 2004 at 00:59 UTC ( #397814=perlmeditation: print w/ replies, xml ) Need Help??

From time to time I wanted to access data in my database as a filesystem. Wouldn't it be nice to just say

vi /db/invoice/2004/01/03/paid.txt
To edit one field instead of writing SQL? But, after installation of WebGUI I realised that I have to do something. All templates are stored in database, and edit box is so small (and without support for vi key bindings :-).

At about same time, I stumbeled upon Fuse project with it's perl bindings. I had idea, and it was simple: write a bridge between database and filesystem.

That bridge is called Fuse::DBI. I just announced it on WebGUI site, but now I have doubt: is this module right material for CPAN?

I should stress that this module isn't general purpose database to filesystem bridge. It allows user to export one part of database (usually one or more tables) as a filesystem (much like /proc filesystem on Linux).

This is done with three SQL queries: one to return filenames, lenghts and write permission, another to return content and third one to update content. It also has optional code hook to invalidate cache (on disk cache created from database for example) if needed.

Advantage of this approach is ability to use different filesystem tools to access data from database without writing specific support for each of them (like editors, ftp servers and so on).

On one hand, we have binding of Lufs (other project with similar goal) on CPAN allready. On other hand, Lufs module is binding for lufs, not module which uses another perl module which isn't on CPAN.

What is best practice for situations like this? Do you like idea to mount database as filesystems?
Please be gantle. I don't think that code is best example how to write perl module, partly because of functional interface in Fuse module itself. But, result is so interesting that I wanted to share it.

Updated 2004-10-09: added explanation about usage of this module.

2share!2flame...

Comment on RFC: Fuse::DBI - mount database as filesystem
Download Code
Re: RFC: Fuse::DBI - mount database as filesystem
by mpeppler (Vicar) on Oct 09, 2004 at 07:58 UTC
    Note: I haven't looked at the code of your module...

    Sharing is good, even if I personally don't find the idea of mounting a SQL database as a filesystem appealing or even useful - I see too many portential issues (locks, concurrent access, memory use, not to mention the whole concept of "relational" which would go out the window.)

    But I wouldn't be surprised if others found this appealing.

    Michael

      I guess that my explanation wasn't clear enough: this is not general purpose database to filesystem module: it just allows user to write one SQL query to fetch files (with optional slashes in name to create directory), one SQL query to fetch content and one to update it.

      This design doesn't allow you to mount whole database as filesystem. It allows you to export one part of data which might be easier editable using filesystem.

      Concurrent access and locking is solved using transactions. Every write commits transaction to database (which is started at file open). That should solve problems of concurrent access through application and filesystem.

      I don't have problem with sharing my code (I wouldn't be writing this if I had :-), but I wondered if uploading to CPAN is the right way to share code that depends on other perl modules not available on CPAN.


      2share!2flame...
Re: RFC: Fuse::DBI - mount database as filesystem
by cchampion (Curate) on Oct 09, 2004 at 08:44 UTC

    DBIx::FileSystem does something very similar to what you say.

    Be aware that both systems fail for the very same reason, i.e. that tables aren't files.

    Information in relational database tables often does not make sense unless it is taken together with information in other tables with complex joins and search conditions.

    Ancient hierarchical databases could have this one-to-one similarity, but they failed because searching through them was only available in predefined directions, while relational DBMS systems allow greater flexibility.

    Even if you find a way of reducing complex queries to a simple file interface, so that I can see a financial report as a file and edit it with Vi, such interface would need to have several well written queries in the background, and this leaves us with the same dilemma we had with hierarchical databases, i.e. without a pre-defined path, we can't have an answer to our questions.

    I'd rather write my own queries in Vim and use a sensible script to fetch the results.

      I also think that KISS principle is best. So, as noted before, this module doesn't try to export complex relations in database. User have to create queries to export just part of database in which he or she is interested (good example included in distribution is WebGUI script which exports just templates). It's very much like vim script that you refer to (which is very useful if you just want to edit data using vim, ++ for that).

      But, having part of database as filesystem allows you to use all filesystem utilities like grep, cp (rm and mv are not supported at the moment) as well as different editors, ftp or http servers and so on. You can even create tar of your data before you start to edit it, just as backup.


      2share!2flame...
Re: RFC: Fuse::DBI - mount database as filesystem
by zentara (Archbishop) on Oct 09, 2004 at 15:03 UTC
    People have been talking about implementing this in linux for years. Go to http://groups.google.com and enter "linux sql filesystem" into the searchbox. Right now, it's about at the same state of development as the crypto-filesystem was 3 years ago. It will be here eventually.

    I'm not really a human, but I play one on earth. flash japh
Re: RFC: Fuse::DBI - mount database as filesystem
by fergal (Chaplain) on Oct 09, 2004 at 15:27 UTC
    Update: This node seems to have been downvoted quite a few times (it's hard to tell how many) but has no replies. Anyone care to explain to me what is the fundamental flaw in the scheme below? I'm not saying it's the "one true way" to access a database or anything, just that many dbs have a hierarchical structure embedded in them and so being able to access that through a hierarchical interface (a filesystem) makes a lot of sense, to me anyway.

    I'd says stick it up there. The more the merrier. I certainly would like to see it even just as an example to base other work on.

    Some other comments have pointed out that tables do not map to a filesystem but I don't agree. A table maps perfectly to a directory full of subdirectories (one for each row in the table). Each name of each subdirectory should correspond to the primary key for that row. Each row's directory would have a file for each field which you can edit directly. Any field that is a foreign key, referencing another table could have 2 entries, one for the actual value and another that is a symlink to the row in the table that is referenced.

    For example here's the table definitions.

    CREATE TABLE persons (name CHAR(50) PRIMARY KEY, company char(50) refe +rences companies.name); CREATE TABLE companies (name CHAR(50) PRIMARY KEY, business char(50));
    now wandering around the filesystem would look like
    > cd db > ls persons companies > ls persons Fergal Daly Bill O'Brien > cd "persons/Fergal Daly" > ls name company company:row > cat company Blogtronic > ls -l company:row cmopany:row -> ../../companies/Blogtronic > ls company:row name business > cat company:row/business Maker of electric lawnmowers and dangerous toys
    tables map just fine onto a filesystem. You could even get fancy and provide automatic reverse lookups on foreign keys
    > cd companies/Blogtronic > ls name business :referers > ls :referers persons > ls referers/persons company > ls referers/persons/company Fergal Daly ... other rows of persons who's company field references Blogtronic's +primary key
    This allows very compfortable wandering around the database, editing fields etc. It's not a great interface for queries though. I think this is actually something like reiserfs's goal.

      I actually upvoted it and also saw that it had bad reputation. My academic education included RDBMS from theory to practice, and I have been working with Open Source RDBMS them since 1996 or so (does anybody remembers msql?). So, I think that my expirince is a bit towards hard-code RDBMS through SQL crowd as well.

      During design phase of Fuse::DBI , I though about RDBMS mapping to filesytem and concluded that model like yours is extremly powerful in presenting data from database. I even planned to impement it just as you describe (actually, simplier, I really like idea about how to use symlinks). But, editing problems (especially adding data) stopped me from using this approach and selecting much simplier one. Why should I browse through files if I know how to write SQL to select/update data?

      Having said all this, I must stress that I can understand why there is negative feeling about exploring database through filesystem in general sense, but having simple and defined interface to it might be of good use. It's almost like SOAP vs REST interface. SQL vs filesystem is same: one if better for some application while other is for others.


      2share!2flame...

      A databse is not strictly a heirarchal tree. This is a rather common mis-application of a tree-type datastructure.

      A tree is a conceptually simple datastructure. There's a root node with one or more children, and each child has one or more children, and so on. Strictly speaking, trees are acylic (this is where problems arise--read on). It's a good datastructure for a lot of problems, but it is not the only one, nor is it the best solution to all problems.

      Relational Theory is based on set theory. You can implement a set as a tree, and relational databases will support it (though you have to do some hoop-jumping to get SQL to handle it). However, there are many other structures you could also build into a relational database (ones which SQL is also much better equipted to handle, though this is SQL's failure, not Relational Theory).

      IMHO, people over-apply trees, and then try to introduce hacks when the structure doesn't fully map to the solution. For instance, symbolic links in a filesystem could be considered an attempt to coerce a tree structure into a general set structure. Symbolic links mean that your file tree is no longer acylic.

      What I really want to see is the opposite direction the OP was thinking of: make the filesystem a relational database. Kill off the tree structure entirely, and lump all the files together. You get specific files by running some query language against them. There are projects doing something more or less like this (Microsoft's WinFS and ReiserFS 4, to name two), but nothing usable on a production system yet.

      "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

        A databse is not strictly a heirarchal tree. This is a rather common mis-application of a tree-type datastructure.

        I never said it was a hierarchical tree, I said "many dbs have a hierarchical structure embedded in them" and since the filesystem is probably the most familiar heirarchical structure to most users, providing a filesystem interface is a useful thing.

        I'm going to expand my claim and say that the FS interface is good not just for databases that have trees but for databases with general graphs also.

        As I said, I'm not advocating this as a general DB access method, it's terrible for queries that involve large amounts of data. However it's great for queries that use small amounts of data but span many tables. SQL queries generally lack state/context whereas a fileystem (well actually I should say a shell) has a very useful, very intuitive piece of state - the current working directory. This allows you to wander around. Using an SQL interface, wandering around is a pain.

        Consider a database of people who may or may not be related to each other in various ways, it's schema is

        CREATE TABLE person ( name CHAR(50) PRIMARY KEY, spouse char(50) references person.name, father char(50) references person.name, mother char(50) references person.name, best_friend char(50) references person.name, boss char(50) references person.name );
        Now I want to know who is Larry Wall's spouse's best friend's mother's boss's father.

        I'm not even going to attempt to write the SQL, it's horrendous. It requires either 5 separate queries with lots of copy and pasting of names or a 5-fold join on the persons tables.

        The virtual fs just needs

        cat "persons/Larry Wall/spouse:row/best_friend:row/mother:row/boss:row +/father:row/name"
        and I get to use tab completion all the way.

        There are graphical DB explorers that allow you to do the same thing but the problem with them is that they generally do not interface to anything else, a filesystem interfaces to pretty much everything.

        Notice that all my relations were 1-1 (actuall n-1 would be ok too), this means that I can write "person/Larry Wall/spouse:row/name" and get a unique answer. I haven't thought too much about how to represent 1-n relations. It requires building a way of expressing more complex queries, which could be done - persons/Larry Wall/children:with/hair=brown - but that might be a bit crazy.

        Whether using symlinks is a good idea or not I suppose is debatable, they are not necessary. Since the filesystem is virtual, where I wrote

        > ls -l company:row cmopany:row -> ../../companies/Blogtronic > ls company:row name business
        it could easily have been
        > ls -l company:row cmopany:row # not a symlink > ls company:row name business
        that is company:row is presented as a real directory which exists below the "Fergal Daly" row and appears to exist independently of the table from which it comes. There are advantages and disadvantages to this approach. Without symlinks, our current working directory encodes the path we took to get where are however if we have done a lot of cding then our path will get ridiculously long and may cause problems. Using symlinks keeps the current directory nice and short but throws away the history of how we got here, all we know is the name of the current table and the value of the primary key, we know nothing about what relations we followed to get here.

        I am also (nervously!) looking forward to database-backed filesystems, they'll make organising your files a lot easier but they worry me, simply from a way-too-complicated point of view. I've long wanted a system where my files are stored on a "real filesystem" but I can also access them through a virtual filesystem which exploits lots of metadata to present a far more flexible view of my files than a simple tree.

        Strictly speaking, trees are acylic (this is where problems arise--read on).

        I'm not sure what problems you mean here, the only thing I can think of is your reference to symlinks turning a tree structure into a graph but I'd say that's a solution, not a problem.

        "A databse is not strictly a heirarchal tree."

        Neither is a traditional UNIX file system. A UNIX filesystem is a directed graph. It's only the "no hardlinks to directories" heresy that's keeping you from being able to descibe arbitrary networks of relations in terms of filesystems: every table a directory, every key a link.

        You can use symlinks instead of hardlinks, if you're scared of programs like "find" going all pear-shaped following hard links. Old school UNIX hackers aren't afraid of tickling the dragon's tail.

Re: RFC: Fuse::DBI - mount database as filesystem
by SpanishInquisition (Pilgrim) on Oct 11, 2004 at 17:09 UTC
    The benefit of SQL is in running complex and arbitrary queries, while a filesystem is limited to queries based on less variables.

    It sounds like you are just looking for a way to simplify database access. But what's the point if you can't find all level 3 accountants not within 3 years of retirement who live in Ohio that have not yet been to training class 256 in the last 18 months and have not had training class 214 in the last 12 months? Sounds silly? Databases can do that. Filesystems can't. I'd absolutely hate to see a filesystem implementation of that query -- why, because it would make one's brain explode. Databases involve more complex data structures than simple hierachies.

    Consider effiency lost -- if you disagree, please justify the running time of the algorithm you would use to search a filesystem dump of a database.

    Proc is one thing, but proc isn't trying to be a SQL database. They serve different ends.

    If you want to write a filesystem driver to interact with a database, that might be interesting ... kind of Plan 9-ish.

      It does not matter that you cannot have so complicated queries in the filesystem interfaces. It is not about emulating databases by filesystems it is about connecting tools working on filesystems, like a text editor, to a database.
        Well, that's the point of proc. I fail to see the need for a text editor to edit individual keys of individual tables directly, if that's the case, I'll pull up psql or equivalent...

        Single entry editing by a text editor is never used, especially when you are editing records...and what about binary data in records, etc? If you have one record per file that could get ugly...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://397814]
Approved by blokhead
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (9)
As of 2014-08-29 20:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (289 votes), past polls