Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re: RFC: Fuse::DBI - mount database as filesystem

by fergal (Chaplain)
on Oct 09, 2004 at 15:27 UTC ( [id://397874]=note: print w/replies, xml ) Need Help??

in reply to RFC: Fuse::DBI - mount database as filesystem

Update: This node seems to have been downvoted quite a few times (it's hard to tell how many) but has no replies. Anyone care to explain to me what is the fundamental flaw in the scheme below? I'm not saying it's the "one true way" to access a database or anything, just that many dbs have a hierarchical structure embedded in them and so being able to access that through a hierarchical interface (a filesystem) makes a lot of sense, to me anyway.

I'd says stick it up there. The more the merrier. I certainly would like to see it even just as an example to base other work on.

Some other comments have pointed out that tables do not map to a filesystem but I don't agree. A table maps perfectly to a directory full of subdirectories (one for each row in the table). Each name of each subdirectory should correspond to the primary key for that row. Each row's directory would have a file for each field which you can edit directly. Any field that is a foreign key, referencing another table could have 2 entries, one for the actual value and another that is a symlink to the row in the table that is referenced.

For example here's the table definitions.

CREATE TABLE persons (name CHAR(50) PRIMARY KEY, company char(50) refe +rences; CREATE TABLE companies (name CHAR(50) PRIMARY KEY, business char(50));
now wandering around the filesystem would look like
> cd db > ls persons companies > ls persons Fergal Daly Bill O'Brien > cd "persons/Fergal Daly" > ls name company company:row > cat company Blogtronic > ls -l company:row cmopany:row -> ../../companies/Blogtronic > ls company:row name business > cat company:row/business Maker of electric lawnmowers and dangerous toys
tables map just fine onto a filesystem. You could even get fancy and provide automatic reverse lookups on foreign keys
> cd companies/Blogtronic > ls name business :referers > ls :referers persons > ls referers/persons company > ls referers/persons/company Fergal Daly ... other rows of persons who's company field references Blogtronic's +primary key
This allows very compfortable wandering around the database, editing fields etc. It's not a great interface for queries though. I think this is actually something like reiserfs's goal.

Replies are listed 'Best First'.
Re^2: RFC: Fuse::DBI - mount database as filesystem
by dpavlin (Friar) on Oct 09, 2004 at 17:59 UTC

    I actually upvoted it and also saw that it had bad reputation. My academic education included RDBMS from theory to practice, and I have been working with Open Source RDBMS them since 1996 or so (does anybody remembers msql?). So, I think that my expirince is a bit towards hard-code RDBMS through SQL crowd as well.

    During design phase of Fuse::DBI , I though about RDBMS mapping to filesytem and concluded that model like yours is extremly powerful in presenting data from database. I even planned to impement it just as you describe (actually, simplier, I really like idea about how to use symlinks). But, editing problems (especially adding data) stopped me from using this approach and selecting much simplier one. Why should I browse through files if I know how to write SQL to select/update data?

    Having said all this, I must stress that I can understand why there is negative feeling about exploring database through filesystem in general sense, but having simple and defined interface to it might be of good use. It's almost like SOAP vs REST interface. SQL vs filesystem is same: one if better for some application while other is for others.

Re^2: RFC: Fuse::DBI - mount database as filesystem
by hardburn (Abbot) on Oct 11, 2004 at 13:53 UTC

    A databse is not strictly a heirarchal tree. This is a rather common mis-application of a tree-type datastructure.

    A tree is a conceptually simple datastructure. There's a root node with one or more children, and each child has one or more children, and so on. Strictly speaking, trees are acylic (this is where problems arise--read on). It's a good datastructure for a lot of problems, but it is not the only one, nor is it the best solution to all problems.

    Relational Theory is based on set theory. You can implement a set as a tree, and relational databases will support it (though you have to do some hoop-jumping to get SQL to handle it). However, there are many other structures you could also build into a relational database (ones which SQL is also much better equipted to handle, though this is SQL's failure, not Relational Theory).

    IMHO, people over-apply trees, and then try to introduce hacks when the structure doesn't fully map to the solution. For instance, symbolic links in a filesystem could be considered an attempt to coerce a tree structure into a general set structure. Symbolic links mean that your file tree is no longer acylic.

    What I really want to see is the opposite direction the OP was thinking of: make the filesystem a relational database. Kill off the tree structure entirely, and lump all the files together. You get specific files by running some query language against them. There are projects doing something more or less like this (Microsoft's WinFS and ReiserFS 4, to name two), but nothing usable on a production system yet.

    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

      A databse is not strictly a heirarchal tree. This is a rather common mis-application of a tree-type datastructure.

      I never said it was a hierarchical tree, I said "many dbs have a hierarchical structure embedded in them" and since the filesystem is probably the most familiar heirarchical structure to most users, providing a filesystem interface is a useful thing.

      I'm going to expand my claim and say that the FS interface is good not just for databases that have trees but for databases with general graphs also.

      As I said, I'm not advocating this as a general DB access method, it's terrible for queries that involve large amounts of data. However it's great for queries that use small amounts of data but span many tables. SQL queries generally lack state/context whereas a fileystem (well actually I should say a shell) has a very useful, very intuitive piece of state - the current working directory. This allows you to wander around. Using an SQL interface, wandering around is a pain.

      Consider a database of people who may or may not be related to each other in various ways, it's schema is

      CREATE TABLE person ( name CHAR(50) PRIMARY KEY, spouse char(50) references, father char(50) references, mother char(50) references, best_friend char(50) references, boss char(50) references );
      Now I want to know who is Larry Wall's spouse's best friend's mother's boss's father.

      I'm not even going to attempt to write the SQL, it's horrendous. It requires either 5 separate queries with lots of copy and pasting of names or a 5-fold join on the persons tables.

      The virtual fs just needs

      cat "persons/Larry Wall/spouse:row/best_friend:row/mother:row/boss:row +/father:row/name"
      and I get to use tab completion all the way.

      There are graphical DB explorers that allow you to do the same thing but the problem with them is that they generally do not interface to anything else, a filesystem interfaces to pretty much everything.

      Notice that all my relations were 1-1 (actuall n-1 would be ok too), this means that I can write "person/Larry Wall/spouse:row/name" and get a unique answer. I haven't thought too much about how to represent 1-n relations. It requires building a way of expressing more complex queries, which could be done - persons/Larry Wall/children:with/hair=brown - but that might be a bit crazy.

      Whether using symlinks is a good idea or not I suppose is debatable, they are not necessary. Since the filesystem is virtual, where I wrote

      > ls -l company:row cmopany:row -> ../../companies/Blogtronic > ls company:row name business
      it could easily have been
      > ls -l company:row cmopany:row # not a symlink > ls company:row name business
      that is company:row is presented as a real directory which exists below the "Fergal Daly" row and appears to exist independently of the table from which it comes. There are advantages and disadvantages to this approach. Without symlinks, our current working directory encodes the path we took to get where are however if we have done a lot of cding then our path will get ridiculously long and may cause problems. Using symlinks keeps the current directory nice and short but throws away the history of how we got here, all we know is the name of the current table and the value of the primary key, we know nothing about what relations we followed to get here.

      I am also (nervously!) looking forward to database-backed filesystems, they'll make organising your files a lot easier but they worry me, simply from a way-too-complicated point of view. I've long wanted a system where my files are stored on a "real filesystem" but I can also access them through a virtual filesystem which exploits lots of metadata to present a far more flexible view of my files than a simple tree.

      Strictly speaking, trees are acylic (this is where problems arise--read on).

      I'm not sure what problems you mean here, the only thing I can think of is your reference to symlinks turning a tree structure into a graph but I'd say that's a solution, not a problem.

        I'm not sure what problems you mean here, the only thing I can think of is your reference to symlinks turning a tree structure into a graph but I'd say that's a solution, not a problem.

        Because the symlinks make your tree not really a tree anymore, but all the tools available want it to be a tree and often need modifications when symlinks enter the picture. For instance, should tar get the data from the symlinked file or make an entry for the symlink in the archive? The answer depends on various circumstances that cannot be coded into tar itself. The best tar can do is let the human operator decide.

        Symlinks are useful and often necessary to express certain relationships in the filesystem. But the reason they are there is that almost everything outside of acadamia sees the file system as a tree and won't accept much else. They are a hack for a poor datastructure. A useful and necessary hack, but a hack. We'd do much better if we had filesystems that operated as a generalized set instead of a strict tree.

        "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

      "A databse is not strictly a heirarchal tree."

      Neither is a traditional UNIX file system. A UNIX filesystem is a directed graph. It's only the "no hardlinks to directories" heresy that's keeping you from being able to descibe arbitrary networks of relations in terms of filesystems: every table a directory, every key a link.

      You can use symlinks instead of hardlinks, if you're scared of programs like "find" going all pear-shaped following hard links. Old school UNIX hackers aren't afraid of tickling the dragon's tail.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://397874]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2024-07-20 12:09 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.