Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^3: RFC: Fuse::DBI - mount database as filesystem

by fergal (Chaplain)
on Oct 11, 2004 at 17:08 UTC ( [id://398217]=note: print w/replies, xml ) Need Help??


in reply to Re^2: RFC: Fuse::DBI - mount database as filesystem
in thread RFC: Fuse::DBI - mount database as filesystem

A databse is not strictly a heirarchal tree. This is a rather common mis-application of a tree-type datastructure.

I never said it was a hierarchical tree, I said "many dbs have a hierarchical structure embedded in them" and since the filesystem is probably the most familiar heirarchical structure to most users, providing a filesystem interface is a useful thing.

I'm going to expand my claim and say that the FS interface is good not just for databases that have trees but for databases with general graphs also.

As I said, I'm not advocating this as a general DB access method, it's terrible for queries that involve large amounts of data. However it's great for queries that use small amounts of data but span many tables. SQL queries generally lack state/context whereas a fileystem (well actually I should say a shell) has a very useful, very intuitive piece of state - the current working directory. This allows you to wander around. Using an SQL interface, wandering around is a pain.

Consider a database of people who may or may not be related to each other in various ways, it's schema is

CREATE TABLE person ( name CHAR(50) PRIMARY KEY, spouse char(50) references person.name, father char(50) references person.name, mother char(50) references person.name, best_friend char(50) references person.name, boss char(50) references person.name );
Now I want to know who is Larry Wall's spouse's best friend's mother's boss's father.

I'm not even going to attempt to write the SQL, it's horrendous. It requires either 5 separate queries with lots of copy and pasting of names or a 5-fold join on the persons tables.

The virtual fs just needs

cat "persons/Larry Wall/spouse:row/best_friend:row/mother:row/boss:row +/father:row/name"
and I get to use tab completion all the way.

There are graphical DB explorers that allow you to do the same thing but the problem with them is that they generally do not interface to anything else, a filesystem interfaces to pretty much everything.

Notice that all my relations were 1-1 (actuall n-1 would be ok too), this means that I can write "person/Larry Wall/spouse:row/name" and get a unique answer. I haven't thought too much about how to represent 1-n relations. It requires building a way of expressing more complex queries, which could be done - persons/Larry Wall/children:with/hair=brown - but that might be a bit crazy.

Whether using symlinks is a good idea or not I suppose is debatable, they are not necessary. Since the filesystem is virtual, where I wrote

> ls -l company:row cmopany:row -> ../../companies/Blogtronic > ls company:row name business
it could easily have been
> ls -l company:row cmopany:row # not a symlink > ls company:row name business
that is company:row is presented as a real directory which exists below the "Fergal Daly" row and appears to exist independently of the table from which it comes. There are advantages and disadvantages to this approach. Without symlinks, our current working directory encodes the path we took to get where are however if we have done a lot of cding then our path will get ridiculously long and may cause problems. Using symlinks keeps the current directory nice and short but throws away the history of how we got here, all we know is the name of the current table and the value of the primary key, we know nothing about what relations we followed to get here.

I am also (nervously!) looking forward to database-backed filesystems, they'll make organising your files a lot easier but they worry me, simply from a way-too-complicated point of view. I've long wanted a system where my files are stored on a "real filesystem" but I can also access them through a virtual filesystem which exploits lots of metadata to present a far more flexible view of my files than a simple tree.

Strictly speaking, trees are acylic (this is where problems arise--read on).

I'm not sure what problems you mean here, the only thing I can think of is your reference to symlinks turning a tree structure into a graph but I'd say that's a solution, not a problem.

Replies are listed 'Best First'.
Re^4: RFC: Fuse::DBI - mount database as filesystem
by hardburn (Abbot) on Oct 11, 2004 at 17:49 UTC

    I'm not sure what problems you mean here, the only thing I can think of is your reference to symlinks turning a tree structure into a graph but I'd say that's a solution, not a problem.

    Because the symlinks make your tree not really a tree anymore, but all the tools available want it to be a tree and often need modifications when symlinks enter the picture. For instance, should tar get the data from the symlinked file or make an entry for the symlink in the archive? The answer depends on various circumstances that cannot be coded into tar itself. The best tar can do is let the human operator decide.

    Symlinks are useful and often necessary to express certain relationships in the filesystem. But the reason they are there is that almost everything outside of acadamia sees the file system as a tree and won't accept much else. They are a hack for a poor datastructure. A useful and necessary hack, but a hack. We'd do much better if we had filesystems that operated as a generalized set instead of a strict tree.

    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

      I still don't see a problem, I'm unlikey to want to use tar on a virtual filesystem, I'd be better off grabbing a snapshot of the real data. Plus, the tar question is only relevent to files but in the scheme above the symlinks are always to directories. But as I said, this is not an interface that is particularly suitable to mass queries or updates, the tools one might use would not need to know anything about symlinks at all, for example

      perl -pi -e '$_=capitalise($_)' persons/*/name
      as a handy way of doing
      update persons set name = CAPITALISE(name);
      on a database without stored procedures or where I'm much more familiar with perl than with stored procedures.

      Mainly though I think it works best as an exploratory tool and that just requires following the symlinks to wherever they go.

      As for the tree/graph problem. If my data forms a directed graph then I don't see any problem. If the graph has cycles, then I want to see those cycles. Symlinks make that easy and if my data has cycles then my tools will have to know how to deal with that, whether they access it through an SQL interface or a filesystem.

      We'd do much better if we had filesystems that operated as a generalized set instead of a strict tree.

      I agree, although I'd imagine that we'll still be accessing those filesystems as trees. The difference will be that the tree structure will be dynamically created from the database.

      For a long time now, I've wanted to be able to do this

      > cd software > ls glibc.rpm DBI.rpm DBI.tgz DBD::mysql.rpm DBD::mysql.tgz author=/ format=/ language=/ ... > ls format= rpm/ tgz/ ... > ls format=/rpm glibc.rpm DBI.rpm DBD::mysql.rpm language=/ author=/ ... > ls format=/rpm/language=/perl DBI.rpm DBD::mysql.rpm ... > ls language=/perl DBI.rpm DBI.tgz DBD::mysql.rpm DBD::mysql.tgz ... > ls language=/perl/format=/rpm DBI.rpm DBD::mysql.rpm ...
      This is still a tree, the fact that the node "language=/perl/format=/rpm" has the same contents as the node "format=/rpm/language=/perl" doesn't make a difference, just as 2 nodes in a tree can contain the number 7.

      The truth is that trees are one of the best interfaces we have, in fact I can't think of any other good way of presenting a set of files (assuming the set of files is too large to just present as a list). It's also a pretty good way of storing the files on disk. The problem comes when you insist that the tree that's presented to the user sees must be the same as the tree stored on disk.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://398217]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (6)
As of 2024-04-23 22:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found