Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: Avoiding compound data in software and system design

by metaperl (Curate)
on Apr 21, 2010 at 14:13 UTC ( [id://836071]=note: print w/replies, xml ) Need Help??


in reply to Re: Avoiding compound data in software and system design
in thread Avoiding compound data in software and system design

Sorry, but this smacks of: I just got bitten by something, so now I'm gonna demonise it
Yes, it's called evolution. Intelligence is the ability to identify, formulate and resolve problems. So this post was made to identify and formulate a problem in hopes that it is not repeated. And yes, I did get bitten by the DBI API and now I have to go redo something so it works with Rose.

Continuing, Let me present the definition of compound data to you once again:

A compound datum is an apparently atomic data item that it really not atomic.
Are hashes evil? They consists of keys and values.
evil? You brought demons in the picture, not me. The point at hand is "apparently atomic". they are not apparently atomic. you dissected hashes into their parts yourself.

Now, if instead of this hash:

%a = (a => 1, b => 2);
You did this: my $vals = "a:1,b:2" then you would have an apparently atomic data item that it really not atomic, because you would have to do string-twiddling to extract relevant subparts.
Floats? Exponent and characteristic.
Seems atomic to me. And the subparts you mention, can they be easily accessed/used?
Integers? Magnitude and sign.
or 32 bits (grin).
my $int = Integer->new(magnitude => 12, sign => '+');
ah, perfect decomposition!

My post did not say it listed all examples of compound data. And if there are more, then fine. Besides, the focus was on software and system design, not language elements.

Bytes? Many bits.
Again, complex data is not 'compound data'. Compound is a specific term referring to a specific mistake in software and system design.
Strings?
Yes, they are complex, but only compound when mis-used.
Objects? ...
Yes, an object is atomic, not apparently atomic. It may have subparts, but each has a well-defined means of accessing/changing it.
I can't use my new vacuum cleaner in it's box, but I'm glad it came in one.
You are confusing a complex of objects with compound data. The vacuum cleaner's relation to the box was meaningful and useful. Packing multiple datums into a string is counter-productive to flexible software and system design as was demonstrated.



The mantra of every experienced web application developer is the same: thou shalt separate business logic from display. Ironically, almost all template engines allow violation of this separation principle, which is the very impetus for HTML template engine development.

-- Terence Parr, "Enforcing Strict Model View Separation in Template Engines"

Replies are listed 'Best First'.
Re^3: Avoiding compound data in software and system design
by BrowserUk (Patriarch) on Apr 21, 2010 at 20:47 UTC
    You are confusing a complex of objects with compound data.

    No I'm not. You are making an artificial separation where none exists.

    Take urls. These are both complex and compound. And simple.

    Whilst there are (many) modules like URI* that allow you to treat these as objects and access all their internal bits separately, the vast majority of modules that use urls as inputs (eg.LWP*), take them in their simple string form. Why?

    Because they do not care what is inside, and do not want to have to deal with it. For most applications of those latter modules, the user will be supplying a 'simple string', picked out of a text file (log file; html; whatever), and all they need or want to know is, can I reach it?

    If they had to tease apart the myriad forms of url/uri/urn formats in order to populate a ur* object in order to pass it to LWP*--that would promptly just stick all the bits back together again--it would be an entirely unnecessary waste of time & resources. Complexity without merit or benefit.

    Same goes for file systems entities. We pass open a string, not some kind of FileSystem::Object. Because for the most part, they are simply an opaque scalar entity we use. Not pick apart and fret over.

    And the same goes for your example of DBI data source names. At the DBI level, and below, they are simply opaque entities to be gathered and passed through uninspected. Requiring some kind of object be used for them would create unnecessary and useless complexity.

    They do not even have a consistent constitution. Your example breaks them down as dbi

    dbi mysql database host port

    And then as

    __PACKAGE__->register_db( driver => 'pg', database => 'my_db', host => 'localhost', usern +ame => 'joeuser', password => 'mysecret', );

    but you've lost two parts (dbi/port) and gained two parts (user/pass).

    And then you get something like DBD::WMI, which doesn't need and cannot use most of those--either set of 5. And DBD::SQLite that also has no use for most of those fields. And these came into being long after the DBI/DBD interfaces were designed and implemented.

    Rather than something to be "avoided", DBI's use of a string for the data source name is the sign of a well-though through, flexible interface. One that recognises that you cannot fit the world into labelled boxes, and that in many situations, there is no purpose in trying.

    You should be celebrating the vision and skill of those authors for designing an interface so flexible it can accommodate future developments without requiring constant re-writes as time passes and uses evolve. Not decrying them.

    Consider: Will your interfaces survive so long, so well?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      You are making an artificial separation where none exists.
      we will see about that (grin)

      But the distinction is simple: conceptual elements belong in separate data elements or in a single element with straightforward access. The DBI dsn string has several conceptual elements which are not in separate data elemnets. And access is not straightforward - had a hash reference been used, access would be more straightforward, with no loss in API quality.

      But like I said in the opening post of this thread: Typically people either know this and dont need to be told or they dont know it and dont care :) So it's almost like screaming at a wall.

      But your comments about URLs are well-taken. I thought about that this morning when I woke up. And in a sense, you could consider DSNs as a form of URL. In fact, SQLAlchemy uses URLs instead of DSNs

      Rather than something to be "avoided", DBI's use of a string for the data source name is the sign of a well-though through, flexible interface. One that recognises that you cannot fit the world into labelled boxes, and that in many situations, there is no purpose in trying.
      I dont agree: it requires more parsing to decide which DBD to dispatch to this way.
      You should be celebrating the vision and skill of those authors for designing an interface so flexible it can accommodate future developments without requiring constant re-writes as time passes and uses evolve. Not decrying them.
      $dsn as a hash reference would have been just as flexible and much finer grained. And it would not suffer from a case of compound data. And the code to decide which DBD to dispatch to would've been more succinct. And I would not have had to write DBIx::DBH in order to work with Rose::DB and DBI interchangeably.

      The Rose::DB API has finer granularity and does not suffer from the compound data issues that the DBI one does: connection info from Rose::DB can be converted into DBI connection info in a simple fashion, vice versa not so.



      The mantra of every experienced web application developer is the same: thou shalt separate business logic from display. Ironically, almost all template engines allow violation of this separation principle, which is the very impetus for HTML template engine development.

      -- Terence Parr, "Enforcing Strict Model View Separation in Template Engines"

        Typically people either know this and dont need to be told or they dont know it and dont care :) So it's almost like screaming at a wall.

        You are deluding yourself. The DBI interface has been around for 15 years. And you are the first person to see this 'need'?

        And I would not have had to write DBIx::DBH in order to work with Rose::DB and DBI interchangeably.

        Have you looked inside Rose::DB?

        Have you looked at all the code and utterly pointless machinations it goes through in dealing with that hash in order to do what? To tack all the bits together into a string and pass it on to DBI!

        And what does it achieve? Nothing! Just a couple of hundred extra lines of code that complicate the interface and slow things down for no net gain whatsoever.

        Rose::Db is essentially a wrapover DBI. And you're writing a wrapover that wrapover so that you can "use them interchangably".

        Sir! Your logic is flawed. Even though you cannot see it. Your logic is flawed.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        EF Codd eh? Circa 1981, I had to do a CS project, and having read an article (in Byte I think) on Codd's paper, I wrote up the proposal for my project as: "A simple exploration of the Relational Model". To be written in BASIC Plus 2. And yes, BASIC.

        I had one term to write it.

        It took 6 weeks for the college library to obtain a photocopy of the paper--it had to come from the British Library in London, the only people in the UK who had a copy. It was photocopy, of a photocopy, of a bound paper with all the distortions and fuzzy greyness that entails. It took me two whole weeks to read it--I understood very little of it. So there I was with half my time gone and nothing to show for it.

        Back to the point.

        And that is, all DBI needs to know is the first two fields of the DSN. The first must match 'dbi' (+-case); the second must match a module "DBD::<2ndfield>" that is installed locally. What comes after that is none of its concern. It just gets passed through to the loaded DBD driver.

        And the forms of that opaque token are myriad. A quick survey turns up:

        $dbh = DBI->connect("dbi:Informix:$database", $user, $pass, %attr); $dbh = DBI->connect("DBI:Unify:dbname[;options]" [, user [, auth [, a +ttr]]]); $dbh = DBI->connect("dbi:Oracle:host=$host;sid=$sid", $user, $passwd) +; $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","",""); $dbh = DBI->connect("DBI:drizzle:database=test;host=localhost", "joe" +, "joe's password", {'RaiseError' => 1}); $dbh = DBI->connect('dbi:ODBC:DSN', 'user', 'password'); $dbh = DBI->connect("dbi:Pg:dbname=$dbname", '', '', {AutoCommit => 0 +}); $dbh = DBI->connect('DBI:RAM:','usr','pwd',{RaiseError=>1}); $dbh = DBI->connect("DBI:Wire10:host=$host", $user, $password, {Raise +Error' => 1, 'AutoCommit' => 1} $dbh = DBI->connect("DBI:CSV:f_dir=/home/joe/csvdb") $dbh = DBI->connect("dbi:JDBC:hostname=$hostname;port=$port;url=$url" +, $user, $password); $dbh = DBI->connect("dbi:Sqlflex:$database", $user, $pass, %attr); $dbh = DBI->connect("dbi:DB2:db_name", $username, $password); $dbh = DBI->connect("DBI:mysql:database=test"); $dbh = DBI->connect('DBI:DBMaker:' . $database, $user, $pass); $dbh = DBI->connect('dbi:PgPP:dbname=$dbname', '', ''); $dbh = DBI->connect('dbi:PgLite:dbname=file'); $dbh = DBI->connect("dbi:ADO:Provider=Microsoft.Jet.OLEDB.4.0;Data So +urce=C:\data\test.mdb", $usr, $pwd, $att ) $dbh = DBI->connect("DBI:Ingres:dbname[;options]", user [, password], + \%attr); $dbh = DBI->connect('DBI:Solid:TCP/IP somewhere.com 1313', $user, $pa +ss, 'Solid'); $dbh = DBI->connect("dbi:Google:", $KEY);

        Look at the variations once you get beyond the first two fields. Yes you could keep these all separate in a hash, but to what end? You (as a DBI user) cannot do anything useful with them because there is insufficient consistency to make even validation judgements, much less anything else.

        Even where several DBDs require, for example, a "dbname", for some this will be have SQL identifier limitations--though even they aren't consistent across all SQL-like DBs.

        For some it will be a filename (with local filesystem semantics--case dependance (or not); reserved characters (or not); length limitations (or not).

        For some, it's a hostname and port.

        For some--see the ADO example--it's a whole bunch of stuff entirely unique to that DBD.

        For some the subfields have to be prefixed with their tagname, others are position dependant.

        Why stick all these disparate bit into a hash and then have DBI concatenate the bits--risking getting it wrong because (for example) it adds tagnames where none are required, or the hash ordering screws up the position dependance; or ...?

        To achieve all that, you'd need more than just a hash. You'd need one flag per field to decide whether the key name should be prepended to the fields value. You'd need another value to ensure ordering. You'd need yet another flag to ensure that (for example) backslashes in pathnames got escaped for interpolation.

        And all of that complexity buys you what? The user can far more easily know what the requirements are for the DBD (or two; or three) he is going to use, than any programmer can try and unify into one generic interface structure that will stand the test of time.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: Avoiding compound data in software and system design
by ikegami (Patriarch) on Apr 28, 2010 at 18:23 UTC

    You did this: my $vals = "a:1,b:2" then you would have an apparently atomic data item that it really not atomic, because you would have to do string-twiddling to extract relevant subparts.

    I don't see why searching through an associative array stored as "a:1,b:2" makes the type not atomic when the example you used for an atomic type ({a=>1,b=>2}) is an associative array that requires searching through a list of buckets then through a linked list.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://836071]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (2)
As of 2024-04-26 02:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found