Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Flock to Rename Directory

by virtualweb (Sexton)
on Sep 17, 2009 at 16:08 UTC ( #795919=perlquestion: print w/ replies, xml ) Need Help??
virtualweb has asked for the wisdom of the Perl Monks concerning the following question:

Hi All:

Im a novice and many times I come up with alternatives that are not always the wisest way to go about something..

I need advise regarding how to save some info that needs to be retrieved and changed often, (every 20 seconds or so), and dont want to used MySQL or any other database requering Structured query language.

If I save each record in a single Flat File, it would need to be opened, read, and rewritten, which I assume is not the fastest technique.. If I save all records in a single flat file, (one record per line), reading the whole file into memory every 20 seconds and change what's needed may not be the best way to go either, knowing there could be 15.000 lines or more... so I thought perhaps I could save each record in a different directory name.

Example:

I need to save name, username, time, and two other numerical parameters, so I would call my directory:

JohnSmith_jhon99_10:30:01_345_765

Then split the directory name into array values and change what's needed, 20 seconds later the name of the directory could look like this:

JohnSmith_jhon99_10:30:21_465_112

Is the above method a feasible solution..??

What would happen if more than one need to change the directory name at the same time. I know that in a flat file system you can lock the file with flock and change the data without worries, but what is used in the case of directory names..??

Thanx for your input
VirtualWeb

------------------------------------
ADDENDUM September 18th

Thanx for all the input

The reason why I was trying to come up with an alternative to a proper database is because i dont know how to install one in my Windows PC environment.

I have read query language tutorials and is, (as you pointed out), very easy but dont know how do I install the simplest database, how do I test my scripts locally, how to move it to the server for final installation.

Since I discarded the idea of renaming directories, I continue this conversation on node Query Language with Flat Files
http://perlmonks.org/?node_id=796175

Comment on Flock to Rename Directory
Re: Flock to Rename Directory
by zwon (Monsignor) on Sep 17, 2009 at 16:23 UTC
    If I save each record in a single Flat File, it would need to be opened, read, and rewritten, which I assume is not the fastest technique.
    Well, your solution would require to read directory every 20 seconds, and also you should have a lock file if I understand your problem correctly. If you don't want to install and use database server you can use some file based database like DBD::SQLite or BerkeleyDB.
Re: Flock to Rename Directory
by ikegami (Pope) on Sep 17, 2009 at 16:24 UTC
    Sounds like an *awful* lot of trouble to avoid using a database.
Re: Flock to Rename Directory
by ELISHEVA (Prior) on Sep 17, 2009 at 16:43 UTC

    Using flock is not necessarily a portable or guarenteed solution - even from unix box to unix box. If the file is available to two different programs, one using flock and one not, then the non-flock using program will still be able to modify the file even though the flock-using progam locked it:

    From perlport:
    Not implemented (Mac OS, VMS, RISC OS, VOS).
    Available only on Windows NT (not on Windows 95). (Win32)

    From flock:
    Two potentially non-obvious but traditional flock semantics are that it waits indefinitely until the lock is granted, and that its locks merely advisory. Such discretionary locks are more flexible, but offer fewer guarantees. This means that programs that do not also use flock may modify files locked with flock.

    One advantage of swallowing the SQL bullet and using a proper DBMS is that you can be assured that all access paths to your data are using the same rules for locking. If you really can't use a proper DBMS, for whatever reason, then at least consider creating your files in a directory owned by a special purpose user. Only scripts running as that user will have access to the files and subdirectories and it will be easier to prevent scripts that don't use flock (or misuse it) from writing to the directory.

    Getting this to work right and testing it may take some time. If this is a homework project or a "teach myself locking project" then enjoy the learning experience. However, if the main point of this project is to accomplish a goal for a work, personal, or volunteer project, please reconsider. In particular, if your reason for avoiding SQL is that you don't know SQL and you don't want to spend the time learning it, please reconsider. The SQL commands to add, update, and remove records from a simple one table database are not all that hard to learn. Testing the non-SQL solution is likely to take a good bit of time, possibly more than learning the SQL. More importantly, you will be able to use SQL over and over. My guess is that what you learn from writing a special purpose locking solution is unlikely to be as generally useful unless you plan to make a career out of low level file system manipulations.

    Best, beth

Re: Flock to Rename Directory
by thunders (Priest) on Sep 17, 2009 at 17:07 UTC

    I agree with everyone else that you want to use a database here.

    Additionally the biggest problem with your scheme is that your directories appear to have no unique index, so accessing a record to modify it will take a long time.

    As far as I can tell, when you need to update the directory name, you need to scan the entire list of 15,000 items, split the names until you find the one you are looking for, check if there's a lock on the then rename that directory. This is CONSIDERABLY slower then accessing a database table with a unique index. I think that's O(n) vs O(1) complexity where n is the total number of records. In your case up to 15,000 times slower. Plus the database solution is a lot more portable and easier to extend.

Re: Flock to Rename Directory
by MidLifeXis (Prior) on Sep 17, 2009 at 17:14 UTC

    Also look at the penalty of large directories. Some filesystems degrade after a certain number of files.

    I would agree with others that avoiding a database of one form or another is not necessarily the best course of action. The file system, after all, is just being used in this case as a database.

    If you do not want to use a full RDBMS, then perhaps something like DBM::Deep or DBD::SQLite might suit your needs.

    --MidLifeXis

    Please consider supporting my wife as she walks in the 2009 Alzheimer's Walk.

Re: Flock to Rename Directory
by trwww (Priest) on Sep 17, 2009 at 17:41 UTC

    I ... dont want to used MySQL or any other database requering Structured query language.

    okay...

    <snip a bunch of issues that using a database would solve>

    hrm...

Re: Flock to Rename Directory
by leocharre (Priest) on Sep 17, 2009 at 19:45 UTC
    Do you *know* that only one process (instance of your program, for example) will be used at a time?

    You don't want two instances to try to do the same thing.

    Two ideas come to mind here.

    1. Use a daemon (google for it)
    2. Instead of naming the directories (in essence using the filename as metadata (one of my pet projects, hehe))- name the directory something constant, and use a config file (YAML) or Storable.
      Or even cooler.. maybe Cache::File

      If you do something like store meta in config files..
      If your dir is
      /stuff/JohnSmith_jhon99

      Then your meta is
      /stuff/JohnSmith_jhon99/.meta

Re: Flock to Rename Directory
by merlyn (Sage) on Sep 17, 2009 at 23:15 UTC
    Take a look at DBM::Deep. It's "not a database", but it does have persistence and transactions for nearly arbitrary Perl data structures.

    -- Randal L. Schwartz, Perl hacker

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Re: Flock to Rename Directory
by Marshall (Prior) on Sep 18, 2009 at 08:23 UTC
    I think that you are vasty under estimating the complexity of implementing a robust flock (file lock) mechanism by yourself.

    You say: Im a novice . Nothing wrong with that, everybody starts. I am saying that this flock() stuff can be a lot more complex in the "always works" details than you think.

    There is a big difference between "works all the time" and works "almost all the time". The coordination and sequencing of asynchronous events is hard -> this is NOT something that I would recommend for a self proclaimed novice -> the pitfalls are many!

    At the same time, I think that you are over estimating the effort to do some simple SQL statements (Perl is great at SQL stuff!). I think the total amount of SQL related code will be like 1/2 page in your application.

    One of the issues with a DB is how to set it up, maintain it, etc.
    I recommend the most basic DB, DB::CSV. You will probably also need SQL::Statement and Text::CSV_XS. Unfortunately DB::CSV doesn't work on Windows. I hope that you have a Unix variant to test with.

    The rationale behind my recommendation of DB::CSV is to get the SQL part figured out. You will wind up with an application that "works" albiet not as fast as it could. All the SQL code will "port" to the "fancy DB" after you get the basic thing working.

    Initailizing the CSV DB is simple, it is a Comma Separate Value file. So this is a text file that can be generated from say an Excel spreadsheet or other ways. Your data has 5 fields, I think the user id is unique although that won't matter if it is not.

    This may seem like a "stupid question" on my part, but I am compelled to ask it because it could result in a vast simplification of the problem.
    I need to save name, username, time, and two other numerical parameters. You don't need to use a dynamic database update if what is in the database currently doesn't matter. In other words, supplying completely new data for a username is a very different thing than updating info for that user based upon what is was in the DB for that user. If you just want the "latest info" for that username, this is very different than a DB read/update situation. Multiple processes can open a file for append. No need for flock() as long as each "record" is terminated by a "\n", ie a single line of output. If this is the case, then we generate a report/update the main DB file periodically (on demand or very 12 hours, etc). Again, my question sounds "goofy", but it is important to make sure that this is NOT "goofy".

      One of the issues with a DB is how to set it up, maintain it, etc. I recommend the most basic DB, DB::CSV. You will probably also need SQL::Statement and Text::CSV_XS. Unfortunately DB::CSV doesn't work on Windows. I hope that you have a Unix variant to test with.

      DB::CSV doesn't exist. You probably meant DBD::CSV, and I'm pretty sure DBD::CSV does work on windows.

        Yes, I misspoke. DBD:CSV is right. Please forgive me for a minor typo eg (DB:CSV vs DBD:CSV). I am on ActiveState 5.10 and DBD:CSV looks like it is there. On my previous ActiveState version, it wasn't there.

        GREAT! Use this to get SQL working and then upgrade DB as performance requires!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://795919]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2014-12-27 15:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (177 votes), past polls