Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
This is a very frequently asked question. It appears in perlfaq5, along with related questions "How do I delete a line from a file?" and "How do I change one line in a file?" It sounds like it should be easy, but it isn't.

The problem is that although we think of files as made of lines, the operating system usually thinks of them as made of bytes. You can overwrite a byte, but not a line. If you want to replace a line, you either have to overwrite every byte exactly, or you have to move the following part of the file up or down. There isn't even an easy way to find a line in a file; you have to read through the file counting newline characters until you get to the place you want.

The FAQ starts with a rather snotty remark about how "Perl is not a text editor." It follows with a 500-word article sketching several more-or-less difficult ways to do this. Most of them involve throwing away the original file and replacing it with a modified copy.

At last, there is a better way.

The new Tie::File module makes a file look like a Perl array. Each array element is one line of the file. If you read the array, you get a line from the file. If you modify the array, the file is modified as you requested.

It's safe. It's reliable. It's efficient.

Best of all, it's easy.

Let's take an example. Supose you want to go through a file and replace PERL with perl everywhere. One easy way is to use Perl's -i option:

perl -i.bak -lpe 's/PERL/perl/g' file
This is convenient, but it has the drawback that it rewrites the entire file. If you want to do this as part of a larger program, it's rather less convenient, and a lot more bizarre. The FAQ suggests:
{ local ($^I, @ARGV) = ('.bak', 'file'); while (<>) { s/PERL/perl/g; print; } }
You get poor error checking if you do this---the open is implicit, so there's no way to catch the error if it fails.

Here's the Tie::File version:

tie @lines, 'Tie::File', 'file' or die ...; for (@lines) { s/PERL/perl/g; } untie @lines;
Not only is this simpler (what the heck is local($^I), anyway?) but it's a lot more efficient. Unlike perl -i, which promises to modify the file "in place", and then actually creates a totally new file from scratch, Tie::File really does modify the file in place. If the file is ten megabytes long and contains PERL ten times, the -i solution writes ten megabytes; Tie::File writes just the ten records that changed.

Here's another common task; people ask about this in comp.lang.perl.misc every week: I have some text, in $text, and I want to insert it into an HTML file just after the line that says <!-- insert here -->. Again, I could use -i, which rewrites the whole file. Or I can use Tie::File:

for (@lines) { if (/<!-- insert here -->/) { $_ .= $text; last; } }
Instead of rewriting the entire file, this only rewrites what is necessary, the part of the file after the comment. If $text happens to be empty, it rewrites only the one line. And the code is really simple and obvious.

Here's another common problem which is trivially solved by Tie::File. How do I add a new record at the beginning of a file instead of at the end?

unshift @lines, $new; # Or add more than one record
This does rewrite the entire file, but there's no getting around that. All you can do is make it easy to write the code, and now it is easy to write the code.

Now let's suppose you have a datatbase with several columns, and the first column is the key. For concreteness, let's say it's the Unix password file, and the key is the username. (Or maybe it's your web server's password file, which has the same format.) Suppose you have a program that needs to look up data in this database.

One good way to do this is to read the database into a hash, and use the usernames as the hash keys, like this:

open DB, "< $database" or die ...; while (<DB>) { chomp; my ($username) = split /:/; $db{$username} = $_; } sub lookup { my $user = shift; return $db{$username}; }
The major drawback of this approach is that if the database is big, you will run out of memory for the hash. (That is probably not a consideration with the password file, but many other databases are bigger.) But you can use Tie::File here to get an easy and efficient solution:

tie @DB, 'Tie::File', $database or die ...; for (@DB) { my ($username) = split /:/, $_; $recno{$username} = $lineno++; } sub lookup { my $username = shift; return $DB[$recno{$username}]; }
We're still using a hash, and the usernames are still the keys. But instead of associating the data with the usernames (which would take a lot of space) we only associate a record number with each username. If we look up $recno{'merlyn'}, we don't get the information for merlyn directly. Instead, we get a number like 1123, which tells us that merlyn's data is on line 1123 of the data file. Then we look at $DB[1123] and Tie::File immediately recovers the data for us---it remembered where record 1123 was from the last time it saw it go by, and goes directly to the right place in the file to find it. We get fast access to every record without storing the entire database in memory.

Even if the database is small, you might still want to use Tie::File if you need to change the data. With Tie::File, you're not limited to only reading the database; you can modify it also:

sub replace_data { my ($username, $new_data) = @_; my $recno = $recno{$username}; if (defined $recno) { # Update existing user $DB[$recno] = $new_data; } else { push @DB, $new_data; # Add new user at the end } } sub update_password { my ($username, $new_password) = @_; my $crypted_password = crypt($new_password, random_salt()); my @data = split /:/, lookup($username); $data[1] = $crypted_password; replace_data($username, join(':', @data)); }
When we call replace_data, the data in the file is overwritten in place with the new data.

Tie::File arrays support all the Perl array operations, including push, pop, shift, unshift, splice, and $#a = $N. There are some other fancy features that you probably won't ever need, but if you do, they are in the manual.

Tie::File is available on CPAN and also from my website. It will be included with Perl 5.8, which will be released in April. It is distributed under the same terms as Perl.

You will like it.

--
Mark Dominus
Perl Paraphernalia


In reply to How do I insert a line into a file? by Dominus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others browsing the Monastery: (9)
    As of 2014-08-28 14:39 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The best computer themed movie is:











      Results (263 votes), past polls