perlmeditation
Dominus
This is a very frequently asked question. It appears in [perlfaq5],
along with related questions "How do I delete a line from a file?"
and "How do I change one line in a file?" It sounds like it should
be easy, but it isn't.<p>
<readmore>
The problem is that although we think of files as made of lines, the
operating system usually thinks of them as made of bytes. You can
overwrite a byte, but not a line. If you want to replace a line, you
either have to overwrite every byte exactly, or you have to move
the following part of the file up or down. There isn't even an easy
way to find a line in a file; you have to read through the file
counting newline characters until you get to the place you want.<p>
The FAQ starts with a rather snotty remark about how "Perl is not a
text editor." It follows with a 500-word article sketching several
more-or-less difficult ways to do this. Most of them involve throwing
away the original file and replacing it with a modified copy. <p>
At last, there is a better way.<p>
The new <a
href="http://perl.plover.com/TieFile/"><tt>Tie::File</tt></a> module
makes a file look like a Perl array. Each array element is one line
of the file. If you read the array, you get a line from the file. If
you modify the array, the file is modified as you requested. <p>
It's safe. It's reliable. It's efficient.<p>
Best of all, it's <i>easy</i>.<p>
Let's take an example. Supose you want to go through a file and
replace <tt>PERL</tt> with <tt>perl</tt> everywhere. One easy way is
to use Perl's <tt>-i</tt> option:
<code>
perl -i.bak -lpe 's/PERL/perl/g' file
</code>
This is convenient, but it has the drawback that it rewrites the
entire file. If you want to do this as part of a larger program, it's
rather less convenient, and a lot more bizarre. The FAQ suggests:
<code>
{
local ($^I, @ARGV) = ('.bak', 'file');
while (<>) {
s/PERL/perl/g;
print;
}
}
</code>
You get poor error checking if you do this---the <tt>open</tt>
is implicit, so there's no way to catch the error if it fails. <p>
Here's the <tt>Tie::File</tt> version:
<code>
tie @lines, 'Tie::File', 'file' or die ...;
for (@lines) {
s/PERL/perl/g;
}
untie @lines;
</code>
Not only is this simpler (what the heck is <tt>local($^I)</tt>,
anyway?) but it's a lot more efficient. Unlike <tt>perl -i</tt>,
which promises to modify the file "in place", and then actually
creates a totally new file from scratch, <tt>Tie::File</tt> really
<i>does</i> modify the file in place. If the file is ten megabytes
long and contains <tt>PERL</tt> ten times, the <tt>-i</tt> solution
writes ten megabytes; <tt>Tie::File</tt> writes just the ten records
that changed.<p>
Here's another common task; people ask about this in
<tt>comp.lang.perl.misc</tt> every week: I have some text, in
<tt>$text</tt>, and I want to insert it into an HTML file just after
the line that says <code><!-- insert here --></code>. Again, I could
use <tt>-i</tt>, which rewrites the whole file. Or I can use
<tt>Tie::File</tt>:
<code>
for (@lines) {
if (/<!-- insert here -->/) {
$_ .= $text;
last;
}
}
</code>
Instead of rewriting the entire file, this only rewrites what is
necessary, the part of the file after the comment. If <tt>$text</tt>
happens to be empty, it rewrites only the one line. And the code is
really simple and obvious.<p>
Here's another common problem which is trivially solved by
<tt>Tie::File</tt>. How do I add a new record at the beginning of a
file instead of at the end?<p>
<code>
unshift @lines, $new; # Or add more than one record
</code>
This does rewrite the entire file, but there's no getting
around that. All you can do is make it easy to write the code, and
now it <i>is</i> easy to write the code.<p>
Now let's suppose you have a datatbase with several columns, and the
first column is the key. For concreteness, let's say it's the Unix
password file, and the key is the username. (Or maybe it's your web
server's password file, which has the same format.) Suppose you have
a program that needs to look up data in this database.<p>
One good way to do this is to read the database into a hash, and use
the usernames as the hash keys, like this:
<code>
open DB, "< $database" or die ...;
while (<DB>) {
chomp;
my ($username) = split /:/;
$db{$username} = $_;
}
sub lookup {
my $user = shift;
return $db{$username};
}
</code>
The major drawback of this approach is that if the database is big,
you will run out of memory for the hash. (That is probably not a
consideration with the password file, but many other databases are
bigger.) But you can use <tt>Tie::File</tt> here to get an easy and
efficient solution:<p>
<code>
tie @DB, 'Tie::File', $database or die ...;
for (@DB) {
my ($username) = split /:/, $_;
$recno{$username} = $lineno++;
}
sub lookup {
my $username = shift;
return $DB[$recno{$username}];
}
</code>
We're still using a hash, and the usernames are still the keys. But
instead of associating the data with the usernames (which would take a
lot of space) we only associate a <i>record number</i> with each
username. If we look up <tt>$recno{'merlyn'}</tt>, we don't get the
information for <tt>merlyn</tt> directly. Instead, we get a number
like 1123, which tells us that <tt>merlyn</tt>'s data is on line 1123
of the data file. Then we look at <tt>$DB[1123]</tt> and
<tt>Tie::File</tt> immediately recovers the data for us---it remembered
where record 1123 was from the last time it saw it go by,
and goes directly to the right place in the file to find it. We get fast
access to every record <i>without</i> storing the entire database in
memory. <p>
Even if the database is small, you might still want to
use <tt>Tie::File</tt> if you need to change the data.
With <tt>Tie::File</tt>, you're not limited to only reading
the database; you can modify it also:<p>
<code>
sub replace_data {
my ($username, $new_data) = @_;
my $recno = $recno{$username};
if (defined $recno) { # Update existing user
$DB[$recno] = $new_data;
} else {
push @DB, $new_data; # Add new user at the end
}
}
sub update_password {
my ($username, $new_password) = @_;
my $crypted_password = crypt($new_password, random_salt());
my @data = split /:/, lookup($username);
$data[1] = $crypted_password;
replace_data($username, join(':', @data));
}
</code>
When we call <tt>replace_data</tt>, the data in the file is overwritten
in place with the new data.<p>
<tt>Tie::File</tt> arrays support all the Perl array operations,
including <tt>push</tt>, <tt>pop</tt>, <tt>shift</tt>,
<tt>unshift</tt>, <tt>splice</tt>, and <tt>$#a = $N</tt>. There are
some other fancy features that you probably won't ever need, but if
you do, they are in the manual.<p>
<tt>Tie::File</tt> is <a
href="http://www.cpan.org/authors/id/MJD/">available on CPAN</a> and
also <a href="http://perl.plover.com/TieFile/">from my website</a>.
It will be included with Perl 5.8, which will be released in April.
It is distributed under the same terms as Perl.<p>
You will like it.<p>
<p>
--<br><font size="-2">
<a href="mailto:mjd-www-perlmonks+@plover.com">Mark Dominus</a><br>
<a href="http://perl.plover.com">Perl Paraphernalia</a><br></font>