Large scale search and replace with perl -i

elbie has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Large scale search and replace with perl -i
by jasonk (Parson) on Apr 14, 2003 at 18:52 UTC

You can combine the two approaches using xargs:

find . -name '*.html' -type -f -print0 | \
    xargs -0 -n 50 perl -pi -e 's/foo/bar/g'
[download]

This will use find to list all the files you want, and xargs to pass them to your perl script. By specifying the -n 50 option to xargs, each invocation of perl will be passed a maximum of 50 filenames to process (if you still get too many arguments because your paths are really long, lower the number). I haven't benchmarked it to make sure, but I suspect that under most circumstances the overhead of using grep first to find the files that contain the thing you want to replace will actually be less efficient than just running the replacement on every file you find.

We're not surrounded, we're in a target-rich environment!

[reply]
[d/l]

Re: Large scale search and replace with perl -i

by Abigail-II (Bishop) on Apr 14, 2003 at 19:02 UTC

xargs

-n

xargs

Abigail

[reply]

•Re: Large scale search and replace with perl -i
by merlyn (Sage) on Apr 14, 2003 at 19:22 UTC

use File::Find;
@ARGV = ();
find sub {
  push @ARGV, $File::Find::name if -f and /\.html$/;
}, ".";
{
  local $^I = ".bak";
  local $/;
  while (<>) {
    if (s/foo/bar/g) { # changes?
      print; # print the new one
    } else { # no changes?  back it out!
      close ARGVOUT; # for windows, not needed on Unix
      rename "$ARGV$^I", $ARGV or warn "Cannot rename for $ARGV$^I: $!
+";
    }
  }
}
[download]

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

[reply]
[d/l]

Re: •Re: Large scale search and replace with perl -i

by BrowserUk (Patriarch) on Apr 20, 2003 at 07:23 UTC

One small caveat with this neat technique (that I just got bitten by) is that if $^I is set to a wild card (eg. *.bak or orig_*) so that the filename of the backup is edited rather than simply appended, the rename will fail.

I'll hazard a guess as to your respose to this as

but I thought it was worth a mention here :)

Examine what is said, not who speaks.

1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
3) Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke.

[reply]

•Re: Re: •Re: Large scale search and replace with perl -i

by merlyn (Sage) on Apr 20, 2003 at 15:37 UTC

Unless you're talking about some local hack to your Perl to make it interpret $^I differently.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

update

perlrun

perlvar

My apologies. Wow, I'll have to write a column about it now to remember it. {grin} And I don't recall it in the perldelta from 5.5 to 5.6, or perhaps I considered it un-noteworthy. Yeah, just checked, not in perldelta. No wonder I hadn't noticed it.

update 2 On further research, 5.4 didn't have the feature, but 5.5 did. And yet it wasn't in 5.5's perldelta. That's why I missed it. I don't always diff the entire manpage set. {sigh} I rely on perldelta.

update 3 See "Put your inplace-edit backup files into a subdir".

[reply]

Re: Large scale search and replace with perl -i
by antifun (Sexton) on Apr 14, 2003 at 19:23 UTC

First question: how many is a "large number"? If it's on the order of 10^4 or less, you will probably spend more time fiddling with a script than it would take to do with a more "brute-force" approach. (Given reasonably fast computer, yadda yadda yadda.)

As for the more theoretical question, you would certainly want to use the second approach (with find -exec grep -l foo) to reduce your working file set as much as possible.

Then your next issue is avoiding the overhead of running multiple perls. The -i switch relies on the magic of <>, which is @ARGV if there are command-line arguments, and STDIN if there are not (paraphrasing slightly). However, what you need to do in this case is use both kinds of magic, so your perl will have to be a little more creative. It's harder to do the shuffle that -i does than to read from STDIN manually, so here's one way to try it:

find . -name "*.html" -type f -exec grep -l foo {} \; | perl -pi -e 'B
+EGIN{ @ARGV = <STDIN>; chomp @ARGV };  while (<>) { s/foo/bar/g; } co
+ntinue { print }'
[download]

Notice that you can fiddle with @ARGV before the <> magic takes place. The internals of the script are basically what the -p option does.

[reply]
[d/l]

Re^2: Large scale search and replace with perl -i (don't grep(1))

by Aristotle (Chancellor) on Apr 14, 2003 at 20:52 UTC

you would certainly want to use the second approach (with find -exec grep -l foo) to reduce your working file set as much as possible.

You would certainly not, because you will have to open all files anyway - even if just to check. The difference is that grepping for matches first will make you spawn one process per file as well as require to open the matching files another time (in Perl) to actually process them. You have a (large) net loss that way.

Taking that out, and using the -print0 option to avoid some nasty surprises (but not all, unfortunately, due to the darn magic open) leaves us with the following. Note I have removed the continue {} block as it isn't necessary and just costs time. ~~I'm also setting the record separator such that the diamond operator reads fixed size blocks (64kbytes in this example), rather than scanning for some end of line character.~~

find . -name "*.html" -type f -print0 | \
perl -i -p0e \
 'BEGIN{ @ARGV = <STDIN>; chomp @ARGV; $/ = "\n" }; \
  while (<>) { s/foo/bar/g; print }'
[download]

That should be about as efficient as it gets.

If you have a lot of nonmatching files, you might save work by hooking a grep in there - but not with find's -exec. That's what xargs was invented for.

find . -name "*.html" -type f -print0 | \
xargs -r0 grep -l0 | \
perl -i -p0e \
 'BEGIN{ @ARGV = <STDIN>; chomp @ARGV; $/ = "\n" }; \
  while (<>) { s/foo/bar/g; print }'
[download]

Update:

s/= \\65536!= "\\n"/;

runrig

Makeshifts last the longest.

[reply]
[d/l]
[select]

Re: Re^2: Large scale search and replace with perl -i

by runrig (Abbot) on Apr 14, 2003 at 21:01 UTC

find . -name "*.html" -type f -print0 | perl -i -p0e \
 'BEGIN{ @ARGV = <STDIN>; chomp @ARGV; $/ = \65536 }; \
  while (<>) { s/foo/bar/g; print }'
[download]

[reply]
[d/l]

Re^4: Large scale search and replace with perl -i (doh!)

by Aristotle (Chancellor) on Apr 14, 2003 at 21:07 UTC

Re: Large scale search and replace with perl -i
by BrowserUk (Patriarch) on Apr 15, 2003 at 02:05 UTC

Given that most html files are usually (hopefully) < 1 MB in size, it would make sense to use Aristotle's technique of changing $/, but set it to null and slurp the whole file each time.

find . -name "*.html" -type f -print0 | \
perl -i -p0e \
 'BEGIN{ @ARGV = <STDIN>; chomp @ARGV; $/ = '' }; \
  while (<>) { s/foo/bar/g; print }'
[download]

If the number of files produced by find is too many for your command line to handle, couldn't you produce a list of directories from find and pass that into perl and then let perl glob those? Something like (NB:completely untested code)

find . -type d -print0 | \
perl -i -p0e \
 'BEGIN{ @ARGV = <STDIN>; \
 chomp @ARGV; \
 @ARGV = map{glob "$_/*.html"}; \
 $/ = '' }; \
  while (<>) { s/foo/bar/g; print }'
[download]

Combining that with Merlyn's trick of backing out the -i effect if nothing is found should save more time.

Examine what is said, not who speaks.

[reply]
[d/l]
[select]

Re: Large scale search and replace with perl -i
by neilwatson (Priest) on Apr 14, 2003 at 18:55 UTC

find / -name "*.html" -exec perl -pi -e 's/find/replace/gi' {} \;

Update Ooops reread your questions. Hmmm, not sure about ignoring certain files. However, does filtering your find file list through grep really gain you any speed? You are having grep go through all your files and then have perl go through whatever files grep returns.

Neil Watson
watson-wilson.ca

[reply]
[d/l]

Re: Large scale search and replace with perl -i

by Abigail-II (Bishop) on Apr 14, 2003 at 19:09 UTC

It's hard to say whether a grep is worthwhile. Without knowing more about the content of the files, I won't dismiss it.

Abigail

[reply]

Re: Large scale search and replace with perl -i
by Improv (Pilgrim) on Apr 14, 2003 at 19:23 UTC

find2perl

[reply]

Re: Large scale search and replace with perl -i

by Abigail-II (Bishop) on Apr 14, 2003 at 19:29 UTC

find

Abigail

[reply]

Re: Re: Large scale search and replace with perl -i

by Improv (Pilgrim) on Apr 14, 2003 at 19:33 UTC

The reason it should be, apart from it being suggested to be so in the find2perl manpage (hehe), is that process creation is a fairly expensive operation, and it usually is the case that all the spawnings of perl (or anything else that perl can easily duplicate in functionality) are going to slow down the entire operation enough that a single-process all-perl implementation will outpace it by a good margin. Of course, your milage may vary.

[reply]

Re: Large scale search and replace with perl -i

by Abigail-II (Bishop) on Apr 14, 2003 at 20:44 UTC

Re^2: Large scale search and replace with perl -i

by Aristotle (Chancellor) on Apr 14, 2003 at 21:04 UTC

Re: Large scale search and replace with perl -i
by Jenda (Abbot) on Apr 15, 2003 at 11:51 UTC

Just for reference. If you are using Windows and have G.pm installed you can do it like this:

perl -MG=R -pi.bak -e "s/foo/bar/g" *.html
[download]

Jenda
Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
-- Rick Osborne

Edit by castaway: Closed small tag in signature

[reply]
[d/l]


Syntactic Confectionery Delight
	PerlMonks