Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
Perl Monk, Perl Meditation
 
PerlMonks  

Multiple substitutions in large files

by mdi (Acolyte)
on May 09, 2005 at 09:41 UTC ( [id://455179]=perlquestion: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.

mdi has asked for the wisdom of the Perl Monks concerning the following question:

I need to do multiple substitutions in several large (1-10MB) files. I've been using this:
use strict; use warnings; use Tie::File; foreach my $x (@ARGV) { tie my @f, 'Tie::File', $x or die "Could not tie $x: $!\n"; for (@f) { s/^\|/\\N\|/; s/\|\s*$/\|\\N/; s/\|\s*\|/\|\\N\|/g; s/\|\.\s*\|/\|\\N\|/g; s/\|\s+/\|/g; s/\s+\|/\|/g; s/(\d{2}:\d{2}:\d{2})\.\d+/$1/g; s/(\d{5})-(?:\d{1,4}|\s+)/$1/; } }
...but this is taking entirely too long, and using up too much CPU. How can I do this more efficiently?

Replies are listed 'Best First'.
Re: Multiple substitutions in large files
by dragonchild (Archbishop) on May 09, 2005 at 09:46 UTC
    #!/usr/bin/perl -p s/^\|/\\N\|/; s/\|\s*$/\|\\N/; s/\|\s*\|/\|\\N\|/g; s/\|\.\s*\|/\|\\N\|/g; s/\|\s+/\|/g; s/\s+\|/\|/g; s/(\d{2}:\d{2}:\d{2})\.\d+/$1/g; s/(\d{5})-(?:\d{1,4}|\s+)/$1/;

    Execute as so:

    my_scriptydoo.pl file1 > file2

    Update: ikegami is absolutely correct. I should be doing a redirect. The next 1st level response provides the -pi version.


    • In general, if you think something isn't in Perl, try it out, because it usually is. :-)
    • "What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?"
      Shouldn't that be -pi (or -pi.bak if a backup is desired)? With just -p, the usage would be my_scriptydoo.pl file1 > file1.new
Re: Multiple substitutions in large files
by Joost (Canon) on May 09, 2005 at 09:48 UTC
Re: Multiple substitutions in large files
by ikegami (Patriarch) on May 09, 2005 at 10:58 UTC

    a|b||d becomes a|b|\N|d
    |b|c|d becomes \N|b|c|d
    a|b|c| becomes a|b|c|\N
    and similarly,
    a|b|.|d becomes a|b|\N|d
    but
    .|b|c|d does not become \N|b|c|d
    a|b|c|. does not become a|b|c|\N
    Is that a bug?

    If the above is a bug, the following regexps are probably faster:

    s/\s*\|\s*/\|/g; s/^\.?(?=\|)/\\N/; s/(?<=\|)\.?(?=\||$)/\\N/g; s/(?<=\d{2}:\d{2}:\d{2})\.\d+//g; s/(?<=\d{5})-(?:\d{1,4}|\s+)//;

    If the above is not a bug, the following regexps are probably faster:

    s/\s*\|\s*/\|/g; s/^(?=\|)/\\N/; s/(?<=\|)(?=\||$)/\\N/g; s/(?<=\|)\.(?=\|)/\\N/g; s/(?<=\d{2}:\d{2}:\d{2})\.\d+//g; s/(?<=\d{5})-(?:\d{1,4}|\s+)//;

    I reduced the number of regexps by combining a few, I shortened the regexps by removing the spaces first (not last), and I used zero-widths positive lookaheads and lookbehinds to mimimze the text being captured and substituted.

    Use this in conjuction with the -p or -pi suggestion for better results.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://455179]
Approved by Fletch
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.