Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Regex to remove data

by Anonymous Monk
on Nov 06, 2012 at 14:04 UTC ( #1002478=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a large file, I'd like to know how to create a regex which will remove lines if the consist of only Upper case alphabetic characters, examples:

Replies are listed 'Best First'.
Re: Regex to remove data
by Athanasius (Chancellor) on Nov 06, 2012 at 14:13 UTC

    Something like this should do the trick:

    #! perl use strict; use warnings; my @lines = <DATA>; s/ ^ [A-Z\s]+ $ //x for @lines; print for @lines; __DATA__ AAAAAAAA AAAA AAAAAAA AAAA AAA AAA Leave me intact PLEASE


    0:10 >perl Leave me intact PLEASE 0:13 >

    Update 1: Note that this will also remove blank (i.e. empty) lines.

    Update 2: Changed

    print "@lines";


    print for @lines;

    to address the issue of leading spaces raised by Anonymous Monk, below.

    Hope that helps,

    Athanasius <°(((><contra mundum

      This is good, but the OP did say he wanted to remove the lines from a "large file", but your approach reads the entire file into an in-memory array (@lines) and processes that.

      Check out my reply below for an example that loads only one line into memory at a time (the -n switch assumes while (<>)).

      Thanks,though it adds a leading space to each line
Re: Regex to remove data
by sundialsvc4 (Abbot) on Nov 06, 2012 at 14:37 UTC

    Also, don’t overlook the obvious grep (or egrep) commands, if you have them on your system . . . You might not have to “write a program” to do this at all.   Simply use the -v option to output all lines which don’t match the pattern.

Re: Regex to remove data
by rjt (Deacon) on Nov 06, 2012 at 17:10 UTC

    It looks like you want to remove lines that (optionally) contain spaces in addition to uppercase. This one-liner will do the trick:

    perl -ne 'print unless /^[A-Z\s]+$/' <in.txt >out.txt

    Of course if you are including this in a larger Perl program, you can just nab the regex out of that, and use it in a loop construct of some kind. For example:

    while (<>) { print if !/^[A-Z\s]+$/ }
Re: Regex to remove data
by space_monk (Chaplain) on Nov 06, 2012 at 17:10 UTC
    perl -pe 's/ ^ [A-Z\s]+ $ //x' <your_data, anybody?
Re: Regex to remove data
by trizen (Hermit) on Nov 07, 2012 at 07:55 UTC
    Keeps the lines that are empty or that contain only whitespace characters:

    perl -ne 'print if not /[A-Z]/ && /^[A-Z\s]+$/' file.txt

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1002478]
Approved by Athanasius
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2017-06-27 22:59 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (617 votes). Check out past polls.