Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Regex to remove data

by Anonymous Monk
on Nov 06, 2012 at 14:04 UTC ( #1002478=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a large file, I'd like to know how to create a regex which will remove lines if the consist of only Upper case alphabetic characters, examples:

Replies are listed 'Best First'.
Re: Regex to remove data
by Athanasius (Chancellor) on Nov 06, 2012 at 14:13 UTC

    Something like this should do the trick:

    #! perl use strict; use warnings; my @lines = <DATA>; s/ ^ [A-Z\s]+ $ //x for @lines; print for @lines; __DATA__ AAAAAAAA AAAA AAAAAAA AAAA AAA AAA Leave me intact PLEASE


    0:10 >perl Leave me intact PLEASE 0:13 >

    Update 1: Note that this will also remove blank (i.e. empty) lines.

    Update 2: Changed

    print "@lines";


    print for @lines;

    to address the issue of leading spaces raised by Anonymous Monk, below.

    Hope that helps,

    Athanasius <°(((><contra mundum

      This is good, but the OP did say he wanted to remove the lines from a "large file", but your approach reads the entire file into an in-memory array (@lines) and processes that.

      Check out my reply below for an example that loads only one line into memory at a time (the -n switch assumes while (<>)).

      Thanks,though it adds a leading space to each line
Re: Regex to remove data
by sundialsvc4 (Abbot) on Nov 06, 2012 at 14:37 UTC

    Also, don’t overlook the obvious grep (or egrep) commands, if you have them on your system . . . You might not have to “write a program” to do this at all.   Simply use the -v option to output all lines which don’t match the pattern.

Re: Regex to remove data
by rjt (Deacon) on Nov 06, 2012 at 17:10 UTC

    It looks like you want to remove lines that (optionally) contain spaces in addition to uppercase. This one-liner will do the trick:

    perl -ne 'print unless /^[A-Z\s]+$/' <in.txt >out.txt

    Of course if you are including this in a larger Perl program, you can just nab the regex out of that, and use it in a loop construct of some kind. For example:

    while (<>) { print if !/^[A-Z\s]+$/ }
Re: Regex to remove data
by space_monk (Chaplain) on Nov 06, 2012 at 17:10 UTC
    perl -pe 's/ ^ [A-Z\s]+ $ //x' <your_data, anybody?
Re: Regex to remove data
by trizen (Hermit) on Nov 07, 2012 at 07:55 UTC
    Keeps the lines that are empty or that contain only whitespace characters:

    perl -ne 'print if not /[A-Z]/ && /^[A-Z\s]+$/' file.txt

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1002478]
Approved by Athanasius
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2017-11-18 01:20 GMT
Find Nodes?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:

    Results (276 votes). Check out past polls.