Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
Syntactic Confectionery Delight
 
PerlMonks  

Regex to remove data

by Anonymous Monk
on Nov 06, 2012 at 14:04 UTC ( #1002478=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a large file, I'd like to know how to create a regex which will remove lines if the consist of only Upper case alphabetic characters, examples:
AAAAAAAA AAAA AAAAAAA AAAA AAA AAA
Thanks

Comment on Regex to remove data
Download Code
Re: Regex to remove data
by Athanasius (Prior) on Nov 06, 2012 at 14:13 UTC

    Something like this should do the trick:

    #! perl use strict; use warnings; my @lines = <DATA>; s/ ^ [A-Z\s]+ $ //x for @lines; print for @lines; __DATA__ AAAAAAAA AAAA AAAAAAA AAAA AAA AAA Leave me intact PLEASE

    Output:

    0:10 >perl 369_SoPW.pl Leave me intact PLEASE 0:13 >

    Update 1: Note that this will also remove blank (i.e. empty) lines.

    Update 2: Changed

    print "@lines";

    to

    print for @lines;

    to address the issue of leading spaces raised by Anonymous Monk, below.

    Hope that helps,

    Athanasius <°(((><contra mundum

      Thanks,though it adds a leading space to each line

      This is good, but the OP did say he wanted to remove the lines from a "large file", but your approach reads the entire file into an in-memory array (@lines) and processes that.

      Check out my reply below for an example that loads only one line into memory at a time (the -n switch assumes while (<>)).

Re: Regex to remove data
by sundialsvc4 (Monsignor) on Nov 06, 2012 at 14:37 UTC

    Also, don’t overlook the obvious grep (or egrep) commands, if you have them on your system . . . You might not have to “write a program” to do this at all.   Simply use the -v option to output all lines which don’t match the pattern.

Re: Regex to remove data
by space_monk (Chaplain) on Nov 06, 2012 at 17:10 UTC
    perl -pe 's/ ^ [A-Z\s]+ $ //x' <your_data, anybody?
Re: Regex to remove data
by rjt (Chaplain) on Nov 06, 2012 at 17:10 UTC

    It looks like you want to remove lines that (optionally) contain spaces in addition to uppercase. This one-liner will do the trick:

    perl -ne 'print unless /^[A-Z\s]+$/' <in.txt >out.txt

    Of course if you are including this in a larger Perl program, you can just nab the regex out of that, and use it in a loop construct of some kind. For example:

    while (<>) { print if !/^[A-Z\s]+$/ }
Re: Regex to remove data
by trizen (Friar) on Nov 07, 2012 at 07:55 UTC
    Keeps the lines that are empty or that contain only whitespace characters:

    perl -ne 'print if not /[A-Z]/ && /^[A-Z\s]+$/' file.txt

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1002478]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (9)
As of 2014-04-23 21:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (555 votes), past polls