Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Regex to remove data

by Anonymous Monk
on Nov 06, 2012 at 14:04 UTC ( #1002478=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a large file, I'd like to know how to create a regex which will remove lines if the consist of only Upper case alphabetic characters, examples:

Replies are listed 'Best First'.
Re: Regex to remove data
by Athanasius (Chancellor) on Nov 06, 2012 at 14:13 UTC

    Something like this should do the trick:

    #! perl use strict; use warnings; my @lines = <DATA>; s/ ^ [A-Z\s]+ $ //x for @lines; print for @lines; __DATA__ AAAAAAAA AAAA AAAAAAA AAAA AAA AAA Leave me intact PLEASE


    0:10 >perl Leave me intact PLEASE 0:13 >

    Update 1: Note that this will also remove blank (i.e. empty) lines.

    Update 2: Changed

    print "@lines";


    print for @lines;

    to address the issue of leading spaces raised by Anonymous Monk, below.

    Hope that helps,

    Athanasius <°(((><contra mundum

      This is good, but the OP did say he wanted to remove the lines from a "large file", but your approach reads the entire file into an in-memory array (@lines) and processes that.

      Check out my reply below for an example that loads only one line into memory at a time (the -n switch assumes while (<>)).

      Thanks,though it adds a leading space to each line
Re: Regex to remove data
by sundialsvc4 (Abbot) on Nov 06, 2012 at 14:37 UTC

    Also, don’t overlook the obvious grep (or egrep) commands, if you have them on your system . . . You might not have to “write a program” to do this at all.   Simply use the -v option to output all lines which don’t match the pattern.

Re: Regex to remove data
by rjt (Deacon) on Nov 06, 2012 at 17:10 UTC

    It looks like you want to remove lines that (optionally) contain spaces in addition to uppercase. This one-liner will do the trick:

    perl -ne 'print unless /^[A-Z\s]+$/' <in.txt >out.txt

    Of course if you are including this in a larger Perl program, you can just nab the regex out of that, and use it in a loop construct of some kind. For example:

    while (<>) { print if !/^[A-Z\s]+$/ }
Re: Regex to remove data
by space_monk (Chaplain) on Nov 06, 2012 at 17:10 UTC
    perl -pe 's/ ^ [A-Z\s]+ $ //x' <your_data, anybody?
Re: Regex to remove data
by trizen (Hermit) on Nov 07, 2012 at 07:55 UTC
    Keeps the lines that are empty or that contain only whitespace characters:

    perl -ne 'print if not /[A-Z]/ && /^[A-Z\s]+$/' file.txt

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1002478]
Approved by Athanasius
[Discipulus]: Tanktalus my comment was about how to fork & join in tk?
[marioroy]: Oh, I've might of missed Discipulus's earlier response. I had gone to bed.
[Tanktalus]: ah, I don't even have the perlmonks site open in my browser :)
Tanktalus is chatting through his cbstats application :)
[marioroy]: Perl is so much fun.
[Discipulus]: Tanktalus i recently started a Meditation about marioroy's MCE suit of modules, but is even better to see a practical question answered than responses to my meditation
[Discipulus]: good night mario!
[Tanktalus]: yeah, I saw you post about MCE - the concept looks really really cool. I wonder how well it plays with Coro :)
[Lady_Aleena]: Hi guys. I asked this earlier but got no answer. Why is this dying at -exec: my @music_times = qx(find ~/Music/Albums/ -type f -iname '*.mp3' -exec mp3info -p "%S\n" {} \;);
[Lady_Aleena]: The error find: missing argument to `-exec'

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (8)
As of 2017-04-23 19:51 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (432 votes). Check out past polls.