Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Removing users from a list

by gossamer (Novice)
on Sep 09, 2012 at 00:14 UTC ( #992543=perlquestion: print w/ replies, xml ) Need Help??
gossamer has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I'm relatively new to perl and having some trouble with what I'd imagine is a classic issue for perl. I'd like to remove a list of email addresses in one file that appear in another.

In other words, given file removelist.txt, which contains:

user1@example.com
user2@example.com
user3@example.com
etc...

In some cases the lines may appear like this:

user1@example.com<mailto:user1@example.com>

due to bad HTML email clients. I'd like to remove all occurrences of each of the lines of the file from the file masterlist.txt, ignoring case.

I've played around with it a bit, but having trouble removing all at once, instead of iterating through the file and creating a temporary file for each email address in the removelist file.

I've also tried a combination of sed with egrep, but haven't figured out how to add a pipe character for the 'or' at the end of each line of the removelist.txt file except the last line.

Thanks for any ideas.

Comment on Removing users from a list
Re: Removing users from a list
by 2teez (Priest) on Sep 09, 2012 at 00:41 UTC
    Hi,

    What have you tried using Perl?
    Really, there is no way, one can help except giving "assumed" examples, since no code, examples, likes or modified script was not given.
    However, if this might help.
    Get all the user from the masterlist.txt, then iterate through the removelist.txt, to remove from the masterlist hash, names contained in the removelist that matches. Then print out all the remaining users in the masterlist hash.

    UPDATE:
    Please, if I may illustrate the point above, using an example that could help.
    Note:
    This might not be the perfect example, but am sure it will give up a head up.
    masterlist.txt

    barak osama bush jonathan perl python java c++

    removelist.txt
    osama jonathan python java
    #!/usr/bin/perl use warnings; use strict; my %new_list; read_n_work_file( 'masterlist.txt', sub { undef $new_list{ $_[0] } } ) +; read_n_work_file( 'removelist.txt', sub { delete $new_list{ $_[0] } if exists $new_list{ $_[0] } } ); print join "\n" => keys %new_list; # print to a new file sub read_n_work_file { my ( $file, $workout ) = @_; open my $fh, '<', $file or die "can't open file:$!"; while (<$fh>) { chomp; $workout->($_); } }

    output
    bush barak perl c++

Re: Removing users from a list
by pvaldes (Chaplain) on Sep 09, 2012 at 00:55 UTC

    Small suggestion/hint

    perlmonks search, enter: "compare 2 files" and "compare two files"

Re: Removing users from a list
by BrowserUk (Pope) on Sep 09, 2012 at 01:19 UTC
    1. Load the smaller (to-be-removed) list into a hash, having stripped any extraneous bits (eg. say $line =~ s[</+$][];) and having lc'd them.
    2. Read the bigger file line by line; strip any extraneous bits & lc them; if email read exists in the toBeRemoved hash, don't print it; otherwise do.
    3. Redirect the output to a temp file. Delete the original master file; rename the temp file to the master.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

Re: Removing users from a list
by Kenosis (Priest) on Sep 09, 2012 at 05:18 UTC

    If I'm understanding your issue, you have two data sets. One contained in removelist.txt:

    user1@example.com user3@example.com ...

    The other in masterlist.txt:

    user1@example.com user2@example.com user3@example.com USER1@EXAMPLE.COM<mailto:USER1@EXAMPLE.COM> user4@example.com user5@example.com user5@example.com<mailto:user5@example.com> ...

    You want items from the first set to be removed from the second. Excellent suggestions have been offered, and this is a "classic issue."

    See if the following will work for your situation:

    use Modern::Perl; use File::Slurp qw/ read_file write_file /; my $removeList = join '|', map { chomp; $_ } read_file 'removelist.txt'; write_file 'finallist.txt', grep !/$removeList/i, read_file 'masterlist.txt';

    With the above two data sets, here is the output to finallist.txt:

    user2@example.com user4@example.com user5@example.com user5@example.com<mailto:user5@example.com>

    The script only does what you mentioned, viz., joins the removelist.txt items with the pipe character (alternation operator) for a !"or" regex of email addresses used by grep on the masterlist.txt lines.

    Hope this helps!

      To get a more efficient regular expression, use Regexp::Assemble to make your RE.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      My blog: Imperial Deltronics

        An excellent suggestion, CountZero...

Re: Removing users from a list
by BillKSmith (Chaplain) on Sep 09, 2012 at 16:04 UTC

    Refer to the FAQ: perldoc -q "How do I compute the difference between two arrays?"

    Bill
      Thanks everyone for your assistance. There's certainly should be enough pointers for me to figure out this problem.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://992543]
Approved by chacham
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (14)
As of 2014-07-23 12:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (140 votes), past polls