Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Extracting DNC issues

by MoodyDreams999 (Beadle)
on Oct 04, 2023 at 21:04 UTC ( [id://11154821]=perlquestion: print w/replies, xml ) Need Help??

MoodyDreams999 has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to extract DNC using Text::CSV. I have 5 columns, only 3 of them I'm using for this process which is columns "cleaned FDNC", "Clean" and "Cleaned_All", I've been working on this for a while and I can't seem to get it right. It should take Clean and Cleaned_FDNC and compare those numbers against Cleaned_All and which ever number is in Cleaned_all and isnt in the combination of the other 2 columns will be added to the DNC column. (a paragraph)

#!/usr/bin/perl use strict; use warnings; use Text::CSV; my $csv = Text::CSV->new({ binary => 1, auto_diag => 1, eol => $/ }); open(my $input, '<', 'output.csv') or die "Could not open output.csv: +$!"; open(my $temp_output, '>', 'temp_output.csv') or die "Could not open t +emp_output.csv: $!"; # This will hold all the numbers from the combined Clean and Cleaned_F +DNC my %combined_numbers; my $header = $csv->getline($input); # Read the header push @$header, 'DNC'; # Add the DNC column to header $csv->print($temp_output, $header); # Print the header to output # Process the header row my ($all_col, $clean_col, $fdnc_col, $invalid_col, $cleaned_all_col, $ +cleaned_fdnc_col) = @$header; my $dnc_col = ''; while (my $row = $csv->getline($input)) { next unless $row; # Skip undefined rows # Skip any row that matches the header pattern next if $row->[0] =~ /^all$/i && $row->[1] =~ /^clean$/i && $row-> +[2] =~ /^fdnc$/i; my ($all, $clean, $fdnc, $invalid, $cleaned_all, $cleaned_fdnc) = +@$row; # Record the numbers from Clean and Cleaned_FDNC columns $combined_numbers{$clean} = 1; $combined_numbers{$cleaned_fdnc} = 1; # If the number from Cleaned_All is not in the combined numbers of + Clean and Cleaned_FDNC, then it's DNC if (!exists $combined_numbers{$cleaned_all}) { $dnc_col = $cleaned_all; } # Print the data with DNC to the new output file $csv->print($temp_output, [$all, $clean, $fdnc, $invalid, $cleaned +_all, $cleaned_fdnc, $dnc_col]); } close($input); close($temp_output); # Rename the temporary file to output.csv rename("temp_output.csv", "output2.csv");

DNC is basically Do Not Call Numbers For the example i'll use 3 sets of data steming from the original.


All: 1111111111 2222222222 3333333333 1010101010 9999999999 8888888888


Clean: 9999999999 3333333333


FDNC: 1111111111 2222222222

To get the DNC we have to combine Clean and FDNC and compare it against All, so whatever number that is in All and isn't in Clean and FDNC, should be DNC. Clean and Federal Do Not Call(FDNC) is taken out of all originally, so whats left should be dnc, what ever isn't accounted for within the Federal Do not call lists.


Output: 1010101010 8888888888

Not sure how to do columns, but these sets are supposed to be columns in a csv.

Replies are listed 'Best First'.
Re: Extracting DNC issues
by choroba (Cardinal) on Oct 04, 2023 at 21:25 UTC
    It would be much easier for us to help you if you included a sample of the input data as well. See SSCCE on how to ask a good question, where "good" means likely to get answers.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Fair enough, I appreciate the feed back, I will create some sample data when I can. I've picked up a pretty nasty stomach flu so haven't been able to stick with work these past 2 days. I will also make sure to reference that every time I post because it seems like I definitely need some work on the SSCCE front
Re: Extracting DNC issues
by chromatic (Archbishop) on Oct 04, 2023 at 21:41 UTC
      No, I'm trying to do if the number doesn't exist in combined numbers then it should be dnc All I'm doing is comparing phone numbers, we get one set of numbers, we run that through black list, we get fdnc and clean from that number set. so I take clean and fdnc against All the numbers basically whatever is left over in all and that isnt in the other 2 is the dnc numbers.
        I'm trying to do if the number doesn't exist in combined numbers then it should be dnc

        I understand that, but $dnc retains the previous value you assigned every time through the loop, so I wonder if that's what you intended. If you move the declaration and assignment into the loop, $dnc will be an empty string unless the current number doesn't exist in combined numbers.

        I don't know if that's what you want though.

        Improve your skills with Modern Perl: the free book.

        Until this reply, I had no idea what you meant by DNC. I still don't know the significance of the other terminology, either, like "cleaned" or "combined". And without input or output I just kind of shrugged and moved on. You seem to assume we know something about these files you're processing.

        I'd suggest restructuring the post to start with what you expect to accomplish with the code, a short example of input, the incorrect output you get now, and what correct output would look like.

Re: Extracting DNC issues
by NERDVANA (Curate) on Oct 10, 2023 at 06:56 UTC

    Ok, so your situation is coming into focus finally. It sounds from your description that the three columns are ... unrelated by row? Like this? (PerlMonks just uses HTML for tables)

    So, each column is one set of numbers, unrelated by row, and you task is to read the first set, and exclude the other two sets from it?

    If so, then your program will look like this:

    my (%all, %clean, %fdnc); while (my $row = $csv->getline($input)) { # skip over whatever needs skipped ...; $all{$row->[0]}= 1 if $row->[0]; $clean{$row->[1]}= 1 if $row->[1]; $fdnc{$row->[2]}= 1 if $row->[2]; } for (sort keys %all) { say $_ unless $clean{$_} or $fdnc{$_}; }

    But lets talk about that file format some more. Most CSV files have meaning to the rows, where each row is one record, and each column is one attribute of that record. Your file above (if that's really what it looks like and I didn't misunderstand) is really just 3 separate files that happen to be stuffed into columns of one file.

    If you use this structure instead, you would have an easier time processing it:
    1111111111 1
    2222222222 1

    With a file like this, as you read each row you can immediately know which sets it was part of, and easily add an additional column. It also sets you up nicely to be able to load them into a database, which is where these things generally need to end up for use by web apps and whatever else. So, I'd recommend writing out a new file like this if your system isn't bound to the other format.

    If you can't change it and really need that 4th column as an independent set, it gets a little awkward because now you need to iterate 4 sets simultaneously. The code would look like

    my @all_nums= sort keys %all; my @clean_nums= sort keys %clean; my @fdnc_nums= sort keys %fdnc; my @dnc_nums= grep !$clean{$_} && !$fdnc{$_}, @all_nums; use List::Util 'max'; my $n= max($#all_nums, $#clean_nums, $#fdnc_nums, $#dnc_nums); for (my $i= 0; i <= $n; $i++) { $csv->print($temp_output, [ $all_nums[$i], $clean_nums[$i], $fdnc_nums[$i], $dnc_nums[$i] ]); }
    which seems fairly awkward, which is why I recommend changing the file format.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11154821]
Approved by marto
Front-paged by Corion
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-05-26 21:00 GMT
Find Nodes?
    Voting Booth?

    No recent polls found