Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Why is the following instance of Matching using an array element not working

by MyJeweledPerls (Initiate)
on Aug 08, 2012 at 21:33 UTC ( #986366=perlquestion: print w/ replies, xml ) Need Help??
MyJeweledPerls has asked for the wisdom of the Perl Monks concerning the following question:

Hola Perl Monks, I seek your wisdom. I have an array of ID's (@sample_ids in the code below) that have been flagged as being faulty for some reason. I have the CSV file that contains all the original data and want to print out rows with the sample IDs in my array. So far I have written the following code

#!/usr/bin/perl use strict; use warnings; my @error_array; my @sample_ids; my @merged_array; #Opening File Number 1 open (ERRORFILE,"<", "Errors-email.txt") || die "File not found\n"; @error_array = <ERRORFILE>; close ERRORFILE; #Opening File Number 2 open (CSV,"<", "Merged_CSVfiles.csv") || die "File not found\n"; @merged_array = <CSV>; close CSV; #Getting the sample ids foreach my $line(@error_array){ if($line =~ m/^Sample_identifier/){ $line =~ s/Sample_identifier//; $line =~ s/IRD.+:$//; $line =~ s/\///; $line =~ s/^\s//; $line =~ s/IRD.+$//; push(@sample_ids,$line); # print(@sample_ids); }# End of the if loop }#End of the for-each loop foreach my $records(@merged_array){ if($records =~ /$sample_ids[3]/){ print $records; } }

I have noted that when I put in UCD67832 (the sample id stored as the 4th element in the array) into the matching statement instead of the sample_ids array element that's there, it is able to return the line no problem. I'm not sure what is the problem. I had tried nested for-each loops before but I bungled that. I have been trying to understand why this doesn't work but so far nothing. -Thanks for your time in advance or retroactively Dave-O

Update sorry about that let me share my data. My CSV has 28 fields in it like for instance and each line represents a record. My @sample_ids has in this specific case 438 ids.

Name,Building,Processing Date,Receipt Date, Location,Patient Id,Sample + ID, John Doe,G building,05-Aug-2012,08-Aug-2012,New York City,ABC2345,UCD2 +3467, John Moe,H building,05-Aug-2012,08-Aug-2012,New York City,DEF2345,UCD8 +0645, John Slo,I building,05-Aug-2012,08-Aug-2012,New York City,GHI2345,UCD7 +6765, John Hor,j building,05-Aug-2012,08-Aug-2012,New York City,JKL2345,UCD8 +7111,
What I wanted to be able to do is to have an expression like this foreach $line ( I called it record in the original code) if there is a match to an element in my arrays of sample_ids then to print out that line. In the code I put $sample_ids3 as a test to see if the line/record would print but it didnt.
foreach my $records(@merged_array){ if($records =~ /$sample_ids[3]/){ print $records; }
Update Here is what the contents of the error file look like
************************************** Errors for file: Merged_CSVfiles_1.txt ************************************** Sample_identifier 11SS00342 / IRD Clinic clinic_HIV000140180: * The following fields contain invalid values: Blah blah blah blah * Sample_identifier 11SS00336 / IRD Clinic clinic_HIV000140174: * The following fields contain invalid values: Yada yada yada * Sample_identifier 11SS00303 / IRD Clinic clinic_HIV000140141: * The following fields contain invalid values: yeah yeah yeah
Just to check that the array @sample_ids contains the actual ids I printed the contents of the sample id' and got the following
push(@sample_ids,$line); print(@sample_ids);
I get the following output which is the list of the ID's (this is just a truncated list, just 8 of the 438 errors) UCD11-02580-V UCD11-02581-P UCD11-02581-V UCD11-02583-P UCD11-02583-V UCD11-02584-P UCD11-02584-V UCD11-02585-P I also did the following as a check
push(@sample_ids,$line); }# End of the if loop }#End of the for-each loop print $sample_ids[3];
and I got 11SS00304

Comment on Why is the following instance of Matching using an array element not working
Select or Download Code
Re: Why is the following instance of Matching using an array element not working
by johngg (Abbot) on Aug 08, 2012 at 22:35 UTC

    A bit of a stab in the dark without seeing a sample of your data files but, do you need to chomp off the line terminator from each $line? If the text you wish to match is in the middle of the $record then the line terminator at the end of the pattern means you will never succeed.

    I hope my guess is close to the mark and that this is helpful.

    Cheers,

    JohnGG

Re: Why is the following instance of Matching using an array element not working
by kcott (Abbot) on Aug 08, 2012 at 22:58 UTC

    Without seeing any sample input, the following is just guesswork.

    You're currently matching the entire record against the fourth ID; I think you want to match the fourth element of each record against all IDs.

    I suspect

    foreach my $records(@merged_array){ if($records =~ /$sample_ids[3]/){ print $records; } }

    should probably be more like

    foreach my $records (@merged_array) { my @elements = split /,/, $records; for my $sample_id (@sample_ids) { if ($elements[3] =~ /$sample_id/) { print $records; } } }

    -- Ken

      You were right I need to match the fourth element against each of the records. I added the code you suggested and no value was returned I decided to do a check and I did this:

      foreach my $records (@merged_array) { my @elements = split /,/, $records; print $elements[0];
      The Output from this was:  "John Doe"John Doe"John Doe"John Doe"John Doe

      I realized that some of the fields themselves had commas within and that I couldn't consistently get the field I wanted by "print18" in this case So back to the drawing board for the next few minutes

Re: Why is the following instance of Matching using an array element not working
by Kenosis (Priest) on Aug 09, 2012 at 03:36 UTC

    It looks like your iterating over lines from Errors-email.txt (and it would have helped to see some of these lines, since they contain the IDs):

    ... open (ERRORFILE,"<", "Errors-email.txt") || die "File not found\n"; @error_array = <ERRORFILE>; ... foreach my $line(@error_array){ # # Some $line substitutions here... ... push(@sample_ids,$line); ...

    Then you have the following:

    ... foreach my $records(@merged_array){ if($records =~ /$sample_ids[3]/){ ...

    Where @merged_array contains lines from your cvs file.

    Does @sample_ids contain only the IDs? Did you try printing $sample_ids[3] to see what you get? Since you successfully used UCD67832 in the regex, it's evident that $sample_ids[3] does not equal UCD67832. What does it equal? Is UCD67832 a different element of @sample_ids? Does @sample_ids comtain whole lines instead of IDs?

    I can't see that you've actually isolated the ID (unless it's via the set of substitutions), so I'd tend to focus on grabbing the ID from $line around those substitutions, and then push(@sample_ids,$id);

    Hope this helps!

      Thanks a lot for this post. I went back and checked

      Here is what the contents of the error file look like
      ************************************** Errors for file: Merged_CSVfiles_1.txt ************************************** Sample_identifier 11SS00342 / IRD Clinic clinic_HIV000140180: * The following fields contain invalid values: Blah blah blah blah * Sample_identifier 11SS00336 / IRD Clinic clinic_HIV000140174: * The following fields contain invalid values: Yada yada yada * Sample_identifier 11SS00303 / IRD Clinic clinic_HIV000140141: * The following fields contain invalid values: yeah yeah yeah
      Just to check that the array @sample_ids contains the actual ids I printed the contents of the sample id' and got the following
      push(@sample_ids,$line); print(@sample_ids);
      I get the following output which is the list of the ID's (this is just a truncated list, just 8 of the 438 errors) UCD11-02580-V UCD11-02581-P UCD11-02581-V UCD11-02583-P UCD11-02583-V UCD11-02584-P UCD11-02584-V UCD11-02585-P I also did the following as a check
      push(@sample_ids,$line); }# End of the if loop }#End of the for-each loop print $sample_ids[3];

      and I got: 11SS00304

      What I did realized though was that:
      foreach my $records (@merged_array) { my @elements = split /,/, $records; print $elements[0];

      The Output from this was: "John Doe"John Doe"John Doe"John Doe"John Doe

      I realized that some of the fields themselves had commas within and that I couldn't consistently get the field I wanted by "print18" in this case

        You're most welcome, MyJeweledPerls!

        If getting the field you want from the cvs lines is still presenting a problem, consider using Text::CSV to open and parse your cvs file. Its documentation shows how to do this, so you can get that field.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://986366]
Approved by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2014-07-28 22:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (210 votes), past polls