Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Why is the following instance of Matching using an array element not working

by MyJeweledPerls (Initiate)
on Aug 08, 2012 at 21:33 UTC ( #986366=perlquestion: print w/ replies, xml ) Need Help??
MyJeweledPerls has asked for the wisdom of the Perl Monks concerning the following question:

Hola Perl Monks, I seek your wisdom. I have an array of ID's (@sample_ids in the code below) that have been flagged as being faulty for some reason. I have the CSV file that contains all the original data and want to print out rows with the sample IDs in my array. So far I have written the following code

#!/usr/bin/perl use strict; use warnings; my @error_array; my @sample_ids; my @merged_array; #Opening File Number 1 open (ERRORFILE,"<", "Errors-email.txt") || die "File not found\n"; @error_array = <ERRORFILE>; close ERRORFILE; #Opening File Number 2 open (CSV,"<", "Merged_CSVfiles.csv") || die "File not found\n"; @merged_array = <CSV>; close CSV; #Getting the sample ids foreach my $line(@error_array){ if($line =~ m/^Sample_identifier/){ $line =~ s/Sample_identifier//; $line =~ s/IRD.+:$//; $line =~ s/\///; $line =~ s/^\s//; $line =~ s/IRD.+$//; push(@sample_ids,$line); # print(@sample_ids); }# End of the if loop }#End of the for-each loop foreach my $records(@merged_array){ if($records =~ /$sample_ids[3]/){ print $records; } }

I have noted that when I put in UCD67832 (the sample id stored as the 4th element in the array) into the matching statement instead of the sample_ids array element that's there, it is able to return the line no problem. I'm not sure what is the problem. I had tried nested for-each loops before but I bungled that. I have been trying to understand why this doesn't work but so far nothing. -Thanks for your time in advance or retroactively Dave-O

Update sorry about that let me share my data. My CSV has 28 fields in it like for instance and each line represents a record. My @sample_ids has in this specific case 438 ids.

Name,Building,Processing Date,Receipt Date, Location,Patient Id,Sample + ID, John Doe,G building,05-Aug-2012,08-Aug-2012,New York City,ABC2345,UCD2 +3467, John Moe,H building,05-Aug-2012,08-Aug-2012,New York City,DEF2345,UCD8 +0645, John Slo,I building,05-Aug-2012,08-Aug-2012,New York City,GHI2345,UCD7 +6765, John Hor,j building,05-Aug-2012,08-Aug-2012,New York City,JKL2345,UCD8 +7111,
What I wanted to be able to do is to have an expression like this foreach $line ( I called it record in the original code) if there is a match to an element in my arrays of sample_ids then to print out that line. In the code I put $sample_ids3 as a test to see if the line/record would print but it didnt.
foreach my $records(@merged_array){ if($records =~ /$sample_ids[3]/){ print $records; }
Update Here is what the contents of the error file look like
************************************** Errors for file: Merged_CSVfiles_1.txt ************************************** Sample_identifier 11SS00342 / IRD Clinic clinic_HIV000140180: * The following fields contain invalid values: Blah blah blah blah * Sample_identifier 11SS00336 / IRD Clinic clinic_HIV000140174: * The following fields contain invalid values: Yada yada yada * Sample_identifier 11SS00303 / IRD Clinic clinic_HIV000140141: * The following fields contain invalid values: yeah yeah yeah
Just to check that the array @sample_ids contains the actual ids I printed the contents of the sample id' and got the following
push(@sample_ids,$line); print(@sample_ids);
I get the following output which is the list of the ID's (this is just a truncated list, just 8 of the 438 errors) UCD11-02580-V UCD11-02581-P UCD11-02581-V UCD11-02583-P UCD11-02583-V UCD11-02584-P UCD11-02584-V UCD11-02585-P I also did the following as a check
push(@sample_ids,$line); }# End of the if loop }#End of the for-each loop print $sample_ids[3];
and I got 11SS00304

Comment on Why is the following instance of Matching using an array element not working
Select or Download Code
Re: Why is the following instance of Matching using an array element not working
by johngg (Abbot) on Aug 08, 2012 at 22:35 UTC

    A bit of a stab in the dark without seeing a sample of your data files but, do you need to chomp off the line terminator from each $line? If the text you wish to match is in the middle of the $record then the line terminator at the end of the pattern means you will never succeed.

    I hope my guess is close to the mark and that this is helpful.

    Cheers,

    JohnGG

Re: Why is the following instance of Matching using an array element not working
by kcott (Abbot) on Aug 08, 2012 at 22:58 UTC

    Without seeing any sample input, the following is just guesswork.

    You're currently matching the entire record against the fourth ID; I think you want to match the fourth element of each record against all IDs.

    I suspect

    foreach my $records(@merged_array){ if($records =~ /$sample_ids[3]/){ print $records; } }

    should probably be more like

    foreach my $records (@merged_array) { my @elements = split /,/, $records; for my $sample_id (@sample_ids) { if ($elements[3] =~ /$sample_id/) { print $records; } } }

    -- Ken

      You were right I need to match the fourth element against each of the records. I added the code you suggested and no value was returned I decided to do a check and I did this:

      foreach my $records (@merged_array) { my @elements = split /,/, $records; print $elements[0];
      The Output from this was:  "John Doe"John Doe"John Doe"John Doe"John Doe

      I realized that some of the fields themselves had commas within and that I couldn't consistently get the field I wanted by "print18" in this case So back to the drawing board for the next few minutes

Re: Why is the following instance of Matching using an array element not working
by Kenosis (Priest) on Aug 09, 2012 at 03:36 UTC

    It looks like your iterating over lines from Errors-email.txt (and it would have helped to see some of these lines, since they contain the IDs):

    ... open (ERRORFILE,"<", "Errors-email.txt") || die "File not found\n"; @error_array = <ERRORFILE>; ... foreach my $line(@error_array){ # # Some $line substitutions here... ... push(@sample_ids,$line); ...

    Then you have the following:

    ... foreach my $records(@merged_array){ if($records =~ /$sample_ids[3]/){ ...

    Where @merged_array contains lines from your cvs file.

    Does @sample_ids contain only the IDs? Did you try printing $sample_ids[3] to see what you get? Since you successfully used UCD67832 in the regex, it's evident that $sample_ids[3] does not equal UCD67832. What does it equal? Is UCD67832 a different element of @sample_ids? Does @sample_ids comtain whole lines instead of IDs?

    I can't see that you've actually isolated the ID (unless it's via the set of substitutions), so I'd tend to focus on grabbing the ID from $line around those substitutions, and then push(@sample_ids,$id);

    Hope this helps!

      Thanks a lot for this post. I went back and checked

      Here is what the contents of the error file look like
      ************************************** Errors for file: Merged_CSVfiles_1.txt ************************************** Sample_identifier 11SS00342 / IRD Clinic clinic_HIV000140180: * The following fields contain invalid values: Blah blah blah blah * Sample_identifier 11SS00336 / IRD Clinic clinic_HIV000140174: * The following fields contain invalid values: Yada yada yada * Sample_identifier 11SS00303 / IRD Clinic clinic_HIV000140141: * The following fields contain invalid values: yeah yeah yeah
      Just to check that the array @sample_ids contains the actual ids I printed the contents of the sample id' and got the following
      push(@sample_ids,$line); print(@sample_ids);
      I get the following output which is the list of the ID's (this is just a truncated list, just 8 of the 438 errors) UCD11-02580-V UCD11-02581-P UCD11-02581-V UCD11-02583-P UCD11-02583-V UCD11-02584-P UCD11-02584-V UCD11-02585-P I also did the following as a check
      push(@sample_ids,$line); }# End of the if loop }#End of the for-each loop print $sample_ids[3];

      and I got: 11SS00304

      What I did realized though was that:
      foreach my $records (@merged_array) { my @elements = split /,/, $records; print $elements[0];

      The Output from this was: "John Doe"John Doe"John Doe"John Doe"John Doe

      I realized that some of the fields themselves had commas within and that I couldn't consistently get the field I wanted by "print18" in this case

        You're most welcome, MyJeweledPerls!

        If getting the field you want from the cvs lines is still presenting a problem, consider using Text::CSV to open and parse your cvs file. Its documentation shows how to do this, so you can get that field.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://986366]
Approved by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (16)
As of 2014-12-22 13:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (118 votes), past polls