bioinformatics has asked for the wisdom of the Perl Monks concerning the following question:
Hello Friends!!!
What is the best way to pattern match an unknown pattern? Allow me to explain... I have a file that contains a series of data values (microarray probe sets to be specific) that I need to sort through. Technically, there should be 11 "probes" for each target (ex. 154115_at=target name), but there are not. So, since there is a commonality between these probes (the target name), I need to be able to sort through the file and have the program take the target name value from the first line, compare it to succesive lines until one doesn't match. (The matching data needs to be further parsed and put on one line tab delimited, but I know how to do that.)When that occurs, the mismatched data needs to become the new pattern to be compared to. I'm familiar with pattern matching. However, I don't know how to designate an "unknown" pattern in perl, since I can't go and write 22,000 some-odd patterns:-). A sample imput file:
Bioinformatics
What is the best way to pattern match an unknown pattern? Allow me to explain... I have a file that contains a series of data values (microarray probe sets to be specific) that I need to sort through. Technically, there should be 11 "probes" for each target (ex. 154115_at=target name), but there are not. So, since there is a commonality between these probes (the target name), I need to be able to sort through the file and have the program take the target name value from the first line, compare it to succesive lines until one doesn't match. (The matching data needs to be further parsed and put on one line tab delimited, but I know how to do that.)When that occurs, the mismatched data needs to become the new pattern to be compared to. I'm familiar with pattern matching. However, I don't know how to designate an "unknown" pattern in perl, since I can't go and write 22,000 some-odd patterns:-). A sample imput file:
Any help is most appreciated!>probe:MOE430A:1415670_at(target name):549:177; Interrogation_Position +=2436; Antisense; GGCTGATCACATCCAAAAAGTCATG(probe sequence) >probe:MOE430A:1415670_at:549:177; Interrogation_Position=2513; Antise +nse; GAGGAAACGTTCACCCTGTCTACTA >probe:MOE430A:1415670_at:467:433; Interrogation_Position=2521; Antise +nse; GTTCACCCTGTCTACTATCAAGACA >probe:MOE430A:1415670_at:254:643; Interrogation_Position=2533; Antise +nse; TACTATCAAGACACTCGAAGAGGCT >probe:MOE430A:1415670_at:54:269; Interrogation_Position=2556; Antisen +se; CTGTGGGCAATATTGTGAAGTTCCT >probe:MOE430A:1415670_at:405:339; Interrogation_Position=2583; Antise +nse; GAATGCATCCTTGTGAGAGGTCAGA >probe:MOE430A:1415670_at:60:395; Interrogation_Position=2597; Antisen +se; GAGAGGTCAGACAAAGTGCCAGAAA >probe:MOE430A:1415670_at:284:165; Interrogation_Position=2619; Antise +nse; AAAACAAGAACACCCACACGCTGCT >probe:MOE430A:1415670_at:622:145; Interrogation_Position=2634; Antise +nse; ACACGCTGCTGCTAGCTGGAGTATT >probe:MOE430A:1415670_at:291:661; Interrogation_Position=2804; Antise +nse; TATCTTGTCCAACACTACGTCGAAG >probe:MOE430A:1415670_at:146:701; Interrogation_Position=2956; Antise +nse; TTGTCACCATGCCTGCAAGGAGAGA >probe:MOE430A:1415671_at:116:525; Interrogation_Position=1156; Antise +nse; GGAACAGGAATGTCGCAACATCGTA >probe:MOE430A:1415671_at:655:137; Interrogation_Position=1173; Antise +nse; ACATCGTATGGATTGCTGAGTGCAT >probe:MOE430A:1415671_at:398:139; Interrogation_Position=1232; Antise +nse;
Bioinformatics
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Little pattern problem...
by BrowserUk (Patriarch) on Aug 22, 2003 at 19:40 UTC | |
by BrowserUk (Patriarch) on Sep 05, 2003 at 19:50 UTC | |
Re: Little pattern problem...
by CombatSquirrel (Hermit) on Aug 22, 2003 at 17:34 UTC | |
Re: Little pattern problem...
by johndageek (Hermit) on Aug 22, 2003 at 21:03 UTC | |
Re: Little pattern problem...
by VSarkiss (Monsignor) on Aug 22, 2003 at 17:42 UTC | |
by bioinformatics (Friar) on Aug 22, 2003 at 20:52 UTC |
Back to
Seekers of Perl Wisdom