I would be greatful if you could share me your knowledge in parsing the blastn file. I have a blastn output file, which is something like this
>lcl|14079 ref|NC_000009.11|:4900000-5300000 Homo sapiens chromosome 9
GRCh37 primary reference assembly
Score = 270 bits (146), Expect = 2e-74
Identities = 148/149 (99%), Gaps = 0/149 (0%)
Query 1 TGGGCAAGGACTTCATGTCTAAAACACCAAAAGCAATGGCAACAAAAGCCAAAATT
Sbjct 48784 TGGGCAAGGACTTCATGTCTAAAACACCAAAAGCAATGGCAACAAAAGCCAAAATT
Query 61 AATGGGATCTAATTAAACTAAAGAGCTTCTGCACAGCAAAAGAAACTACCATCAGA
Sbjct 48724 AATGGGATCTAATTCAACTAAAGAGCTTCTGCACAGCAAAAGAAACTACCATCAGA
Query 121 ACAGGCAACCTACAGAATGGGAGAACATT 149
Sbjct 48664 ACAGGCAACCTACAGAATGGGAGAACATT 48636
I would like to create a summary of the position of mismatch, and the type (insertion/deletion) from the blast output. In this case, the position 75, and the alleles A-C.
I tried with BioSearchIO, which parses the percentage, start and the end position of the alignment.Obviously, I dont want to have them in my summary rather the corodinates of mismatch, type of variation. Have any one know about any modules in perl/or a simpler(even harder) way of finding the position of mismatch and the type of variation?