Re: Parsing BLAST

Depends on what you're trying to do. OK, you appear to be searching for 20mers against a virus library. Is the purpose to identify how often a given 20mer comes up in the library? To see if the 20mer is unique in a given sequence? To see if the 20mer is represented at all in the library? To map the position of 20mer hits to features within the library? Answers depend upon the purpose.

I suggest that you go to the Pasteur web site that provides BioPerl training and have a look at the examples they give around doing Blast - it has several very good examples on how to do this with variations on the parameters. I agree that BioPerl can be a bit of a beast at the beginning but I happen to like it. The alternative is try out a copy of the Tisdall book

Another question - given that you're looking for 20mers, is BLAST even the best tool to be using for this exercise? You're going to end up with many hits (they're 20mers after all and they're everywhere) and many HSPs based upon each individual hit and your e-values are going to be crap.

Given this would you maybe be better in taking your library of sequences, walking down it 20 bases at a time and scoring each 20mer pattern as you observe it? This is more by way of a regex approach to the problem. Each 20mer is represented as a key in a hash and you simply increment by one every time you get a new pattern occurring.

If you're trying something more ambitious, you'll need to provide more information.

MadraghRua
yet another biologist hacking perl....

Comment on Re: Parsing BLAST

Replies are listed 'Best First'.
Re^2: Parsing BLAST by cumurph (Novice) on Apr 25, 2006 at 00:04 UTC
I am trying to find which 20mer's are unique to my sequence. I've read the stuff at pasteur and its doesn't really seem to help me for my particular problem. I also have to do this search using FASTA, and have no clue even where to start with that., but that's another bird to kill. thanks!	[reply]
Re^3: Parsing BLAST by Anonymous Monk on Apr 25, 2006 at 00:28 UTC
I always parse blast in its -m 8 or -m 9 tabular output format. Much easier to parse.	[reply]
Re^3: Parsing BLAST by srdst13 (Pilgrim) on Apr 25, 2006 at 01:58 UTC
Unless your sequence is quite large (and so you have many thousands of unique 20mers), I would go the hash route. It will be VERY fast if memory isn't limiting. If that isn't feasible, break your sequence into fasta sequences of size 20 base pairs and give each a unique ID. Then, blast away using tabular output. Then, you can parse to your heart's content using simple perl. Sean	[reply]
Re^3: Parsing BLAST by Anonymous Monk on Apr 25, 2006 at 09:03 UTC
Is this homework?	[reply]
Re^4: Parsing BLAST by cumurph (Novice) on Apr 25, 2006 at 16:53 UTC
Any suggestions on implementing the hashing methos, or web sites with code I might be able to user/modify? This a part of class project for the bioinformatics class I'm in. The rest of my classmates and I (seven of us.) are all trying to figure this out. The professor has given us some leads, but the code he gave us isn't working right. thanks! -Rob	[reply]
Re^5: Parsing BLAST by MadraghRua (Vicar) on Apr 25, 2006 at 21:23 UTC
Re^6: Parsing BLAST by cumurph (Novice) on Apr 26, 2006 at 15:12 UTC
Re^5: Parsing BLAST by cumurph (Novice) on Apr 25, 2006 at 17:13 UTC


more useful options
	PerlMonks