You're welcome. Not all modules are on CPAN, and in those cases Google is your friend. :-)
Thank you for the brief description of your project. Unfortunately, a complete analysis of TF binding sites and coregulated genes is very difficult if not impossible (at the current moment in time), even in a model organism like Drosophila. Therefore, keep in mind that any results you obtain are going to be incomplete.
1. Extract all the transcription factors (TFBS) for fruitfly (D. Melanogaster).
Until all of the TFs in Drosophila have been characterized, they aren't going to be in transfac (or any other db). It is trivial to grab TF records from transfac that have some data from Drosophila, though, so you can at least get data for those TFs that have been characterized thus far. Go forth and parse (the TFBS modules may help you here).
2. Identify all the coregulated genes of each TFBS found above.
If I understand you correctly, you want to identify the genes that are regulated by each TF. That's a much harder problem, and it has been a subject of active research for many years. I encourage you to do some searches on PubMed for background and talk to someone at your institution that might have experience in the area. This step will likely require some experimental data (e.g., expression microarrays).
It is certainly possible to identify genes that contain sequences that match a given matrix, but that does not mean that the TF actually binds to that site and regulates the gene in question. Take a look at the PATSER program (Hertz and Stormo, Bioinformatics 1999). It's even part of bioperl: Bio::Tools::Run::PiseApplication::patser. :-)
3. Finally, extract the the sequence from -450 upstream to 50 downstream region of each coregulated genes found.
Once you identify the genes, extracting a portion of the sequence is trivial. Bioperl to the rescue (keep in mind that many TF binding sites could be located outside of your defined region).
HTH
| [reply] |
.. so you can at least get data for those TFs that have been characterized thus far.
Yes bobf, my intention is only to extract already characterized TFBS from TRANSFAC database. I'm not looking for new TFBS of fruitfly.
Go forth and parse (the TFBS modules may help you here)
Having looked at the TFBS module. I can't seem to find a method
that allow me to pass the query "fruitfly" or "drosophila" and then returning the:
- already characterized TF and their binding sites (BS),
- their respective coregulated genes, and
- the location of TFBS location/position in the genes.
all from TRANSFAC database. Perhaps a good example of database that has this all this in straightforward manner is SCPD (database for yeast). But TRANSFAC seems to be more complex for doing this task. Any advice on how can I make use of the module as you suggested?
Certainly I don't expect a silver bullet method from TFBS module that return all that. But at least a method that allow gives me a starting point to achieve that tasks above.
| [reply] |