Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Deconvolutinng FastQ files

by frozenwithjoy (Curate)
on Aug 06, 2012 at 07:01 UTC ( #985615=note: print w/ replies, xml ) Need Help??


in reply to Deconvolutinng FastQ files

You should check out the FASTX Toolkit. In it there is a very complete barcode splitting script (written in Perl) in addition to many other useful tools. You can download pre-compiled for binaries or the source here. If you want to subsequently trim the barcodes, you can use the fastx_trimmer.


Comment on Re: Deconvolutinng FastQ files
Replies are listed 'Best First'.
Re^2: Deconvolutinng FastQ files
by Anonymous Monk on Aug 07, 2012 at 13:37 UTC
    Thank you frozenjoy. With FastQ on galaxy I need to trim the first three letters for my record to be able use the barcode splitting function. I haven't tried the stand-alone version yet. I will give that a go once I can get it to work on my computer. These three letters are important for my analysis, so I am not entirely sure if I can use FastX's barcode splitter tool. I am playing around with the galaxy version of it at present.
      I took a look at fastx_barcode_splitter.pl and I think I've figured out a solution. I haven't tested it, but if you change line 161 from:
      unless $barcode =~ m/^[AGCT]+$/;
      to:
      unless $barcode =~ m/^[AGCTN]+$/;
      then you should be able to prefix your barcodes w/ 3 N's as long as you set --mismatches to at least 3 on the command line when running the script.

      One caveat is that you will want to toss out any reads that have any Ns in the first X bases (where X = 3+ barcode length). Have you run FastQC? If so, this will tell you the per base N content. It probably won't be an issue if you've already done preliminary filtering based on Illumina's Y/N flags (assuming Illumina sequencing, of course).

      Also, (depending on your computer, of course) I suspect fastx_barcode_splitter.pl will run a lot faster at the CLI than on Galaxy (at least if you are using the public galaxy server).

      Edit: to avoid the potential problem w/ Ns, just use some other non-nucleotide character!

        Thank you frozenwithjoy. I should give this a go too. We are thinking about running our own Galaxy server in the EC2, so the revised fastx_barcode_splitter might come in handy. Browseruk's script below works super fast, I am not sure how fastx_barcode_splitter.pl might compare.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://985615]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (12)
As of 2015-07-28 09:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (254 votes), past polls