in reply to Re^2: Regular expressions across multiple lines in thread Regular expressions across multiple lines
Is this an ASCII file or are there other multi-byte character encodings? "Too slow" a PC is not likely, some other issue is afoot here, could be a Unicode issue? Can you hack this down into a simple: a)this works and b)this doesn't work example without huge files? The actual code can also be VERY useful.
Re^4: Regular expressions across multiple lines
by abcd (Novice) on Apr 24, 2016 at 17:27 UTC
|
I dont know much about file formats but the input file I am using is a FASTA file which stores DNA sequences. I am a beginner and doing this as a grad school project so this is pretty much the actual code and there isnt much else to it. The regular expression is fine as it gives the desired results when I use it on a test file with a few lines but doesnt work on larger files.
To give more context on the actual problem the 10 random characters are random barcodes flanked by a specific sequence (the abc and def in my example code). Once I get the 5 characters (i.e. dna bases) before and after this fragment I will use them to figure out which gene the random barcode inserted into. In this way I will have each gene associated with a unique barcode.
| [reply] |
|
| [reply] |
|
Yes the original file displays fine in the text editor. Also I dont really see bizarre characters, just normal characters placed one on top of another which is why I thought it maybe an issue with my pc as the output file I create on removing the newlines has a very very long single line of text which my pc maybe having problems loading.
But anyways thanks for the help. I will keep messing around and see if I can somehow get this to work because from the replies I have got the problem doesnt seem to be with the code itself but with something else.
| [reply] |
|