|Don't ask to ask, just ask|
Lower-casing Substrings and Iterating Two Files togetherby neversaint (Deacon)
|on Dec 27, 2008 at 14:14 UTC||Need Help??|
neversaint has asked for the wisdom of the Perl Monks concerning the following question:
I have two files as input:
data2.txt (this is called "hard masked")
My aim is to generate output by lower-casing data1.txt on the position of "N" that appears in data2.txt, yielding:
Typically, in real situation there are ~10^5 sequences, each of length 10^5 ~ 10^7 characters (upto 3.5Gb in file size).
I have a script that slurps two files together and then looping over sequence index. My approach is time consuming and memory inefficient. Thus, I look for enlightenment from my fellow monks on how to perform this task more efficiently.
Update: Running time comparison between BrowserUk's approach and mine is shown below.
neversaint and everlastingly indebted.......