Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: input record separator and split

by Laurent_R (Vicar)
on May 28, 2014 at 21:21 UTC ( #1087716=note: print w/ replies, xml ) Need Help??


in reply to input record separator and split

Not only does $/ not accept regex, but it also looks fairly useless to add the "\s+" pattern in this context. At most, it would remove additional spaces from the chunks you get, but that can easily be done as a second step.

The second thing that I don't get is that you split your file on "Query" and then try to split your lines on almost the same pattern. Unless I missed something, it does not seem to me to make much sense with the data sample you provided.

Lastly, a 72320825-line file is pretty big, but I would not qualify it as huge (unless the lines are really very long), I am using much larger files on an almost daily basis and don't get any trouble so long as I am not doing something stupid sus as trying to load everything into memory (il might just take some time, but it does not fail). Anyway, since this line:

@blastblock = split(/Query=/, $_);
is overwriting the @blastblock array each time through the loop, I don't really believe that you ran out of memory because of the size of the input data. I would suggest that you try to look at line 54725380 to figure out if there is something wrong with it. One possible to view it might be a one-liner such as this one:
perl -e '$/ = "\nQuery="; while (<>) { print and last if $. == 5472538 +0;}' file.txt
It might have to be adapted depending on your data, but see if this works. More generally, I suspect that your split fails because your data might have a very large section (possibly the whole file) without ever matching the record splitting pattern. So the first thing to be done is to remove the \s+ from your input record delimiter and see whether that works.


Comment on Re: input record separator and split
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1087716]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2014-08-01 02:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (256 votes), past polls