Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: input record separator and split

by Laurent_R (Canon)
on May 28, 2014 at 21:21 UTC ( #1087716=note: print w/replies, xml ) Need Help??

in reply to input record separator and split

Not only does $/ not accept regex, but it also looks fairly useless to add the "\s+" pattern in this context. At most, it would remove additional spaces from the chunks you get, but that can easily be done as a second step.

The second thing that I don't get is that you split your file on "Query" and then try to split your lines on almost the same pattern. Unless I missed something, it does not seem to me to make much sense with the data sample you provided.

Lastly, a 72320825-line file is pretty big, but I would not qualify it as huge (unless the lines are really very long), I am using much larger files on an almost daily basis and don't get any trouble so long as I am not doing something stupid sus as trying to load everything into memory (il might just take some time, but it does not fail). Anyway, since this line:

@blastblock = split(/Query=/, $_);
is overwriting the @blastblock array each time through the loop, I don't really believe that you ran out of memory because of the size of the input data. I would suggest that you try to look at line 54725380 to figure out if there is something wrong with it. One possible to view it might be a one-liner such as this one:
perl -e '$/ = "\nQuery="; while (<>) { print and last if $. == 5472538 +0;}' file.txt
It might have to be adapted depending on your data, but see if this works. More generally, I suspect that your split fails because your data might have a very large section (possibly the whole file) without ever matching the record splitting pattern. So the first thing to be done is to remove the \s+ from your input record delimiter and see whether that works.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1087716]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (7)
As of 2018-06-22 23:06 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (124 votes). Check out past polls.