Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: input record separator and split

by Laurent_R (Canon)
on May 28, 2014 at 21:21 UTC ( #1087716=note: print w/replies, xml ) Need Help??

in reply to input record separator and split

Not only does $/ not accept regex, but it also looks fairly useless to add the "\s+" pattern in this context. At most, it would remove additional spaces from the chunks you get, but that can easily be done as a second step.

The second thing that I don't get is that you split your file on "Query" and then try to split your lines on almost the same pattern. Unless I missed something, it does not seem to me to make much sense with the data sample you provided.

Lastly, a 72320825-line file is pretty big, but I would not qualify it as huge (unless the lines are really very long), I am using much larger files on an almost daily basis and don't get any trouble so long as I am not doing something stupid sus as trying to load everything into memory (il might just take some time, but it does not fail). Anyway, since this line:

@blastblock = split(/Query=/, $_);
is overwriting the @blastblock array each time through the loop, I don't really believe that you ran out of memory because of the size of the input data. I would suggest that you try to look at line 54725380 to figure out if there is something wrong with it. One possible to view it might be a one-liner such as this one:
perl -e '$/ = "\nQuery="; while (<>) { print and last if $. == 5472538 +0;}' file.txt
It might have to be adapted depending on your data, but see if this works. More generally, I suspect that your split fails because your data might have a very large section (possibly the whole file) without ever matching the record splitting pattern. So the first thing to be done is to remove the \s+ from your input record delimiter and see whether that works.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1087716]
[Tux]: choroba : why does pm_cb_g not recognize /me ?

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2018-04-26 20:12 GMT
Find Nodes?
    Voting Booth?