Re^2: Reading files, skipping very long lines...

Thanks for the advice...

I actually tried to do this (using read() instead of $/, but I don't think it will do basically the same). It works, but the problem, then, is time. I am processing the file in real time, and it was taking ages (literally!) to read that 380+Mb long line...

Better explained: a process inserts lines into a file, and I am processing it. Somehow, it inserts a 380+Mb long line, and I want to skip it, and wait for the next... Maybe going really low level and playing with IPC would do the trick... I gotta go now, but will think on it tomorrow...

Conclusion: the method works, but it's too slow... I need a way to skip the line completely, and wait for a new line to be inserted into the file. (I wanna croak my $brain)

Thanks for your help, fellow monks!

--
our $Perl6 is Fantastic;

Comment on Re^2: Reading files, skipping very long lines... Download Code

Replies are listed 'Best First'.
Re^3: Reading files, skipping very long lines... by pjf (Curate) on Sep 30, 2005 at 01:03 UTC
G'day Excalibor, All the suggestions so far have been fantastic, and it sounds like all you really need now is a very-fast 'discard line' subroutine. Be aware that regardless of how efficient your code may be, you'll be limited by the speed of the I/O operations provided by your operating system. If you've got to read 380Mb from disk, that's going to take some time regardless of how you process it. If possible, set your program running and take a look at what your system is doing. If you're on a unix-flavoured system, then `top` and `time` can help a lot. If you're hitting 100% CPU usage, and a lot of that is in userland time, then a tigher reading-loop may help. If you're not seeing 100% CPU usage, or you're seeing a very high amount of system time, then you're probably I/O bound. You'll need faster disks, hardware, and/or filesystems for your program's performance to improve. Assuming that you are CPU bound, you can potentially write your 'discard line' subroutine in C, which allows it to be very fast and compact. Here's an example using Inline::C `use Inline 'C'; # Example, skip a line of input from STDIN: skip_line(); # Look! The next line is read fine by Perl. print scalar <STDIN>; __END__ __C__ /* Read (and discard) until we find a newline / / NOTE: This will loop endlessly if it hits EOF * before finding a newline. Caveat lector. / void skip_line() { while( getchar() != '\n' ) { } }` [download] I haven't benchmarked that, but it should be both very memory efficient and fast. Be aware the of the problem that you will encounter if skip_line() hits EOF before a newline; unless you're very* sure of your input file you'll want to improve upon the sample code provided here. If you do benchmark, keep in mind that any caching by the CPU may make a significant difference to your end results. All the very best, Paul Fenwick Perl Training Australia	[reply] [d/l]
Re^3: Reading files, skipping very long lines... by Roy Johnson (Monsignor) on Sep 29, 2005 at 18:34 UTC
If you're reading each line as it's appended to the file, you can seek to the end of the file as soon as you see that the line is too long. Caution: Contents may have been coded under pressure.	[reply]
Re^4: Reading files, skipping very long lines... by rir (Vicar) on Sep 30, 2005 at 15:08 UTC
How do you know if you've skipped over a newline? Be well, rir	[reply]
Re^5: Reading files, skipping very long lines... by Roy Johnson (Monsignor) on Sep 30, 2005 at 16:42 UTC
If you're reading each line as soon as it is appended to the file, there won't be a newline. But if there's a chance that another line will follow more quickly than your `seek` gets there, you could always read backwards $MAX bytes to see if there's a newline. Caution: Contents may have been coded under pressure.	[reply] [d/l]
Re^6: Reading files, skipping very long lines... by rir (Vicar) on Sep 30, 2005 at 23:10 UTC
Re^7: Reading files, skipping very long lines... by Roy Johnson (Monsignor) on Oct 03, 2005 at 12:44 UTC
Re^3: Reading files, skipping very long lines... by ForgotPasswordAgain (Priest) on Sep 30, 2005 at 14:41 UTC
Maybe you could try replacing the output file with a named pipe (man mkfifo). The program outputs to the pipe, and you have a filter program read from the pipe. Or depending on the predictability/frequency of the output, you could compare the file size once, then later if the file size isn't greater than $MAX_DIFFERENCE, you know you don't have to worry about it. Or patch the program to not insert long lines.	[reply]


XP is just a number
	PerlMonks