perlquestion
Rhys
Okay, [id://8874] was a lot of help, since it shows how I can keep a bunch of files open at once (combined with a hash using filenames as keys, very cool).
<p>
Too bad it's not what I need. Let's back up, shall we?
<p>
I have a log file. From UDP port 161 (SNMP traps) to snmptrapd to syslog-ng and into a file. File looks roughly like this:
<p>
<code>
Sep 28 19:45:10 logsrvr snmptrapd[<pid>]: <ip1>: Trap msga.
Sep 28 19:45:10 logsrvr snmptrapd[<pid>]: <ip3>: Trap msgg.
Sep 28 19:45:10 logsrvr snmptrapd[<pid>]: <ip4>: Trap msgd.
Sep 28 19:45:10 logsrvr snmptrapd[<pid>]: <ip1>: Trap msge.
Sep 28 19:45:10 logsrvr snmptrapd[<pid>]: <ip2>: Trap msga.
</code>
<p>
I have seven input files, some gzipped, some not. Since they're log files, I can use <code>(stat "$filename")[9]</code> to get the last modified time. Sort those to keep the log entries in order without having to mess with the timestamps in the log. Match <code>/\]:\s(\S+):/</code> to get the IP address of the original trap sender.
<p>
Sounds easy, right? Here's the hard part: For each trap sender, I want to write an HTML file with only the traps for that sender. If there were only a few senders, I could just open the file, write the HTML 'top', add <code><pre></code>, then put the filehandle into a hash, and just write to the appropriate filehandle as the lines are parsed.
<p>
The problem is that there can be <i>hundreds</i> of original senders. Having that many filehandles open is certain to be problematic. The input data is about 100MB, so I'd rather not parse the data more than once if I can get away without it (although I wouldn't mind going through them twice if a first pass would generate some useful meta-information).
<p>
SO... What's a good way to deal with this? As it is, I may be faced with just opening the correct output file based on the sender IP, perhaps writing the HTML 'top', writing a line, closing it, and on to the next line. All that opening and closing files seems bad somehow, so I'm seeking the wisdom of the Monastery.
<p>
A second possibility - if they won't be used often - is to pull a list of IPs from the log files and dynamically write CGI scripts as the links instead of HTML files. The CGIs, when accessed, would <code>`zcat logs.gz | grep <ip>`</code>, basically generating the list of traps for a given IP at runtime. Quick to make, slow (and expensive) to use very often.
<p>
So what do you think? Easy way out of this? Should I just risk opening a zillion filehandles? Should I just open them and close them one at a time? Suggestions are welcome.
<p>
--J