Just to throw my two pennies into the mix, I am not surprised that either forking for every packet or implementing the disk queue was slow. Both forking and disk I/O have high overheads. I really think your initial solution was a pretty fair idea.
If this is still too slow, I might suggest a three-process approach. The packet catcher will spawn a child every 1000 packets. The child will spit the packets to disk while the catcher gets back to work. A third process watches for new files to be created ( maybe naming each file with the child's PID so you would know when it exited and the file was complete ) and does the parsing. This would gain some speed since the third process could keep a permanent connection to the database - DBI->connect is slow. It also makes this almost nightmarishly complex.
Just brainstorming now. What if you were to use ( since I just offered an answer using these ) one of the IPC shared memory modules? Using some kind of ring buffer in the shared memory, this would allow the parent and child to work
asynchronously. It would also elminate some significant overhead, as the child could hold a more or less permanent
connection open to the database.
mikfire