Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Get specific entries from a file which is always growing

by ashok.g (Beadle)
on Jun 09, 2011 at 21:12 UTC ( [id://909007]=perlquestion: print w/replies, xml ) Need Help??

ashok.g has asked for the wisdom of the Perl Monks concerning the following question:

Hi Folks,

Do anyone have an idea on how to achieve the following task using a shell/perl script?

Lets say some data is being written forever to File1. I need to get some specific data from File1 (say lines containing word 'IDENTIFIER') and write to File2 always.

The question seems to be simple but I think it is tricky and need some good time to get the solution.

Any help on this is highly appreciated. Thanks in advance.

-- Ashok
  • Comment on Get specific entries from a file which is always growing

Replies are listed 'Best First'.
Re: Get specific entries from a file which is always growing
by wind (Priest) on Jun 09, 2011 at 22:59 UTC
Re: Get specific entries from a file which is always growing
by Eliya (Vicar) on Jun 09, 2011 at 22:19 UTC

    A quick test shows that something like the following simple approach seems to work rather well (on a Unix file system, where you can read a file that's still open for writing):

    open my $fh, "<", "growingfile" or die; while (1) { while (<$fh>) { print if /IDENTIFIER/; } sleep 1; }

    I.e. you just resume reading after you've reached the (current) end of file.

    The tricky part is probably when IDENTIFIER has only been partially read before EOF is encountered. To handle this case, you'd need to concat the last read incomplete record after resuming...  (Whether this is relevant at all would depend on the buffering mode used to write the file (i.e. unbuffered, line-buffered, fully/block-buffered), and what your input records are (e.g. lines).)

Re: Get specific entries from a file which is always growing
by NetWallah (Canon) on Jun 10, 2011 at 01:39 UTC
    Assuming this is linux (use cygwin otherwise), shell commands are made for this:
    grep "IDENTIFIER" File1 > File2 tail -f File1 | grep "IDENTIFIER" >> File2

                "XML is like violence: if it doesn't solve your problem, use more."

      This doesn't seem to be working. I am not getting any data in the File2 using these commands.

        I suppose you're talking about the second command. The first grep command should definitely put something in File2, as long as there's some occurrences of "IDENTIFIER" in File1 at the time you run the command.

        The "problem" with the second command line is that when redirecting grep's output to a file, the stream will be fully buffered. This means that nothing will be written to File2 before the buffer (typically 4K in size) is full, or tail closes its pipe to grep (the latter won't happen when tail is running in "follow" (-f) mode).   Depending on how much output grep produces with your data, filling the buffer may take a while...

        Whether the buffering is a problem depends on your expectations. If you want File2 to be updated immediately with every newly found occurrence, then the above shell command is not the right tool for the purpose. In this case, you could write the tail/grep combo yourself in Perl (as already suggested), in which case you do have control over buffering (i.e. you can flush data immediately).

Re: Get specific entries from a file which is always growing
by BrowserUk (Patriarch) on Jun 09, 2011 at 21:29 UTC

    Do you control the process that is writing file1?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://909007]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-24 08:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found