Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Looking for a string from a huge log file

by perl_req (Initiate)
on Sep 13, 2012 at 17:48 UTC ( #993543=perlquestion: print w/replies, xml ) Need Help??
perl_req has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I am looking for a certain string from a huge log file for approx 1000 ID's. The log file is in a text format. Each log file is separate for each ID and the no of lines each has varies. With the code I have written it goes into each log file, looks for that particular string. When it does not find anything it does not print anything. But once it hits the string I am looking for it keeps on printing the same string for all the other ID's. Any help will be greatly appreciated. My code is as below:

$messages = "N/A"; open(MYINPUTFIL3, "file path"); while(<MYINPUTFIL3>){ chomp; $SYS=$_; $SYS =~ s/\r|\n//g; if($SYS =~ /search string/i) { $message_1=$SYS ; $message_1 =~ s/\r|\n//g; $message_1=~ s/^\s+//; $message_1=~ s/\s+$//; $message_11 = $message_1; $message_12 = $message_1; $message_12 = s/[^a-zA-Z]*//g; $message_11 =~ s/[^0-9.]*//g; last; } } close MYINPUTFIL3; print "$message_1\n"; open(FILEWRITE, ">>trial_one.txt"); print FILEWRITE "$message_1\n"; close (FILEWRITE);

Replies are listed 'Best First'.
Re: Looking for a string from a huge log file
by kennethk (Abbot) on Sep 13, 2012 at 18:07 UTC
    Your example leaves a little to be desired -- in order for monks to be able diagnose your issue, we need to be able to replicate it. This means providing some sample input and expected output in addition to code that displays this issue. See How do I post a question effectively?.

    The problem you are describing is frequently caused by a variable that gets set in a loop, and then not cleared before it goes into another instance of that loop; this type of issue can usually be addressed with tight scoping of variables combined with the strict pragma. If I assume the code you posted occurs in a loop over a number of files, I'd expect including something like undef $message_1; at the end of your code might resolve your issue. But this is all conjecture without an effective problem statement.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Looking for a string from a huge log file
by pvaldes (Chaplain) on Sep 13, 2012 at 21:08 UTC

    Not my intention to be rude, but delete this script, think a little in what do you want to do and try again. Seriously, is very poor.

    For example: You claim to have a big file, Well, you are chomping each same line of this huge file up to three times. So your script is very bussy... doing nothing. If you need to delete a special input separator redefine $/

    Don't apply four lines of regex if you want to delete a line containing "my pattern". Use "grep-not" (grep !/my pattern/, FH) instead.

    keeps on printing the same string for all the other ID's

    because $_ is not changing at all in the last lines. The loop has finished when you print to outfile so you only have the last matched line of your log in $_.

    can you explain a little what do you need to do?


    Try to keep it simple and readable, filehandles don't need to be complicated:

Re: Looking for a string from a huge log file
by ig (Vicar) on Sep 13, 2012 at 22:36 UTC
    it keeps on printing the same string for all the other ID's

    It is hard to imagine how a program that only has two print statements, one to STDOUT and one to a file, outside any loop "keeps on printing". Can you clarify?

Re: Looking for a string from a huge log file
by swampyankee (Parson) on Sep 15, 2012 at 02:56 UTC

    Not to make you feel persecuted...

    What, exactly, are you trying to do? Pretend you're paying a monk a to write the code for you, but you're only allowed to talk to the monk once, and you don't get paid unless the program does what you want. The monk always gets paid, and gets paid a lot extra if you want to change something later.

    Aside, what do you consider "huge"? Why are you doing all those substitutions? How many "hits" would you expect in your dataset? On 0.1% of records or 75%?

    Information about American English usage here and here. Floating point issues? Please read this before posting. — emc

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://993543]
Approved by ig
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2018-05-20 18:32 GMT
Find Nodes?
    Voting Booth?