Just another Perl shrine | |
PerlMonks |
Re: Help with speeding up regexby eversuhoshin (Sexton) |
on Aug 12, 2012 at 16:41 UTC ( [id://986968]=note: print w/replies, xml ) | Need Help?? |
Hello Thank you all so much for the helpful suggestions. I will need some time to fully digest them since I am still learning perl :) Basically, my script identifies the number of false positive words related to management guidance. I need to do this so I don't have to go through all the financial filings. So through manual processing, I figured out words that seem to be related to guidance but do not have anything to do with the actual guidance. The regex code that I posted is that list that I compiled. By counting the number of false positive words I know that this filing is irrelevant and I will not have read it later for processing. I have changed the code a bit and used File::Map to speed it up but I am not sure if I am doing it right. Also, someone asked if the regex worked. Yes, regex works but it is slow and I am trying to make it faster.
I am also attaching some sample text http://sec.gov/Archives/edgar/data/1011737/0001193125-06-122041.txt http://sec.gov/Archives/edgar/data/1012270/0001104659-07-059430.txt http://sec.gov/Archives/edgar/data/1016281/0001104659-03-016871.txt http://sec.gov/Archives/edgar/data/1166036/0001104659-09-021080.txt http://sec.gov/Archives/edgar/data/1019361/0001019361-04-000007.txt http://sec.gov/Archives/edgar/data/1013934/0000950136-04-003588.txt Thank you all again for everything!
In Section
Seekers of Perl Wisdom
|
|