Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^2: REGEX or Not to REGEX for many items

by r1n0 (Beadle)
on Oct 26, 2009 at 12:23 UTC ( [id://803246]=note: print w/replies, xml ) Need Help??


in reply to Re: REGEX or Not to REGEX for many items
in thread REGEX or Not to REGEX for many items

sflitman/Your Mother,

Thank you very much for your responses. I have used KinoSearch in the past for creating an index and a query engine against that index for another project. I like the idea, but this will still require running a looped lookup routine, correct? Maybe the method I want to use doesn't exist within perl, but maybe I need to use a DB with triggers or something. I will give the KinoSearch idea a try. I am using this to go through log files, which is cool. End goal was going to be to do the entire "search string list" against each log file as they are pulled into the system, but for info you supplied, I will just wait until all logs are brought in and go against them all at once. This will change my thinking but should work fine.

I have never used KinoSearch to index Word files. I like that idea, too. Is there a site that exists that might tell one how to index all kinds of files with KinoSearch? I guess lots of tools are required based on the various filetypes that need to be converted to text. Actually, I am wondering, now, if there is a perl module that would help with converting all kinds of file types to text for KinoSearch ingestion. Something that could be used to convert PDF, Word, PowerPoint, Excel, OCR Graphics, etc, and turn them into text for KinoSearch indexing. Now that would be really cool. Anyone have any knowledge of such a module/tool/project?

Thanks again for the info.
  • Comment on Re^2: REGEX or Not to REGEX for many items

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://803246]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (8)
As of 2024-04-19 08:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found