Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: makeing refering faster ?

by graff (Chancellor)
on Aug 17, 2004 at 08:23 UTC ( #383583=note: print w/replies, xml ) Need Help??


in reply to makeing refering faster ?

It's a good thing you provided this information:
ps : this script is supposed to get all the lines from a file and refer each word to the sentence that starts with that word (and, there aren't 2 sentences that start identically).
Without that, there'd be no hope of helping with the problem. But even with that, there's still not quite enough to go on. (Looks like perldeveloper made a lucky guess, but I confess that I am still confused.)

Does the input data file really contain exactly one "sentence" per line? Are you certain that the "words" in each sentence are always separated by exactly a single space character? Are the words in "mixed case", and do they include punctuation marks? (And does this have an effect on what you are trying to do?) Why should it matter if a sentence contains a "word" that consists of the single letter "v"?

Let's suppose a particular word (e.g. "bar") occurs at the beginning of one sentence (e.g. sentence #23), and also occurs in the middle or at the end of 4 other sentences (e.g. #5, #12, #47, #69). What do you want to accomplish with regard to this word? Locate just the one sentence that begins with "bar"? Locate just the other four sentences that contain "bar"? Locate all five sentences (and identify the one that begins with "bar")? What do you want to do with words that only occur in the middle or at the end of sentences but never at the beginning of any sentence? Ignore them?

How you answer those questions will determine how you should read through the sentences and words, what sort of data structure you should create from the input data, and how you would use that data structure after you've built it.

As for the code you posted at the start of this thread, the reason it takes so long for more sentences is the nesting of your "for" loops:

foreach sentence in the file { ... foreach word in the sentence { ... foreach sentence in the file { ... # given n sentences with and avereage of m words each, # this block has to execute n*m*n times } } }
As you have learned from experience, this sort of approach "does not scale well" to large numbers of sentences. But to work out a good approach, you need to clarify your goals. You seem to be content with perldeveloper's solution (assuming his additional reply makes sense to you), but it's not clear to me that it is the best approach, or that it does what you really want -- mostly because you haven't provided a clear description of what you really want.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://383583]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (6)
As of 2022-12-06 22:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?