Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
It's a good thing you provided this information:
ps : this script is supposed to get all the lines from a file and refer each word to the sentence that starts with that word (and, there aren't 2 sentences that start identically).
Without that, there'd be no hope of helping with the problem. But even with that, there's still not quite enough to go on. (Looks like perldeveloper made a lucky guess, but I confess that I am still confused.)

Does the input data file really contain exactly one "sentence" per line? Are you certain that the "words" in each sentence are always separated by exactly a single space character? Are the words in "mixed case", and do they include punctuation marks? (And does this have an effect on what you are trying to do?) Why should it matter if a sentence contains a "word" that consists of the single letter "v"?

Let's suppose a particular word (e.g. "bar") occurs at the beginning of one sentence (e.g. sentence #23), and also occurs in the middle or at the end of 4 other sentences (e.g. #5, #12, #47, #69). What do you want to accomplish with regard to this word? Locate just the one sentence that begins with "bar"? Locate just the other four sentences that contain "bar"? Locate all five sentences (and identify the one that begins with "bar")? What do you want to do with words that only occur in the middle or at the end of sentences but never at the beginning of any sentence? Ignore them?

How you answer those questions will determine how you should read through the sentences and words, what sort of data structure you should create from the input data, and how you would use that data structure after you've built it.

As for the code you posted at the start of this thread, the reason it takes so long for more sentences is the nesting of your "for" loops:

foreach sentence in the file { ... foreach word in the sentence { ... foreach sentence in the file { ... # given n sentences with and avereage of m words each, # this block has to execute n*m*n times } } }
As you have learned from experience, this sort of approach "does not scale well" to large numbers of sentences. But to work out a good approach, you need to clarify your goals. You seem to be content with perldeveloper's solution (assuming his additional reply makes sense to you), but it's not clear to me that it is the best approach, or that it does what you really want -- mostly because you haven't provided a clear description of what you really want.

In reply to Re: makeing refering faster ? by graff
in thread makeing refering faster ? by chiburashka

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (1)
As of 2022-11-27 15:52 GMT
Find Nodes?
    Voting Booth?