Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Hi everyone.
Here I am with a new problem I can't solve.
I have two input files. One contains a list of semantic relations structured like the following (lets' call it INPUT1):

alligator-n amphibian_reptile attri long-j alligator-n amphibian_reptile attri old-j alligator-n amphibian_reptile coord crocodile-n alligator-n amphibian_reptile coord frog-n alligator-n amphibian_reptile event walk-v alligator-n amphibian_reptile hyper animal-n

And another one that is like the following (obviously the following is just a very reduced version):

frog-n about adage-n 8.8016 frog-n appearance-1 broad-j 11.9640 frog-n coord albino-n 6.7667 frog-n be jumper-n 6.0272 frog-n be key-n 3.8779 frog-n of body-n 8.3063 frog-n of bone-n 20.7982 frog-n of book-n 0.4229 crocodile-n be key-n 3.2572 crocodile-n of chorus-n 24.9515 crocodile-n of book-n 2.3460 crocodile-n obj sit-v 3.1857 crocodile-n obj size-v 57.3257 crocodile-n obj skewer-v 6.1105 animal-n coord-1 investigation-n 0.9666 animal-n coord-1 irrigation-n 2.6058 animal-n coord-1 isolation-n 1.4074 animal-n coord-1 isotope-n 2.7420

I need to check input1 for relations eq "coord" (third field of the rows) and search input2 for occurrences of fourth field of the row element in it. In this case I have crocodile-n and frog-n. I have to build another file that looks like input2 but contains every row whose first field is crocodile-n or frog-n. If one element is already found, I need not to repeat it, but sum the score it has with the one I already found.
I understand this explanation is not really clear, so here it is an example of desired output:

not_alligator-n about adage-n 8.8016 not_alligator-n appearance-1 broad-j 11.9640 not_alligator-n coord albino-n 6.7667 not_alligator-n be jumper-n 6.0272 not_alligator-n be key-n 7.1351(3.8779+3.2572) not_alligator-n of body-n 8.3063 not_alligator-n of chorus-n 24.9515 not_alligator-n of bone-n 20.7982 not_alligator-n of book-n 2.7689(0.4229+2.3460) not_alligator-n obj sit-v 3.1857 not_alligator-n obj size-v 57.3257 not_alligator-n obj skewer-v 6.1105

I have no idea where to start. Less than one month since I started back using perl, and still a lot I have to learn
Every suggestion, tip, indication on what to do would be really appreciated
I need it because I'm analyzing some statistical measure to be used on semantic relation for my ph.D Theses.
Thanks to all
Giulia


In reply to Select only desired features from a text by remluvr

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others about the Monastery: (15)
    As of 2014-07-22 17:28 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My favorite superfluous repetitious redundant duplicative phrase is:









      Results (121 votes), past polls