Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Hi everyone.
Here I am with a new problem I can't solve.
I have two input files. One contains a list of semantic relations structured like the following (lets' call it INPUT1):

alligator-n amphibian_reptile attri long-j alligator-n amphibian_reptile attri old-j alligator-n amphibian_reptile coord crocodile-n alligator-n amphibian_reptile coord frog-n alligator-n amphibian_reptile event walk-v alligator-n amphibian_reptile hyper animal-n

And another one that is like the following (obviously the following is just a very reduced version):

frog-n about adage-n 8.8016 frog-n appearance-1 broad-j 11.9640 frog-n coord albino-n 6.7667 frog-n be jumper-n 6.0272 frog-n be key-n 3.8779 frog-n of body-n 8.3063 frog-n of bone-n 20.7982 frog-n of book-n 0.4229 crocodile-n be key-n 3.2572 crocodile-n of chorus-n 24.9515 crocodile-n of book-n 2.3460 crocodile-n obj sit-v 3.1857 crocodile-n obj size-v 57.3257 crocodile-n obj skewer-v 6.1105 animal-n coord-1 investigation-n 0.9666 animal-n coord-1 irrigation-n 2.6058 animal-n coord-1 isolation-n 1.4074 animal-n coord-1 isotope-n 2.7420

I need to check input1 for relations eq "coord" (third field of the rows) and search input2 for occurrences of fourth field of the row element in it. In this case I have crocodile-n and frog-n. I have to build another file that looks like input2 but contains every row whose first field is crocodile-n or frog-n. If one element is already found, I need not to repeat it, but sum the score it has with the one I already found.
I understand this explanation is not really clear, so here it is an example of desired output:

not_alligator-n about adage-n 8.8016 not_alligator-n appearance-1 broad-j 11.9640 not_alligator-n coord albino-n 6.7667 not_alligator-n be jumper-n 6.0272 not_alligator-n be key-n 7.1351(3.8779+3.2572) not_alligator-n of body-n 8.3063 not_alligator-n of chorus-n 24.9515 not_alligator-n of bone-n 20.7982 not_alligator-n of book-n 2.7689(0.4229+2.3460) not_alligator-n obj sit-v 3.1857 not_alligator-n obj size-v 57.3257 not_alligator-n obj skewer-v 6.1105

I have no idea where to start. Less than one month since I started back using perl, and still a lot I have to learn
Every suggestion, tip, indication on what to do would be really appreciated
I need it because I'm analyzing some statistical measure to be used on semantic relation for my ph.D Theses.
Thanks to all
Giulia


In reply to Select only desired features from a text by remluvr

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others rifling through the Monastery: (5)
    As of 2015-07-05 05:08 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









      Results (60 votes), past polls