<?xml version="1.0" encoding="windows-1252"?>
<node id="960394" title="Select only desired features from a text" created="2012-03-19 07:25:42" updated="2012-03-19 07:25:42">
<type id="115">
perlquestion</type>
<author id="717513">
remluvr</author>
<data>
<field name="doctext">
&lt;p&gt;Hi everyone.&lt;/br&gt;
Here I am with a new problem I can't solve.&lt;/br&gt;
I have two input files. One contains a list of semantic relations structured like the following (lets' call it INPUT1):
&lt;/p&gt;
&lt;code&gt;alligator-n		amphibian_reptile	attri	long-j
alligator-n		amphibian_reptile	attri	old-j
alligator-n		amphibian_reptile	coord	crocodile-n
alligator-n		amphibian_reptile	coord	frog-n
alligator-n		amphibian_reptile	event	walk-v
alligator-n		amphibian_reptile	hyper	animal-n&lt;/code&gt;
&lt;p&gt;And another one that is like the following (obviously the following is just a very reduced version):&lt;/p&gt;
&lt;code&gt;frog-n	about	adage-n	8.8016
frog-n	appearance-1	broad-j	11.9640
frog-n	coord	albino-n	6.7667
frog-n	be	jumper-n	6.0272
frog-n	be	key-n	3.8779
frog-n	of	body-n	8.3063
frog-n	of	bone-n	20.7982
frog-n	of	book-n	0.4229
crocodile-n	be	key-n	3.2572
crocodile-n	of	chorus-n	24.9515
crocodile-n	of	book-n	2.3460
crocodile-n	obj	sit-v	3.1857
crocodile-n	obj	size-v	57.3257
crocodile-n	obj	skewer-v	6.1105
animal-n	coord-1	investigation-n	0.9666
animal-n	coord-1	irrigation-n	2.6058
animal-n	coord-1	isolation-n	1.4074
animal-n	coord-1	isotope-n	2.7420
&lt;/code&gt;
&lt;p&gt;I need to check input1 for relations eq "coord" (third field of the rows) and search input2 for occurrences of fourth field of the row element in it. In this case I have crocodile-n and frog-n. I have to build another file that looks like input2 but contains every row whose first field is crocodile-n or frog-n. If one element is already found, I need not to repeat it, but sum the score it has with the one I already found. &lt;/br&gt;
I understand this explanation is not really clear, so here it is an example of desired output:&lt;/p&gt;
&lt;code&gt;
not_alligator-n about		adage-n	8.8016
not_alligator-n	appearance-1	broad-j	11.9640
not_alligator-n	coord	albino-n	6.7667
not_alligator-n	be	jumper-n	6.0272
not_alligator-n	be	key-n	7.1351(3.8779+3.2572)
not_alligator-n	of	body-n	8.3063
not_alligator-n	of	chorus-n	24.9515
not_alligator-n	of	bone-n	20.7982
not_alligator-n	of	book-n	2.7689(0.4229+2.3460)
not_alligator-n	obj	sit-v	3.1857
not_alligator-n	obj	size-v	57.3257
not_alligator-n	obj	skewer-v	6.1105
&lt;/code&gt;
&lt;p&gt;I have no idea where to start. Less than one month since I started back using perl, and still a lot I have to learn&lt;/br&gt;
Every suggestion, tip, indication on what to do would be really appreciated&lt;/br&gt;
I need it because I'm analyzing some statistical measure to be used on semantic relation for my ph.D Theses.&lt;/br&gt;
Thanks to all&lt;/br&gt;
Giulia
&lt;/p&gt;</field>
</data>
</node>
