Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^2: Select only desired features from a text

by remluvr (Sexton)
on Mar 19, 2012 at 15:15 UTC ( #960428=note: print w/ replies, xml ) Need Help??


in reply to Re: Select only desired features from a text
in thread Select only desired features from a text

Thanks, this was really useful, but my problem is I don't want to have duplicates. Given this output:

not_alligator-n about adage-n 8.8016 not_alligator-n appearance-1 broad-j 11.9640 not_alligator-n coord albino-n 6.7667 not_alligator-n be jumper-n 6.0272 not_alligator-n be key-n 3.8779 not_alligator-n of body-n 8.3063 not_alligator-n of bone-n 20.7982 not_alligator-n of book-n 0.4229 not_alligator-n be key-n 3.2572 not_alligator-n of chorus-n 24.9515 not_alligator-n of book-n 2.3460 not_alligator-n obj sit-v 3.1857 not_alligator-n obj size-v 57.3257 not_alligator-n obj skewer-v 6.1105

I'd like for not_alligator-n be key-n 3.8779 and not_alligator-n be key-n 3.2572 to appear just once, but with their score summed up.
How can I achieve that?
Thanks
Giulia


Comment on Re^2: Select only desired features from a text
Download Code
Re^3: Select only desired features from a text
by bitingduck (Friar) on Mar 19, 2012 at 15:29 UTC

    You might want to consider loading the whole thing into a database if it's that large and you need to do a lot of key lookup (e.g. to avoid dupes) as you process the data, particularly if you need to sort on it in different ways or pull out subsets based on certain conditions.

Re^3: Select only desired features from a text
by moritz (Cardinal) on Mar 19, 2012 at 18:03 UTC

    Use a second hash to store those (partial) lines that you've already seen, and only print out those lines that aren't in the hash yet.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://960428]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2014-12-22 06:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (111 votes), past polls