Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Merging two list with simple operation

by fanticla (Scribe)
on Jul 31, 2010 at 17:48 UTC ( #852261=perlquestion: print w/replies, xml ) Need Help??

fanticla has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks

My problem should be easy to solve, but I am still struggling very much to find a proper solution to it.

I have two txt fils containing two lists of words - one word per line - with a score (the score simply being the row number). I'd like to merge the two files, ordering the words according with a new ranking computed by multiplying the scores of the same word in the two files.

The files have the following format:

FILE 1 hello 1 today 2 well 3 yes 4
FILE 2 hello 1 yes 2 today 3 well 4

The output should look like:

hello 1 today 6 yes 8 well 12

I am by now reading FILE 1 line by line, reading through FILE 2 to match the same word, multiplying the scores and write the result out into a new file. As the list are huge, it seems to me a very BAD way to do this task. Any idea how I could do it better? Sorry if my question is too simple...

Thanks, Cla

Replies are listed 'Best First'.
Re: Merging two list with simple operation
by Corion (Pope) on Jul 31, 2010 at 17:57 UTC

    To speed this up, you could read the first file into a hash, using the words as keys and the numbers as values. Then you could read through the second file and output the multiplication of the hash lookup as the result.

    As you haven't shown any code, neither will I.

      @Corion: thank you very much. I'll try to change my (poor script) according to your suggestion, and post it later here.

Re: Merging two list with simple operation
by BrowserUk (Pope) on Jul 31, 2010 at 18:11 UTC

    Wrapped. Needs mods for *nix shells:

    sort file1>file1.s & sort file2>file2.s & join file1.s file2.s | perl -anle"print$F[0],' ',$F[1]*$F[2]" | sort -n -k 2 > files.merged & del file1.s file2.s c:\test>type files.merged hello 1 FILE 2 today 6 yes 8 well 12

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      @BrowserUk; Thanks, it works. I'm though trying to come up with a pure perl solution, without any sort command, as sorting my cause problems. The two lits may also be slightly different.

        without any sort command, as sorting my cause problems.

        Why would do you think sorting would cause cause problems?

        The two lits may also be slightly different.

        Different how? If it is anything other than casing, it will screw up most solutions.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Merging two list with simple operation
by ahmad (Hermit) on Jul 31, 2010 at 18:48 UTC

    Try this:

    #!/usr/bin/perl use strict; use warnings; my %HASH; open(F1,"file1"); while (<F1>) { chomp; my ($k,$v) = split /\s+/,$_,2; $HASH{$k} = $v; } close(F1); open(F2,"file2"); while (<F2>) { chomp; my ($k,$v) = split /\s+/,$_,2; if (exists $HASH{$k} ) { $HASH{$k} *= $v; }else{ $HASH{$k} = $v; } } close(F2); # write output to file open(OUT,">newfile"); while (my ($k,$v) = each %HASH) { print OUT "$k $v\n"; } close(OUT);

      @ahmad: THANK YOU VERY MUCH. It works great!!! 1000 better as my original script!

      I'll endorse ahmad's solution. I'd just point out that the magical quality of @ARGV can be handy for this kind of data munging and that chomp %hash chomps the hash's values.

      Be well,
      rir

      #!/usr/bin/perl use strict; use warnings; @ARGV = qw/ file1 file2/; my %h; while ( <> ) { /(\S+)\s+(\S+)/; $h{$1} = $h{$1} ? $h{$1}*$2 : $2; } print "$_ $h{$_}\n" for sort { $h{$a} <=> $h{$b} } keys %h;
Re: Merging two list with simple operation
by kikuchiyo (Friar) on Aug 01, 2010 at 19:18 UTC

    I'm late to the party as usual, but...

    This problem reminds me of the algorithm certain sites use to determine the similarity between scientific articles. First they define a wordlist containing vocabulary of the relevant scientific fields. This list will form the basis of a vector field. Then they assign a vector to every article in their database; the i-th element of the vector being the number of occurrences of the i-th word from the list. The similarity of two articles is calculated by the scalar product of the (normalized) vectors belonging to the two articles - which makes perfect sense, since the scalar product of two unit vectors is the cosine of the angle between the vectors, so the above definition basically calculates the angle between the word-vectors.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://852261]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2021-05-11 08:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Perl 7 will be out ...





    Results (114 votes). Check out past polls.

    Notices?