This is more efficient if you have the data available as two files. Build a lookup table (hash) using the first file then consult it while reading the second file:
#!/usr/bin/env perl
use strict;
use warnings;
my $f1 = <<F1;
antidemocratica 8 0.000274459
antidemocratiche 1 3.43074e-05
antidemocratici 4 0.00013723
antidemocraticità 1 3.43074e-05
antidemocratico 14 0.000480303
antidemocratico.questa 1 3.43074e-05
consensi 29 0.000994914
consenso 109 0.00373951
consensuale 2 6.86148e-05
consensuali 1 3.43074e-05
consensus 2 6.86148e-05
corrotto 128 0.00439135
disonesti 19 0.00065184
F1
my $f2 = <<F2;
antidemocratica 58 0.000288782
antidemocratiche 33 0.000164307
antidemocratici 31 0.000154349
antidemocraticità 1 4.979e-06
antidemocratico 76 0.000378404
consensi 74 0.000368446
consenso 2543 0.0126616
consensocrazia 1 4.979e-06
consensuale 60 0.00029874
consensuali 15 7.4685e-05
consensualmente 9 4.4811e-05
disonesta 7 3.4853e-05
disonesti 29 0.000144391
F2
my %f1Words;
open my $fIn, '<', \$f1;
while (<$fIn>) {
chomp;
my ($word, $num, $value) = split;
$f1Words{$word} = $value;
}
close $fIn;
open $fIn, '<', \$f2;
while (<$fIn>) {
chomp;
my ($word, $num, $value) = split;
next if ! exists $f1Words{$word};
print "$word ", $f1Words{$word} - $value, "\n";
}
close $fIn;
Prints:
antidemocratica -1.4323e-005
antidemocratiche -0.0001299996
antidemocratici -1.7119e-005
antidemocraticità 2.93284e-005
antidemocratico 0.000101899
consensi 0.000626468
consenso -0.00892209
consensuale -0.0002301252
consensuali -4.03776e-005
disonesti 0.000507449
If you only have the combined rows available then you need two lookup tables. Populate the tables in the file input loop, then loop over the keys from one of the tables to generate the output:
#!/usr/bin/env perl
use strict;
use warnings;
my %f1Entries;
my %f2Entries;
while (<DATA>) {
my ($f1, $f2, $perc1, $perc2) = (split)[0, -3, 2, -1];
$f1Entries{$f1} = $perc1;
$f2Entries{$f2} = $perc2;
}
for my $f2 (sort keys %f2Entries) {
next if ! exists $f1Entries{$f2};
print "$f2 ", $f1Entries{$f2} - $f2Entries{$f2}, "\n";
}
__DATA__
antidemocratica 8 0.000274459 antidemocratica 58 0.000288782
antidemocratiche 1 3.43074e-05 antidemocratiche 33 0.000164307
antidemocratici 4 0.00013723 antidemocratici 31 0.000154349
antidemocraticità 1 3.43074e-05 antidemocraticità 1 4.979e-06
antidemocratico 14 0.000480303 antidemocratico 76 0.000378404
antidemocratico.questa 1 3.43074e-05 consensi 74 0.000368446
consensi 29 0.000994914 consenso 2543 0.0126616
consenso 109 0.00373951 consensocrazia 1 4.979e-06
consensuale 2 6.86148e-05 consensuale 60 0.00029874
consensuali 1 3.43074e-05 consensuali 15 7.4685e-05
consensus 2 6.86148e-05 consensualmente 9 4.4811e-05
corrotto 128 0.00439135 disonesta 7 3.4853e-05
disonesti 19 0.00065184 disonesti 29 0.000144391
prints:
antidemocratica -1.4323e-005
antidemocratiche -0.0001299996
antidemocratici -1.7119e-005
antidemocraticità 2.93284e-005
antidemocratico 0.000101899
consensi 0.000626468
consenso -0.00892209
consensuale -0.0002301252
consensuali -4.03776e-005
disonesti 0.000507449
True laziness is hard work