Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^2: Aligning text and then perfom calculations

by epimenidecretese (Acolyte)
on Dec 15, 2013 at 22:12 UTC ( #1067260=note: print w/replies, xml ) Need Help??


in reply to Re: Aligning text and then perfom calculations
in thread Aligning text and then perfom calculations

I went out fishing and brought home something. I got the data in a way to fit the previous script and now I can print the rows that match, but still can't skip the one who don't.

Thank you very much for pointing me in the correct direction. I'd be happy to figure it out myself if you could give me one more tip on this way.

What I can't figure out is how to sort before comparing and printing, so that I get the words that are present in both lists but are not aligned.

#!/usr/bin/env perl use strict; use warnings; while (<DATA>) { my ($f1, $f2,$perc1,$perc2) = (split)[0,-3,2,-1]; if ($f1 eq $f2){ print $f1,($perc1-$perc2),"\n"; } else{ next; } } print "\n"; __DATA__ antidemocratica 8 0.000274459 antidemocratica 58 0.000288782 antidemocratiche 1 3.43074e-05 antidemocratiche 33 0.000164307 antidemocratici 4 0.00013723 antidemocratici 31 0.000154349 antidemocraticità 1 3.43074e-05 antidemocraticità 1 4.979e-06 antidemocratico 14 0.000480303 antidemocratico 76 0.000378404 antidemocratico.questa 1 3.43074e-05 consensi 74 0.000368446 consensi 29 0.000994914 consenso 2543 0.0126616 consenso 109 0.00373951 consensocrazia 1 4.979e-06 consensuale 2 6.86148e-05 consensuale 60 0.00029874 consensuali 1 3.43074e-05 consensuali 15 7.4685e-05 consensus 2 6.86148e-05 consensualmente 9 4.4811e-05 corrotto 128 0.00439135 disonesta 7 3.4853e-05 disonesti 19 0.00065184 disonesti 29 0.000144391

OUTPUT:

antidemocratica-1.4323e-05 antidemocratiche-0.0001299996 antidemocratici-1.7119e-05 antidemocraticità2.93284e-05 antidemocratico0.000101899 consensuale-0.0002301252 consensuali-4.03776e-05 disonesti0.000507449

One of Crete's own prophets has said it: 'Cretans are always liars, evil brutes, lazy gluttons'.
He has surely told the truth.

Replies are listed 'Best First'.
Re^3: Aligning text and then perfom calculations
by GrandFather (Sage) on Dec 15, 2013 at 23:24 UTC

    This is more efficient if you have the data available as two files. Build a lookup table (hash) using the first file then consult it while reading the second file:

    #!/usr/bin/env perl use strict; use warnings; my $f1 = <<F1; antidemocratica 8 0.000274459 antidemocratiche 1 3.43074e-05 antidemocratici 4 0.00013723 antidemocraticità 1 3.43074e-05 antidemocratico 14 0.000480303 antidemocratico.questa 1 3.43074e-05 consensi 29 0.000994914 consenso 109 0.00373951 consensuale 2 6.86148e-05 consensuali 1 3.43074e-05 consensus 2 6.86148e-05 corrotto 128 0.00439135 disonesti 19 0.00065184 F1 my $f2 = <<F2; antidemocratica 58 0.000288782 antidemocratiche 33 0.000164307 antidemocratici 31 0.000154349 antidemocraticità 1 4.979e-06 antidemocratico 76 0.000378404 consensi 74 0.000368446 consenso 2543 0.0126616 consensocrazia 1 4.979e-06 consensuale 60 0.00029874 consensuali 15 7.4685e-05 consensualmente 9 4.4811e-05 disonesta 7 3.4853e-05 disonesti 29 0.000144391 F2 my %f1Words; open my $fIn, '<', \$f1; while (<$fIn>) { chomp; my ($word, $num, $value) = split; $f1Words{$word} = $value; } close $fIn; open $fIn, '<', \$f2; while (<$fIn>) { chomp; my ($word, $num, $value) = split; next if ! exists $f1Words{$word}; print "$word ", $f1Words{$word} - $value, "\n"; } close $fIn;

    Prints:

    antidemocratica -1.4323e-005 antidemocratiche -0.0001299996 antidemocratici -1.7119e-005 antidemocraticità 2.93284e-005 antidemocratico 0.000101899 consensi 0.000626468 consenso -0.00892209 consensuale -0.0002301252 consensuali -4.03776e-005 disonesti 0.000507449

    If you only have the combined rows available then you need two lookup tables. Populate the tables in the file input loop, then loop over the keys from one of the tables to generate the output:

    #!/usr/bin/env perl use strict; use warnings; my %f1Entries; my %f2Entries; while (<DATA>) { my ($f1, $f2, $perc1, $perc2) = (split)[0, -3, 2, -1]; $f1Entries{$f1} = $perc1; $f2Entries{$f2} = $perc2; } for my $f2 (sort keys %f2Entries) { next if ! exists $f1Entries{$f2}; print "$f2 ", $f1Entries{$f2} - $f2Entries{$f2}, "\n"; } __DATA__ antidemocratica 8 0.000274459 antidemocratica 58 0.000288782 antidemocratiche 1 3.43074e-05 antidemocratiche 33 0.000164307 antidemocratici 4 0.00013723 antidemocratici 31 0.000154349 antidemocraticità 1 3.43074e-05 antidemocraticità 1 4.979e-06 antidemocratico 14 0.000480303 antidemocratico 76 0.000378404 antidemocratico.questa 1 3.43074e-05 consensi 74 0.000368446 consensi 29 0.000994914 consenso 2543 0.0126616 consenso 109 0.00373951 consensocrazia 1 4.979e-06 consensuale 2 6.86148e-05 consensuale 60 0.00029874 consensuali 1 3.43074e-05 consensuali 15 7.4685e-05 consensus 2 6.86148e-05 consensualmente 9 4.4811e-05 corrotto 128 0.00439135 disonesta 7 3.4853e-05 disonesti 19 0.00065184 disonesti 29 0.000144391

    prints:

    antidemocratica -1.4323e-005 antidemocratiche -0.0001299996 antidemocratici -1.7119e-005 antidemocraticità 2.93284e-005 antidemocratico 0.000101899 consensi 0.000626468 consenso -0.00892209 consensuale -0.0002301252 consensuali -4.03776e-005 disonesti 0.000507449
    True laziness is hard work
      next if ! exists $f1Words{$word};

      That is exactly what I was looking for now. Thanks a lot. Problem solved.

      One of Crete's own prophets has said it: 'Cretans are always liars, evil brutes, lazy gluttons'.
      He has surely told the truth.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1067260]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (2)
As of 2019-12-08 04:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Strict and warnings: which comes first?



    Results (162 votes). Check out past polls.

    Notices?