Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

plotting roc curve using roc package

by numita (Initiate)
on Aug 02, 2015 at 06:56 UTC ( #1137147=perlquestion: print w/replies, xml ) Need Help??

numita has asked for the wisdom of the Perl Monks concerning the following question:

hello everyone,I tried to write a code for roc curve using perl module statistic:roc, but when i run the program it gives the error value out of range for table lookup (2):0.711752437,1 at line 16.

use Statistics::ROC; open(fh,"<roc_tp_fp.txt"); while ( <fh> ) { ($a,$b)=split/,/; push @AoA, [ split ]; } for $aref ( @AoA ) { #print "[ @$aref ],"; } @curves=roc('decrease',0.95,@AoA); print "$curves[0][2][0] $curves[0][2][1] \n";

input_file:roc_tp_fp.txt looks like as follows: 0.9883817,1 0.770431568,1 0.983195895,1 0.812109932,1 0.901505931,1 0.72431528,1 0.73553418,1 0.724572657,1

Replies are listed 'Best First'.
Re: plotting roc curve using roc package
by Athanasius (Bishop) on Aug 02, 2015 at 07:54 UTC

    Hello numita, and welcome to the Monastery!

    I can see three problems with your code (there might be others):

    First, you do not have:

    use strict; use warnings;

    at the head of your script. Get into the habit of always adding these pragmata, and of using lexical variables whenever possible.

    Second, this loop:

    while ( <fh> ) { ($a,$b)=split/,/; push @AoA, [ split ]; }

    almost certainly doesn’t do what you want. The first call to split does nothing (because the results are never used); the second call results in @AoA containing this (obtained via Data::Dump):

    What you need is something like this:

    which produces the following output:

    Well, we’re getting closer, but we’re still getting the same error message. Which brings us to the third problem: the input data is almost certainly incorrect. In the examples given in Statistics::ROC’s documentation, the second “true” value in each data pair is zero (i.e. false) for around half the pairs. In your data, the second value is always 1 (true). I’m no mathematician, but I’m guessing that the input data you have supplied is invalid (or at least incomplete) for this algorithm.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: plotting roc curve using roc package
by GrandFather (Sage) on Aug 02, 2015 at 07:52 UTC
    ($a,$b)=split/,/; push @AoA, [ split ];

    is bogus. It should probably be:

    chomp; push @AoA, [split /,/];

    In addition, shouldn't you have at least one x, 0 value? Looks to me like the module doesn't handle cases where all truth values are the same.

    Premature optimization is the root of all job security

      Hello GrandFather,

      chomp; push @AoA, [split /,/];

      This will work only if there is no more than one pair of values on each line of input data (which, from the OP, I’m guessing is not the case). Otherwise, @AoA will end up like this:

      [ 0.9883817, "1 0.770431568", "1 0.983195895", "1 0.812109932", "1 0.901505931", "1 0.72431528", "1 0.73553418", "1 0.724572657", 1, ]

      :-(

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        I cheated wearing my Janitor hat and looked at the raw text for the OP's node - the data is one pair per line as implied by the OP's code, but without code tags gets flattened into a single line inside p tags.

        Premature optimization is the root of all job security
Re: plotting roc curve using roc package
by Laurent_R (Canon) on Aug 02, 2015 at 07:54 UTC
    Hmmm, you probably want this:
    while ( <fh> ) { ($a,$b)=split/,/; push @AoA, [ $a, $b ]; }
    Update: you also probably need to chomp your data lines.

      Hmmmm, no he probably doesn't want that.

      $a and $b are special variables and even in sample code should be avoided. In fact more effort should go into making sample code clean and clear than even production code because you are providing example code for other people. You could write your sample like:

      while (<$fh>) { my ($value, $truth) = split /,/; push @groups, [$value, $truth]; }

      which hints at using lexical file handles, avoids special variables, uses correctly scoped sensibly named lexical variables and uses consistent white space.

      Update: replaced ) with ] - thanks Laurent_R

      Premature optimization is the root of all job security
        Hum, yes, GrandFather, you're right, I wouldn't write such code, but here I only wanted to point to the obvious error in the code shown, i.e. that the first split was useless (because the result is never used) and that the second split did not split anything but probably only removed trailing spaces and newline character from $_. But you're right that the OP should use strictures and warnings, should not use the $a and $b special variables, should use meaningful variable names, and so on.
Re: plotting roc curve using roc package
by pme (Prior) on Aug 02, 2015 at 08:03 UTC
    Hi Numita,

    This code is untested because I have never used Statistics::ROC but hopefully helps.

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use Statistics::ROC; open(my $fh, "<roc_tp_fp.txt") or die "cannot open file 'roc_tp_fp.txt +': $!\n"; my @AoA; while ( <$fh> ) { chomp; push @AoA, [ split /,/ ]; } close $fh; print Dumper( \@AoA ) . "\n"; my @curves = roc('decrease', 0.95, @AoA); print Dumper( \@curves ) . "\n";

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1137147]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (2)
As of 2020-02-24 07:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What numbers are you going to focus on primarily in 2020?










    Results (104 votes). Check out past polls.

    Notices?