Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Writing tests when you don't know what the output should be

by nysus (Parson)
on May 17, 2016 at 19:54 UTC ( [id://1163257]=perlquestion: print w/replies, xml ) Need Help??

nysus has asked for the wisdom of the Perl Monks concerning the following question:

UPDATE: OK, thanks to everyone who chimed in and helped me get a clearer idea of what's involved with writing tests. Looks like I have my work cut out for me.

So I've decided to dip my toes into writing tests. One of the functions of a module I'll be writing will be to grab the file names from an existing directory.

So I'm already stumped. How do I write a test to ensure that the module is loading the correct files if I don't yet know what the files in the directory will be?

Complicating matters, my module will load data structures from files based on the file names it finds. My module will perform operations on the data. How do I test whether the module is performing these operations correctly? Do I need to create a bunch of dummy data structures that mimic all the different possibilities of what the real data will look like and run the tests with this dummy data? This seems like an awful lot of work and I'm not sure it would really be worth the time.

The short tutorials I've found about testing in Perl use only the most simplistic examples and aren't of much help.

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks

Replies are listed 'Best First'.
Re: Writing tests when you don't know what the output should be
by afoken (Chancellor) on May 17, 2016 at 20:43 UTC
    (Example in italics)
    How do I write a test to ensure that the module is loading the correct files if I don't yet know what the files in the directory will be?

    You create a known set of files in the directory.

    Let's assume a simple "pocket calculator module". Create - say three - files "one-plus-two.calc", "two-plus-three.calc", "three-times-five.calc", containing "1+2", "2+3", "3*5".

    How do I test whether the module is performing these operations correctly?

    You know the input (because you created it), you know the expected output. Compare the output of your module with the expected output. Test::More and friends are extremely useful.

    Your test may either have a hardcoded result for each input file, perhaps in a hash (%expected=( "one-plus-two.calc" => 3, "two-plus-three.calc" => 5, ...)), or a result file for each input file ("one-plus-two.result" contains "3", and so on).

    Do I need to create a bunch of dummy data structures that mimic all the different possibilities of what the real data will look like and run the tests with this dummy data?

    If you want a 100% test, yes. Hint: You can generate the input and the expected output.

    To test all basic operations on numbers from one to ten, use three nested loops:

    my $testNo=1; for my $x (1..10) { for my $y (1..10) { for my $op (qw( + - * / )) { write_file("test$testno.calc","$x $op $y"); write_file("test$testno.result",eval "$x $op $y"); # eval just f +or demo purposes ... $testNo++; } } }
    This seems like an awful lot of work and I'm not sure it would really be worth the time.

    It depends on how the module is used, and how reliable it has to be. Start with a few simple tests, then extend.

    If you find bugs later, FIRST add a test for the bug, THEN fix the bug, and re-run the tests to confirm that the bug is gone. This way, you will never re-introduce the bug, because the test for this bug will fail.

    Another hint: Don't test the entire module at once, test its components (functions / methods). So if you have a file searching function (something like glob), write a test that runs that function on a set of files, and compare its return value with the expected result. ((You would expect it to return ( "one-plus-two.calc", "two-plus-three.calc", "three-times-five.calc" ) in the example.) You will likely have a function that parses a file. To test it, call it with a file with known content, and compare the return value with the expected result. (See above: %expected=( "one-plus-two.calc" => { op => '+', left => 1, right => 2 }, ... )). You will have a function that actually works on the parsed data. To test it, call it with known parser output. (use Test::More tests => 3; use Calculator; is(calculate({ op => '+', left => 1, right => 2 }),3,'add one plus two'); ...). You have a function that outputs data. To test it, call it with known input, as usual. (... is(numberToWords(12),"twelve","number to words for 12"); ...)

    If your code does everything in one big function: Big chance to clean up your code! ;-) Like in the old unix philosopy, a function should do one thing, and it should do that right. As a rule of thumb, most functions should have drastically less than 100 lines of code.

    Still confused? CPAN is full of tests. Look around how modules are tested. DBI and DBD::ODBC have a lots of tests, for every little detail. CGI has a few tests. HTML::Parser has some. Test::Simple has lots of tests, so does Text::CSV_XS. Most other modules come with at least a few tests. Pick a module you know well, and look at its tests.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Regarding the files, I don't know what the files will be named ahead of time so how do I test if the module is loading the correct ones? If I create dummy files to run tests on, it will interfere with the proper operation of the actual module.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

        How do you plan on deciding which files to load then? Will they be defined by name in some sort of configuration?

        You should be setting up an actual distribution structure. See Module::Starter. All of your tests go in t/, and you can then create a test directory structure and associated data in for eg: t/data/. Your module should have a method or something that allows a script to tell it what directory to look in for the files (the tests in the t/*.t test files would do this).

        Setting up a proper testing infrastructure allows you to very precisely control everything, and easily and quickly add new sample data and new tests going forward while making changes, or particularly when a bug has been found.

        Any chance you could show us the code you have so far?

        Regarding the files, I don't know what the files will be named ahead of time so how do I test if the module is loading the correct ones? If I create dummy files to run tests on, it will interfere with the proper operation of the actual module.

        Then your module lacks a parameter telling it WHERE to look for the files. During the tests, you want to look for test input files usually below the "t" directory, not where the production files are.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Writing tests when you don't know what the output should be
by LanX (Saint) on May 17, 2016 at 20:22 UTC
    I'd split up the different phases ( like
    • reading/filtering directory,
    • reading data from file,
    • operating on data structure,
    • ...
    ) into different subs and unit test them against typical cases.

    And after testing all small building steps you test some big one combining easier ones and so on.

    Of course you have to mock data, but with this approach you only need to simulate the simpler building cases that convince you.

    The trick is to add new tests as soon that you discover a bug caused by an edge case you forgot to cover.

    With this incremental approach the probability of bugs shrinks very fast in linear time.

    UPDATE

    I think a good test suite combines the qualities of a spider web and a pyramid.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

Re: Writing tests when you don't know what the output should be
by stevieb (Canon) on May 17, 2016 at 20:11 UTC

    Trust me. It's worth the time. In four months after you think you've got it working and you make a change, how do you know you haven't broken something far away?

    Can you provide more details on your project?

    • will you read only specific files, or all files?
    • do all the files require the same structure of data/text (ie. will you be defining a data model)?
    • how do you identify what to do for each file/file's data?
    • how many possibilities are you thinking?

    If you don't create all of the possibilities up front, how will you know if your code is doing the right thing?

      The module grabs the names of the files. Each file is a .pl file. Each file has a data file associated with it, derived from the name of the .pl file, that it stores its data in using storable. Each data file is basically like a little mini database consisting of a keyed hash with objects as values. The objects represent events in the real world. The purpose of this module is to look for duplicate events across the different data files by analyzing the start date, start time, event name and possibly other fields. The data structures are the same across the data files.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

Re: Writing tests when you don't know what the output should be
by GrandFather (Saint) on May 18, 2016 at 04:22 UTC

    Retrofitting tests is never fun. The better approach is to develop the tests along with the code. Usually that means write a tests for the function you are about to write. Write the function and fix tests and the function until all the tests succeed. Often that means that you will have to "mock" input data that doesn't exist yet, or other functions that don't exist and so on.

    An important part of the process is to measure the code coverage provided by the tests. That is, measure how much of the code under test is actually hit by the tests. Devel::Cover is the tool of choice for coverage analysis, but it needs a bit of getting your head around.

    This "tests first" technique is known as Test Driven Development and is a popular choice for Perl and agile developers. Ideally the tests and the code are written against documented interfaces without direct reference to each other. In a development team context there may be three different devs involved: API doc writers, test writers and code writers. Training yourself to wear only one of the three hats at a time helps the process work for a single dev.

    Premature optimization is the root of all job security

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1163257]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-03-28 16:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found