Python Testing and Continuous Integration

Modified for 2018 Sydney ResBaz

This is a modified version of the Software Carpentry python-testing lesson by Kathryn D. Huff for the purposes of 2018 Sydney ResBaz. See https://github.com/katyhuff/python-testing for the original.

Before relying on a new experimental device, an experimental scientist always establishes its accuracy. A new detector is calibrated when the scientist observes its responses to known input signals. The results of this calibration are compared against the expected response. An experimental scientist would never conduct an experiment with uncalibrated detectors - that would be unscientific. So too, simulations and analysis with untested software do not constitute science.

You only know what you test

You can only know by testing it. Software bugs are hiding in all nontrivial software. Testing is the process by which those bugs are systematically exterminated before they have a chance to cause a paper retraction. In software tests, just like in device calibration, expected results are compared with observed results in order to establish accuracy.

The collection of all of the tests for a given code is known as the test suite. You can think of the test suite as a bunch of pre-canned experiments that anyone can run. If all of the test pass, then the code is at least partially trustworthy. If any of the tests fail then the code is known to be incorrect with respect to whichever case failed. After this lesson, you will know to not trust software when its tests do not cover its claimed capabilities and when its tests do not pass.

Managing Expectations

In the same way that your scientific domain has expectations concerning experimental accuracy, it likely also has expectations concerning allowable computational accuracy. These considerations should surely come into play when you evaluate the acceptability of your own or someone else’s software.

In most other programming endeavours, if code is fundamentally wrong

even for years at a time - the impact of this error can be relatively small. Perhaps a website goes down, or a game crashes, or a days worth of writing is lost to a bug in your word processor. Scientific code, on the other hand, controls planes, weapons systems, satellites, agriculture, and most importantly scientific simulations and experiments. If the software that governs the computational or physical experiment is wrong, then disasters (such as false claims in a publication) will result.

This is not to say that scientists have a monopoly on software testing, simply that software cannot be called scientific unless it has been validated.

Code without tests… is legacy code!

In Working Effectively with Legacy Code, Michael Feathers defines legacy code as “any code without tests”. This definition draws on the fact that after its initial creation, tests provide a powerful guide to other developers (and to your forgetful self, a few months in the future) about how each function is meant to be used. Without runnable tests to provide examples of code use, even brand new programs are unsustainable.

Testing is the calibration step of the computational simulation and analysis world: it lets the scientist trust their own work on a fundamental level and helps others to understand and trust their work as well. Furthermore, tests help you to never fix a bug a second time. Once a bug has been caught and a test has been written, that particular bug can never again re-enter the codebase unnoticed. So, whether motivated by principles or a desire to work more efficiently, all scientists can benefit from testing.

Prerequisites

You should have a basic understanding of Python variables and functions are a necessary prerequisite. Some previous experience with the shell and git is expected. If you have done an Software Carpentry course with Python, Shell and Git, then you are sufficiently prepared for this course.

You are not expected to be familiar with pytest (the Python library we will be using), nor with the new Python concepts we will cover.

Where these lessons are from

Note that this testing lesson was adapted from the Testing chapter in Effective Computation In Physics by Anthony Scopatz and Kathryn Huff. It is often quoted directly.

Schedule

	Setup	Download files required for the lesson
00:00	1. Basics of Testing	Why test?
00:05	2. Exceptions, Status Values and Tracebacks	What does this error mean?
00:15	3. Using Exceptions	How do I handle unusual behavior while the code runs?
00:25	4. Assertions	How can we compare observed and expected values?
00:35	5. Unit Tests	What is a unit of code?
00:45	6. Running Tests with pytest	How do I automate my tests?
00:55	7. Fixtures	How do I create and cleanup the data I need to test the code?
01:05	8. Edge and Corner Cases	How do I catch all the possible errors?
01:15	9. Integration and Regression Tests	How do we test more than a single unit of software?
01:25	10. Continuous Integration	How can I automate running the tests on more platforms than my own?
01:35	11. Test Driven Development	How do you make testing part of the code writing process?
01:45	Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.