User Guide
TODO
Finish this section
Testing non-Python tools via the command line
Imagine you're testing the GC-content calculator, except it's written in Node.js and reads the sequence from stdin:
// read the data from stdin
var fs = require("fs")
var data = fs.readFileSync(0, "utf-8")
// count the Gs and Cs
var g_count = 0
var c_count = 0
for (const letter of data.slice(0, -1)) {
if (letter == "G") {
g_count++
} else if (letter == "C") {
c_count++
}
}
// log the GC content to stdout
console.log((g_count + c_count) / (data.length - 1))
To test it using Hypothesis-Bio, we can use Python's subprocess
library:
import subprocess
from hypothesis import given
from hypothesis_bio import dna
@given(dna())
def test_node_gc_content(seq):
gc_content = subprocess.run(
["node", "gc_content.js"], input=seq, capture_output=True, encoding="ascii"
).stdout
assert 0.0 <= float(gc_content) <= 1.0
test_node_gc_content()
In essence, we call node
with the DNA sequence generated by Hypothesis-Bio passed as stdin and, just as in the example in the README, check to make sure that the value of stdout is in the allowable range.
When we run the script (notice that it has to manually call test_node_gc_content
), we see:
Falsifying example: test_node_gc_content(seq='')
AssertionError
Sure enough, we get an error on the empty string, just as we expected.
Working with other Hypothesis extensions
A lot of biological data formats are tab or comma delimited. For example, consider default BLAST+6 formatted outputs which are just tab-delimited files like this:
moaC gi|15800534|ref|NP_286546.1| 100.00 161 0 0 1 161 1 161 3e-114330
moaC gi|170768970|ref|ZP_02903423.1| 99.38 161 1 0 1 161 1 161 9e-114329
The hypothesis-csv
package is capable of generating tab-delimited files if given a list of the type of each column.
Hypothesis-Bio provides just such a list.
Going back to the BLAST+6 example, you can use the BLAST6_DEFAULT_HEADERS
list to generate BLAST+6 files:
from hypothesis_bio import BLAST6_DEFAULT_HEADERS
from hypothesis_csv.strategies import csv
@given(csv(columns=BLAST6_DEFAULT_HEADERS, dialect="excel-tab"))
def test_blast6(blast6):
...