User Guide

TODO

Finish this section

Testing non-Python tools via the command line

Imagine you're testing the GC-content calculator, except it's written in Node.js and reads the sequence from stdin:

// read the data from stdin
var fs = require("fs")
var data = fs.readFileSync(0, "utf-8")

// count the Gs and Cs
var g_count = 0
var c_count = 0
for (const letter of data.slice(0, -1)) {
  if (letter == "G") {
    g_count++
  } else if (letter == "C") {
    c_count++
  }
}

// log the GC content to stdout
console.log((g_count + c_count) / (data.length - 1))

To test it using Hypothesis-Bio, we can use Python's subprocess library:

import subprocess
from hypothesis import given
from hypothesis_bio import dna


@given(dna())
def test_node_gc_content(seq):
    gc_content = subprocess.run(
        ["node", "gc_content.js"], input=seq, capture_output=True, encoding="ascii"
    ).stdout
    assert 0.0 <= float(gc_content) <= 1.0


test_node_gc_content()

In essence, we call node with the DNA sequence generated by Hypothesis-Bio passed as stdin and, just as in the example in the README, check to make sure that the value of stdout is in the allowable range. When we run the script (notice that it has to manually call test_node_gc_content), we see:

Falsifying example: test_node_gc_content(seq='')

AssertionError

Sure enough, we get an error on the empty string, just as we expected.

Working with other Hypothesis extensions

A lot of biological data formats are tab or comma delimited. For example, consider default BLAST+6 formatted outputs which are just tab-delimited files like this:

moaC	gi|15800534|ref|NP_286546.1|	100.00	161	0	0	1	161	1	161	3e-114330
moaC	gi|170768970|ref|ZP_02903423.1|	99.38	161	1	0	1	161	1	161	9e-114329

The hypothesis-csv package is capable of generating tab-delimited files if given a list of the type of each column. Hypothesis-Bio provides just such a list. Going back to the BLAST+6 example, you can use the BLAST6_DEFAULT_HEADERS list to generate BLAST+6 files:

from hypothesis_bio import BLAST6_DEFAULT_HEADERS
from hypothesis_csv.strategies import csv


@given(csv(columns=BLAST6_DEFAULT_HEADERS, dialect="excel-tab"))
def test_blast6(blast6):
    ...
Last Updated: 10/25/2019, 1:08:05 AM