sequence_identifiers
Strategies for generating sequence identifiers for biological data formats such as FASTA and FASTQ.
sequence_identifier()
@composite
def sequence_identifier(draw, blacklist_characters: Sequence[str] = "", min_size: int = 0, max_size: Optional[int] = None) -> str
Generates sequence identifiers.
Arguments
blacklist_characters
: Characters to not include in the sequence ID.min_size
: Minimum length of the sequence ID.max_size
: Maximum length of the sequence ID.
illumina_sequence_identifier()
@composite
def illumina_sequence_identifier(draw) -> str
Generates Illumina-style sequence identifiers.
Note
Specifications taken from Specifications taken from here
nanopore_sequence_identifier()
@composite
def nanopore_sequence_identifier(draw) -> str
Generates Nanopore-style sequence identifiers.
Note
No formal specifications could be found, this strategy is based off a header produced from Guppy
v2.1.3:
@db127b21-9336-4052-8a8e-5b5d6ac0e3be runid=700c35056d5bf4191f3f9ade0cb342d8406f8ea4 sampleid=madagascar_tb_mdr_3 read=20199 ch=214 start_time=2018-02-26T21:39:56Z