sequence_identifiers

Strategies for generating sequence identifiers for biological data formats such as FASTA and FASTQ.

sequence_identifier()

@composite
def sequence_identifier(draw, blacklist_characters: Sequence[str] = "", min_size: int = 0, max_size: Optional[int] = None) -> str

Generates sequence identifiers.

Arguments

  • blacklist_characters: Characters to not include in the sequence ID.
  • min_size: Minimum length of the sequence ID.
  • max_size: Maximum length of the sequence ID.

illumina_sequence_identifier()

@composite
def illumina_sequence_identifier(draw) -> str

Generates Illumina-style sequence identifiers.

Note

Specifications taken from Specifications taken from here

nanopore_sequence_identifier()

@composite
def nanopore_sequence_identifier(draw) -> str

Generates Nanopore-style sequence identifiers.

Note

No formal specifications could be found, this strategy is based off a header produced from Guppy v2.1.3:

@db127b21-9336-4052-8a8e-5b5d6ac0e3be runid=700c35056d5bf4191f3f9ade0cb342d8406f8ea4 sampleid=madagascar_tb_mdr_3 read=20199 ch=214 start_time=2018-02-26T21:39:56Z