sequences

Strategies for generating biological sequences.

dna()

@composite
def dna(draw, allow_ambiguous=True, allow_gaps=True, uppercase_only=False, min_size=0, max_size: Optional[int] = None)

Generates DNA sequences.

Arguments

  • allow_ambiguous: Whether ambiguous bases are permitted.
  • allow_gaps: Whether a - may be in the DNA sequence.
  • uppercase_only: Whether to use only uppercase characters.
  • min_size: The shortest DNA sequence to generate.
  • max_size: The longest DNA sequence to generate.

rna()

@composite
def rna(draw, allow_ambiguous=True, allow_gaps=True, allow_lowercase=True, min_size=0, max_size: Optional[int] = None)

Generates RNA sequences.

Arguments

  • allow_ambiguous: Whether ambiguous bases are permitted.
  • allow_gaps: Whether a - may be in the RNA sequence.
  • allow_lowercase: Whether lowercase characters should be used.
  • min_size: The shortest RNA sequence to generate
  • max_size: The longest RNA sequence to generate

protein()

@composite
def protein(draw, allow_extended=False, allow_ambiguous=True, single_letter_protein=True, uppercase_only=False, min_size=0, max_size: Optional[int] = None)

Generates protein sequences.

Tip

By default, only canonical amino acids are used.

Arguments

  • allow_extended: Whether the extended amino acid alphabet should be used.
  • allow_ambiguous: Whether ambiguous amino acids are permitted.
  • single_letter_protein: Whether 1-letter or 3-letter abbreviations of proteins should be used.
  • uppercase_only: Whether to restrict the protein sequence to uppercase characters.
  • min_size: The shortest protein sequence to generate.
  • max_size: The longest protein sequence to generate.

start_codon()

@composite
def start_codon(draw, allow_ambiguous=True) -> str

Generates start codons.

Arguments

  • allow_ambiguous: Whether ambiguous bases are permitted.

stop_codon()

@composite
def stop_codon(draw, allow_ambiguous=True) -> str

Generates stop codons.

Arguments

  • allow_ambiguous: Whether ambiguous bases are permitted.

cds()

@composite
def cds(draw, include_start_codon=True, include_stop_codon=True, allow_internal_stop_codons=True, allow_ambiguous=True, uppercase_only=False, min_size=0, max_size=None) -> str

Generates coding DNA sequences (CDSs).

Arguments

  • include_start_codon: Whether to include a start codon at the beginning.
  • include_stop_codon: Whether to include a stop codon at the end.
  • allow_internal_stop_codons: Whether stop codons may occur at any place other than the end.
  • allow_ambiguous: Whether ambiguous bases are permitted.
  • uppercase_only: Whether to use only uppercase characters.
  • min_size: The shortest CDS to generate in base pairs.
  • max_size: The longest CDS to generate in base pairs.

kmers()

@composite
def kmers(draw, seq: str, k: int) -> str

Generates k-mers (short sliding window substrings) from a given sequence.

Arguments

  • seq: The sequence to be used for generating k-mers
  • k: Size of the substrings to be generated