Package csb :: Package bio :: Package sequence
[frames] | no frames]

Package sequence

source code

Sequence and sequence alignment APIs.

This module defines the base interfaces for biological sequences and alignments: AbstractSequence and AbstractAlignment. These are the central abstractions here. This module provides also a number of useful enumerations, like SequenceTypes and SequenceAlphabets.

Sequences

AbstractSequence has a number of implementations. These are of course interchangeable, but have different intents and may differ significantly in performance. The standard Sequence implementation is what you are after if all you need is high performance and efficient storage (e.g. when you are parsing big files). Sequence objects store their underlying sequences as strings. RichSequences on the other hand will store their residues as ResidueInfo objects, which have the same basic interface as the csb.bio.structure.Residue objects. This of course comes at the expense of degraded performance. A ChainSequence is a special case of a rich sequence, whose residue objects are actually real csb.bio.structure.Residues.

Basic usage:

>>> seq = RichSequence('id', 'desc', 'sequence', SequenceTypes.Protein)
>>> seq.residues[1]
<ResidueInfo [1]: SER>
>>> seq.dump(sys.stdout)
>desc
SEQUENCE

See AbstractSequence for details.

Alignments

AbstractAlignment defines a table-like interface to access the data in an alignment:

>>> ali = SequenceAlignment.parse(">a\nABC\n>b\nA-C")
>>> ali[0, 0]
<SequenceAlignment>   # a new alignment, constructed from row #1, column #1
>>> ali[0, 1:3]
<SequenceAlignment>   # a new alignment, constructed from row #1, columns #2..#3

which is just a shorthand for using the standard 1-based interface:

>>> ali.rows[1]
<AlignedSequenceAdapter: a, 3>                        # row #1 (first sequence)
>>> ali.columns[1]
(<ColumnInfo a [1]: ALA>, <ColumnInfo b [1]: ALA>)    # residues at column #1

See AbstractAlignment for all details and more examples.

There are a number of AbstractAlignment implementations defined here. SequenceAlignment is the default one, nothing surprising. A3MAlignment is a more special one: the first sequence in the alignment is a master sequence. This alignment is usually used in the context of HHpred. More important is the StructureAlignment, which is an alignment of csb.bio.structure.Chain objects. The residues in every aligned sequence are really the csb.bio.structure.Residue objects taken from those chains.

Submodules

Classes
  A3MAlignment
A specific type of multiple alignment, which provides some operations relative to a master sequence (the first entry in the alignment).
  AbstractAlignment
Base class for all alignment objects.
  AbstractSequence
Base abstract class for all Sequence objects.
  AlignedSequenceAdapter
Adapter, which wraps a gapped AbstractSequence object and makes it compatible with the MSA row/entry interface, expected by AbstractAlignment.
  AlignmentFormats
Enumeration of multiple sequence alignment formats
  AlignmentRowsTable
  AlignmentTypes
Enumeration of alignment strategies
  ChainSequence
Sequence view for csb.bio.structure.Chain objects.
  ColumnIndexer
  ColumnInfo
  ColumnPositionError
  DuplicateSequenceError
  NucleicAlphabet
Nucleic sequence alphabet
  PositionError
  ProteinAlphabet
Protein sequence alphabet
  ResidueInfo
  RichSequence
Sequence implementation, which converts the sequence into a list of ResidueInfo objects.
  Sequence
High-performance sequence object.
  SequenceAdapter
Base wrapper class for AbstractSequence objects.
  SequenceAlignment
Multiple sequence alignment.
  SequenceAlphabets
Sequence alphabet enumerations.
  SequenceCollection
Represents a list of AbstractSequences.
  SequenceError
  SequenceIndexer
  SequenceNotFoundError
  SequencePositionError
  SequenceTypes
Enumeration of sequence types
  SliceHelper
  StdProteinAlphabet
Standard protein sequence alphabet
  StructureAlignment
Multiple structure alignment.
  UngappedSequenceIndexer
  UnknownAlphabet
Unknown sequence alphabet
Variables
  __package__ = 'csb.bio.sequence'