Package csb :: Package bio :: Package io :: Module fasta
[frames] | no frames]

Module fasta

source code

FASTA format sequence I/O.

This module provides parsers and writers for sequences and alignments in FASTA format. The most basic usage is:

>>> parser = SequenceParser()
>>> parser.parse_file('sequences.fa')
<SequenceCollection>   # collection of L{AbstractSequence}s

This will load all sequences in memory. If you are parsing a huge file, then you could efficiently read the file sequence by sequence:

>>> for seq in parser.read('sequences.fa'):
        ...            # seq is an L{AbstractSequence}

BaseSequenceParser is the central class in this module, which defines a common infrastructure for all sequence readers. SequenceParser is a standard implementation, and PDBSequenceParser is specialized to read FASTA sequences with PDB headers.

For parsing alignments, have a look at SequenceAlignmentReader and StructureAlignmentFactory.

Finally, this module provides a number of OutputBuilders, which know how to write AbstractSequence and AbstractAlignment objects to FASTA files:

>>> with open('file.fa', 'w') as out:
        builder = OutputBuilder.create(AlignmentFormats.FASTA, out)
        builder.add_alignment(alignment)
        builder.add_sequence(sequence)
        ...

or you could instantiate any of the OutputBuilders directly.

Classes
  A3MOutputBuilder
Formats sequences as A3M strings.
  A3MSequenceIterator
  BaseSequenceParser
FASTA parser template.
  FASTAOutputBuilder
Formats sequences as standard FASTA strings.
  OutputBuilder
Base sequence/alignment string format builder.
  PDBSequenceParser
PDB FASTA parser.
  PIROutputBuilder
Formats sequences as PIR FASTA strings, recognized by Modeller.
  SequenceAlignmentReader
Sequence alignment parser.
  SequenceFormatError
  SequenceParser
Standard FASTA parser.
  StructureAlignmentFactory
Protein structure alignment parser.
  klass
Formats sequences as PIR FASTA strings, recognized by Modeller.
Variables
  __package__ = 'csb.bio.io'