Package csb :: Package bio :: Package io :: Module fasta

Module fasta

FASTA format sequence I/O.

This module provides parsers and writers for sequences and alignments in FASTA format. The most basic usage is:

>>> parser = SequenceParser()
>>> parser.parse_file('sequences.fa')
<SequenceCollection>   # collection of L{AbstractSequence}s

This will load all sequences in memory. If you are parsing a huge file, then you could efficiently read the file sequence by sequence:

>>> for seq in parser.read('sequences.fa'):
        ...            # seq is an L{AbstractSequence}

BaseSequenceParser is the central class in this module, which defines a common infrastructure for all sequence readers. SequenceParser is a standard implementation, and PDBSequenceParser is specialized to read FASTA sequences with PDB headers.

For parsing alignments, have a look at SequenceAlignmentReader and StructureAlignmentFactory.

Finally, this module provides a number of OutputBuilders, which know how to write AbstractSequence and AbstractAlignment objects to FASTA files:

>>> with open('file.fa', 'w') as out:
        builder = OutputBuilder.create(AlignmentFormats.FASTA, out)
        builder.add_alignment(alignment)
        builder.add_sequence(sequence)
        ...

or you could instantiate any of the OutputBuilders directly.

Classes
	A3MOutputBuilder Formats sequences as A3M strings.
	A3MSequenceIterator
	BaseSequenceParser FASTA parser template.
	FASTAOutputBuilder Formats sequences as standard FASTA strings.
	OutputBuilder Base sequence/alignment string format builder.
	PDBSequenceParser PDB FASTA parser.
	PIROutputBuilder Formats sequences as PIR FASTA strings, recognized by Modeller.
	SequenceAlignmentReader Sequence alignment parser.
	SequenceFormatError
	SequenceParser Standard FASTA parser.
	StructureAlignmentFactory Protein structure alignment parser.
	klass Formats sequences as PIR FASTA strings, recognized by Modeller.

Variables
	__package__ = `'csb.bio.io'`