Package csb :: Package bio :: Package io :: Module wwpdb

Module wwpdb

PDB structure parsers, format builders and database providers.

The most basic usage is:

>>> parser = StructureParser('structure.pdb')
>>> parser.parse_structure()
<Structure>     # a Structure object (model)

or if this is an NMR ensemble:

>>> parser.parse_models()
<Ensemble>      # an Ensemble object (collection of alternative Structure-s)

This module introduces a family of PDB file parsers. The common interface of all parsers is defined in AbstractStructureParser. This class has several implementations:

RegularStructureParser - handles normal PDB files with SEQRES fields
LegacyStructureParser - reads structures from legacy or malformed PDB files, which are lacking SEQRES records (initializes all residues from the ATOMs instead)
PDBHeaderParser - reads only the headers of the PDB files and produces structures without coordinates. Useful for reading metadata (e.g. accession numbers or just plain SEQRES sequences) with minimum overhead

Unless you have a special reason, you should use the StructureParser factory, which returns a proper AbstractStructureParser implementation, depending on the input PDB file. If the input file looks like a regular PDB file, the factory returns a RegularStructureParser, otherwise it instantiates LegacyStructureParser. StructureParser is in fact an alias for AbstractStructureParser.create_parser.

Writing your own, customized PDB parser is easy. Suppose that you are trying to parse a PDB-like file which misuses the charge column to store custom info. This will certainly crash RegularStructureParser (for good), but you can create your own parser as a workaround. All you need to to is to override the virtual _read_charge hook method:

   class CustomParser(RegularStructureParser):
   
       def _read_charge(self, line):
           try:
               return super(CustomParser, self)._read_charge(line)
           except StructureFormatError:
               return None

Another important abstraction in this module is StructureProvider. It has several implementations which can be used to retrieve PDB Structures from various sources: file system directories, remote URLs, etc. You can easily create your own provider as well. See StructureProvider for details.

Finally, this module gives you some FileBuilders, used for text serialization of Structures and Ensembles:

>>> builder = PDBFileBuilder(stream)
>>> builder.add_header(structure)
>>> builder.add_structure(structure)

where stream is any Python stream, e.g. an open file or sys.stdout.

See Ensemble and Structure from csb.bio.structure for details on these objects.

Classes
	AbstractResidueMapper Defines the base interface of all residue mappers, used to align PDB ATOM records to the real (SEQRES) sequence of a chain.
	AbstractStructureParser A base PDB structure format-aware parser.
	AsyncParseResult
	AsyncStructureParser Wraps StructureParser in an asynchronous call.
	CombinedResidueMapper The best of both worlds: attempts to map the residues using FastResidueMapper, but upon failure secures success by switching to RobustResidueMapper.
	CustomStructureProvider A custom PDB data source.
	DegenerateID Looks like a StandardID, except that the accession number may have arbitrary length.
	EntryID Represents a PDB Chain identifier.
	FastResidueMapper RegExp-based residue mapper.
	FileBuilder Base abstract files for all structure file formatters.
	FileSystemStructureProvider Simple file system based PDB data source.
	HeaderFormatError
	InvalidEntryIDError
	LegacyStructureParser This is a customized PDB parser, which is designed to read both sequence and atom data from the ATOM section.
	PDBEnsembleFileBuilder Supports serialization of NMR ensembles.
	PDBFileBuilder PDB file format builder.
	PDBHeaderParser Ultra fast PDB HEADER parser.
	PDBParseError
	RegularStructureParser This is the de facto PDB parser, which is designed to read SEQRES and ATOM sections separately, and them map them.
	RemoteStructureProvider Retrieves PDB structures from a specified remote URL.
	ResidueInfo High-performance struct, which functions as a container for unmapped Atoms.
	ResidueMappingError
	RobustResidueMapper Exhaustive residue mapper, which uses Needleman-Wunsch global alignment.
	SecStructureFormatError
	SeqResID Same as a StandardID, but contains an additional underscore between te accession number and the chain identifier.
	SparseChainSequence Sequence view for reference (SEQRES) or sparse (ATOM) PDB chains.
	StandardID Standard PDB ID in the following form: xxxxY, where xxxx is the accession number (lower case) and Y is an optional chain identifier.
	StructureFormatError
	StructureNotFoundError
	StructureProvider Base class for all PDB data source providers.
	UnknownPDBResidueError

Functions

AbstractStructureParser

StructureParser(structure_file, check_ss=False, mapper=None)
A StructureParser factory, which instantiates and returns the proper parser object based on the contents of the PDB file.

source code

str

find(id, paths)
Try to discover a PDB file for PDB id in paths. source code

Structure

get(accession, model=None, prefix='https://files.rcsb.org/download/')
Download and parse a PDB entry. source code

Variables
	PDB_AMINOACIDS = `{'2AS': 'ASP', '3AH': 'HIS', '5HP': 'GLU', 'A...`
	PDB_NUCLEOTIDES = `{' M': 'Amino', 'A': 'Adenine', 'B': 'NotA'...`
	__package__ = `'csb.bio.io'`

Function Details

StructureParser(structure_file, check_ss=False, mapper=None)

source code

A StructureParser factory, which instantiates and returns the proper parser object based on the contents of the PDB file.

If the file contains a SEQRES section, RegularStructureParser is returned, otherwise LegacyStructureParser is instantiated. In the latter case LegacyStructureParser will read the sequence data directly from the ATOMs.

Parameters:

structure_file (str) - the PDB file to parse
check_ss (bool) - if True, secondary structure errors in the file will cause SecStructureFormatError exceptions
mapper (AbstractResidueMapper) - residue mapper, used to align ATOM records to SEQRES. If None, use the default (CombinedResidueMapper)

Returns: AbstractStructureParser

find(id, paths)

source code

Try to discover a PDB file for PDB id in paths.

Parameters:

id (str) - PDB ID of the entry
paths (list of str) - a list of directories to scan

Returns: str

path and file name on success, None otherwise

get(accession, model=None, prefix=`'https://files.rcsb.org/download/'`)

source code

Download and parse a PDB entry.

Parameters:

accession (str) - accession number of the entry
model (str) - model identifier
prefix (str) - download URL prefix

Returns: Structure

object representation of the selected model

Variables Details

PDB_AMINOACIDS

Value:

{'2AS': 'ASP',
 '3AH': 'HIS',
 '5HP': 'GLU',
 'ACL': 'ARG',
 'AGM': 'ARG',
 'AIB': 'ALA',
 'ALA': 'ALA',
 'ALM': 'ALA',
...

PDB_NUCLEOTIDES

Value:

{'  M': 'Amino',
 'A': 'Adenine',
 'B': 'NotA',
 'C': 'Cytosine',
 'D': 'NotC',
 'DA': 'Adenine',
 'DC': 'Cytosine',
 'DG': 'Guanine',
...

Module wwpdb

StructureParser(structure_file, check_ss=False, mapper=None)

find(id, paths)

get(accession, model=None, prefix='https://files.rcsb.org/download/')

PDB_AMINOACIDS

PDB_NUCLEOTIDES

get(accession, model=None, prefix=`'https://files.rcsb.org/download/'`)