Package csb :: Package bio :: Package io :: Module wwpdb
[frames] | no frames]

Module wwpdb

source code

PDB structure parsers, format builders and database providers.

The most basic usage is:

>>> parser = StructureParser('structure.pdb')
>>> parser.parse_structure()
<Structure>     # a Structure object (model)

or if this is an NMR ensemble:

>>> parser.parse_models()
<Ensemble>      # an Ensemble object (collection of alternative Structure-s)

This module introduces a family of PDB file parsers. The common interface of all parsers is defined in AbstractStructureParser. This class has several implementations:

Unless you have a special reason, you should use the StructureParser factory, which returns a proper AbstractStructureParser implementation, depending on the input PDB file. If the input file looks like a regular PDB file, the factory returns a RegularStructureParser, otherwise it instantiates LegacyStructureParser. StructureParser is in fact an alias for AbstractStructureParser.create_parser.

Writing your own, customized PDB parser is easy. Suppose that you are trying to parse a PDB-like file which misuses the charge column to store custom info. This will certainly crash RegularStructureParser (for good), but you can create your own parser as a workaround. All you need to to is to override the virtual _read_charge hook method:

   class CustomParser(RegularStructureParser):
   
       def _read_charge(self, line):
           try:
               return super(CustomParser, self)._read_charge(line)
           except StructureFormatError:
               return None

Another important abstraction in this module is StructureProvider. It has several implementations which can be used to retrieve PDB Structures from various sources: file system directories, remote URLs, etc. You can easily create your own provider as well. See StructureProvider for details.

Finally, this module gives you some FileBuilders, used for text serialization of Structures and Ensembles:

>>> builder = PDBFileBuilder(stream)
>>> builder.add_header(structure)
>>> builder.add_structure(structure)

where stream is any Python stream, e.g. an open file or sys.stdout.

See Ensemble and Structure from csb.bio.structure for details on these objects.

Classes
  AbstractResidueMapper
Defines the base interface of all residue mappers, used to align PDB ATOM records to the real (SEQRES) sequence of a chain.
  AbstractStructureParser
A base PDB structure format-aware parser.
  AsyncParseResult
  AsyncStructureParser
Wraps StructureParser in an asynchronous call.
  CombinedResidueMapper
The best of both worlds: attempts to map the residues using FastResidueMapper, but upon failure secures success by switching to RobustResidueMapper.
  CustomStructureProvider
A custom PDB data source.
  DegenerateID
Looks like a StandardID, except that the accession number may have arbitrary length.
  EntryID
Represents a PDB Chain identifier.
  FastResidueMapper
RegExp-based residue mapper.
  FileBuilder
Base abstract files for all structure file formatters.
  FileSystemStructureProvider
Simple file system based PDB data source.
  HeaderFormatError
  InvalidEntryIDError
  LegacyStructureParser
This is a customized PDB parser, which is designed to read both sequence and atom data from the ATOM section.
  PDBEnsembleFileBuilder
Supports serialization of NMR ensembles.
  PDBFileBuilder
PDB file format builder.
  PDBHeaderParser
Ultra fast PDB HEADER parser.
  PDBParseError
  RegularStructureParser
This is the de facto PDB parser, which is designed to read SEQRES and ATOM sections separately, and them map them.
  RemoteStructureProvider
Retrieves PDB structures from a specified remote URL.
  ResidueInfo
High-performance struct, which functions as a container for unmapped Atoms.
  ResidueMappingError
  RobustResidueMapper
Exhaustive residue mapper, which uses Needleman-Wunsch global alignment.
  SecStructureFormatError
  SeqResID
Same as a StandardID, but contains an additional underscore between te accession number and the chain identifier.
  SparseChainSequence
Sequence view for reference (SEQRES) or sparse (ATOM) PDB chains.
  StandardID
Standard PDB ID in the following form: xxxxY, where xxxx is the accession number (lower case) and Y is an optional chain identifier.
  StructureFormatError
  StructureNotFoundError
  StructureProvider
Base class for all PDB data source providers.
  UnknownPDBResidueError
Functions
AbstractStructureParser
StructureParser(structure_file, check_ss=False, mapper=None)
A StructureParser factory, which instantiates and returns the proper parser object based on the contents of the PDB file.
source code
str
find(id, paths)
Try to discover a PDB file for PDB id in paths.
source code
Structure
get(accession, model=None, prefix='https://files.rcsb.org/download/')
Download and parse a PDB entry.
source code
Variables
  PDB_AMINOACIDS = {'2AS': 'ASP', '3AH': 'HIS', '5HP': 'GLU', 'A...
  PDB_NUCLEOTIDES = {' M': 'Amino', 'A': 'Adenine', 'B': 'NotA'...
  __package__ = 'csb.bio.io'
Function Details

StructureParser(structure_file, check_ss=False, mapper=None)

source code 

A StructureParser factory, which instantiates and returns the proper parser object based on the contents of the PDB file.

If the file contains a SEQRES section, RegularStructureParser is returned, otherwise LegacyStructureParser is instantiated. In the latter case LegacyStructureParser will read the sequence data directly from the ATOMs.

Parameters:
Returns: AbstractStructureParser

find(id, paths)

source code 

Try to discover a PDB file for PDB id in paths.

Parameters:
  • id (str) - PDB ID of the entry
  • paths (list of str) - a list of directories to scan
Returns: str
path and file name on success, None otherwise

get(accession, model=None, prefix='https://files.rcsb.org/download/')

source code 

Download and parse a PDB entry.

Parameters:
  • accession (str) - accession number of the entry
  • model (str) - model identifier
  • prefix (str) - download URL prefix
Returns: Structure
object representation of the selected model

Variables Details

PDB_AMINOACIDS

Value:
{'2AS': 'ASP',
 '3AH': 'HIS',
 '5HP': 'GLU',
 'ACL': 'ARG',
 'AGM': 'ARG',
 'AIB': 'ALA',
 'ALA': 'ALA',
 'ALM': 'ALA',
...

PDB_NUCLEOTIDES

Value:
{'  M': 'Amino',
 'A': 'Adenine',
 'B': 'NotA',
 'C': 'Cytosine',
 'D': 'NotC',
 'DA': 'Adenine',
 'DC': 'Cytosine',
 'DG': 'Guanine',
...