The csb.bio.structure
module defines some of the most fundamental
abstractions in the library: Structure
, Chain
, Residue
and
Atom
. Instances of these objects may exist independently and that is
perfectly fine, but usually they are part of a Composite
aggregation.
The root node in this Composite is a Structure
(or Ensemble
).
Structure
-s are composed of Chain
-s, and each Chain
is a collection
of Residue
-s. The leaf nodes are Atom
-s.
All of these objects implement the base AbstractEntity
interface.
Therefore, every node in the Composite can be transformed:
>>> r, t = [rotation matrix](rotation-matrix), [translation vector](translation-vector)
>>> entity.transform(r, t)
and it knows its immediate children:
>>> entity.items
<iterator> # over all immediate child entities
If you want to traverse the complete Composite tree, starting at arbitrary
level, and down to the lowest level, use one of the CompositeEntityIterators
.
Or just call AbstractEntity.components()
:
>>> entity.components()
<iterator> # over all descendants, of any type, at any level
>>> entity.components(klass=Residue)
<iterator> # over all Residue descendants
Some of the inner objects in this hierarchy behave just like dictionaries (but are not):
>>> structure.chains['A']('A') # access chain A by ID
<Chain A: Protein>
>>> structure['A']('A') # the same
<Chain A: Protein>
>>> residue.atoms['CS']('CS')
<Atom: CA> # access an atom by its name
>>> residue.atoms['CS']('CS')
<Atom: CA> # the same
Others behave like list collections:
>>> chain.residues[10](10) # 1-based access to the residues in the chain
<ProteinResidue [10](10): PRO 10>
>>> chain[10](10) # 0-based, list-like access
<ProteinResidue [11](11): GLY 11>
Step-wise building of Ensemble
-s, Chain
-s and Residue
-s is supported
through a number of append methods, for example:
>>> residue = ProteinResidue(401, ProteinAlphabet.ALA)
>>> s.chains['A']('A').residues.append(residue)
See EnsembleModelsCollection
, StructureChainsTable
,
ChainResiduesCollection
and ResidueAtomsTable
in our API docs for more
details.
Some other objects in this module of potential interest are the
self-explanatory SecondaryStructure
and TorsionAngles
.
CSB comes with a number of PDB structure parsers, format builders and
database providers, all defined in the csb.bio.io.wwpdb
package.
The most basic usage is:
>>> parser = StructureParser('structure.pdb')
>>> parser.parse_structure()
<Structure> # a Structure object (model)
or if this is an NMR ensemble:
>>> parser.parse_models()
<Ensemble> # an Ensemble object (collection of alternative Structure-s)
This module introduces a family of PDB file parsers. The common interface
of all parsers is defined in AbstractStructureParser
. This class has
several implementations:
RegularStructureParser
- handles normal PDB files with SEQRES fieldsLegacyStructureParser
- reads structures from legacy or malformed PDB
files, which are lacking SEQRES records (initializes all residues from
the ATOMs instead)PDBHeaderParser
- reads only the headers of the PDB files and produces
structures without coordinates. Useful for reading metadata (e.g.
ccession numbers or just plain SEQRES sequences) with minimum overheadUnless you have a special reason, you should use the StructureParser
factory, which returns a proper AbstractStructureParser
implementation,
depending on the input PDB file. If the input file looks like a regular
PDB file, the factory returns a RegularStructureParser
, otherwise it
instantiates LegacyStructureParser
. StructureParser
is in fact an
alias for AbstractStructureParser.create_parser
.
Writing your own, customized PDB parser is easy. Suppose that you are
trying to parse a PDB-like file which misuses the charge column to store
custom info. This will certainly crash AbstractStructureParser
(for good), but you can create your own parser as a workaround. All you
need to do is override the virtual _read_charge_field
hook method:
class CustomParser(RegularStructureParser):
def _read_charge(self, line):
try:
return super(CustomParser, self)._read_charge(line)
except StructureFormatError:
return None
Another important abstraction in this module is StructureProvider
.
It has several implementations which can be used to retrieve PDB
Structures from various sources: file system directories, remote
URLs, etc. You can easily create your own provider as well. See
StructureProvider
for details.
Finally, this module gives you some FileBuilder
-s, used for text
serialization of Structure
-s and Ensemble
-s:
>>> builder = PDBFileBuilder(stream)
>>> builder.add_header(structure)
>>> builder.add_structure(structure)
where stream is any Python stream, e.g. an open file or sys.stdout.