How to calculate percentile based spread and other ProteinRanger features
Load pdb-files
Select the pdb file using the file chooser and click one of the two buttons
with either blue or red arrow. The order in which files are chosen doesn't matter.
You can see which two files are currently loaded in the list (Model #1 and Model #2).
Process
To proceed with default parameters, just click this button
The analysis normally
takes several seconds and upon conclusion this graph will appear showing the distance
distribution.
You can play with buttons of the graph window to pan/zoom,
switch to logarithmic scale, hide the multivariate distribution and
copy the data to the clipboard for transfer to another application.
Results
Calculation results are listed in the bottom half of the ProteinRanger window.
r.m.s.d.
First you have the traditional
r.m.s.d. For the example I tried for this tutorial
(the variable domain of an antibody in apo-form and in complex with hapten)
it is 0.77A - you can clearly see that this doesn't make much sense when you look
at the distance distribution.
p.b.s.
The next listed value is the
percentile based spread,
or p.b.s. In this example it is 0.20A, which is much more reasonable given the
distance distribution. Percentile based spread is roughly the 60th percentile,
which for Maxwell-Boltzmann distrbution corresponds to the root-mean-square variation
in atomic positions.
Single variance spread
This is obtained from the least
squares fit of the distance distribution to Maxwell-Boltzmann. This is usually a little
smaller than the p.b.s. (in this example it's 0.17A), perhaps because p.b.s. is somewhat
influenced by outliers (but to much smaller degree than the r.m.s.d.).
Single variance completeness
This estimates the fraction of the distance distribution
that the single variance distribution accounts for. Just an estimate, so don't take
it too seriously (it is derived from fit parameters of the Maxwell-Boltzmann distribution).
Estimated outlier shift
This is even more carelessly defined parameter. The
assumptions are that (i) your distribution contains two groups of atoms which have sharply different
underlying variation; (ii) the smaller variation amplitude and its fraction are correctly identified
by the fit to single Maxwell-Boltzmann distribution. Then the larger variation amplitude calculated
from known overall r.m.s.d., relative fractions of the two groups of atoms and smaller variation
amplitude. In this example the estimate comes out as 0.83A. As you can see, the overall
r.m.s.d. under such circumstances is effectively defined by 20% of the atoms, resulting
in the core variation overestimated roughly 5-fold by r.m.s.d. Most atoms in these two structures
are shifted by ~0.15A (which is probably simply due to limited precision of the structure
determination and a little bit of the "global" structural change induced by the ligand binding).
Some, however, shift by 0.8A, which may be considered the real conformational change.
Number of groups
ProteinRanger attempts to fit the distance distribution to multiple Maxwell-Boltzmann's. Of course,
this is mathematically an ill-conditioned problem, so results should be evaluated carefully.
Under the hood it tries to fit the complete distribution using as many groups as needed (but
less that user-defined maximum, 5 by default) to account for at least 95% of all the atoms (this
limit can be changed too). This parameter (you guessed right) says how many groups were used.
In our example, two groups were enough to meet the target.
Multivariance spread
Lists the individual sigmas for multiple groups. In this example the values are 0.16A and 0.41A
(check the screenshot to see that the second group account for the "wing" of the overall distribution).
Notice that the first is a little less than the "single variance spread" (which will be biased towards
higher values as it tries to account for the outliers), and the second is much smaller than predicted
outlier shift, once again emphasizing that r.m.s.d. is heavily influenced by outliers.
Multivariance completeness
In our example, the two fractions are roughly 75% and 25%. Of course, these are approximate. You
can probably say that in this case the second group of atoms is rather substantial (i.e. more than
just a percent or two), but that is probably all you can say. The total may exceed 100%.
Number of outliers
This is the absolute number of atoms not accounted for by multivariance fit. In this example it is
only 6 atoms, indicating that the vast majority of things can be accounted for by assuming just two
groups of atoms shifting by different amount. This parameter is not too reliable. It is obtained by
subracting the sum of analytically evaluated areas under individual weighted Maxwell-Boltzmann
distributions from unity. Because fitting by multiple distributions is an ill-conditioned problem,
so is this parameter. Sometimes it becomes negative, which the program will report.
Using only backbone for alignment
This is what the "Fit ..." checkboxes are for. By default the program uses Kabsch alignment with
all atoms, but you can ask it to use only the backbone (or only the side chains, which sounds
rediculous but was easy to implement so why not). My experience is that in most cases it doesn't make
that much difference, but sure sometimes it will. In the example used here the change in r.m.s.d. and
p.b.s. is in the thousandths of an angstrom.
Select matching atoms
By default, all atoms are matched. "Match ..." checkboxes provide certain limited control over which
atoms are matched. Currently, you can match only backbone, only side chains, only waters (this is distance
based so there is no need to renumber waters). You can also specify that only protein atoms are matched
(the default). In the example, r.m.s.d. for the backbone is reduced to 0.41A and p.b.s. to 0.17A.
Sometimes for really identical structures limiting match to the backbone brings the r.m.s.d. close to p.b.s.,
suggesting that the outliers are mostly disordered side chains.
Compare identical chains (e.g. NCS)
Use the processing option "match chains" for this. When a pdb file is imported, the list of chains
is placed in the text field. Modify the two list to select which chains to match. For example,
if the first text box says "AB", and the second "CD", then chain A from the first molecule will be matched
to the chain C of the second, "B" from the first to "D" from the second etc. Note that you need to
import the pdb file twice (red and blue arrow) to compare chains within the same structure.
Compare non-identical proteins
Use SSM-superposition processing option. This uses CCP4's superpose to run the SSM matching and get
the sequence alignment from the log file. This option currently assumes that superpose is available from
command line, so make sure that you have configured CCP4.