3. QSAR: The Setting
Quantitative structure-activity relationships are used when
• there is little or no receptor information
• but there are measured activities of (many) compounds
4. From Structure to Property
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
0
1
2
3
4
5
6
7
8
9
1 3 5 7 9 11 13 15
EC50
5. From Structure to Property
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
LD50
6. From Structure to Property
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
7. QSAR: Which Relationship?
Quantitative structure-activity relationships correlate
chemical/biological activities with structural features or
atomic, group or molecular properties.
within a range of structurally similar compounds
8. Quantitative measurements for biological &
physicochemical properties
Most common properties studied
• Hydrophobicity of the molecule
• Hydrophilicity of substituents
• Electronic properties of substituents
• Steric properties of substituents
9. Partition Coefficient P = [Drug in octanol]
[Drug in water]
High P High hydrophobicity
Hydrophobicity of the Molecule
10. •Activity of drugs is often related to P
e.g. binding of drugs to serum albumin
(straight line - limited range of log P)
•Binding increases as log P increases
•Binding is greater for hydrophobic drugs
Log 1
C
= 0.75 logP+ 2.30
Log (1/C)
Log P
. .
.
.. .
...
0.78 3.82
Hydrophobicity of the Molecule
11. Example 2 General anaesthetic activity of ethers
(parabolic curve - larger range of log P values)
Optimum value of log P for anaesthetic activity = log Po
Log
1
C
= -0.22(logP)2 + 1.04 logP + 2.16
Log P o
Log P
Log (1/C)
Hydrophobicity of the Molecule
12. QSAR equations are only applicable to compounds in the same
structural class (e.g. ethers)
• However, log Po is similar for anaesthetics of different
structural classes (ca. 2.3)
• Structures with log P ca. 2.3 enter the CNS easily
(e.g. potent barbiturates have a log P of approximately 2.0)
• Can alter log P value of drugs away from 2.0 to avoid CNS side
effects
Hydrophobicity of the Molecule
13. Example:
Benzene
(Log P = 2.13)
Chlorobenzene
(Log P = 2.84)
Benzamide
(Log P = 0.64)
Cl CONH2
pCl = 0.71 pCONH = -1.492
Hydrophobicity of Substituents (p)
- the substituent hydrophobicity constant
• A measure of a substituent’s hydrophobicity relative to hydrogen
• Tabulated values exist for aliphatic and aromatic substituents
• Measured experimentally by comparison of log P values with log P of
parent structure
• Positive values imply substituents are more hydrophobic than H
• Negative values imply substituents are less hydrophobic than H
14. Example:
meta-Chlorobenzamide
Cl
CONH2
Log P(theory) = log P(benzene) + pCl + pCONH
= 2.13 + 0.71 - 1.49
= 1.35
Log P (observed) = 1.51
2
Hydrophobicity of Substituents
- the substituent Hydrophobicity constant (p)
• The value of p is only valid for parent structures
• It is possible to calculate log P using p values
• A QSAR equation may include both P and p.
• P measures the importance of a molecule’s overall hydrophobicity
(relevant to absorption, binding etc)
• p identifies specific regions of the molecule which might interact with
hydrophobic regions in the binding site
15. X=H K H = Dissociation constant= [PhCO 2-]
[PhCO 2H]
+CO2H CO2 H
X X
Electronic Effects
Hammett Substituent Constant (s)
• The constant (s) is a measure of the e-withdrawing or e-donating
influence of substituents
• It can be measured experimentally and tabulated (e.g. s for
aromatic substituents is measured by comparing the dissociation
constants of substituted benzoic acids with benzoic acid)
16. +
X = electron
withdrawing
group
X
CO2CO2H
X
H
s X = log
K X
K H
= logK X - logK H
Positive value
Hammett Substituent Constant (s)
• X= electron withdrawing group (e.g. NO2)
• Charge is stabilised by X
• Equilibrium shifts to right
• KX > KH
17. s X = log
K X
K H
= logK X - logK H
Charge destabilised Equilibrium shifts to left
KX < KH
Negative value
+
X = electron
withdrawing
group
X
CO2CO2H
X
H
Hammett Substituent Constant (s)
• X= electron donating group (e.g. CH3)
18. • s value depends on inductive and resonance effects
• s value depends on whether the substituent is meta or para
• ortho values are invalid due to steric factors
Hammett Substituent Constant (s)
19. DRUG
N
O
O
meta-Substitution
EXAMPLES: sp (NO2) =0.78 sm (NO2) =0.71
e-withdrawing (inductive effect only)
e-withdrawing (inductive + resonance effects)
N
O O
DRUG DRUG
N
OO
N
O O
DRUG DRUG
N
OO
para-Substitution
Hammett Substituent Constant (s)
20. sm (OH) =0.12 sp (OH) =-0.37
e-withdrawing (inductive effect only)
e-donating by resonance more important than inductive effect
EXAMPLES:
DRUG
OH
meta-Substitution
DRUG
OH
DRUG DRUG
OH OH
DRUG
OH
para-Substitution
Hammett Substituent Constant (s)
22. Electronic Factors R & F
• R - Quantifies a substituent’s resonance effects
• F - Quantifies a substituent’s inductive effects
23. X= electron donating Rate sI = -ve
X= electron withdrawing Rate sI = +ve
+
Hydrolysis
HOMe
CH2 OMe
C
O
X CH2 OH
C
O
X
Aliphatic electronic substituents
• Defined by sI
• Purely inductive effects
• Obtained experimentally by measuring the rates of hydrolyses of
aliphatic esters
• Hydrolysis rates measured under basic and acidic conditions
• Basic conditions: Rate affected by steric + electronic factors
Gives sI after correction for steric effect
• Acidic conditions: Rate affected by steric factors only (see Es)
24. Steric Factors
Taft’s Steric Factor (Es)
• Measured by comparing the rates of hydrolysis of substituted
aliphatic esters against a standard ester under acidic conditions
Es = log kx - log ko
kx represents the rate of hydrolysis of a substituted ester
ko represents the rate of hydrolysis of the parent ester
• Limited to substituents which interact sterically with the tetrahedral
transition state for the reaction
• Cannot be used for substituents which interact with the transition
state by resonance or hydrogen bonding
• May undervalue the steric effect of groups in an intermolecular
process (i.e. a drug binding to a receptor)
25. Steric Factors
Molar Refractivity (MR) - a measure of a substituent’s volume
MR =
(n 2
-1)
(n 2
- 2)
x
mol. wt.
density
Correction factor
for polarisation
(n=index of
refraction)
Defines volume
Steric Factors
Molar Refractivity (MR)
26. Verloop Steric Parameter
- calculated by software (STERIMOL)
- gives dimensions of a substituent
- can be used for any substituent
L
B 3
B 4
B4 B3
B2
B1
C
O
O
H
H O C O
Example - Carboxylic acid
Steric Factors
27. Free Energy of Binding and
Equilibrium Constants
The free energy of binding is related to the reaction
constants of ligand-receptor complex formation:
DGbinding = –2.303 RT log K
= –2.303 RT log (kon / koff)
Equilibrium constant K
Rate constants kon (association) and koff (dissociation)
28. Concentration as Activity Measure
• A critical molar concentration C that produces the biological
effect is related to the equilibrium constant K
• Usually log (1/C) is used (c.f. pH)
• For meaningful QSARs, activities need to be spread out over
at least 3 log units
29. Free Energy of Binding
DGbinding = DG0 + DGhb + DGionic + DGlipo + DGrot
DG0 entropy loss (translat. + rotat.) +5.4
DGhb ideal hydrogen bond –4.7
DGionic ideal ionic interaction –8.3
DGlipo lipophilic contact –0.17
DGrot entropy loss (rotat. bonds) +1.4
(Energies in kJ/mol per unit feature)
30. Basic Assumption in QSAR
The structural properties of a compound contribute
in a linearly additive way to its biological activity provided there
are no non-linear dependencies of transport or binding on some
properties
31. An Example: Capsaicin Analogs
X EC50(mM) log(1/EC50)
H 11.80 4.93
Cl 1.24 5.91
NO2 4.58 5.34
CN 26.50 4.58
C6H5 0.24 6.62
NMe2 4.39 5.36
I 0.35 6.46
NHCHO ? ?
X
N
H
O
OH
MeO
32. An Example: Capsaicin Analogs
X log(1/EC50) MR p s Es
H 4.93 1.03 0.00 0.00 0.00
Cl 5.91 6.03 0.71 0.23 -0.97
NO2 5.34 7.36 -0.28 0.78 -2.52
CN 4.58 6.33 -0.57 0.66 -0.51
C6H5 6.62 25.36 1.96 -0.01 -3.82
NMe2 5.36 15.55 0.18 -0.83 -2.90
I 6.46 13.94 1.12 0.18 -1.40
NHCHO ? 10.31 -0.98 0.00 -0.98
MR = molar refractivity (polarizability) parameter; p = hydrophobicity parameter;
s= electronic sigma constant (para position); Es = Taft size parameter
33. An Example: Capsaicin Analogs
X
N
H
O
OH
MeO
log(1/EC50) = -0.89 + 0.019 * MR +
0.23 * p + (-10.31) * s + (-0.14) * Es
34. An Example: Capsaicin Analogs
X EC50(mM) log(1/EC50)
H 11.80 4.93
Cl 1.24 5.91
NO2 4.58 5.34
CN 26.50 4.58
C6H5 0.24 6.62
NMe2 4.39 5.36
I 0.35 6.46
NHCHO ? ?
X
N
H
O
OH
MeO
36. Free-Wilson Analysis
• The biological activity of the parent structure is measured & compared with
the activity of analogues bearing different substituents
• An equation is derived relating biological activity to the presence or absence
of particular substituents
• Activity = k1X1 + k2X2 +.…knXn + Z
• Xn is an indicator variable which is given the value 0 or 1 depending on
whether the substituent (n) is present or not
• The contribution of each substituent (n) to activity is determined by the value
of kn
• Z is a constant representing the overall activity of the structures studied
37. Free-Wilson Analysis
log (1/C) = S aixi + m
xi: presence of group i (0 or 1)
ai: activity group contribution of group i
m: activity value of unsubstituted compound
39. Advantages
• No need for physicochemical constants or tables
• Useful for structures with unusual substituents
• Useful for quantifying the biological effects of molecular features that
cannot be quantified or tabulated by the Hansch method
Disadvantages
• A large number of analogues need to be synthesised to represent each
different substituent and each different position of a substituent
• It is difficult to rationalise why specific substituents are good or bad for
activity
• The effects of different substituents may not be additive
• (e.g. intramolecular interactions)
Free-Wilson Analysis
40. Hansch Analysis
Drug transport and binding affinity depend nonlinearly on lipophilicity:
log (1/C) = a (log P)2 + b log P + c Ss + k
P: n-octanol/water partition coefficient
s: Hammett electronic parameter
a, b, c: regression coefficients
k: constant term
41. Hansch Analysis
+ Fewer regression coefficients needed for correlation
+ Interpretation in physicochemical terms
+ Predictions for other substituent's possible
42. Molecular Descriptors
• Simple counts of features, e.g. of atoms, rings, H-bond donors,
molecular weight
• Physicochemical properties, e.g. polarisability, hydrophobicity
(logP), water-solubility
• Group properties, e.g. Hammett and Taft constants, volume
• 2D Fingerprints based on fragments
• 3D Screens based on fragments
43. 2D Fingerprints
Br
N
H
O
OH
MeO
C N O P S X F Cl Br I Ph CO NH OH Me Et Py CHO SO C=C CΞC C=N Am Im
1 1 1 0 0 1 0 0 1 0 1 1 1 1 1 0 0 0 0 1 0 0 1 0
44. • Molecular docking strategies & different
methods of
• docking. Mechanism based drug design
including quantum mechanics,
• molecular mechanics and molecular modeling
46. Molecular Docking
What? How? Why?
• In silico (computer-based) approach
• Identification of bound conformation
• Prediction of binding affinity
• Docking vs. (Virtual) Screening
47. Two “Modes”: – Respective:
• How does your molecule bind?
• What is its mode of action?
• What might be the reaction mechanism?
Molecular Docking
What? How? Why?
48. – Prospective:
• What compounds might be good leads?
• What compound(s) should you make?
Molecular Docking
What? How? Why?
49. Docking Basics
• Initially – Receptor (protein) and ligand
rigid
• Most current approaches – Receptor rigid,
ligand flexible
• Advanced approaches – Receptor (to a
degree) and ligand flexible Fast, Simple
Slow, Complex
FAST/ SIMPLE
SLOW /COMPLEX
51. Stages of Docking
• Pose generation
– Place the ligand in the binding site
– Generally well solved
• Pose selection
– Determine the proper pose
– The hard part
52. Pose Generation
• Rigid docking with a series of conformers
– Most techniques use this approach
– Most techniques will generate the conformers internally rather
than using conformers as inputs
• Incremental construction (FlexX)
• – Split ligand into base fragment and side-chains – Place base
– Add side-chains to grow, scoring as you grow
• In general, use a very basic vdW shape function
• Often see variability with input conformers
54. Pose Selection/Scoring
• Where most of the current research focused
• More sophisticated scoring functions take longer
– Balance need for speed vs. need for accuracy
– Virtual screening needs to be very fast
– Studies on single compounds can be much slower
– Can do multi-stage studies
56. Principal Component Analysis (PCA)
• Many (>3) variables to describe objects = high dimensionality of
descriptor data
• PCA is used to reduce dimensionality
• PCA extracts the most important factors
(principal components or PCs) from the data
• Useful when correlations exist between descriptors
• The result is a new, small set of variables (PCs) which explain most of
the data variation
59. Different Views on PCA
• Statistically, PCA is a multivariate analysis technique closely
related to eigenvector analysis
• In matrix terms, PCA is a decomposition of matrix X
into two smaller matrices plus a set of residuals:
X = TPT + R
• Geometrically, PCA is a projection technique in which X is
projected onto a subspace of reduced dimensions
60. Partial Least Squares (PLS)
y1 = a0 + a1x11 + a2x12 + a3x13 + … + e1
y2 = a0 + a1x21 + a2x22 + a3x23 + … + e2
y3 = a0 + a1x31 + a2x32 + a3x33 + … + e3
…
yn = a0 + a1xn1 + a2xn2 + a3xn3 + … + en
Y = XA + E
(compound 1)
(compound 2)
(compound 3)
…
(compound n)
X = independent variables
Y = dependent variables
61. PLS – Cross-validation
• Squared correlation coefficient R2
• Value between 0 and 1 (> 0.9)
• Indicating explanative power of regression equation
• Squared correlation coefficient Q2
• Value between 0 and 1 (> 0.5)
• Indicating predictive power of regression equation
With cross-validation:
62. PCA vs PLS
• PCA:
The Principle Components describe the variance
in the independent variables (descriptors)
• PLS:
The Principle Components describe the variance
in both the independent variables (descriptors)
and the dependent variable (activity)
63. Comparative Molecular Field Analysis
(CoMFA)
• Set of chemically related compounds
• Common substructure required
• 3D structures needed (e.g., Corina-generated)
• Bioactive conformations of the active compounds
are to be aligned
67. CoMFA Model Derivation
Van der Waals field
(probe is neutral carbon)
Evdw = S (Airij
-12 - Birij
-6)
Electrostatic field
(probe is charged atom)
Ec = S qiqj / Drij
• Molecules are positioned in a regular grid
according to alignment
• Probes are used to determine the molecular field:
69. CoMFA Pros and Cons
+ Suitable to describe receptor-ligand interactions
+ 3D visualization of important features
+ Good correlation within related set
+ Predictive power within scanned space
– Alignment is often difficult
– Training required