Nicholas Jewell MedicReS World Congress 2011

Good Biostatistical Report Practices
Being Honest in Data Analysis
Nicholas P. Jewell
Departments of Statistics &
School of Public Health (Biostatistics)
University of California, Berkeley
March 26, 2011
1

References
•  Bailar, J.C. and Mosteller, F. “Guidelines for
statistical reporting in articles for medical
journals: Amplifications and explanations,”
Annals of Internal Medicine, 108, 1988, 266-273.
•  Lang, T.A. and Secic M. “How to Report
Statistics in Medicine: Annotated Guidelines for
Authors, Editors, and Reviewers,” 2nd Edition,
Amrican College of Physicians, 2006.
•  Altman, D. “Statistical reviewing for medical
journals, Statistics in Medicine, 17, 1998,
2661-2674.
© Nicholas P. Jewell, 2011
4

Overview
•  Opportunistic Data Torturing
–  Multiple comparisons
–  Multiple analyses
–  Subgroup analyses
•  Procustean Data Torturing
–  Subgroup analyses (or lack thereof)
–  Meta-analyses
•  Sensitivity Analyses/Proxies/Assumptions/Honesty
•  Blinding the Statistician (who funds the statistician?)
•  Pre-specified Data Driven Analyses
•  Dissemination and Publication
5

What is the Recipe for a Good
Biostatistical Review?
The Four Hs
•  What basic questions should be asked?
–  GBRS Questions provide an excellent checklist
(guidelines etc)
–  How was the Data Collected (e.g. design/sample size
issues/loss to follow up)
–  How were the variables measured?
–  How was the data analyzed? (e.g. methods/
assumptions, software, pre-specified methods)
–  How were the results reported? (e.g. sensitivity,
interpretation, uncertainty, conflicts of interest)
6

Deeper Questions
The Five Ws
•  What is the population of interest?
•  What is the data-generating mechanism (e.g.
assumptions, missingness or data filtering)
•  What are the parameters of interest that describe the
data-generating mechanism? (e.g. causal inference)
•  Which of these parameters are identifiable and how can
they be estimated effectively?
•  Where can I find the data and the software?
7

The MIRA Trial
•  Gates Foundation study to determine the
effectiveness of a latex diaphragm in the reduction
of heterosexual acquisition of HIV among women
•  Two arm, randomized, controlled trial
•  Primary intervention: diaphragm and gel provision
to diaphragm arm (nothing to control arm).
•  Secondary Intervention: Intensive condom provision
and counseling given to both arms, plus treatment
of STIs
•  Trial is not blinded
•  5000 women seen for 18 months in three sites in
Zimbabwe and South Africa
8

MIRA Trial: Basic Intention to
Treat Results
•  Basic Intent-to-Treat Analysis:
– 158 new HIV infections in Diaphragm Arm
– 151 new HIV infections in Control Arm
•  ITT estimate of Relative Risk is 1.05 with a
95% CI of (0.84, 1.30)
•  End of story . . . . .?
9

MIRA Trial: Basic Intention to
Treat Results
•  However condom use differed between the
two arms:
– 53.5% in Diaphragm Arm (by visit)
– 85.1% in Control Arm (by visit)
•  Could this mean that the diaphragm was more
effective than it appeared from the basic
analysis?
•  To make sense of this—we’d like to
understand the role of condom use in
mediating the effect of treatment assignment
on HIV infection. 10

Most Important Public Health Questions
1. What is the effectiveness of providing study product in
environment of country-level standard condom
counseling?
(in environment of no condom counseling?)
2. How does providing study product alone compare to
consistent condom use alone in reducing HIV
transmission?
3. How does providing the study product alone compare
to unprotected sex, in terms of risk of HIV infection?
None of these questions are answered by basic ITT analysis
11

Reasons effectiveness may be different in environment
of intensive condom counseling vs. environment of
country-level standard condom counseling
A. In trials without blinding, effect of intensive
condom counseling may be different for
treatment arm and control arm.
B. Effects of intensive condom counseling may be
different for those who adhere to assigned
treatment and those who don’t.
C. Intensive condom counseling may itself affect
adherence to assigned treatment.
12

Reason A: In unblinded trials, effect of
intensive condom counseling may be
different for treatment arm and control arm.
Treatment
Arm
Control
Arm
Non-adherers (25%) Adherers (75%)
50 infections 40
50
Treatment
Arm
Control
Arm
Non-adherers (25%) Adherers (75%)
50 40
50 10
RR:170/200=0.85
RR:106/80=1.33
Study Product (or placebo) Users Condom Users
No Condom Counseling
Intensive Condom Counseling
Assume: Condoms 80% protective, Study Product 20% efficacy,
and Intensive Condom Counseling less effect in treatment arm
8
50 infections
4040
5050
8
10 10

MIRA Trial—Randomization of
Diaphragm
HIV
Infection
Tx Assignment
(Diaphragm use)
Condom
Use
14

Estimating Direct Effects: Adjusting for
a Mediator (condom use)
HIV
Infection
Treatment Group
(Diaphragm use)
Condom
Use
DIRECT EFFECT
•  We want to estimate the direct effect of diaphragm provision, at a set
level of condom use. (Petersen et al. 2006, Robins and Greenland 1992,
Pearl 2000, Rosenblum et al. 2009)
•  Still ITT interpretation
•  Requires Stronger Assumptions than basic ITT 15

Estimating Direct Effects: Stratify on
condom use?
(danger: condom use is not randomized)
HIV
Infection
Treatment Group
(Diaphragm use)
Condom
Use
Confounders
after stratification on condom use
HIV
Infection
Treatment Group
(Diaphragm use)
Confounders
randomization hasn’t ruled out confounding of direct effect!
Now have to adjust for confounders (but we are still ITT)

Why Stratification/Regression Gives
Biased Estimates When There Are
Confounders as Causal Intermediates
Treatment
Group (R)
HIV Status
(H)
Condom Use
(C)
Diaphragm Use
(D) (confounder)
Using Regression, if we control for D, we don’t get the direct effect that we want.
© Nicholas P. Jewell, 2009© Nicholas P. Jewell, 2009

Results of Direct Effects Analysis
•  Relative Risk of HIV infection between
Diaphragm arm and Control arm by end of Trial,
with Condom Use Fixed at “Never”: 0.59 (95%
CI: 0.26, 4.56)
•  Relative Risk of HIV infection between
Diaphragm arm and Control arm by end of Trial,
with Condom Use Fixed at “Always”: 0.96 (95%
CI: 0.59, 1.45)
Conclusion: No definitive evidence from direct
effects analysis that diaphragms prevent (or
don’t prevent) HIV.
18

Pain Trials with Self-Reporting
•  Pain is the most disturbing symptom of peripheral
neuropathy among diabetic patients
•  As many as 45% of patients with diabetes develop
peripheral neuropathies
•  Gabapentin was suggested as a treatment option
•  To evaluate the effect of Gabapentin, a randomized,
double-blind, placebo-controlled trial was conducted
•  165 patients with a 1- to 5-year history of pain attributed
to diabetic neuropathy enrolled at 20 different sites
19

Backonja et al. Trial in JAMA
•  The main outcome was daily pain severity as measured on an
11-point Likert scale (0 no pain- 10 worst possible pain)
•  Eighty-four patients received gabapentin, 81 received placebo
•  By intention-to-treat analysis, gabapentin-treated patients'
mean daily pain score (baseline 6.4, end point 3.9) was
signiﬁcantly lower (P<.001) than the placebo-treated patients'
score (baseline 6.5, end point 5.1)
•  Concluded that gabapentin appears to be efficacious for the
treatment of pain associated with diabetic peripheral
neuropathy
20

Handling Treatment-Related
Side Effects
•  Treat side effects singly by
removing those individuals
from the data analysis and
seeing if that changed the
results
21

Perception Effect
•  Patients have a perception about the treatment they
receive
•  In general we may think of the patients assigning a
degree of certainty (probability) to receiving the active
treatment, measured by a variable P
•  In most cases we do not observe the patient’s perception
on a continuous scale
€
P =
1
0
−1
⎧
⎨
⎪
⎩
⎪
convinced on treatment
not sure if on treatment or placebo
convinced on placebo
22

Application to Unmasking of Blinded
Treatment Assignment Due to Side Effects
Outcome
(pain score)
Treatment Group
Side effects Confounders
•  side effects are treatment-related and therefore potentially unmasking
•  outcome is measured subjectively
•  actual use of the drug (adherence) is a potential confounder
23

Data
•  Outcome:
mean pain score for the last 7
diary entries
•  Baseline covariates:
age, sex, race, height, weight,
baseline pain, baseline sleep
•  Treatment:
gabapentin, placebo
•  Perception:
changes from 0 to 1 when a
treatment-related side effect
occurs
24

Parameters of Interest
•  What is the treatment effect if all the patients
remained unknowledgeable about their
treatment? (Perception fixed at 0)
•  What is the treatment effect if all the
patients thought they were receiving the
active treatment? (Perception fixed at 1)
•  (The difference between these two parameters can be
thought of as a perception bias)
preferred ITT
parameter
25

Estimates
26

Subgroups—Interaction?
27

28

29

30

Reviewer Comments
•  Figure 4- The steroid finding is worthy of comment, even if the interaction is not
statistically significant (the power for interactions is low). It may well be that the
new agent has little benefit for those not treated with steroids,”
•  Figure 4 has been deleted as suggested by the editor. While there was not a
significant reduction in relative risk of confirmed PUBS in the subgroup not
taking steroids at baseline, VIGOR was only designed to detect a significant
reduction overall, and not within any specific subgroup. There were, however,
reductions in relative risk in this subgroup albeit not significant. Whenever many
subgroups are examined, it becomes likely that there will be no apparent effect
in some of them purely by chance. This is why the test for interaction, although
underpowered as you rightly pointed out, is the appropriate way to assess
subgroup effects. When there is no treatment-by-subgroup interaction, the more
reliable estimate of treatment effect within a subgroup is the effect in the entire
cohort [1].”
31

Meta-Analyses
•  Random or Fixed Effects
–  heterogeneity
–  Measure of association (RR or ER)
•  Zero-event trials/use of all evidence
–  Drug safety studies (Vioxx, Celebrex, Avandia, etc)
–  Continuity corrections
–  Study duration
32

Meta-Analyses and No Event
Studies
•  One study: 30 events under Tx, 3 under
placebo (5000 individuals in both arms)
– Risk difference of 0.0054 with exact 95%
confidence interval of (0.0032, 0.0076), p < 0.0001
•  Now add three (short duration) studies,all with
5000 in both arms, no events in either arms in
all cases
– exact 95% confidence interval of (-0.0044, 0.0001),
p = 0.46
33

Multiplicity
34

References
•  Ioannidis, J.P.A. “Why most published research findings are false,” PLoS Med, 2(8),
2005, e124.
•  Siegfried, T. “Odds are, it’s wrong,” Science News, March 27, 2010, 26-29.
•  Bailar, J.C. III., ”Science. Statistics, and deception," Annals of Internal Medicine, 104,
1986, 259-260.
•  Mills. J.S., “Data torturing,” New England Journal of Medicine, 329, 1993, 1196-1199.
•  Young, J., Hubbard, A.E., Eskenazi, B. and Jewell, N. P., “A machine-learning
algorithm for estimating and ranking the impact of environmental risk factors in
exploratory epidemiological studies”, to appear, 2011.
•  A.E. Hubbard, Ahern, J., Fleischer, N.L., Lippman, S.A., Jewell, N.P., Bruckner, T.,
Satariano, W.A., “To GEE or not to GEE: Comparing estimating function and
likelihood-based methods for estimating the associations between neighboring risk
factors and health,” Epidemiology, 21, 2010, 467-474.
•  Van der Laan, M., Hubbard, A., Jewell, N.P., ”Semi-parametric models versus faith-
based inference," to appear, Epidemiology, 21, 2010, 479-481.
35

Nicholas Jewell MedicReS World Congress 2011

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Nicholas Jewell MedicReS World Congress 2011

Ähnlich wie Nicholas Jewell MedicReS World Congress 2011 (20)

Mehr von MedicReS

Mehr von MedicReS (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Nicholas Jewell MedicReS World Congress 2011