SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Validity and Reliability
in Assessment
This work is the summarizations
.Of the previous efforts done by great educators
A humble presentation by Dr Tarek Tawfik Amin
Measurement experts (and many educators) believe
that every measurement device should possess certain
qualities.
The two most common technical concepts in
measurement are reliability and validity.
Reliability Definition (Consistency)




The degree of consistency between two measures of
the same thing. (Me hre ns and Le hman, 1 9 8 7 (
The measure of how stable, dependable,
trustworthy, and consistent a test is in measuring the
same thing each time (Wo rthe n e t al. , 1 9 9 3)
(Validity definition (Accuracy



Truthfulness: Does the test measure what it purports
to measure? the extent to which certain inferences
can be made from test scores or other measurement.
(M hre ns and Le hman, 1 9 8 7 )
e



The degree to which they accomplish the purpose
for which they are being used. (Wo rthe n e t al. , 1 9 9 3(


The term “ validity” refers to the degree to which the
conclusions (interpretations) derived from the
results of any assessment are “ well-grounded or
justifiable; being at once relevant and meaningful.”
(M ssick S. 1 9 9 5)
e



Content” : related to objectives and their sampling.



“ Construct” : referring to the theory underlying the target.



“ Criterion” : related to concrete criteria in the real world. It can be concurrent or
predictive.



“ Concurrent” : correlating high with another measure already validated.



“ Predictive” : Capable of anticipating some later measure.



“ Face” : related to the test overall appearance.

The usual concepts of validity.
Sources of validity in assessment

Old concept
Sources of validity in assessment

Usual concepts of validity
All assessments in medical education require
evidence of validity to be interpreted meaningfully.
In contemporary usage, all validity is construct
validity, which requires multiple sources of evidence;
construct validity is the whole of validity, but has
multiple facets. (Do wning S 20 0 3)
( Construct (Concepts, ideas and notions
- Nearly all assessments in medical education, deal with constructs:

intangible collections of abstract concepts and principles which
are inferred from behavior and explained by educational or
psychological theory.
- Educational achievement is a construct, inferred from performance
on assessments; written tests over domain of knowledge, oral
examinations over specific problems or cases in medicine, or
OSCE, history-taking or communication skills.
- Educational ability or aptitude is another example of construct –
a construct that may be even more intangible and abstract than
achievement. (Do wning 20 0 3)
Sources of validity in assessment










Content: do instrument items completely represent the
construct?
Response process: the relationship between the intended
construct and the thought processes of subjects or observers
Internal structure: acceptable reliability and factor structure
Relations to other variables: correlation with scores from
another instrument assessing the same construct
Consequences: do scores really make a difference?

Downing 2003, Cook S 2007
Sources of validity in assessment
Content

Response process

- Examination
blueprint

- Student format
familiarity

- Representativeness
of test blueprint to
achievement

- Quality control of
electronic

domain
- Test specification

- Key validation of
preliminary scores

- Match of item
content to test
specifications

- Accuracy in
combining different
formats scores

- Representativeness
of items to domain

- Quality
control/accuracy of
final
scores/marks/grades

- Logical/empirical
relationship of content
tested domain
- Quality of test
questions
- Item writer
qualifications
- Sensitivity review

scanning/scoring

- Subscore/subscale
analyses:
1-Accuracy of
applying pass-fail
decision rules to
scores
2-Quality control of
score reporting

Internal structure
• Item analysis data:
1. Item
difficulty/discriminati
on
2. Item/test
characteristic curves
3. Inter-item
correlations
4. Item-total
correlations (PBS)
• Score scale
reliability
• Standard errors of
measurement (SEM)
• Generalizability
• Item factor analysis
• Differential Item
Functioning (DIF)

Relationship to
other variables

Consequences

• Correlation with
other relevant
variables (exams)

• Impact of test
scores/results

• Convergent
correlations -

• Consequences on
learners/future

internal/external:
- Similar tests
• Divergent
correlations:
internal/external

on students/society

learning
• Reasonableness of
method of establishing
pass-fail (cut) score

- Dissimilar measures

• Pass-fail
consequences:

• Test-criterion
correlations

1. P/F Decision
reliability-accuracy

• Generalizability of
evidence

2. Conditional
standard error of
measurement
• False +ve/-ve
Sources of validity

Internal Structure-1

Statistical e vide nce o f the hypo the size d re latio nship
:be twe e n te st ite m sco re s and the co nstruct
(:Reliability (internal consistency - 1
􀂄
Test scale reliability
􀂄
Rater reliability
􀂄
Generalizability
Item analysis data- 2
􀂄
Item difficulty and discrimination
􀂄
MCQ option function analysis
􀂄
Inter-item correlations
Scale factor structure -3
Dimensionality studies- 4
Differential item functioning (DIF studies- 5
)
Sources of validity
Relationship to other variables-2

Statistical e vide nce o f the hypo the size d re latio nship be twe e n
te st sco re s and the co nstruct
�
Criterion-related validity studies

�Correlations between test scores/
subscores
and other measures
�Convergent-Divergent studies
Keys of reliability assessment






“ Stability” : related to time consistency.
“ Internal” : related to the instruments.
“ Inter-rater” : related to the examiners’ criterion.
“ Intra-rater” : related to the examiner’ s criterion.

Validity and reliability are closely related.
A test cannot be considered valid unless the measurements resulting from it are
reliable. Likewise, results from a test can be reliable and not necessarily valid.
Keys of reliability assessment

Validity and reliability are closely related.
A test cannot be considered valid unless the measurements resulting from it are
reliable. Likewise, results from a test can be reliable and not necessarily valid.
Sources of reliability in assessment
Source of
reliability
Internal
consistency

Description

M
easures

Definitions

Comments
- Rarely used because
the “effective”
instrument is only half
as long as the actual
instrument; SpearmanBrown† formula can
adjust

- Do all the items on an
instrument measure the same
construct? (If an instrument
measures more than one
construct, a single score will
not measure either construct
very well.

Split-half reliability

- Correlation between
scores on the first and
second halves of a
given instrument

- We would expect high
correlation between item
scores measuring a single
construct.

Kuder-Richardson 20

-Assumes all items are
- Similar concept to
split-half, but accounts equivalent, measure a
for all items
single construct, and
have dichotomous
responses

- Internal consistency is
probably the most commonly
reported reliability statistic, in
part because it can be
calculated after a single
administration of a single
instrument.
- Because instrument halves
can be considered “alternate
forms,” internal consistency
can be viewed as an estimate
of parallel forms reliability.

Cronbach’ s alpha

- A generalized form
of the
Kuder-Richardson
formulas

- Assumes all items
are equivalent and
measure a single
construct; can be used
with dichotomous or
continuous data
Sources of reliability in assessment
Source of
reliability
Temporal
stability

Description

M
easures

Definitions

Comments

Does the instrument produce
similar results when
administered a second

Test-retest reliability

Administer the instrument
to the same person at
different times

Usually quantified
using correlation
(eg, Pearson’ s r)

Administer different
versions of the instrument
to the same individual at
the same or

Usually quantified
using correlation
(eg, Pearson’ s r)

time?
Parallel forms

Do different versions of the
“same” instrument produce
similar results?

Alternate forms
reliability

different times

Agreement
(inter-rater

When using raters, does it
matter who does the rating?

Percent agreement

reliability)

Is one rater’ s score similar to
another’ s?

Kappa

Phi
Kendall’s tau
Intraclass correlation
coefficient

%identical responses
Simple correlation
Agreement corrected for
chance
Agreement on ranked data
ANOVA to estimate how
well ratings from different
raters coincide

Does not account
for agreement that
would occur by
chance
Does not account
for chance
Sources of reliability in assessment
Source of
reliability
Generalizability
theory

Description

Measures

How much of the error in
Generalizability
measurement is the result
coefficient
of each factor (eg, item,
item grouping, subject,
rater, day of
administration) involved in
the measurement process?

Definitions
Complex model that
allows estimation of
multiple sources of
error

Comments
As the name implies,
this elegant method is
“generalizable” to
virtually any setting in
which reliability is
assessed;
For example, it can
determine the relative
contribution of
internal consistency
and
inter-rater reliability to
the overall reliability
of a given instrument

. I ms” are the individual q ue stio ns o n the instrume nt*“
te
.The “ co nstruct” is what is be ing me asure d, such as kno wle dg e , attitude , skill, o r sympto m in a spe cific are a
The Spe arman B wn “ pro phe cy” fo rmula allo ws o ne to calculate the re liability o f an instrume nt’ s sco re s
ro
(.whe n the numbe r o f ite ms is incre ase d (o r de cre ase d

(Co o k and B ckman Validity and Re liability o f Psycho me tric I
e
nstrume nts (20 0 7
Keys of reliability assessment
Keys of reliability assessment

Different types of assessments require different kinds
of reliability
Written MCQs
􀂄
Scale reliability
Oral Exams
􀂄
Internal consistency
􀂄
Rater reliability
􀂄
Generalizability Theory
Written—Essay
Observational Assessments
􀂄
Inter-rater agreement
􀂄
Rater reliability
􀂄
Generalizability Theory
􀂄
Inter-rater agreement

􀂄
Generalizability Theory
Performance Exams (OSCEs)
􀂄
Rater reliability
􀂄
Generalizability Theory
Keys of reliability assessment

?R
eliability – H high
ow

􀂆Very high-stakes: > 0.90 + (L
icensure
(tests
( 􀂆M
oderate stakes: at least ~0.75 (OSCE
( 􀂆L stakes: >0.60 (Quiz
ow
Keys of reliability assessment
?How to increase reliability

For Written tests
􀂄 objectively scored formats
Use
􀂄 least 35-40 MCQs
At
􀂄
MCQs that differentiate high-low students
For performance exams
􀂄 least 7-12 cases
At
􀂄
Well trained SPs
􀂄
Monitoring, QC
Observational Exams
( 􀂄
Lots of independent raters (7-11
􀂄
Standard checklists/rating scales
􀂄
Timely ratings
Conclusion

Validity = Meaning
􀂄
Evidence to aid interpretation of assessment data
􀂄
Higher the test stakes, more evidence needed
􀂄
Multiple sources or methods
􀂄
Ongoing research studies
Reliability
􀂄
Consistency of the measurement
􀂄
One aspect of validity evidence
􀂄
Higher reliability always better than lower
References


National Board of Medical Examiners. United States Medical Licensing



Exam Bulletin. Produced by Federation of State Medical Boards of



the United States and the National Board of Medical Examiners.



Available at: http://www.usmle.org/bulletin/2005/testing.htm.



Norcini JJ, Blank LL, Duffy FD, Fortna GS. The mini-CEX: a method

for assessing clinical skills. Ann Intern Med. 2003;138:476-481.


Litzelman DK, Stratos GA, Marriott DJ, Skeff KM. Factorial validation

of a widely disseminated educational framework for evaluating
clinical teachers. Acad Med. 1998;73:688-695.


Merriam-Webster Online. Available at: http://www.m-w.com/.



Sackett DL, Richardson WS, Rosenberg W, Haynes RB. Evidence-

Based Medicine: How to Practice and Teach EBM. Edinburgh: Churchill Livingstone; 1998.


Wallach J. Interpretation of Diagnostic Tests. 7th ed. Philadelphia:

Lippincott Williams & Wilkins; 2000.


Beckman TJ, Ghosh AK, Cook DA, Erwin PJ, Mandrekar JN. How reliable are assessments of clinical
teaching? A review of the published instruments. J Gen Intern Med. 2004;19:971-977.



Shanafelt TD, Bradley KA, Wipf JE, Back AL. Burnout and selfreported

patient care in an internal medicine residency program. Ann Intern Med. 2002;136:358-367.


Alexander GC, Casalino LP, Meltzer DO. Patient-physician communication about out-of-pocket costs.
JAMA. 2003;290:953-958.
Reference
s

- Pittet D, Simon A, Hugonnet S, Pessoa-Silva CL, Sauvan V, Perneger TV. Hand hygiene among
physicians: performance, beliefs, and perceptions. Ann Intern Med. 2004;141:1-8.
- Messick S. Validity. In: Linn RL, editor. Educational Measurement, 3rd Ed. New York: American
Council on Education and Macmillan; 1989.
- Foster SL, Cone JD. Validity issues in clinical assessment. Psychol Assess. 1995;7:248-260.
American Educational Research Association, American Psychological Association, National Council
on Measurement in Education. Standards for Educational and Psychological Testing. Washington,
DC:
American Educational Research Association; 1999.
- Bland JM, Altman DG. Statistics notes: validating scales and indexes. BMJ. 2002;324:606-607.
- Downing SM. Validity: on the meaningful interpretation of assessment
data. Med Educ. 2003;37:830-837. 2005 Certification Examination in Internal Medicine Information
Booklet. Produced by American Board of Internal Medicine. Available
at: http://www.abim.org/resources/publications/IMRegistrationBook. pdf.
- Kane MT. An argument-based approach to validity. Psychol Bull. 1992;112:527-535.
- Messick S. Validation of inferences from persons’ responses and performances as scientific
inquiry into score meaning. Am Psychol. 1995;50:741-749.
- Kane MT. Current concerns in validity theory. J Educ Meas. 2001; 38:319-342. American
Psychological Association. Standards for Educational and Psychological Tests and Manuals.
Washington, DC: American Psychological Association; 1966.
- Downing SM, Haladyna TM. Validity threats: overcoming interference in the proposed
interpretations of assessment data. Med Educ. 2004;38:327-333.
- Haynes SN, Richard DC, Kubany ES. Content validity in psychological assessment: a functional
approach to concepts and methods. Psychol Assess. 1995;7:238-247.
- Feldt LS, Brennan RL. Reliability. In: Linn RL, editor. Educational Measurement, 3rd Ed. New
York: American Council on Education and Macmillan; 1989.
- Downing SM. Reliability: on the reproducibility of assessment data. Med Educ. 2004;38:10061012.
Clark LA, Watson D. Constructing validity: basic issues in objective scale development. Psychol
Resources


For an excellent resource on item analysis:




For a more extensive list of item-writing tips:


http://testing.byu.edu/info/handbooks/Multiple-Choice
%20Item%20Writing%20Guidelines%20-%20Haladyna%20and
%20Downing.pdf

http://homes.chass.utoronto.ca/~murdockj/teaching/MCQ_basi
c_tips.pdf
For a discussion about writing higher-level multiple choice items:
 http://www.ascilite.org.au/conferences/perth04/procs/pdf/wo
odford.pdf




http://www.utexas.edu/academic/ctl/assessment/iar/students/
report/itemanalysis.php

Weitere ähnliche Inhalte

Was ist angesagt?

Assessment in Education
Assessment in EducationAssessment in Education
Assessment in EducationVandana Thakur
 
Assessment Of Student Learning
Assessment Of Student LearningAssessment Of Student Learning
Assessment Of Student LearningArlan Villanueva
 
Validity, reliability & practicality
Validity, reliability & practicalityValidity, reliability & practicality
Validity, reliability & practicalitySamcruz5
 
Test development
Test developmentTest development
Test developmentZubair Khan
 
Validity and Reliability
Validity and ReliabilityValidity and Reliability
Validity and ReliabilityMaury Martinez
 
Criterion-referenced assessment
Criterion-referenced assessmentCriterion-referenced assessment
Criterion-referenced assessmentMiss EAP
 
Standardized Testing
Standardized TestingStandardized Testing
Standardized TestingMiss EAP
 
Validity and objectivity of tests
Validity and objectivity of testsValidity and objectivity of tests
Validity and objectivity of testsbushra mushtaq
 
Test and types of tests
Test and types of testsTest and types of tests
Test and types of testsFousiya O P
 
Assessment for Learning, Assessment as Learning and Assessment of Learning
Assessment for Learning, Assessment as Learning and Assessment of LearningAssessment for Learning, Assessment as Learning and Assessment of Learning
Assessment for Learning, Assessment as Learning and Assessment of LearningSuresh Babu
 
Basic Principles of Assessment
Basic Principles of AssessmentBasic Principles of Assessment
Basic Principles of AssessmentYee Bee Choo
 
Performance assessment
Performance assessmentPerformance assessment
Performance assessmentKrisna Marcos
 

Was ist angesagt? (20)

Assessment in Education
Assessment in EducationAssessment in Education
Assessment in Education
 
Principles of assessment
Principles  of assessmentPrinciples  of assessment
Principles of assessment
 
Assessment Of Student Learning
Assessment Of Student LearningAssessment Of Student Learning
Assessment Of Student Learning
 
Validity, reliability & practicality
Validity, reliability & practicalityValidity, reliability & practicality
Validity, reliability & practicality
 
Subjective and Objective Test
Subjective and Objective TestSubjective and Objective Test
Subjective and Objective Test
 
Test development
Test developmentTest development
Test development
 
Types of test
Types of testTypes of test
Types of test
 
Validity and Reliability
Validity and ReliabilityValidity and Reliability
Validity and Reliability
 
Criterion-referenced assessment
Criterion-referenced assessmentCriterion-referenced assessment
Criterion-referenced assessment
 
Classroom assessment
Classroom assessmentClassroom assessment
Classroom assessment
 
Standardized Testing
Standardized TestingStandardized Testing
Standardized Testing
 
Validity and objectivity of tests
Validity and objectivity of testsValidity and objectivity of tests
Validity and objectivity of tests
 
Types of Test
Types of Test Types of Test
Types of Test
 
Test and types of tests
Test and types of testsTest and types of tests
Test and types of tests
 
Assessment for Learning, Assessment as Learning and Assessment of Learning
Assessment for Learning, Assessment as Learning and Assessment of LearningAssessment for Learning, Assessment as Learning and Assessment of Learning
Assessment for Learning, Assessment as Learning and Assessment of Learning
 
Basic Principles of Assessment
Basic Principles of AssessmentBasic Principles of Assessment
Basic Principles of Assessment
 
Rubrics ppt
Rubrics pptRubrics ppt
Rubrics ppt
 
Performance assessment
Performance assessmentPerformance assessment
Performance assessment
 
Test and Assessment Types
Test and Assessment TypesTest and Assessment Types
Test and Assessment Types
 
Types of Test
Types of TestTypes of Test
Types of Test
 

Ähnlich wie Validity and reliability in assessment.

Validity & reliability seminar
Validity & reliability seminarValidity & reliability seminar
Validity & reliability seminarmrikara185
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Linejan
 
Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliabilitysongoten77
 
Test characteristics
Test characteristicsTest characteristics
Test characteristicsSamcruz5
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnairesVenkitachalam R
 
Valiadity and reliability- Language testing
Valiadity and reliability- Language testingValiadity and reliability- Language testing
Valiadity and reliability- Language testingPhuong Tran
 
RELIABILITY AND VALIDITY
RELIABILITY AND VALIDITYRELIABILITY AND VALIDITY
RELIABILITY AND VALIDITYJoydeep Singh
 
JC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptxJC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptxsaurami
 
validity_reliability_trustworthiness.pdf
validity_reliability_trustworthiness.pdfvalidity_reliability_trustworthiness.pdf
validity_reliability_trustworthiness.pdfolgacamilahernandezm
 
Validity, Reliability and Feasibility
Validity, Reliability and FeasibilityValidity, Reliability and Feasibility
Validity, Reliability and FeasibilityJasna3134
 
Validity, reliability and feasibility
Validity, reliability and feasibilityValidity, reliability and feasibility
Validity, reliability and feasibilitysilpa $H!lu
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good testcyrilcoscos
 
VALIDITY
VALIDITYVALIDITY
VALIDITYANCYBS
 
Validity of test
Validity of testValidity of test
Validity of testSarat Rout
 
research-instruments (1).pptx
research-instruments (1).pptxresearch-instruments (1).pptx
research-instruments (1).pptxJCronus
 

Ähnlich wie Validity and reliability in assessment. (20)

Validity & reliability seminar
Validity & reliability seminarValidity & reliability seminar
Validity & reliability seminar
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity
 
Shaheen Anwar
Shaheen AnwarShaheen Anwar
Shaheen Anwar
 
Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliability
 
Test characteristics
Test characteristicsTest characteristics
Test characteristics
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnaires
 
Valiadity and reliability- Language testing
Valiadity and reliability- Language testingValiadity and reliability- Language testing
Valiadity and reliability- Language testing
 
RELIABILITY AND VALIDITY
RELIABILITY AND VALIDITYRELIABILITY AND VALIDITY
RELIABILITY AND VALIDITY
 
JC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptxJC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptx
 
Rep
RepRep
Rep
 
validity_reliability_trustworthiness.pdf
validity_reliability_trustworthiness.pdfvalidity_reliability_trustworthiness.pdf
validity_reliability_trustworthiness.pdf
 
Validity, Reliability and Feasibility
Validity, Reliability and FeasibilityValidity, Reliability and Feasibility
Validity, Reliability and Feasibility
 
Validity, reliability and feasibility
Validity, reliability and feasibilityValidity, reliability and feasibility
Validity, reliability and feasibility
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good test
 
Qualities of good evaluation tool (1)
Qualities of good evaluation  tool (1)Qualities of good evaluation  tool (1)
Qualities of good evaluation tool (1)
 
VALIDITY
VALIDITYVALIDITY
VALIDITY
 
Validity & reliability
Validity & reliabilityValidity & reliability
Validity & reliability
 
Validity of test
Validity of testValidity of test
Validity of test
 
research-instruments (1).pptx
research-instruments (1).pptxresearch-instruments (1).pptx
research-instruments (1).pptx
 
Validity and Reliability
Validity and Reliability Validity and Reliability
Validity and Reliability
 

Mehr von Tarek Tawfik Amin

Mehr von Tarek Tawfik Amin (20)

Publishing of an article
Publishing of an article Publishing of an article
Publishing of an article
 
Blinding in clinical trilas
Blinding in clinical trilas Blinding in clinical trilas
Blinding in clinical trilas
 
Clincal trails phases
Clincal trails  phasesClincal trails  phases
Clincal trails phases
 
Clinical trials designs
Clinical trials designsClinical trials designs
Clinical trials designs
 
Scientific writing
Scientific writing Scientific writing
Scientific writing
 
Bias and confounding
Bias and confoundingBias and confounding
Bias and confounding
 
Data collection
Data collection Data collection
Data collection
 
Epidemiology of physical activity in the Middle East
Epidemiology of physical activity in the Middle EastEpidemiology of physical activity in the Middle East
Epidemiology of physical activity in the Middle East
 
Cancer Epidemiology part II
Cancer Epidemiology part IICancer Epidemiology part II
Cancer Epidemiology part II
 
Cancer Epidemiology part I
Cancer Epidemiology part ICancer Epidemiology part I
Cancer Epidemiology part I
 
Community Diagnosis
Community DiagnosisCommunity Diagnosis
Community Diagnosis
 
Ebola virus disease
Ebola virus diseaseEbola virus disease
Ebola virus disease
 
Linear Correlation
Linear Correlation Linear Correlation
Linear Correlation
 
Plagiarism
Plagiarism Plagiarism
Plagiarism
 
Screening test (basic concepts)
Screening test (basic concepts)Screening test (basic concepts)
Screening test (basic concepts)
 
Public awareness
Public awarenessPublic awareness
Public awareness
 
Health education and adults learning.
Health education and adults learning. Health education and adults learning.
Health education and adults learning.
 
Breast cancer risk factors
Breast cancer risk factors Breast cancer risk factors
Breast cancer risk factors
 
Types of epidemics and epidemic investigations
Types of epidemics and epidemic investigationsTypes of epidemics and epidemic investigations
Types of epidemics and epidemic investigations
 
Samples Types and Methods
Samples Types and Methods Samples Types and Methods
Samples Types and Methods
 

Kürzlich hochgeladen

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 

Kürzlich hochgeladen (20)

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 

Validity and reliability in assessment.

  • 1. Validity and Reliability in Assessment This work is the summarizations .Of the previous efforts done by great educators A humble presentation by Dr Tarek Tawfik Amin
  • 2. Measurement experts (and many educators) believe that every measurement device should possess certain qualities. The two most common technical concepts in measurement are reliability and validity.
  • 3. Reliability Definition (Consistency)   The degree of consistency between two measures of the same thing. (Me hre ns and Le hman, 1 9 8 7 ( The measure of how stable, dependable, trustworthy, and consistent a test is in measuring the same thing each time (Wo rthe n e t al. , 1 9 9 3)
  • 4. (Validity definition (Accuracy  Truthfulness: Does the test measure what it purports to measure? the extent to which certain inferences can be made from test scores or other measurement. (M hre ns and Le hman, 1 9 8 7 ) e  The degree to which they accomplish the purpose for which they are being used. (Wo rthe n e t al. , 1 9 9 3(
  • 5.  The term “ validity” refers to the degree to which the conclusions (interpretations) derived from the results of any assessment are “ well-grounded or justifiable; being at once relevant and meaningful.” (M ssick S. 1 9 9 5) e  Content” : related to objectives and their sampling.  “ Construct” : referring to the theory underlying the target.  “ Criterion” : related to concrete criteria in the real world. It can be concurrent or predictive.  “ Concurrent” : correlating high with another measure already validated.  “ Predictive” : Capable of anticipating some later measure.  “ Face” : related to the test overall appearance. The usual concepts of validity.
  • 6. Sources of validity in assessment Old concept
  • 7. Sources of validity in assessment Usual concepts of validity
  • 8. All assessments in medical education require evidence of validity to be interpreted meaningfully. In contemporary usage, all validity is construct validity, which requires multiple sources of evidence; construct validity is the whole of validity, but has multiple facets. (Do wning S 20 0 3)
  • 9. ( Construct (Concepts, ideas and notions - Nearly all assessments in medical education, deal with constructs: intangible collections of abstract concepts and principles which are inferred from behavior and explained by educational or psychological theory. - Educational achievement is a construct, inferred from performance on assessments; written tests over domain of knowledge, oral examinations over specific problems or cases in medicine, or OSCE, history-taking or communication skills. - Educational ability or aptitude is another example of construct – a construct that may be even more intangible and abstract than achievement. (Do wning 20 0 3)
  • 10. Sources of validity in assessment      Content: do instrument items completely represent the construct? Response process: the relationship between the intended construct and the thought processes of subjects or observers Internal structure: acceptable reliability and factor structure Relations to other variables: correlation with scores from another instrument assessing the same construct Consequences: do scores really make a difference? Downing 2003, Cook S 2007
  • 11. Sources of validity in assessment Content Response process - Examination blueprint - Student format familiarity - Representativeness of test blueprint to achievement - Quality control of electronic domain - Test specification - Key validation of preliminary scores - Match of item content to test specifications - Accuracy in combining different formats scores - Representativeness of items to domain - Quality control/accuracy of final scores/marks/grades - Logical/empirical relationship of content tested domain - Quality of test questions - Item writer qualifications - Sensitivity review scanning/scoring - Subscore/subscale analyses: 1-Accuracy of applying pass-fail decision rules to scores 2-Quality control of score reporting Internal structure • Item analysis data: 1. Item difficulty/discriminati on 2. Item/test characteristic curves 3. Inter-item correlations 4. Item-total correlations (PBS) • Score scale reliability • Standard errors of measurement (SEM) • Generalizability • Item factor analysis • Differential Item Functioning (DIF) Relationship to other variables Consequences • Correlation with other relevant variables (exams) • Impact of test scores/results • Convergent correlations - • Consequences on learners/future internal/external: - Similar tests • Divergent correlations: internal/external on students/society learning • Reasonableness of method of establishing pass-fail (cut) score - Dissimilar measures • Pass-fail consequences: • Test-criterion correlations 1. P/F Decision reliability-accuracy • Generalizability of evidence 2. Conditional standard error of measurement • False +ve/-ve
  • 12. Sources of validity Internal Structure-1 Statistical e vide nce o f the hypo the size d re latio nship :be twe e n te st ite m sco re s and the co nstruct (:Reliability (internal consistency - 1 􀂄 Test scale reliability 􀂄 Rater reliability 􀂄 Generalizability Item analysis data- 2 􀂄 Item difficulty and discrimination 􀂄 MCQ option function analysis 􀂄 Inter-item correlations Scale factor structure -3 Dimensionality studies- 4 Differential item functioning (DIF studies- 5 )
  • 13. Sources of validity Relationship to other variables-2 Statistical e vide nce o f the hypo the size d re latio nship be twe e n te st sco re s and the co nstruct � Criterion-related validity studies �Correlations between test scores/ subscores and other measures �Convergent-Divergent studies
  • 14. Keys of reliability assessment     “ Stability” : related to time consistency. “ Internal” : related to the instruments. “ Inter-rater” : related to the examiners’ criterion. “ Intra-rater” : related to the examiner’ s criterion. Validity and reliability are closely related. A test cannot be considered valid unless the measurements resulting from it are reliable. Likewise, results from a test can be reliable and not necessarily valid.
  • 15. Keys of reliability assessment Validity and reliability are closely related. A test cannot be considered valid unless the measurements resulting from it are reliable. Likewise, results from a test can be reliable and not necessarily valid.
  • 16. Sources of reliability in assessment Source of reliability Internal consistency Description M easures Definitions Comments - Rarely used because the “effective” instrument is only half as long as the actual instrument; SpearmanBrown† formula can adjust - Do all the items on an instrument measure the same construct? (If an instrument measures more than one construct, a single score will not measure either construct very well. Split-half reliability - Correlation between scores on the first and second halves of a given instrument - We would expect high correlation between item scores measuring a single construct. Kuder-Richardson 20 -Assumes all items are - Similar concept to split-half, but accounts equivalent, measure a for all items single construct, and have dichotomous responses - Internal consistency is probably the most commonly reported reliability statistic, in part because it can be calculated after a single administration of a single instrument. - Because instrument halves can be considered “alternate forms,” internal consistency can be viewed as an estimate of parallel forms reliability. Cronbach’ s alpha - A generalized form of the Kuder-Richardson formulas - Assumes all items are equivalent and measure a single construct; can be used with dichotomous or continuous data
  • 17. Sources of reliability in assessment Source of reliability Temporal stability Description M easures Definitions Comments Does the instrument produce similar results when administered a second Test-retest reliability Administer the instrument to the same person at different times Usually quantified using correlation (eg, Pearson’ s r) Administer different versions of the instrument to the same individual at the same or Usually quantified using correlation (eg, Pearson’ s r) time? Parallel forms Do different versions of the “same” instrument produce similar results? Alternate forms reliability different times Agreement (inter-rater When using raters, does it matter who does the rating? Percent agreement reliability) Is one rater’ s score similar to another’ s? Kappa Phi Kendall’s tau Intraclass correlation coefficient %identical responses Simple correlation Agreement corrected for chance Agreement on ranked data ANOVA to estimate how well ratings from different raters coincide Does not account for agreement that would occur by chance Does not account for chance
  • 18. Sources of reliability in assessment Source of reliability Generalizability theory Description Measures How much of the error in Generalizability measurement is the result coefficient of each factor (eg, item, item grouping, subject, rater, day of administration) involved in the measurement process? Definitions Complex model that allows estimation of multiple sources of error Comments As the name implies, this elegant method is “generalizable” to virtually any setting in which reliability is assessed; For example, it can determine the relative contribution of internal consistency and inter-rater reliability to the overall reliability of a given instrument . I ms” are the individual q ue stio ns o n the instrume nt*“ te .The “ co nstruct” is what is be ing me asure d, such as kno wle dg e , attitude , skill, o r sympto m in a spe cific are a The Spe arman B wn “ pro phe cy” fo rmula allo ws o ne to calculate the re liability o f an instrume nt’ s sco re s ro (.whe n the numbe r o f ite ms is incre ase d (o r de cre ase d (Co o k and B ckman Validity and Re liability o f Psycho me tric I e nstrume nts (20 0 7
  • 19. Keys of reliability assessment
  • 20. Keys of reliability assessment Different types of assessments require different kinds of reliability Written MCQs 􀂄 Scale reliability Oral Exams 􀂄 Internal consistency 􀂄 Rater reliability 􀂄 Generalizability Theory Written—Essay Observational Assessments 􀂄 Inter-rater agreement 􀂄 Rater reliability 􀂄 Generalizability Theory 􀂄 Inter-rater agreement 􀂄 Generalizability Theory Performance Exams (OSCEs) 􀂄 Rater reliability 􀂄 Generalizability Theory
  • 21. Keys of reliability assessment ?R eliability – H high ow 􀂆Very high-stakes: > 0.90 + (L icensure (tests ( 􀂆M oderate stakes: at least ~0.75 (OSCE ( 􀂆L stakes: >0.60 (Quiz ow
  • 22. Keys of reliability assessment ?How to increase reliability For Written tests 􀂄 objectively scored formats Use 􀂄 least 35-40 MCQs At 􀂄 MCQs that differentiate high-low students For performance exams 􀂄 least 7-12 cases At 􀂄 Well trained SPs 􀂄 Monitoring, QC Observational Exams ( 􀂄 Lots of independent raters (7-11 􀂄 Standard checklists/rating scales 􀂄 Timely ratings
  • 23. Conclusion Validity = Meaning 􀂄 Evidence to aid interpretation of assessment data 􀂄 Higher the test stakes, more evidence needed 􀂄 Multiple sources or methods 􀂄 Ongoing research studies Reliability 􀂄 Consistency of the measurement 􀂄 One aspect of validity evidence 􀂄 Higher reliability always better than lower
  • 24. References  National Board of Medical Examiners. United States Medical Licensing  Exam Bulletin. Produced by Federation of State Medical Boards of  the United States and the National Board of Medical Examiners.  Available at: http://www.usmle.org/bulletin/2005/testing.htm.  Norcini JJ, Blank LL, Duffy FD, Fortna GS. The mini-CEX: a method for assessing clinical skills. Ann Intern Med. 2003;138:476-481.  Litzelman DK, Stratos GA, Marriott DJ, Skeff KM. Factorial validation of a widely disseminated educational framework for evaluating clinical teachers. Acad Med. 1998;73:688-695.  Merriam-Webster Online. Available at: http://www.m-w.com/.  Sackett DL, Richardson WS, Rosenberg W, Haynes RB. Evidence- Based Medicine: How to Practice and Teach EBM. Edinburgh: Churchill Livingstone; 1998.  Wallach J. Interpretation of Diagnostic Tests. 7th ed. Philadelphia: Lippincott Williams & Wilkins; 2000.  Beckman TJ, Ghosh AK, Cook DA, Erwin PJ, Mandrekar JN. How reliable are assessments of clinical teaching? A review of the published instruments. J Gen Intern Med. 2004;19:971-977.  Shanafelt TD, Bradley KA, Wipf JE, Back AL. Burnout and selfreported patient care in an internal medicine residency program. Ann Intern Med. 2002;136:358-367.  Alexander GC, Casalino LP, Meltzer DO. Patient-physician communication about out-of-pocket costs. JAMA. 2003;290:953-958.
  • 25. Reference s - Pittet D, Simon A, Hugonnet S, Pessoa-Silva CL, Sauvan V, Perneger TV. Hand hygiene among physicians: performance, beliefs, and perceptions. Ann Intern Med. 2004;141:1-8. - Messick S. Validity. In: Linn RL, editor. Educational Measurement, 3rd Ed. New York: American Council on Education and Macmillan; 1989. - Foster SL, Cone JD. Validity issues in clinical assessment. Psychol Assess. 1995;7:248-260. American Educational Research Association, American Psychological Association, National Council on Measurement in Education. Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association; 1999. - Bland JM, Altman DG. Statistics notes: validating scales and indexes. BMJ. 2002;324:606-607. - Downing SM. Validity: on the meaningful interpretation of assessment data. Med Educ. 2003;37:830-837. 2005 Certification Examination in Internal Medicine Information Booklet. Produced by American Board of Internal Medicine. Available at: http://www.abim.org/resources/publications/IMRegistrationBook. pdf. - Kane MT. An argument-based approach to validity. Psychol Bull. 1992;112:527-535. - Messick S. Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. Am Psychol. 1995;50:741-749. - Kane MT. Current concerns in validity theory. J Educ Meas. 2001; 38:319-342. American Psychological Association. Standards for Educational and Psychological Tests and Manuals. Washington, DC: American Psychological Association; 1966. - Downing SM, Haladyna TM. Validity threats: overcoming interference in the proposed interpretations of assessment data. Med Educ. 2004;38:327-333. - Haynes SN, Richard DC, Kubany ES. Content validity in psychological assessment: a functional approach to concepts and methods. Psychol Assess. 1995;7:238-247. - Feldt LS, Brennan RL. Reliability. In: Linn RL, editor. Educational Measurement, 3rd Ed. New York: American Council on Education and Macmillan; 1989. - Downing SM. Reliability: on the reproducibility of assessment data. Med Educ. 2004;38:10061012. Clark LA, Watson D. Constructing validity: basic issues in objective scale development. Psychol
  • 26. Resources  For an excellent resource on item analysis:   For a more extensive list of item-writing tips:  http://testing.byu.edu/info/handbooks/Multiple-Choice %20Item%20Writing%20Guidelines%20-%20Haladyna%20and %20Downing.pdf http://homes.chass.utoronto.ca/~murdockj/teaching/MCQ_basi c_tips.pdf For a discussion about writing higher-level multiple choice items:  http://www.ascilite.org.au/conferences/perth04/procs/pdf/wo odford.pdf   http://www.utexas.edu/academic/ctl/assessment/iar/students/ report/itemanalysis.php