Reliability and ValidityAs part of the test development process, researchers strive to create psychometrically sound instruments with acceptable levels of reliability and validity. When test items are

Types of Reliability PSY3700 Multimedia Assessment and Psychometrics ©20 16 South University 2 Types of Reliability Reliability If a test administrator is concerned about the error due to testing candidates at different times, the test -retest method can be used. In this method, the same test is administered twice and scores across test takers are compared on both administrations to obtain the correlation between the two. High correlations are favored as the test shows consistency across two or more a dministrations. However, if the concern is over error due to having a small sample, the split -half method is preferred. In this method, the test is split into halves and each half administered separately.

Scores are correlated using the Kuder -Richardson 20 (KR20) in some cases or Cronbach's coefficient alpha in other cases. These measures of internal consistency are often used as an estimate of a test's reliability (Kaplan & Saccuzzo, 2013). Item sampling may be used rather than test -retest. This method addresses the error variance due to selection of a subset of test items within the domain under investigation. Choosing a new subset of items to compile a test and comparing the scores to the old constitute s the alternate, or parallel, form method of estimation. This estimate can be calculated using the Pearson’s r. For tests that include behavioral observations, such as those used in criterion -referenced tests, agreement among the raters or scorers can be estimated through the Kappa statistic (Kaplan & Saccuzzo, 2013). The Standard Error of Measurement (SEM) can b e computed indirectly from the reliability coefficient. Taken from the normative sample, the standard deviation of test scores and the reliability coefficient can yield the SEM. The SEM is the standard deviation multiplied by the square root of (1 –r) for t he normative sample (Gregory, 2013). The larger the SEM, the less certain we can be that the test is accurate. Different types of tests render acceptable reliability indices. For projective tests, such as the Rorschach Inkblot Test, reliability indices ma y be as low as r = 0.2 whereas for objective measures, such intelligence quotient (IQ) or personality tests, they are much higher (0.9 to 0.7, respectively) (Rust & Golombok, 2009). In our example of testing the components of creativity, it would be diffic ult to expect a reliability index score of 0.7 or 0.9 due to the subjective nature of interpretations by raters on a test of creativity. Here, the reliability of the test would be expected to fall somewhere between that of projective and objective tests (R ust & Golombok, 2009). PSY3700 Multimedia Assessment and Psychometrics ©20 16 South University 3 Types of Reliability Reliability References Gregory, R. (2013). Psychological testing: History, principles, and applications (7th ed.). Boston, MA: Pearson. Kaplan, R., & Saccuzzo, D. (2013). Psychological testing: Principles, applications, & issues (8th ed.). B elmont, CA: Wadsworth. Rust, J., & Golombok, S. (2009). Modern psychometrics: The science of psychological assessment (3rd ed.). New York, NY: Taylor & Francis. © 201 6 South University