Wk 3 - Response to article 1 of 2
Page 221
Academy of Educational Leadership Journal, Volume 18, Number 4, 2014
USING EXCELS PIVOT TABLE FUNCTION FOR
VISUAL DATA ANALYSIS OF EXAM RESULTS: A
SUPPLEMENTAL PROCEDURE TO CLASSICAL TEST
THEORY
Robert D. Slater, University of North Florida
Mary Beal-Hodges, University of North Florida
Anita Reed, Texas A&M University Corpus Christi
ABSTRACT
This paper demonstrates how Excel’s Pivot Table Function can be used to visually examine
electronic exam results. Pivot tables allow users to visually analyze data such as exam results
effectively and efficiently. The paper provides a short discussion of Classical Test Theory statistics
such as Item Difficulty and Item Discrimination. Examples are then presented where exam
questions seemed to perform poorly when analyzed using only the statistical measurements. When
the same examples are explored using visual analysis from Excel’s Pivot Table as a supplement to
the statistical methods the results are better understand.
Keywords: Item Analysis, Assessment, Pivot-Table, Item Discrimination, Item Difficulty,
Classical Test Theory
INTRODUCTION
In this paper a graphical method of analyzing exam question results using Excel’s Pivot
Table function is proposed. We argue that visual analysis of exam data should be used as a
supplement to the traditional statistical approaches of item analysis. Performing detailed item
analyses on exam question responses allows instructors to understand not only how well students
are grasping the material on the exam as a whole but also to understand how well each question is
measuring the student’s knowledge. However, as Crisp & Palmer (2007) and Vyas & Supe (2008)
point out, many instructors are not specialists in educational theory or the discipline of assessment
and are limited in the statistical training needed to analyze assessment results. Therefore, it is
common practice for many instructors to create an exam, grade it, report the students’ scores and
then give the exam no further thought. In other words, validation of exams and their results tend
to be based around ‘academic acumen rather than quantitative evidence’ (Crisp & Palmer, 2007;
Knight, 2006; Price, 2005). Even when exam item analyses are conducted often times the Page 222
Academy of Educational Leadership Journal, Volume 18, Number 4, 2014
measures may be misunderstood. Course management systems such as Blackboard now offer exam
item analysis measures such as Item Discrimination and Item Difficulty. These measures can easily
be misinterpreted if instructors are not aware of how they are calculated and their sensitivity to the
data being measured.
Graphically analyzing electronic exam results gives instructors a method to cross-validate
traditional quantitative analyses. Ackerman (1996) illustrated how graphical analyses enhanced
interpretations of item responses. Performing an analysis of exam results using Excel’s Pivot Table
function allows professors to evaluate each question’s overall effectiveness and to identify
questions where students have performed poorly. The graphical results provided by the pivot table
provides an opportunity for instructors to recognize those questions that might need to be revised
or thrown out or that need further review before being used in future assessments. The pivot table
simultaneously presents students’ overall performance on the test, question performance, and the
student performance on each test question. This visual exam analysis is intended to complement
the traditional quantitative item statistics provided by Blackboard and/or other standard electronic
exam result analysis software.
PIVOT TABLES FOR DATA VISUALIZATION ANALYSIS
In this paper we propose using a pivot table and conditional formatting to conduct an exam
item analysis graphically. A pivot table is a data visualization tool that is included in most
spreadsheet programs such as Microsoft Excel. Pivot tables allow for multidimensional
representation of data as can be seen in Figure 1 below. Figure 1 demonstrates the analysis of
exam data. The itemized data was downloaded from Blackboard into an Excel spreadsheet. In
Figure 1, each student is represented by a row in the table and each question from the exam is
represented by a column in the table. Each student’s performance on a particular question can be
found at the intersection of each column and row. In the example below, each question was worth
two points and any student who answered the question correctly would have a “2” at the
intersection of the column and row corresponding with that student’s identification number.
Pivot tables allow users to select, view, and sort a large amount of data in a short period of time.
In Figure 1 below, the data has been sorted by both student performance on the exam and by
question performance. As can be seen on the right hand side of Figure 1, students who performed
poorly on the exam appear near the top of the table and students who performed well on the exam
appear on the bottom of the table. The student grades on this exam ranged from a low of 42 to a
high of 96. At the bottom of Figure 1 is the total score of each question. This score represents the
students who answered this question correctly with higher values representing easier questions.
The question performance has been sorted from high to low with easy questions presented on the
left hand side of the chart and more difficult questions presented on the right hand side of the chart.
At the top left of Figure 1, a Report Filter has been used. Excel’s Pivot Table function
includes the report filter which allows the data in the table to be filtered by user based criteria. In Page 223
Academy of Educational Leadership Journal, Volume 18, Number 4, 2014
this example, each question has been assigned to a category based on the particular construct the
question is measuring. The current configuration for Figure 1 is showing all of the constructs in
the exam. Several columns have been hidden in Figure 1 to enhance the quality of the image.
Conditional formatting is another data analysis visualization tool that is available in
Microsoft Excel. The benefits of conditional formatting have been demonstrated in this pivot table
analysis. In Figure 1 below, conditional formatting has been used to highlight any value in the
pivot table which equals zero. By using conditional formatting the instructor can easily see not
only missed questions, but the patterns that have developed for particular questions and the exam
overall.
Figure 1: Pivot Table showing exam item analysis
In Figure 1, it is also easy to quickly identify the questions that are easy and the questions
that are difficult. The two most difficult questions appear to be questions 47 and 49. One interesting
observation is that the two highest performing students on the exam both missed these questions.
Another pattern that can be seen is that for the most part students who answered one of the
questions correctly also answered the other question correctly as can be seen by the non-red values Page 224
Academy of Educational Leadership Journal, Volume 18, Number 4, 2014
that appear paired in the diagram. Further, question 14 is the third most difficult question and was
answered correctly by most of the students who performed well on the test overall.
The visual analysis of the pivot table can supplement traditional statistical analysis. Many
of the statistical measures are subject to false positive outcomes indicating problematic exam
questions that need further review. This paper provides specific examples where statistical
measures indicate there may be problems with particular exam questions and the visual analysis
(pivot table) provides a better understanding of the items responses. The visual analysis may also
help to eliminate questions that need to be thrown out. Identifying poorly worded questions before
an exam is reviewed with students in class can save classroom time and instructor effort. For some
professors, the first time they realize a question is poorly worded is only after students publically
bring a flawed question to their attention.
STATISTICAL MEASURES FOR ITEM ANALYSIS
Classical Test Theory (CTT) comprises a set of concepts and methods that provides a basis for
many of the measurement tools and statistics that are commonly used by higher education
instructors to both construct and evaluate exams. Around since the early 20th century, CTT is the
easiest and most widely used form of analysis. In recent decades attention has turned to Item
Response Theory which examines how test performance relates to the underlying abilities that are
measured by the items in the test (Hambleton and Jones, 1993). Item Response Theory, as the
name implies, tends to focus on item-level performance. It has very stringent assumptions such as
the fact that the set of items that compose the test measure a single common trait or ability.
However, CTT forms the basis of the item analysis provided in Blackboard and in other popular
item analysis software such as Softscore or ScorePak. The popularity of CTT is partly due to the
relatively weak statistical assumptions needed to run analyses combined with simple mathematical
procedures. Most procedures in CTT analysis focus on the test as a whole (mean, standard
deviation, etc.) rather than on the individual questions. However, important item-level statistics
such as difficulty and discrimination can still be calculated as part of Classical Theory. Additional
detail about the mathematical and theoretical components of Classical Test Theory can be found
in a variety of books and articles including Baker (1997); Crocker et. al (1986); Fan (1998); and
Hambleton & Jones (1993).
Blackboard uses both Item Difficulty and Item Discrimination measures in the Item
Analysis function. While these measures are helpful in understanding question performance, both
measures have limitations which may be seen quite clearly using a visual analysis of the exam
results (such as in a pivot table). Next, Item Difficulty and Item Discrimination will be discussed
and it will be illustrated how a visual tool such as a pivot table can supplement an exam analysis
using these two measures.
Item Difficulty is a measure used to show the percentage of students who answered a
particular question correctly (for items with one correct alternative). Item Difficulty is reported on Page 225
Academy of Educational Leadership Journal, Volume 18, Number 4, 2014
a range from 0 to 100% whereby higher item difficulty percentages represent easier questions.
According to Lord (1952) desirable difficulty levels are slightly higher than midway between
chance (arriving at correct choice by guessing) and perfect scores for the item. Figure 2 below
represents Lord’s (1952) desirable difficulty levels based on the question format:
Figure 2: Question Format and Ideal Item Difficulty
Question Format Ideal Difficulty
Five-response multiple choice 70
Four-response multiple choice 74
Three-response multiple choice 77
True-false 85
Blackboard arbitrarily classifies questions with percentages greater than 80% as “Easy” and less
<0057004b004400510003001600130008000300440056000300b3002b00440055004700b4000300440051004700030049004f0044004a005600030057004b004800560048000300540058004800560057004c00520051005600030049005200550003005500
480059004c0048005a0011000300030003[
Questions where students have performed poorly may fall into one of several categories:
1) incorrectly keyed answers, 2) confusing text, 3) content that was not covered during class, or 4)
higher level questions. By only looking at a measurement like the percentage of students who
answered a question correctly, professors may accidently throw out higher level questions. Using
Excel’s Pivot Table function to visually analyze the exam results allows instructors to visually
identify and categorize these questions. For example, question 14 in Figure 1, was given an Item
Difficulty of 29.2% which according to Lord would be much lower than ideal. Question 14 would
also be flagged by Blackboard for further review since the Item Difficulty was lower than 30%.
Based on the visual analysis presented by the pivot table it can be seen that most of the students
who received an “A” on the exam answered this question correctly. This question may be a valid
question that tests higher level constructs than the other questions. However, the analysis reveals
that two students who performed poorly on the overall exam still received credit for this question.
Whether these two poorly performing students actually knew the material being tested in the
question and received credit from “informed guessing” or if their correct responses were a function
of “statistical guessing” cannot be determined from either Item Difficulty or Visual Analysis
(Burton 2001).
Another statistical method common in Classical Test Theory and also presented by
Blackboard is Item Discrimination. Item Discrimination refers to the ability of a question to
differentiate among students on the basis of how well each student knows the overall material
being tested. Item Discrimination is a measure of the degree to which students with high overall
exam scores also answered a particular question correctly. A question is a good discriminator when
students who answer the question correctly also do well on the test. One common item
discrimination index is a point biserial correlation coefficient between students’ responses to a
particular question and total scores on all other questions on the test. However, a discrimination
value cannot be calculated when the question’s difficulty is 100% or when all students receive the
same score on a question (Blackboard Learn). Point biserial values can range from -1.0 to +1.0. Page 226
Academy of Educational Leadership Journal, Volume 18, Number 4, 2014
Questions with discrimination values above 0.3 are arbitrarily classified as “Good”; between 0.1
and 0.3 as “Fair”; and those less than 0.1 are considered “Poor” and flagged for review by
Blackboard.
The visual analysis from the pivot table can help with the review of items which scored
low on the point biserial scale. For example in Figure 1, question 46 scored a .056 point biserial
rating in Blackboard, suggesting that further review is required. In the pivot table it is shown that
20 out of 24 students answered this question correctly. A couple of students with lower overall
exam grades missed this question while students with even lower overall grades answered the
question correctly. This type of pattern heavily influences the point biserial statistic but visually
shows that there is nothing wrong with the question. The four students who missed this question
may have just marked the wrong response on their exam or they may not have studied that
particular construct being tested.
One problem with item discrimination methods such as the point biserial statistic is that
the calculation assumes that an individual question is measuring the same construct as the rest of
the questions on a particular test. In higher education, exams often have questions from multiple
chapters that cover different constructs. So, a question with a low or negative discrimination index
(point biserial value) might indicate a concept that is covered sparingly throughout the exam. In
other words a student could do very well on this construct but still score poorly on the overall
exam. There would be nothing inherently wrong with the question but statistical tests may flag the
question for review. The report filter in the pivot table can help with this type of item classification
and analysis. An example using the Report Filter function is demonstrated in Figure 3 below. In
Figure 3, the report has been filtered by each question’s category. In this filtered report, only
questions measuring students’ knowledge of the REA Diagraming construct have been included.
Below the pivot table the Item Difficulty and Item Discrimination statistics are presented.
The Item Difficulty rating is question specific and does not change when the report is filtered.
However, the point biserial (item discrimination) may be recalculated for this subset of questions
to see how well each question measures the construct being tested. In the example in Figure 3,
questions 47 and 49 which were the most difficult questions on the overall exam (as reported in
Figure 1 for all question categories) are still the most difficult question for the REA Diagraming
category. The visual analysis also clearly highlights that these two questions are pretty good
“higher level” questions. That is, these questions can discriminate between high performing
students and lower performing students for questions in the category REA Diagraming. A
comparison of the Item Discrimination measure for how well these questions correlate with
students overall exam scores (Original Discrimination) versus how well these questions correlate
with the overall score of just questions in the REA Diagraming category (Revised Discrimination)
shows a significant difference. In the Original Discrimination results there were three questions
(31, 47, and 49) that seemed to test only fair when compared with the overall exam scores. The
Revised Discrimination results show that these three questions actually correlate very well with
the total scores for the just the category of REA Diagraming. Page 227
Academy of Educational Leadership Journal, Volume 18, Number 4, 2014
Figure 3: Pivot Table Using Report Filter to Show Only Questions Measuring a Particular
Construct (REA Diagraming).
Page 228
Academy of Educational Leadership Journal, Volume 18, Number 4, 2014
SUMMARY
In this paper we have demonstrated that a visual analysis of exam results using Excel’s
Pivot Table Function can supplement traditional Classical Test Theory measures such as Item
Difficulty and Item Discrimination. Examples were provided for both Item Difficulty and Item
Discrimination where the calculated statistics indicated further analysis of exam questions would
be needed. It was demonstrated how the visual analysis in Excel’s Pivot Table could easily show
that questions with high item difficulty measures may be valid questions that were only answered
correctly by the students who performed better on the exam. The sensitivity of Item Discrimination
measures such as the point biserial statistic to small anomalies in the exam data was also illustrated.
For instance, when students who performed poorly on an exam answered questions correctly, they
heavily influenced the Item Discrimination measure. Visualization analysis of the questions
indicated that these students performance on the question being examined may be caused by
random guessing rather than informed guessing (Burton 2001).
It was also demonstrated how violations to a required assumption of the point biserial
measure may impact the measurements effectiveness. When multiple constructs are being
measured in a single exam the results of the point biserial statistic may not be applicable to the
exam as a whole. Using the report filter function in the pivot table allows the user to view questions
from the exam based on the question’s category. Viewing the questions by a single category
allowed for a re-calculation of the point biserial measurement to examine how well each question
correlated to the other measures of a single category.
Page 229
Academy of Educational Leadership Journal, Volume 18, Number 4, 2014
REFERENCES
Al Muhaidib, Nouriya Saab. 2010. “Types of Item-Writing Flaws in Multiple Choice Question Pattern—A
Comparative Study.” Journal of Educational & Psychological Sciences 10-44.
Ackerman, T. 1996. “Graphical Representation of Multidimensional Item Response Theory Analyses”, Applied
Psychological Measurement, 20(4).
Baker, R. 1997, “Classical Test Theory and Item Response Theory in Test Analysis.”
Blackboard Learn 2013. “Running Item Analysis on a Test.” https://help.blackboard.com/en-
us/Learn/9.1_SP_10_and_SP_11/Instructor/070_Tests_Surveys_Pools/112_Running_Item_Analysis_on_a_
Test accessed September 8, 2013.
Burton, R. F. 2001. “Quantifying the Effects of Chance in Multiple Choice and True/False Tests: question selection
and guessing of answers.” Assessment & Evaluation in Higher Education, 26(1), 41-50.
Crisp, G. T. And E. J. Palmer. 2007. Engaging Academics with A Simplified Analysis of Their Multiple-Choice
Questions (MCQ) Assessment Results.” Journal of University Teaching & Learning Practice. 4(2):88-106.
Crocker, Linda and James Algina. 1986. Introduction to Classical and Modern Test Theory. Holt, Rinehart, and
Winston., Orlando, Florida.
De Champlain, Andre F. 2010. “A primer on classical test theory and item response theory for assessments in medical
education.” Medical Education. 44(1): 109-117.
Fan, Xitao. 1998. “Item Response Theory and Classical test theory: an empirical comparison of their item/person
statistics.” Educational and Psychological Measurement 58(3): 357-382.
Hambleton, Ronald and Russell W. Jones. 1993. “An NCME Instructional Module on Comparison of Classical Test
Theory and Item Response Theory and Their Applications to Test Development.” Educational Measurement:
Issues and Practice. 12(3): 38-47.
Hambleton, Ronald and Hariharan Swaminathan and H. Jane Rogers. 1991. Fundamentals of Item Response Theory.
Sage Publications Inc. Newbury Park, London, New Delhi.
Kline, Theresa J.B. 2005. Psychological Testing: A Practical Approach to Design and Evaluation. Sage Publications,
Inc. Thousand Oaks, London, New Delhi. 91-165.
Knight, P. 2006. “The Local Practices of Assessment.” Assessment & Evaluation in Higher Education. 31(4): 435-
452.
Lord, F.M. 1952. “The Relationship of the Reliability of Multiple-Choice Test to the Distribution of Item Difficulties.”
Psychometrika, 18: 181-194.
Price, M. 2005. “Assessment Standards: The Role of Communities of Practice and the Scholarship of Assessment.
Assessment & Evaluation in Higher Education. 30(3): 215-230.
Vyas, R and A Supe. 2008. “Multiple Choice Questions: A Literature Review on the Optimal Number of Options.”
The National Medical Journal of India. 21(3), 130-133.
Weiss, David and Michael E. Yoes. 1991. “Item Response Theory.” Advances in Educational and Psychological
Testing: Theory and Applications. Kluwer Academic/Plenum Publishers New York: New York. 69-95.
R epro duce d w ith p erm is sio n o f th e c o pyrig ht o w ner. F urth er r e pro ductio n p ro hib ite d w ith out
p erm is sio n.