Show
新建帐户 无法处理你的请求此请求遇到了问题。我们会尽快将它修复。
Meta © 2022 Item analysis of multiple choice questions: Assessing an assessment tool in medical students
Correspondence Address: Source of Support: None, Conflict of Interest: None DOI: 10.4103/2395-2296.189670 Aim: Assessment is a very important component of the medical course curriculum. Item analysis is the process of collecting, summarizing, and using information from student's responses to assess the quality of multiple-choice questions (MCQs). Difficulty index (P) and discrimination index (D) are the parameters used to evaluate the standard of MCQs. The aim of the study was to assess quality of MCQs. Materials and Methods: The study was conducted in the Department of Pathology. One hundred and twenty, 2nd year MBBS students took the MCQs test comprising 40 questions. There was no negative marking and evaluation was done out of 40 marks, and 50% score was the passing mark. Postvalidation of the paper was done by item analysis. Each item was analyzed for difficulty index, discrimination index, and distractor effectiveness. The relationship between them for each item was determined by Pearson correlation analysis using SPSS 20.0. Results: Difficulty index of 34 (85%) items was in the acceptable range (P = 30–70%), 2 (5%) item was too easy (P >70%), and 4 (10%) items were too difficult (P <30%). Discrimination index of 24 (60%) items was excellent (D >0.4), 4 (10%) items were good (D =0.3–0.39), 6 (15%) items were acceptable (D =0.2–0.29), and 6 (15%) items were poor (D < 0–0.19). A total 40 items had 120 distractors. Amongst these, 6 (5%) were nonfunctional distracters, 114 (95%) were functional distracters. The discrimination index exhibited positive correlation with difficulty index (r = 0.563, P = 0.010, significant at 0.01 level [two-tailed]). The maximum discrimination (D = 0.5–0.6) was observed in acceptable range (P = 30–70%). Conclusion: In this study, the majority of items fulfilled the criteria of acceptable difficulty and good discrimination. Moderately easy/difficult had the maximal discriminative ability. Very difficult item displayed poor discrimination, but the very easy item had high discrimination index, indicating a faulty item, or incorrect keys. The results of this study would initiate a change in the way MCQ test items are selected for any examination, and there should be proper assessment strategy as part of the curriculum development. Keywords: Difficulty index, discrimination index, distractor effectiveness, item analysis, multiple choice questions How to cite this article: How to cite this URL:
Multiple choice questions (MCQs) are frequently used to assess students in different educational streams for objectivity and wide reach of coverage in less time. MCQs are used mostly for comprehensive assessment at the end of academic sessions and provide feedback to the teachers on their educational actions. Designing MCQ is a complex and time-consuming process in a multidisciplinary, integrated curriculum. MCQ needs to be tested for the standard or quality.[1] Item analysis examines the student responses to individual test items (MCQ) to assess the quality of those items and test as a whole. Item analysis assesses the assessment tool for the benefit of both student and teacher. The aim of the study was to analyze the quality of MCQ's. To investigate the relationship of items having good difficulty and discrimination indices with their distractor efficiency and to find out the correlation between difficulty index (P) and discrimination index (D).
The study was conducted in the Department of Pathology as part of the assessment. Total 120, second year MBBS students took the MCQ's test comprising 40 questions with a single best response. There was no negative marking and time allotted was half an hour. Prevalidation of the paper was done by the head of the department. The evaluation was done out of 40 marks, and 50% score was the passing mark. Postvalidation of the paper was done by item analysis. The scores of all the students were arranged in order of merit. The upper one-third students were considered high achievers and lower one-third as low achievers. Each item was analyzed for:[2],[3]
Interpretation: Difficulty index (P) if P< 30% Difficult P = 30–70% Acceptable P > 70% Easy Discrimination index (D) if D = Negative. Defective item/wrong key D = 0–0.19 Poor discrimination D between 0.2 and 0.29 Acceptable discrimination D between 0.3 and 0.39 Good discrimination D > 0.4 Excellent discrimination. An item contains a stem and four options including one correct (key) and three incorrect (distractor) alternatives. Nonfunctional distractor (NFD) in an item is the option, other than the key selected by <5% of students and functional or effective distractor is the option selected by 5% or more students. On the basis of NFDs in an item, DE ranges from 0% to 100%. If an item contains three or two or one or nil NFDs, then DE would be 0, 33.3%, 66.6%, and 100%, respectively. Statistical analysis The data are reported as a percentage and mean plus or minus standard deviation (SD) of n items. The relationship between the difficulty index and discrimination index values for all items was determined using Pearson correlation analysis and using SPSS 20.0 (IBM, Armonk, NY, United States of America). P< 0.05 was considered to indicate statistical significance.
A total of 120 students gave the test consisting of 40 MCQs. As seen in [Table 1], mean difficulty index (P) was 50.16 ± 16.15 while mean discrimination index (D) was 0.34 ± 0.17. The distribution between difficulty indices (range 23.7–75.0) and discrimination indices (range 0–0.66) in all forty MCQ items were analyzed.
A total of 40 items had 120 distractors. Amongst these, 6 (5%) were NFDs, 114 (95%) were functional distractors. Mean distractor efficiency was 89.99 ± 24.42 and distribution range from 0% to 100% [Table 1]. [Figure 1] shows, out of a total 40 items, difficulty indices of 5% MCQ items were easy (P > 70%), about 10% were difficult (P < 30%) and the remaining 85% of the items were within an acceptable range (30–70%). The discrimination indices for 40 items showed 15% of the items with poor discrimination power (0–0.19), and 60% of the items exhibited excellent discrimination (>0.4). The remaining 25% were acceptable and good, out of which 15% of the items had an acceptable range (0.2–0.29) and 10% of the items showed good discrimination (0.3–0.39) [Figure 2]. The discrimination index correlated positively with the difficulty index (r =0.563, P = 0.010, significant at 0.01 level [two-tailed]). The maximum discrimination (D = 0.5–0.6) was observed in acceptable range (P = 30–70%).
The effective measurement of knowledge acquired is an important component of medical education. MCQ form useful assessment tool in measuring factual recall and if carefully constructed can test higher order of thinking skills which is very important for a medical graduate.[4] The method of assessment should be regularly evaluated. Developing an appropriate assessment strategy is a key part in curriculum development. It is important to evaluate MCQ items to see how effective they are in assessing the knowledge of students.[4] Postexamination analysis of the MCQs helps to assess the quality of individual test items and test as a whole.[1] Poor items can be modified or removed from the store of questions. Previous studies have proposed the mean of difficulty index as 39.4 ± 21.4%,[3] 52.53 ± 20.59.[4] Karelia et al. showed a range of mean ± SD between 47.17 ± 19.79 and 58.8 ± 19.33 in a study conducted over a period of 5 years.[5] They also showed 61% items in acceptable range (P = 30–70%), 24% items (P >70%), and 15% items (P< 30%). Other study by Patel and Mahajan showed 80% of items in the acceptable range.[6] Our findings corresponded with this study having a mean of difficulty index as 75.0 ± 23.7. The P value of 34 (85%) items was in acceptable range, two items (5%) easy, and 4 (10%) items difficult. Higher the difficulty index lower is the difficulty of the question. The difficulty index and discrimination index are reciprocally related.[1] Questions with high P value are considered to be good discriminators.[7] The value of discrimination index normally ranges between 0 and 1. Any discrimination index of 0.2 or higher is acceptable, and the test item would be able to differentiate between weak and good students. In this, it shows that 75% had discrimination index of more than 0.2. Out of 75%, 65% showed mean discrimination index of equal to or more than 0.4, indicating that these MCQ item were excellent test items for differentiating between poor and good performers. There were no items with negative discrimination index. Some studies have shown negative discrimination index in 20%.[8] Items with negative discrimination index decrease the validity of the test and should be removed from the collection of questions. Earlier studies have revealed 29% items with discrimination index >0.4, 46% items with discrimination index between 0.2 0.39 and 21% items with discrimination index <0.19.[7] A positive correlation was noted in difficulty and discrimination indices. The Same observation was reported by Pande et al., 2013[4] and Si-Mui Sim and Rasaiah 2006[9] in their studies. Mitra et al., 2009[10] showed that the discrimination index correlated poorly switch difficulty index (r = −0.325). The negative correlation signifies with increasing difficulty index values; there was a decrease in the discrimination index indicating that low performers were more likely to get the correct answer. In the present study, moderately easy/difficult (acceptable range) items had the maximal discriminative ability. Very difficult item displayed poor discrimination, but the very easy item had high discrimination index, indicating a faulty item, or incorrect keys. The distraction effect of items in our study was 89.99%. The number of an NFDs also affect the discrimination power of an item. It is seen that reducing the number of distractors from four to three decreases the difficulty index while increasing the discrimination index and reliability. Hingorjo [11] observed that items having one NFD had excellent discrimination ability. (D = 0.427) As compared to items with all four functioning distractors (D = 0.351). This compares well with other studies favoring better discrimination by three distractors as compared to four.[11] It was also observed that item having good difficulty index (P = 30–70) and good/excellent D (D > 0.24), considered to be ideal question, had DE of 85.15% which is close to items having one NFD.
Item analysis is a simple yet valuable procedure performed after the examination providing information regarding the reliability and validity of an item/test by calculating difficulty index, discrimination index, distractor efficiency, and their interrelationship. An ideal item (MCQ) will be the one which has average difficulty index between 31% and 60%, high discrimination (D > 0.25), and maximum distractor efficiency (100%) with three functional distractors. Items analyzed in the study were neither too easy nor too difficult (mean difficulty index = 50.16%), and overall discrimination index was 0.34, which is acceptable. In this study, the majority of items fulfilled the criteria of acceptable difficulty and good discrimination. Easy items with poor discrimination index will be reviewed and reconstructed. The results of this study should initiate a change in the way MCQ test items are selected for any examination, and there should be proper assessment strategy as part of the curriculum development. Much more of these kinds of analysis should be carried out after each examination to identify the areas of potential weakness in the one best answer type of MCQ tests to improve the standard of assessment. Financial support and sponsorship Nil. Conflicts of interest There are no conflicts of interest.
[Figure 1], [Figure 2] [Table 1]
What is a good item discrimination index?ScorePak® classifies item discrimination as “good” if the index is above . 30; “fair” if it is between . 10 and. 30; and “poor” if it is below .
How can we solve the index of discrimination?Determine the Discrimination Index by subtracting the number of students in the lower group who got the item correct from the number of students in the upper group who got the item correct. Then, divide by the number of students in each group (in this case, there are five in each group).
What is the purpose of discrimination index?The discrimination index (DI) provides an indication of the ability of the group of students who selects each option, in terms of how they perform (as a group) on the examination overall.
What is discrimination index in research?The item discrimination index is a measure of how well an item is able to distinguish between examinees who are knowledgeable and those who are not, or between masters and non-masters. There are actually several ways to compute an item discrimination, but one of the most common is the point-biserial correlation.
|