CBE Life Sci Educ. 2014 Summer; 13(2): 159–166. The authors explore a history of grading and review the literature regarding the purposes and impacts of grading. They then suggest strategies for making grading more supportive of learning, including balancing accuracy-based and effort-based grading, using self/peer evaluation,
curtailing curved grading, and exercising skepticism about the meaning of grades. When we consider the practically universal use in all educational institutions of a system of marks, whether numbers or letters, to indicate scholastic attainment of the pupils or students in these institutions, and when we remember how very great stress is laid by teachers and pupils alike upon these marks as real measures or
indicators of attainment, we can but be astonished at the blind faith that has been felt in the reliability of the marking systems. —I. E. Finkelstein (1913) If your current professional position involves teaching in a formal
classroom setting, you are likely familiar with the process of assigning final course grades. Last time you assigned grades, did you assign an “E,” “E+,” or “E−” to any of your students? Likely you assigned variations on “A’s,” “B’s,” “C’s,” “D’s,” and “F’s.” Have you wondered what happened to the “E’s” or talked with colleagues about their mysterious absence from the grading lexicon? While we often commiserate about the process of assigning grades, which may be as stressful for instructors as
for students, the lack of conversation among instructors about the mysterious omission of the “E” is but one indicator of the many tacit assumptions we all make about the processes of grading in higher education. Given that the time and stress associated with grading has the potential to distract instructors from other, more meaningful aspects of teaching and learning, it is perhaps time to begin scrutinizing our tacit assumptions surrounding grading. Below, we explore a brief history of grading
in higher education in the United States. This is followed by considerations of the potential purposes of grading and insights from research literature that has explored the influence of grading on teaching and learning. In particular, does grading provide feedback for students that can promote learning? How might grades motivate struggling students? What are the origins of norm-referenced grading—also known as curving? And, finally, to what extent does grading provide reliable information about
student learning and mastery of concepts? We end by offering four potential adjustments to our general approach to grading in undergraduate science courses for instructors to consider. It can be easy to perceive grades as both fixed and inevitable—without origin or evolution … Yet grades have not always been a part of education in the United States. —Schneider and Hutt (2013 Surprisingly, the letter grades most of us take for granted did not gain widespread popularity until the 1940s. Even as late as 1971, only 67% of primary and secondary schools in the United States used letter grades
(National Education Association, 1971 The earliest forms of grading consisted of exit exams before awarding of a degree, as seen at Harvard as early as 1646 (Smallwood, 1935
Harvard and other schools soon experimented with public rankings and evaluations, noting that this resulted in “increasing [student] attention to
the course of studies” and encouraged “good moral conduct” (Harvard University, 1832 With schools growing rapidly in size and number and coordination between schools becoming more important, grades became one of the primary means of communication between institutions (Schneider and Hutt, 2013 By the early 1900s, 100-point or percentage-based grading systems were very common
(Cureton, 1971 As research on intellectual ability appeared to show that, like other continuous biological traits, levels of aptitude in a population conformed to a normal curve, some experts felt grades should similarly be distributed according to a curve in a classroom
(Finkelstein, 1913 Based on the above research and the pressure toward uniformity of grading systems, by the 1940s the “A”–“F” grading system was dominant, with the four-point scale and percentages still also in use (Schneider and Hutt, 2013
Abstract
INTRODUCTION
A BRIEF HISTORY OF GRADING IN HIGHER EDUCATION
Early 19th Century and Before
Late 19th
Century and 20th Century
Present Day
Grading systems remain controversial and hotly debated today (Jaschik, 2009
Though grades were initially meant to serve various pedagogical purposes, more recent reforms have focused on “grades as useful tools in an organizational rather than pedagogical enterprise—tools that would facilitate movement, communication, and coordination”
(Schneider and Hutt, 2013
PURPOSES OF GRADING—PAST AND PRESENT
Grades as Feedback on Performance—Does Grading Provide Feedback to Help Students Understand and Improve upon Their Deficiencies?
[This] work affirms an observation that many classroom teachers have made about their students: if a paper is returned with both a grade and a comment, many students will pay attention to the grade and ignore the comment.
—Brookhart (2008
, p. 8)
For most faculty members, the concept of feedback has at least two applications to the concept of grading. On one hand, grading itself is a form of feedback that may be useful to students. In addition, in the process of grading student work, faculty members sometimes provide written comments as feedback that students could use to improve
their work. Because college students express a desire for feedback (Higgins et al., 2002
Feedback is generally divided into two categories: evaluative feedback and descriptive feedback. Evaluative feedback, such as a letter grade or written praise or criticism, judges student work, while descriptive feedback provides
information about how a student can become more competent (Brookhart, 2008
While descriptive,
written feedback can enhance student performance on problem-solving tasks; reaping those benefits requires students to read, understand, and use the feedback. Anecdotal accounts, as well as some studies, indicate that many students do not read written feedback, much less use it to improve future work (MacDonald, 1991
Grading does not appear to provide effective feedback that constructively informs students’ future efforts. This is particularly true for tasks involving problem solving or creativity. Even when grading comes in the form of written comments, it is unclear whether students even read such comments, much less understand and act on them.
Grades as a Motivator of Student Effort—Does Grading Motivate Students to Learn?
Our results suggest…that the information routinely given in schools—that is, grades—may encourage an emphasis on quantitative aspects of learning, depress creativity, foster fear of failure, and undermine interest.
—Butler and Nisan (1986
)
As described in the history of grading above, our current “A”–“F” grading system was not designed with the primary intent of motivating students. Rather, it stemmed from efforts to streamline communication between institutions and
diminish the impacts of unreliable evaluation of students from teacher to teacher (Grant and Green, 2013
It would not be surprising to most faculty members that, rather than stimulating an interest in learning, grades primarily enhance students’ motivation to avoid receiving bad grades (Butler and Nisan, 1986
High-achieving students on initial graded assignments appear somewhat sheltered from some of the negative impacts of grades, as they tend to maintain their interest in completing future assignments (presumably in anticipation
of receiving additional good grades; Butler, 1988
This is not to say that classroom evaluation is by definition harmful or a thing to avoid. Evaluation of students in the service of
learning—generally including a mechanism for feedback without grade assignment—can serve to enhance learning and motivation (Butler and Nisan, 1986
Rather than motivating students to learn, grading appears to, in many ways, have quite the opposite effect. Perhaps at best, grading motivates high-achieving students to continue getting high grades—regardless of whether that goal also happens to overlap with learning. At worst, grading lowers interest in learning and enhances anxiety and extrinsic motivation, especially among those students who are struggling.
Grades as a Tool for Comparing Students—Is Grading on a Curve the Fairest Way to Grade?
You definitely compete for grades in engineering; whereas you earn grades in other disciplines … I have to get one point higher on the test than the next guy so I can get the higher grade.
—Student quoted in Seymour and Hewitt (1997
, p. 118)
The concept of grading on a curve arose from studies in the early 20th century suggesting that levels of aptitude, for example as measured by IQ, were distributed in the population according to a normal curve.
Some then argued, if a classroom included a representative sample from the population, grades in the class should similarly be distributed according to a normal curve (Finkelstein, 1913
Grading on a curve is by definition a type of “norm-referenced” grading, meaning student work is graded based on comparisons with other students’ work (Brookhart, 2004
Of even more concern, however, is the impact norm-referenced grading has on competition between students. The quote at the start of this section describes how many students respond to curve-graded classes compared with classes that do not use a grading curve. Seymour and Hewitt (1997
Bloom (1968
Of course, Bloom's work did not rule out the possibility that some teachers might still give high grades for undesirable reasons unrelated to standards of mastery (e.g., to be nice, to gain the admiration of students, etc.). Such
practices would not be in line with Bloom's work and would lead to pernicious grade inflation. Indeed, many of those bemoaning recent trends in grade inflation in higher education (though less prevalent in the sciences) point to the abandonment of curved grading as a major factor (Rojstaczer and Healy, 2012
In brief, curved grading creates a competitive classroom environment, alienates certain groups of talented students, and often results in grades unrelated to content mastery. Curving is therefore not the fairest way to assign grades.
Grades as an Objective Evaluation of Student Knowledge—Do Grades Provide Reliable Information about Student Learning?
Study Critiques Schools over Subjective Grading: An Education Expert Calls for Greater Consistency in Evaluating Students' Work.
—Los Angeles Times (2009)
As evidenced by the above headline, some have criticized grading as subjective and inconsistent, meaning that the same student could receive drastically different grades for the same work, depending on who is grading the work and when it is graded. The literature indeed indicates that some forms of assessment lend themselves to greater levels of grading subjectivity than others.
Scoring multiple-choice assessments does not generally require the use of professional judgment from one paper to the next, so instructors should
be able to score such assessments objectively (Wainer and Thissen, 1993
Grading student writing, whether in essays, reports, or constructed-response test items, opens up
greater opportunities for subjectivity. Shortly after the rise in popularity of percentage-based grading systems in the early 1900s, researchers began examining teacher consistency in marking written work by students. Starch and Elliott (1912)
Eells (1930)
Designing and using rubrics to grade assignments or tests can reduce inconsistencies and make grading written work more objective. Sharing the rubrics with students can have the added benefit of enhancing
learning by allowing for feedback and self-assessment (Jonsson and Svingby, 2007
In summary, grades often fail to provide reliable information about student learning. Grades awarded can be inconsistent both for a single instructor and among different instructors for reasons that have little to do with a students’ content knowledge or learning advances. Even multiple-choice tests, which can be graded with great consistency, have the potential to provide misleading information on student knowledge.
GRADING—STRATEGIES FOR CHANGE
In part, grading practices in higher education have been driven by educational goals such as providing feedback to students, motivating students, comparing students, and measuring learning. However, much of the research literature on grading reviewed above suggests that these goals are often not being achieved with our current
grading practices. Additionally, the expectations, time, and stress associated with grading may be distracting instructors from integrating other pedagogical practices that could create a more positive and effective classroom environment for learning. Below we explore several changes in approaching grading that could assist instructors in minimizing its negative influences.
Kitchen et al. (2006)
Balancing Accuracy-Based Grading with Effort-Based Grading
Multiple research studies described above suggest that the evaluative aspect of grading may distract students from a focus on learning. While evaluation will no doubt always be key in determining course grades, the entirety of students’ grades need not be based primarily on work that rewards only correct answers, such as exams and quizzes. Importantly, constructing a grading system that rewards students for participation and effort has been
shown to stimulate student interest in improvement (Swinton, 2010
Providing Opportunities for Meaningful Feedback through Self and Peer Evaluation
Instructors
often perceive grading to be a separate process from teaching and learning, yet well-crafted opportunities for evaluation can be effective tools for changing students’ ideas about biology. Nicol and Macfarlane-Dick (2006)
Making the Move Away from Curving
As documented in the research literature, the practice of grade curving has had unfortunate and often unintended consequences for the culture of undergraduate science classrooms, pitting students against one another as opposed to creating a collaborative learning community
(Tobias, 1990
Becoming Skeptical about What Grades Mean
The research literature raises significant questions about what grades really measure. However, it is likely that grades will
continue to be the currency of formal teaching and learning in most higher education settings for the near future. As such, perhaps the most important consideration for instructors about grading is to simply be skeptical about what grades mean. Some instructors will refuse to write letters of recommendation for students who have not achieved grades in a particular range in their course. Yet, if grades are not a reliable reflection of learning and reflect other factors—including language
proficiency, cultural background, or skills in test taking—this would seem a deeply biased practice. One practical strategy for making grading more equitable is to grade student work anonymously when possible, just as one would score assays in the laboratory blind to the treatment of the sample. The use of rubrics can also help remove bias from grading (Allen and Tanner,
2006
IN CONCLUSION—TEACHING MORE BY GRADING LESS (OR DIFFERENTLY)
A review of the history and research on grading practices may appear to present a bleak outlook on the process of grading and its impacts on learning. However, underlying the less encouraging news about grades are numerous opportunities for faculty members to make assessment and evaluation more productive, better aligned with student learning, and less burdensome for faculty and students. Notably, many of the practices advocated in the literature would appear to involve faculty members spending less time grading. The time and energy spent on grading has been often pinpointed as a key barrier to instructors becoming more innovative in their teaching. In some cases, the demands of grading require so much instructor attention, little time remains for reflection on the structure of a course or for aspirations of pedagogical improvement. Additionally, some instructors are hesitant to develop active-learning activities—as either in-class activities or homework assignments—for fear of the onslaught of grading resulting from these new activities. However, just because students generate work does not mean instructors need to grade that work for accuracy. In fact, we have presented evidence that accuracy-based grading may, in fact, demotivate students and impede learning. Additionally, the time-consuming process of instructors marking papers and leaving comments may achieve no gain, if comments are rarely read by students. One wonders how much more student learning might occur if instructors’ time spent grading was used in different ways. What if instructors spent more time planning in-class discussions of homework and simply assigned a small number of earned points to students for completing the work? What if students themselves used rubrics to examine their peers’ efforts and evaluate their own work, instead of instructors spending hours and hours commenting on papers? What if students viewed their peers as resources and collaborators, as opposed to competitors in courses that employ grade curving? Implementing small changes like those described above might allow instructors to promote more student learning by grading less or at least differently than they have before.
REFERENCES
- Allen D, Tanner K. Rubrics: tools for making learning goals and evaluation criteria explicit for both teachers and learners. Cell Biol Educ. 2006;5:197–203. [PMC free article] [PubMed] [Google Scholar]
- Anderson VJ. In: Encyclopedia of Educational Psychology. Thousand Oaks, CA: Sage; 2008. Grading. [Google Scholar]
- Bagg LH. Four Years at Yale. New Haven, CT: Charles C. Chatfield; 1871. [Google Scholar]
- Bean JC, Peterson D. Grading classroom participation. New Direct Teach Learn. 1998;1998(74):33–40. [Google Scholar]
- Bloom BS. Learning for Mastery. Instruction and Curriculum. Regional Education Laboratory for the Carolinas and Virginia, Topical Papers and Reprints, Number 1. Eval Comment 1(2), 1–11. 1968 [Google Scholar]
- Bloom BS. Human Characteristics and School Learning. New York: McGraw-Hill; 1976. [Google Scholar]
- Brookhart S. Grading. Upper Saddle River, NJ: Pearson Education; 2004. [Google Scholar]
- Brookhart SM. How to Give Effective Feedback to Your Students. Alexandria, VA: Association for Supervision and Curriculum Development; 2008. [Google Scholar]
- Brown University. 2014. Brown's Grading System. //brown.edu/campus-life/support/careerlab/employers/employer-resources/browns-grading-system/browns-grading-system (accessed 19 February 2014) [Google Scholar]
- Bull R, Stevens J. The effects of attractiveness of writer and penmanship on essay grades. J Occup Psychol. 1979;52:53–59. [Google Scholar]
- Butler R. Enhancing and undermining intrinsic motivation: the effects of task-involving and ego-involving evaluation on interest and performance. Br J Educ Psychol. 1988;58:1–14. [Google Scholar]
- Butler R, Nisan M. Effects of no feedback, task-related comments, and grades on intrinsic motivation and performance. J Educ Psychol. 1986;78:210. [Google Scholar]
- Crisp BR. Is it worth the effort? How feedback influences students’ subsequent submission of assessable work. Assess Eval High Educ. 2007;32:571–581. [Google Scholar]
- Crooks TJ. The impact of classroom evaluation practices on students. Rev Educ Res. 1988;58:438–481. [Google Scholar]
- Cureton LW. The history of grading practices. NCME Measurement in Educ. 1971;2(4):1–8. [Google Scholar]
- Dufresne RJ, Leonard WJ, Gerace WJ. Making sense of students’ answers to multiple-choice questions. Phys Teach. 2002;40:174–180. [Google Scholar]
- Ebert-May D, Batzli J, Lim H. Disciplinary research strategies for assessment of learning. Bioscience. 2003;53:1221–1228. [Google Scholar]
- Eells WC. Reliability of repeated grading of essay type examinations. J Educ Psychol. 1930;21:48. [Google Scholar]
- Fajardo DM. Author race, essay quality, and reverse discrimination. J Appl Social Psychol. 1985;15:255–268. [Google Scholar]
- Farrell MJ, Gilbert N. A type of bias in marking examination scripts. Br J Educ Psychol. 1960;30:47–52. [Google Scholar]
- Finkelstein IE. The Marking System in Theory and Practice. Baltimore: Warwick & York; 1913. [Google Scholar]
- Freeman S, et al. Prescribed active learning increases performance in introductory biology. Cell Biol Educ. 2007;6:132–139. [PMC free article] [PubMed] [Google Scholar]
- Freeman S, Parks JW. How accurate is peer grading? CBE Life Sci Educ. 2010;9:482–488. [PMC free article] [PubMed] [Google Scholar]
- Grant D, Green WB. Grades as incentives. Empirical Econom. 2013;44:1563–1592. [Google Scholar]
- Guskey TR. Making the grade: what benefits student. Educ Leadership. 1994;52(2):14–20. [Google Scholar]
- Harter S. Pleasure derived from challenge and the effects of receiving grades on children's difficulty level choices. Child Dev. 1978;49:788–799. [Google Scholar]
- Harvard University. Cambridge, UK: E. W. Metcalf; 1832. Annual Report of the President of Harvard University to the Overseers on the State of the University for the Academic Year 1830–1831. [Google Scholar]
- Higgins R, Hartley P, Skelton A. The conscientious consumer: reconsidering the role of assessment feedback in student learning. Stud High Educ. 2002;27:53–64. [Google Scholar]
- Humphreys B, Johnson RT, Johnson DW. Effects of cooperative, competitive, and individualistic learning on students’ achievement in science class. J Res Sci Teach. 1982;19:351–356. [Google Scholar]
- Jaschik S. Imagining College without Grades. (2009). www.insidehighered.com/news/2009/01/22/grades (accessed 20 February 2014)
- Johnson V. Grade Inflation: A Crisis in College Education. Secaucus, NJ: Springer; 2003. [Google Scholar]
- Jonsson A, Svingby G. The use of scoring rubrics: reliability, validity and educational consequences. Educ Res Rev. 2007;2:130–144. [Google Scholar]
- Kitchen E, King SH, Robison DF, Sudweeks RR, Bradshaw WS, Bell JD. Rethinking exams and letter grades: how much can teachers delegate to students? CBE Life Sci Educ. 2006;5:270–280. [PMC free article] [PubMed] [Google Scholar]
- Kohn A. Punished by Rewards: The Trouble with Gold Stars, Incentive Plans, A's, Praise, and Other Bribes. New York: Houghton Mifflin Harcourt; 1999. [Google Scholar]
- Los Angeles Times. Los Angeles Times, October 4, 2009. 2009. Study critiques schools over subjective grading. //articles.latimes.com/2009/oct/04/nation/na-grading-policy4 (accessed 15 April 2014) [Google Scholar]
- MacDonald RB. Developmental students’ processing of teacher feedback in composition instruction. Rev Res Dev Educ 8(5), 1–5. 1991 [Google Scholar]
- Marble WO, Winne PH, Martin JF. Science achievement as a function of method and schedule of grading. J Res Sci Teach. 1978;15:433–440. [Google Scholar]
- Meadows M, Billington L. Unpublished AQA report produced for the National Assessment Agency. 2005. A review of the literature on marking reliability. //archive.teachfind.com/qcda/orderline.qcda.gov.uk/gempdf/184962531X/QCDA104983_review_of_the_literature_on_marking_reliability.pdf. [Google Scholar]
- Meyer M. The grading of students. Science. 1908;28:243–250. [PubMed] [Google Scholar]
- National Education Association. Reporting pupil progress to parents. Res Bulletin. 1971;49:81–83. (October) [Google Scholar]
- Nicol DJ, Macfarlane-Dick D. Formative assessment and self-regulated learning: a model and seven principles of good feedback practice. Stud High Educ. 2006;31:199–218. [Google Scholar]
- Oettinger GS. The effect of nonlinear incentives on performance: evidence from “Econ 101.” Rev Econ Stat. 2002;84:509–517. [Google Scholar]
- Palmer B. E Is for Fail. (2010). www.slate.com/articles/news_and_politics/explainer/2010/08/e_is_for_fail.html (accessed 19 February 2014)
- Paxton M. A linguistic perspective on multiple choice questioning. Assess Eval High Educ. 2000;25:109–119. [Google Scholar]
- Pulfrey C, Buchs C, Butera F. Why grades engender performance-avoidance goals: the mediating role of autonomous motivation. J Educ Psychol. 2011;103:683. [Google Scholar]
- Reddy YM, Andrade H. A review of rubric use in higher education. Assess Eval High Educ. 2010;35:435–448. [Google Scholar]
- Rocca KA. Student participation in the college classroom: an extended multidisciplinary literature review. Commun Educ. 2010;59:185–213. [Google Scholar]
- Rogers WT, Harley D. An empirical comparison of three-and four-choice items and tests: susceptibility to testwiseness and internal consistency reliability. Educ Psychol Meas. 1999;59:234–247. [Google Scholar]
- Rojstaczer S, Healy C. Where A is ordinary: the evolution of American college and university grading, 1940–2009. Teachers College Rec. 2012;114(7):1–23. [Google Scholar]
- Sadler PM, Good E. The impact of self-and peer-grading on student learning. Educ Assess. 2006;11:1–31. [Google Scholar]
- Schneider J, Hutt E. Making the grade: a history of the A–F marking scheme. J Curric Stud. 2013:1–24. [Google Scholar]
- Scouller K. The influence of assessment method on students’ learning approaches: multiple choice question examination versus assignment essay. High Educ. 1998;35:453–472. [Google Scholar]
- Seymour E, Hewitt N. Talking about Leaving: Why Undergraduates Leave the Sciences. Boulder, CO: Westview; 1997. [Google Scholar]
- Sinclair HK, Cleland JA. Undergraduate medical students: who seeks formative feedback? Med Educ. 2007;41:580–582. [PubMed] [Google Scholar]
- Smallwood ML. An Historical Study of Examinations and Grading Systems in Early American Universities: A Critical Study of the Original Records of Harvard, William and Mary, Yale, Mount Holyoke, and Michigan from Their Founding to 1900, vol. 24. Cambridge, MA: Harvard University Press; 1935. [Google Scholar]
- Spear M. The influence of halo effects upon teachers’ assessments of written work. Res Educ. 1996;1996(56):85–86. [Google Scholar]
- Spear MG. The biasing influence of pupil sex in a science marking exercise. Res Sci Technol Educ. 1984;2:55–60. [Google Scholar]
- Stanger-Hall KF. Multiple-choice exams: an obstacle for higher-level thinking in introductory science classes. CBE Life Sci Educ. 2012;11:294–306. [PMC free article] [PubMed] [Google Scholar]
- Starch D. Reliability and distribution of grades. Science. 1913;38:630–636. [PubMed] [Google Scholar]
- Starch D, Elliott EC. Reliability of the grading of high-school work in English. School Rev. 1912;20:442–457. [Google Scholar]
- Starch D, Elliott EC. Reliability of grading work in mathematics. School Rev. 1913;21:254–259. [Google Scholar]
- Stiles E. The Literary Diary of Ezra Stiles … President of Yale College. New York: Scribner's; 1901. [Google Scholar]
- Swinton OH. The effect of effort grading on learning. Econ Educ Rev. 2010;29:1176–1182. [Google Scholar]
- Tobias S. They’re Not Dumb, They’re Different: Stalking the Second Tier. Tucson, AZ: Research Corporation; 1990. [Google Scholar]
- Towns MH, Robinson WR. Student use of test-wiseness strategies in solving multiple-choice chemistry examinations. J Res Sci Teach. 1993;30:709–722. [Google Scholar]
- Wainer H, Thissen D. Combining multiple-choice and constructed-response test scores: toward a Marxist theory of test construction. Appl Measure Educ. 1993;6:103–118. [Google Scholar]
- Weaver MR. Do students value feedback? Student perceptions of tutors’ written responses. Assess Eval High Educ. 2006;31:379–394. [Google Scholar]
- Weigle SC. Investigating rater/prompt interactions in writing assessment: quantitative and qualitative approaches. Assessing Writing. 1999;6:145–178. [Google Scholar]
- Weld LD. A standard of interpretation of numerical grades. School Rev. 1917;25:412–421. [Google Scholar]
- Yale University. 2013. Revised Report of the Ad Hoc Committee on Grading. //yalecollege.yale.edu/sites/default/files/2_Report%20from%20Ad%20Hoc%20Committee%20on%20Grading%5B2%5D.pdf (accessed 15 April 2014) [Google Scholar]
- Zimmerman DW, Williams RH. A new look at the influence of guessing on the reliability of multiple-choice tests. Appl Psychol Measure. 2003;27:357–371. [Google Scholar]
Articles from CBE Life Sciences Education are provided here courtesy of American Society for Cell Biology