Welcome to APES‎ > ‎The Exam‎ > ‎

Exam Scoring Guide

 

What an AP Grade Means

 

The end product of the AP Exams are the AP grades that are reported to students, their schools, and their designated colleges in July. The colleges use these grades as evidence of the students' abilities and achievements when they make their decisions regarding whether or not to grant credit and/or advanced placement.

 

The AP grade scale ranges from 5 to 1:

 

Grade Meaning

  • 5 Extremely well qualified
  • 4 Well qualified
  • 3 Qualified
  • 2 Possibly qualified
  • 1 No recommendation

 

How is a grade determined?

 

Translating exam performance into a grade.

 

  1. The multiple-choice answer sheets are scored by computer.
  2. The free-response questions are scored at the annual AP Reading by Readers.
  3. The composite score is calculated.
  4. The composite score is converted to an AP grade.

 

The multiple-choice answer sheets are scored by computer.

 

This is how the process works:

 

  1. Each answer sheet is run through an electronic scanner. This transfers the information directly to cartridges, creating a record for that sheet.
  2. The scanning cartridge is processed by computer. The computer program checks each record for invalid or missing identification data and scores the student's responses.
  3. The computer counts how many answers the student got wrong, then deducts a fraction of that number from the number of right answers. For exams with five-choice items, the fraction is one quarter; for those with four-choice items, it is one third. This type of scoring is appropriate for tests where students are not expected to have mastered all of the material that might be tested. With this procedure, the average multiple-choice score under purely random guessing is zero.
  4. The total score is now rounded to the nearest whole number; if the score falls halfway between two whole numbers, it is rounded upward. If the student scores less than zero as a result of the correction for guessing, the score is replaced with a zero.
  5. Finally, the computer creates a record for the student, containing his or her total multiple-choice score, and any subsection scores needed for calculating the composite score.
  6. After the 2002 administration, nearly 938,000 answer sheets were returned to ETS and then scanned in approximately two weeks.

 

The free-response questions are scored at the annual AP Reading by Readers.

 

Unlike the multiple-choice section, which is scored by machine, the free-response section is scored by Readers, and is a highly labor-intensive process. In June every year, approximately 6,000 Readers come together at the AP Reading, held at various sites throughout the United States, to evaluate over five million essays, solutions to extended problems, audiotaped responses, and works of art. Because it is essential that each Reader scores his or her papers fairly, uniformly, and to the same standard as the other Readers, a great deal of attention is paid to the creation of detailed scoring guidelines, the thorough training of all Readers, and various 'checks and balances' applied throughout the AP Reading.

 

Creating the Scoring Guidelines

 

During the creation of the free-response questions, the Development Committees propose some preliminary scoring standards.

Before the AP Reading, the Chief Reader prepares a draft of the scoring guidelines for each free-response question, based on suggestions from the committee.

The Chief Reader, Question Leaders, Table Leaders, and ETS content experts meet at the Reading site. They review and revise the draft scoring guidelines, and test them by prescoring randomly selected student papers. A set of suitable student papers is chosen for use in training the Readers.

The Chief Reader, Question Leaders, and Table Leaders conduct training sessions for each free-response question, which are attended by all the Readers who are scoring that question. If problems or ambiguities become apparent, the scoring guidelines are revised and refined until a final consensus is reached. Only then does the actual grading of papers begin.

 

 

The Training Process
 

Since the training of the Readers is so vital in ensuring that students receive a grade that accurately reflects their performance, the process is thorough. Although it varies from subject to subject, this is a common scenario:

 

 

The Readers who will grade a particular question meet as a group with the leaders for that question to discuss the question and acceptable correct responses.

The group reviews the scoring guidelines, adding refinements based on their discussion.

The Readers read sample papers that have been prescored (see above). These samples reflect all levels of ability.

Each group of Readers then compares and discusses the scores for the samples, based on the scoring guidelines.

The groups are then presented with other packets of photocopied papers which have also been prescored; the scores, however, are not marked on the samples. The groups work together to assign an appropriate score, and then the question or table leader reveals the original scores. Discussion then follows about the basis for the score.

This process is repeated with other papers that have been chosen, either because they demonstrate a particular level of mastery, or because they contain particular features that will help Readers understand the scoring scale more clearly. As the teams become more proficient in their use of the scale, they are presented with papers that are in some way more problematic. Scores and differences in judgment are discussed until agreement is reached, with the Question Leaders, the Table Leaders, or the Chief Reader acting as arbitrator when needed.

After a team shows consistent agreement on its scores, its members proceed to score individually. Readers are encouraged to seek advice from each other, the Question Leaders and Table Leaders, or the Chief Reader when in doubt about a score. A student response that is problematic receives multiple readings and evaluations.

Checks and Balances

Various steps are taken to ensure that grading is done fairly and consistently.

 

The student's identification information is covered: at the exam administration, students are instructed to cover their names and other identifying information on the exam booklet by sealing a special flap.

Clerical aides at the Reading record data and handle paper flow, thus freeing Readers from these duties and enabling them to concentrate on the scoring of student papers. They also randomly distribute books and tapes so that materials from any particular school are scored by a wide variety of people.

All scores given by other Readers are completely masked. The clerical aides conceal any previous scores on the books and tapes so that no outside influence might prejudice the Readers.

On smaller volume AP subjects that have a larger number of short-answer free-response questions, a Reader might read two responses of a given student. A potential problem with this is that a Reader could give an answer a higher or lower score than it deserves because the same student has performed well or poorly on other questions. To avoid this so-called "halo effect," in most cases each student's question is read by a different Reader. This also ensures that any idiosyncrasies a particular Reader may have -- any tendency to be stricter or more lenient than others, or any fondness or antipathy for the student's writing style or approach to a problem -- will not affect a student's score on more than one of the free-response questions.

Using these practices permits each Reader to evaluate free-response answers without being prejudiced by knowledge about individual students. Here are some other methods that may be used to help ensure that everyone is adhering closely to the scoring guidelines:

 

The entire group discusses prescored papers each morning, and as necessary during the day.

Table leaders review some of each of their Readers' scores, to ensure that everyone is applying the scoring guidelines in a similar manner.

"Spot checks," in which the same paper is read by more than one Reader in the group, are conducted on a regular basis. These checks allow individual scores to be compared, and provide information on retraining needs.

Each Reader is asked at least once to rescore a set of selected papers that he or she has already scored, without seeing the previously assigned score. When differences between the original and rescored evaluations occur, the Reader reconsiders the final score, perhaps in consultation with colleagues or the question leader.

The Chief Reader and the Question Leaders monitor use of the full range of the scoring scale for the group and for each Reader by checking daily graphs of score distributions. Currently, 21 of the largest volume AP subjects use the computerized Reader Management System (RMS):

 

Sets of 25 free-response booklets are placed in folders.

Readers record student scores on computer-generated forms.

The scored forms are fed through a scanner.

Twice a day, the Chief Reader and Table Leaders receive statistical information concerning the scores given by each Reader. This allows table leaders to give quick feedback to people who may need to "recalibrate" standards.

Scoring Reliability Studies

In addition to monitoring Readers' performance during the Reading, reliability of scoring is looked at from a more general point of view when special studies are undertaken to examine aspects of the reading process. Areas that might be looked at include, for example:

the reliability of single reading of essays;

time-of-day/day-of-week impact on the reliability of the scoring; and

consistency of scoring from one group of Readers to another.

Results from these studies provide input toward decisions regarding the timing and length of breaks for Readers, the amount of training required, the nature of the scoring guidelines used to score the free-response questions, and the frequency of consistency checks during the Reading. You can see the results of some scoring reliability studies in Table 3.2.

 

The composite score is calculated.

 

When the free-response section of an exam contains two or more parts, those parts are weighted according to a value assigned to them by the Development Committee. This allows the committee to place more importance on certain skills to correspond to their emphasis in the corresponding college course curriculum. Weighting also comes into play when looking at the multiple-choice section in comparison to the free-response section.

 

For each AP Exam, there is a formula for combining the scores for the multiple-choice and free-response sections or subsections into a maximum weighted score (composite score). Table 3.3 shows the composite-score formula for the 2003 Computer Science A Exam. Once the weights have been decided and the free-response section scored, computing each student's composite score is a purely mechanical process and is done by computer.

 

The composite score is converted to an AP grade.

 

Deciding on the cut-off point between each of the five grades is not a simple process. Because it can't be assumed that one AP Exam is as difficult as the previous year's exam, nor that the student group is equally strong, the statistical processes of equating and scaling are used to make adjustments to the cut-off scores each year. These adjusted cut-off scores are presented to the CRs along with other information about the students' performance on the exam. The Chief Reader then makes the final decision about the four-cut-off scores which determine the five AP grades.

 

For each AP Exam, a grade-setting session is held after the reading of the free-response sections has been completed. Participants at a typical grade-setting session include:

 

The Chief Reader and, if there is one, the Chief Reader Designate;

The ETS Director or Associate Director of the AP Program;

The College Board Director or Associate Director of the AP Program:

ETS content experts for each exam; and

An AP Program statistician.

Although these participants each provide assistance and advice, such as interpreting statistical evidence, the Chief Reader has the principal responsibility for establishing AP grades.

 

Continuity of AP standards is important, so that colleges can be confident that an AP grade of, say, 3 on this year's exam will represent, as nearly as possible, the same level of achievement as a grade of 3 on last year's exam. To choose grade boundaries that will maintain AP grading standards over time, the Chief Readers make use of the following types of evidence:

 

Statistical information based on common items (multiple-choice questions that were included in both the current exam and one or more previous exams). (See "More," below.)

College/AP Grade Comparability Studies. (See "More," below.)

The Chief Reader's own observations of students' free-response answers.

The distribution of scores on different parts of the exam.

AP grade distributions from the past three years. (See "More," below.)

Subscore Grades

For the Calculus BC and Music Theory Exams, in addition to an AP grade based on performance on the overall exam, students also receive subscore grades.

 

A Calculus AB subscore grade is reported for students who take the Calculus BC Examination, based on their performance on the portion of the exam devoted to AB topics (approximately 60 percent of the exam).

 

An aural and a non-aural subscore grade are reported for students who take the Music Theory exam based on their performance on the portion of the exam devoted to aural and non-aural material. Half of the exam consists of aural material and half consists of non-aural material.

 

Subscore grades are designed to give colleges and universities more information about the student. Although each college and university sets its own policy for awarding credit, placement, or both for AP Exams, it is recommended that institutions apply the same policy to the subscore grades that they apply to the overall grade.

 

Comments