Output Education

Education Blog

Criterion-Referenced Test

Criterion-Referenced Test

6 April, 2016

Criterion-Referenced Test is designed to measure student performance against a fixed set of predetermined criteria or learning standards—i.e., concise, written descriptions of what students are expected to know and be able to do at a specific stage of their education.

  • In elementary and secondary education, criterion-referenced tests are used to evaluate whether students have learned a specific body of knowledge or acquired a specific skill set. For example, the curriculum taught in a course, academic program, or content area.If students perform at or above the established expectations—for example, by answering a certain percentage of questions correctly—they will pass the test, meet the expected standards, or be deemed “proficient.”
  • Every student taking the exam could theoretically fail if they don’t meet the expected standard; alternatively, every student could earn the highest possible score.
  • It is not only possible, but desirable, for every student to pass the test or earn a perfect score.
  • Criterion-referenced tests have been compared to driver’s-license exams, which require would-be drivers to achieve a minimum passing score to earn a license.
  • Examples: Advanced Placement exams and the National Assessment of Educational Progress wherein scores are typically expressed as a percentage.


  1. Criterion-referenced tests may include multiple-choice questions, true-false questions, “open-ended” questions (e.g., questions that ask students to write a short response or an essay), or a combination of question types.
  2. Individual teachers may design the tests for use in a specific course, or they may be created by teams of experts for large companies that have contracts with state departments of education.
  3. May be high-stakes tests—i.e., tests that are used to make important decisions about students, educators, schools, or districts—or they may be “low-stakes tests” used to measure the academic achievement of individual students, identify learning problems, or inform instructional adjustments.

Objectives of Criterion-referenced Tests

  1. To determine whether students have learned expected knowledge and skills. If the criterion-referenced tests are used to make decisions about grade promotion or diploma eligibility, they would be considered “high-stakes tests.”
  2. To determine if students have learning gapsor academic deficits that need to be addressed.
  3. To evaluate the effectiveness of a course, academic program, or learning experienceby using “pre-tests” and “post-tests” to measure learning progress over the duration of the instructional period.
  4. To evaluate the effectiveness of teachers by factoring test results into job-performance evaluations.
  5. To measure progress toward the goals and objectives described in an “individualized education plan” for students with disabilities.
  6. To determine if a student or teacher is qualified to receive a license or certificate.
  7. To measure the academic achievement of students in a given state, usually for the purposes of comparing academic performance among schools and districts.
  8. To measure the academic achievement of students in a given country, usually for the purposes of comparing academic performance among nations.


  1. To hold schools and educators accountable for educational results and student performance.In this case, test scores are used as a measure of effectiveness, and low scores may trigger a variety of consequences for schools and teachers.
  2. To evaluate whether students have learned what they are expected to learn.In this case, test scores are seen as a representative indicator of student achievement.
  3. To identify gaps in student learning and academic progress.Test scores may be used, along with other information about students, to diagnose learning needs so that educators can provide appropriate services, instruction, or academic support.
  4. To identify achievement gaps among different student groups.Students of color, students who are not proficient in English, students from low-income households, and students with physical or learning disabilities tend to score, on average, well below white students from more educated, higher income households on standardized tests. In this case, exposing and highlighting achievement gaps may be seen as an essential first step in the effort to educate all students well, which can lead to greater public awareness and resulting changes in educational policies and programs.
  5. To determine whether educational policies are working as intended. Elected officials and education policy makers may rely on standardized-test results to determine whether their laws and policies are working as intended, or to compare educational performance from school to school or state to state. They may also use the results to persuade the public and other elected officials that their policies are in the best interest of children and society.


  1. The tests are better suited to measuring learning progress than norm-referenced exams, and they give educators information they can use to improve teaching and school performance.
  2. The tests are fairer to students than norm-referenced tests because they don’t compare the relative performance of students; they evaluate achievement against a common and consistently applied set of criteria.
  3. The tests apply the same learning standards to all students, which can hold underprivileged or disadvantaged students to the same high expectationsas other students. Historically, students of color, students who are not proficient in English, students from low-income households, and students with physical or learning disabilities have suffered from lower academic achievement, and many educators contend that this pattern of underperformance results, at least in part, from lower academic expectations. Raising academic expectations for these student groups, and making sure they reach those expectations, is believed to promote greater equity in education.
  4. The tests can be constructed with open-ended questions and tasks that require students to use higher-level cognitive skills such as critical thinking, problem solving, reasoning, analysis, or interpretation. Multiple-choice and true-false questions promote memorization and factual recall, but they do not ask students to apply what they have learned to solve a challenging problem or write insightfully about a complex issue, for example.


  1. The tests are only as accurate or fair as the learning standards upon which they are based. If the standards are vaguely worded, or if they are either too difficult or too easy for the students being evaluated, the associated test results will reflect the flawed standards. A test administered in eleventh grade that reflects a level of knowledge and skill students should have acquired in eighth grade would be one general example. Alternatively, tests may not be appropriately “aligned” with learning standards, so that even if the standards are clearly written, age appropriate, and focused on the right knowledge and skills, the test might not designed well enough to achievement of the standards.
  2. The process of determining proficiency levels and passing scores on criterion-referenced tests can be highly subjective or misleading—and the potential consequences can be significant, particularly if the tests are used to make high-stakes decisions about students, teachers, and schools. Because reported “proficiency” rises and falls in direct relation to the standards or cut-off scores used to make a proficiency determination, it’s possible to manipulate the perception and interpretation of test results by elevating or lowering either standards and passing scores. And when educators are evaluated based on test scores, their job security may rest on potentially misleading or flawed results. Even the reputations of national education systems can be negatively affected when a large percentage of students fail to achieve “proficiency” on international assessments.
  3. The subjective nature of proficiency levels allows the tests to be exploited for political purposes to make it appear that schools are either doing better or worse than they actually are. For example, some states have been accused of lowering proficiency standards of standardized tests to increase the number of students achieving “proficiency,” and thereby avoid the consequences—negative press, public criticism, large numbers of students being held back or denied diplomas (in states that base graduation eligibility on test scores)—that may result from large numbers of students failing to achieve expected or required proficiency levels.
  4. If the tests primarily utilize multiple-choice questions—which, in the case of standardized testing, makes scoring faster and less expensive because it can be done by computers rather than human scorers—they will promote rote memorization and factual recall in schools, rather than the higher-order thinking skills students will need in college, careers, and adult life. For example, the overuse or misuse of standardized testing can encourage a phenomenon known as “teaching to the test,” which means that teachers focus too much on test preparation and the academic content that will be evaluated by standardized tests, typically at the expense of other important topics and skills.