Chapter Three
Desired Attributes of the Assessment

Some Assessment Issues

NAEP functions to provide information about (1) the knowledge and scientific understanding of the nation's youth and (2) the features of science education programs that relate to high levels of student achievements. Consequently, the design plan for the NAEP Science Assessment includes strategies for measurement in both of these areas. Research has produced practical and theoretical knowledge that is important to the assessment design process.

The Nature of Testing and the Nature of Knowledge and Learning

Except for early assessments, NAEP science tests have consisted primarily of short-answer, paper-and-pencil questions that were mostly multiple choice. The previous assessments tended to focus on discrete components of science, each of which was usually learned independently of the others. Hence, tests were made up of independent items, each comparable to the others and weighted the same.

The Framework is based on a different view. It holds that scientific knowledge should be organized to provide a structure that connects and creates meaning for factual information, and that this organization is influenced by the context in which the knowledge is presented. Learning is perceived as an activity in which the learner interacts with the physical world, with peers and teachers, and with the scientific community. In this view, science proficiency depends upon the ability to know facts and integrate them into larger constructs and to use the tools, procedures, and reasoning processes of science for an increased understanding of the natural world.

Rather than concentrate on facts in isolation, assessments will reflect the organization and structure of scientific knowledge and the nature of learning in science. Because scientific knowledge is expanding faster than can be accommodated by any curriculum, teachers and assessment designers must make choices about what topics, concepts, and factual information to address. Consequently, the 1996 and 2000 NAEP Science Assessment Framework concentrates on assessing students' ability to relate basic facts and concepts as well as their ability to discuss and evaluate approaches to science-related problems. The Framework also stresses that an assessment of what students know and can do must employ techniques reflective of the nature of science.

Inferring Understanding From Student Responses

Test items present students with tasks that may require a range of responses, from recall of factual information to performance of scientific investigations to complex reasoning. Based on analysis of the responses, experts in the field make inferences about students' understanding -- that is, the knowledge and reasoning skills that are assumed to have produced the responses. The validity of these inferences is a central issue in assessment.

Responses to exercises designed to assess thinking or mental processing are generally more difficult to interpret than responses to items designed to assess factual knowledge. In practice, basing an assessment of the quality of mental processing on short respon-ses is problematic. Often the decision about the mental processes applied is based only on the accuracy of the factual knowledge in the answer. When the answer is factually correct, the observer infers that the mental processes represent scientific reasoning (for example, those mental processes that are necessary to understand information in the test question stem, to retrieve scientific knowledge from memory, to reason from the stem to the correct response, or to eliminate incorrect responses). But this is not necessarily a valid inference. An incorrect answer may be the result of misinformation, not flawed reasoning; a correct answer is not necessarily the product of sound reasoning. Illogical thinking or using a wrong assumption or incorrect information can produce a seemingly correct answer. Furthermore, a correct answer may not require any higher order thinking at all; it simply may have been recalled. Only if the student response includes some indication of how the answer was obtained will those who score the assessment have information from which to choose among alternative interpretations.

Science entails observing objects and phenomena in the natural world and collecting and interpreting information about them. For this reason, pencil-and-paper tests have been criticized as too limited for assessing what students know and can do in science. Over the past several years, groups have developed assessment exercises that engage students in "performance tasks" using scientific equipment and materials. Student responses are recorded by an observer or by the students themselves in written form. However, these exercises have limitations as well.

Indeed, in the course of developing the new Framework, numerous examples of performance exercises were examined with respect to science concepts and reasoning, but many did not stand up to rigorous analysis. These exercises might have led science teachers and educators to draw faulty inferences.

The following is typical of a performance exercise that is counterproductive: An exercise requires a student to identify several unknown substances by means of indicators, but the student is given minutely detailed directions for performing each step in the identification process. Unfortunately, even if the answers are correct, the only inference to be drawn is that the student can follow written instructions. A test item formulated with such detailed step-by-step directions reduces to zero the science understanding needed for problem solving.

Many of the so-called performance assessment tasks that were reviewed turned out to be standard laboratory exercises that, again, were reduced to "follow-the-instructions" problems. No inferences about a student's knowledge of science or its tools and procedures can be drawn from such exercises. To test higher order thinking skills -- a major goal of performance assessments -- problems need to be placed in new contexts, applied to new situations, or have new elements introduced that preclude students' recalling what they have done before (Resnick, 1987).

Class, culture, ethnicity, gender, language ability, and access to quality instruction may influence the manner in which science is learned and the manner in which science attitudes and knowledge are produced. Hence, individuals need opportunities to demonstrate knowledge or competencies in different contexts.

Assessment techniques that show group differences are more likely to reveal problems with student learning and classroom instruction than with assessment, per se. However, this does not eliminate the assessment community's responsibility to the broader society. In addressing the issues of pluralism, multiple assessment methods may be more effective than any one method -- no matter how well it is developed.

Developments in Assessment

Currently, some state pilot-testing efforts are providing new ideas about assessment exercise and task formats. These pilot activities are also aimed at assessing new types of information relevant to new curriculum guides emerging in the states.

The experimental assessment work (much of it pioneered in the United Kingdom) uses new approaches, including performance tasks, open-ended tasks, and new types of multiple-choice questions that are thematic or conceptual or that ask students to explain their choices in short written answers. Moreover, state assessments are experimenting with collecting information on other meaningful outcomes: sustained student work, proficiency in designing and conducting experiments, and fluency of ideas critical to the natural sciences and related fields. They are also experimenting with innovative reporting approaches. The performance measures are created through consensual judgment about what students should know and be able to do at given grade levels or developmental stages.

Criteria for Assessing Learning and Achievement in Science

The Framework for the 1996 and 2000 NAEP Science Assessments has been developed according to the following broad guidelines:

  • By focusing on meaningful knowledge and skills, NAEP should be a force in fostering progress as well as measuring it, enabling more students to learn more science.

  • A range of assessment means must be used, including some in which the student is required to create and construct, not just to recognize and respond.

  • Assessment exercises should challenge students at developmentally appropriate levels to:

    • Explain commonplace natural phenomena using appropriate scientific theory, principles, and concepts.

    • Plan the investigation of a novel scientific problem.

    • Demonstrate understanding of the basic knowledge structures of science by using the appropriate techniques to connect concepts to one another and to the theory within which they are embedded.

    • Demonstrate some understanding of pervasive crosscutting themes in science.

    • Solve practical problems by using the appropriate theories, principles, concepts, and techniques of science.

  • Assessments must be sensitive to the need and ability of students to function in a variety of contexts.

  • Assessment exercises should use a variety of formats to allow students to display the wide range of competencies expected as the outcomes of science education.

  • Assessment tasks that are larger than single items should be analyzed in multiple ways, not restricted to providing information for single scales.

  • Test results should not be normalized -- that is, students' outcomes should not be manipulated to fit a normal distribution curve.

  • Assessments must have enough questions about enough topics to explore students' knowledge in depth.

Achievement Levels in Science

Achievement levels describe how well students should perform on the content and thinking levels required by the assessment. They evaluate the quality of the outcomes of students' education in science at grades 4, 8, and 12 as measured by NAEP.

Three achievement levels -- Basic, Proficient, and Advanced -- have been defined for each grade level assessed by the National Assessment Governing Board.

Basic denotes partial mastery of prerequisite knowledge and skills that are fundamental for proficient work at each grade. Proficient represents solid academic performance for each grade assessed. Students reaching this level have demonstrated competency over challenging subject matter, including subject-matter knowledge, application of such knowledge to real-world situations, and analytical skills appropriate to the subject matter. Advanced represents superior performance.

Appendix A lists the final achievement level descriptions for students participating in the 1996 and 2000 NAEP Science Assessment in grades 4, 8, and 12. The assessment was constructed to measure and report student performance according to the three levels of achievement, as required by NAEP policy.

Previous Contents Next


Science Framework for the 1996 and 2000 National Assessment of Educational Progress