|
Chapter Four Characteristics of Assessment Exercises Innovative assessments in the United States and other countries use three major item types: performance exercises, open-ended paper-and-pencil exercises, and multiple-choice items probing understanding of conceptual and reasoning skills. A fourth type often added to time-limited tests is the collection and evaluation of portfolios of student work done in the course of instruction. There is also an emerging assessment technique that involves two-phase testing. The following sections discuss these exercise types and provide some guidelines for the amount of assessment time to be devoted to each. Further details are provided in the Specifications and Reporting Formats and Issues documents. In performance exercises, students actually manipulate selected physical objects and try to solve a scientific problem about the objects before them. Although various types of performance tests have been piloted extensively, their standardization and administration differ widely. One method for ensuring uniform administration is the use of standardized performance test kits, with each test proctored and scored by trained personnel. Depending on the objectives established for the assessment, student answer sheets can also be used to provide responses to be scored. To adequately measure the goals outlined for the assessment, performance items generally should make up at least 30 percent of the assessment, as measured by student response time. An extra period of time (20 or 30 minutes) may be necessary for students who have been assigned to perform complex tasks. The shortcomings of many performance tasks currently being used were discussed in the preceding chapter. How can a performance exercise be designed so that it meets criteria for assessing science concepts and their relations? The exercise should be meaningful and not a context-free laboratory problem. Personal context, for example, is seen in the following problem: "You have just been given this new drink. It is claimed to be sugar- and calorie-free. What can you find out about the accuracy of these claims with these indicators?" Students can be given the names and procedures for the safe use of each indicator, but no information about their scientific use. The students would have to know, for example, that iodine solution is a test for starch and what the negative and positive reactions are. The students would also have to know that if fats, proteins, or carbohydrates are present, they will yield calories. They would have to plan how to conduct an investigation of the unknown in such a way as not to waste the materials, to be able to repeat the investigation when they believe their procedure is faulty, and to have enough solution left to replicate the investigation for verification. They would also have to design their data-collecting procedures. Finally, they would need to interpret and justify their findings. The questions asked of students as part of a performance exercise need to enable the students to display understanding and to justify interpretations. Such questions as "What substance is in the unknown?" or "How far did the dye diffuse?" do not elicit responses that demonstrate understanding. If students need additional information to carry out the task, they could be asked before they begin if they would like any other materials and why. If they request known substances for each indicator to refresh their memories, those could be provided. Such a request demonstrates one aspect of understanding the processes of science -- knowing what one doesn't know and how to acquire more information. Scoring for such a problem could give points for science knowledge, for laboratory procedures, and (if using an observer) for a systematic approach to problem solving in contrast to a trial-and-error or random approach. Open-Ended Paper-and-Pencil Items Open-ended items that require written responses provide particularly useful insights into students' levels of conceptual understanding. They can also be used to assess students' abilities to communicate in the sciences. In addition, open-ended items, if carefully crafted, can be used to reflect students' abilities to generate rather than recognize information related to scientific concepts and their interconnections. Open-ended items should make up at least 50 percent of the assessment, as measured by student response time. About one-third of the open-ended questions should consist of extended response items. The 1996 and 2000 NAEP Science Assessment will send important messages about science curriculum and classroom instruction. The use of multiple-choice items should be considered carefully because they are often overused to test low-level recall. Balanced with other item types, however, multiple-choice items are worthwhile for measuring knowledge of important facts and concepts as well as deductive reasoning skills. Multiple-choice items should make up no more than 50 percent of the assessment, as measured by student response time. Performance exercises, open-ended paper-and-pencil items, and multiple-choice items could produce responses less subject to faulty interpretation if students are given an opportunity to explain their responses, their reasoning processes, or their approach to a problem. Hence, an assessment should afford this opportunity. But care must be taken, particularly with fourth graders, that language ability is not confounded with science ability. This caution also applies to the more complex multiple-choice items needed to probe conceptual understanding. The new NAEP Science Assessment, to be consonant with current reform efforts in science education, needs to probe students' depth of knowledge and scientific understanding. For this reason, it is recommended that for at least half the students sampled, the assessment include an indepth examination involving a single problem or topic. The format could be a set of linked performance tasks, open-ended paper-and-pencil exercises, multiple-choice items, or a combination thereof. Pending modification after pilot testing, the suggested time to be spent by students on this type of exercise is 10 minutes for grade 4; 20 minutes for grade 8; and 30 minutes for grade 12. Multiple approaches need to be tried in the pilot testing of the assessment exercises. It would be especially useful to test the same concept(s) and performance skill(s) in different ways to see which method provides the richest, most reliable, and valid information. For example, if an open-ended question can easily be turned into a multiple-choice question without losing its intent and validity, it should be multiple choice. Open-ended questions should tap skills and knowledge that are truly "open" -- probing for the integrated application of relevant knowledge, not for the recall of a series of unconnected facts. The following additional issues need to be investigated during pilot testing:
The planning committee responsible for developing the new NAEP Science Assessment Framework was concerned that the nature and specifics of the Framework be faithfully mirrored in the actual instrument of the assessment. The committee therefore recommended that a detailed review of individual items and of each proposed assessment as a whole be conducted at the conclusion of the following four stages:
Special studies are often recommended as a part of the National Assessment process because new or emerging techniques offer promise and, if they yield useful information, will make a positive contribution to the assessment process. Special studies are a part of the main NAEP process and are usually reported along with the results from the national sample. In 1996, a special study was carried out to assess students with advanced training in science. Past NAEP science assessments have been criticized for having too low a ceiling -- that is, not including an adequate number of items at advanced levels of difficulty. As a result, NAEP tests are assumed not to have reflected what the best prepared students know or can do in science. This issue has become more serious since the formulation of the National Education Goals, particularly goal 4: "By the year 2000, U.S. students will be first in the world in science and mathematics achievement." An international assessment conducted in 1995 included students who are concentrating on science in high school. It would be highly desirable to have similar results from NAEP so that reports on progress toward the National Education Goals will be based on more than one study. Suggestions on how to include the most competent students in this special study included sampling students who are taking advanced courses in science. Assessments must contain a sufficient number of challenging exercises to measure what these "best" students know and are able to do. It was recommended, therefore, that a special study be conducted in 1996 with a subsample of the national NAEP sample to determine whether this is a useful approach to establish the achievement and performance of the best science students.
|
||||||||