The National Assessment of Educational Progress:
Design 2000-2010


National Assessment Redesign: A Summary and Status Report

Introduction: The Redesign Principles

Over its thirty-year history, the National Assessment has earned respect and credibility. The National Assessment is widely recognized for the comprehensiveness of its tests, the quality of its technical design, the accuracy of its reports, and innovation in its execution. The data produced by the National Assessment are unique. No other program provides regular reports on the status and progress of student achievement for our nation as a whole and that are comparable state-by-state.

Although its original purpose was to measure and report on the status of student achievement and on change over time, recognition of the quality and integrity of the National Assessment led to a multitude of demands and expectations beyond reporting on achievement. Meeting those expectations was done with good intentions and seemed right for the situation at the time. However, some additions that the National Assessment performs less effectively were "tacked on" to the original design.

The National Assessment was being asked to do too many things, some even beyond its reach to do well, and was attempting to serve too many audiences. For example, in contrast to the 1970's in which a single 120 page report on mathematics was deemed sufficient, the 1992 NAEP mathematics reports numbered seven and totaled about 1,800 pages.

The result of attempting to respond to demands beyond NAEP's central purpose was to overburden NAEP's design, drive up costs and reduce the number of subjects that could be tested. For example, the National Assessment tested two or three subjects each year during the 1970's, its first decade, but only every other year after the 1980's. Another indicator that NAEP had too many distractions was that results could be released as many as two to three years after testing. This simply was not acceptable, particularly with the advent of state-level assessments in the 1990's.

The Governing Board's solution was to focus NAEP on what it does best: measure and report on the status of student achievement and change over time. Focusing NAEP on what it does best would permit NAEP's design to be simplified and also would mean putting limits on demands that are outside NAEP's central purpose. Another part of focusing NAEP is to define the audience for reports. The Governing Board has determined that the NAEP program should not attempt to serve multiple audiences directly. The audience for reports should be the general public.

Specialized needs for NAEP data should be accommodated by making the NAEP data easily accessible for analysis by others-educators, researchers, policymakers, and the media, among others. In order to make data more understandable and useful to the general public, the Governing Board has determined that achievement levels, or performance standards, should be the primary means for reporting NAEP results.

Thus, five principles undergird the Governing Board's policy for the redesign of the National Assessment:

  • Conduct assessments annually, following a dependable schedule
  • Focus NAEP on what it does best
  • Define the audience for NAEP reports
  • Report results using performance standards
  • Simplify NAEP's technical design

Details on these and other aspects of the redesign policy follow.

Annual Schedule

A centerpiece of the National Assessment redesign is a dependable annual schedule of assessments through the year 2010 (Table 1). In the past decade, the focus on education reform, new and revised state assessments, and the national education goals have led to demand for National Assessment testing more frequently than the biennial schedule of the 1980's and most of the 1990's. The schedule for the period 1996 through 2010 was adopted in March 1997 and revised in November 1998. It provides for annual assessments at the national level and state-level assessments in even-numbered years. The long-term trend assessments in reading, writing, mathematics, and science continue on a once per four-year cycle beginning in 1999.

At the national level, grades assessed will be 4, 8 and 12. Subjects covered will be reading, writing, mathematics, science, geography, U.S. history, world history, civics, economics, foreign language, and the arts. These are the subjects listed in the current national education goals. Reading, writing, mathematics and science will be assessed once every four years. Other subjects will be assessed less frequently, but there will generally be two assessments in a subject over a ten-year period.

Testing at the state level will occur in even-numbered years, with reading and writing in grades 4 and 8 alternating with mathematics and science in grades 4 and 8. Student achievement results in these subjects and grades at the state level will be reported on a once per four-year basis.

Many of the other redesign policies, described below, are aimed at making the annual schedule affordable through cost-saving efficiencies.

Table 1. Schedule for the National Assessment of Educational Progress

The following schedule was adopted by the National Assessment Governing Board on March 8, 1997 and revised in November 1998. Assessments shown as scheduled for 1996, 1997, and 1998 were approved previously by the Board.

YEAR NATIONAL STATE
1996 Mathematics
Science
Long-term trend* (reading, writing, mathematics, science)
Mathematics (4, 8)
Science (8)
1997 Arts (8)
1998 Reading
Writing
Civics
Reading (4, 8)
Writing (8)
1999 Long-term trend*
2000 Mathematics
Science
Reading (4)
Mathematics (4, 8)
Science (4, 8)
2001 U.S. History
Geography

2002 Reading
Writing
Reading (4, 8)
Writing (4, 8)
2003 Civics
FOREIGN LANGUAGE (12)
Long-term trend*

2004 MATHEMATICS
Science
MATHEMATICS (4, 8)
Science (4, 8)
2005 WORLD HISTORY (12)
ECONOMICS (12)

2006 READING
Writing
READING (4, 8)
Writing (4, 8)
2007 ARTS
Long-term trend*

2008 Mathematics
SCIENCE
Mathematics (4, 8)
SCIENCE (4, 8)
2009 U.S. HISTORY
GEOGRAPHY

2010 Reading
WRITING
Reading (4, 8)
WRITING (4, 8)

Note: Grades 4, 8, and 12 will be tested unless otherwise indicated. Comprehensive assessments are indicated in BOLD ALL CAPS; standard assessments are indicated in upper and lower case (see page 10 for discussion of comprehensive and standard assessments).

* Long-term trend assessments are conducted in reading, writing, mathematics and science. These assessments provide trend data as far back as 1970 and use tests developed by the National Assessment at that time.

Status of Implementation:

The work in the new NAEP contracts covers the schedule as adopted by the Governing Board for the years 1999-2003. The long-term trend assessments in reading, writing, mathematics, and science will be conducted in 1999 and 2003. In 2000, mathematics and science assessments will be conducted in grades 4 and 8 at the state level and at grades 4, 8, and 12 at the national level. In addition, a reading assessment at grade 4 at the national level will be conducted. In 2001, geography and U.S. history assessments will be conducted at grades 4, 8, and 12 at the national level. In 2002, reading and writing assessments will be conducted at the state level in grades 4 and 8 and at the national level in grades 4, 8, and 12. In 2003, assessments will be conducted at the national level in civics in grades 4, 8, and 12 and in foreign language at grade 12.

Define the Audience for NAEP Reports

The expanded demands and expectations noted above reflected the many varied audiences that NAEP was attempting to serve. Trying to serve too many audiences has meant that no audience is optimally served by the National Assessment. The NAEP redesign policy makes the distinction between the audience for reports prepared by the NAEP program and the users of NAEP data. The audience for NAEP reports is the American public. The primary users of NAEP data are national and state policymakers, educators, and researchers.

This distinction in the policy between the audience for reports and users of data is important. It is intended to address the needs of various groups and individuals interested in NAEP results, while providing an appropriate division of labor between them and the federal government.

National Assessment reports released by the National Center for Education Statistics should be objective, providing the facts about the status and progress of student achievement. Providing objective information about student achievement is an appropriate federal role. Since the public is the primary audience, NAEP reports should be understandable, jargon free, easy to use, widely disseminated, and timely.

On the other hand, the redesign policy suggests that interpreting NAEP data (e.g., developing hypotheses about achievement from relationships between test scores and background questions) is a role that falls primarily to those outside the Department of Education -- the states that participate in NAEP, policymakers, curriculum specialists, researchers, and the media, to name a few. For the NAEP program itself to address the myriad of interests and questions of these diverse groups seems both impractical and inappropriate. However, the federal government should encourage and provide funds for a wide range of individuals and organizations with varied interests and perspectives to analyze NAEP data and use the results to improve education. This is the point of the redesign policy. Thus, the redesign policy provides that National Assessment data are to be made available in easily accessible forms to support the efforts of states and others to analyze the data, interpret results to the public, and improve education performance.

Status of Implementation:

The National Center for Education Statistics is placing a high priority on "highlight" reports and national report cards for each subject, which are aimed at the general public. NAEP data will be accessible through a new Internet web site, customized for particular data users. Priorities for NAEP secondary analysis grants were revised to encourage wider use of NAEP data by national and state policy makers, educators, and researchers and to focus the analyses on interpretive and education improvement purposes. Also, NCES is continuing to develop and provide training on software for analyzing NAEP data.

Report Results Using Performance Standards

In 1988, Congress created the Governing Board and authorized it to set performance standards -- called achievement levels -- for reporting National Assessment results. Under the redesign policy, achievement levels are to be used as the primary (although not exclusive) means for reporting National Assessment results. The achievement levels describe "how good is good enough" on the various tests that make up the National Assessment. Previously, the National Assessment reported average scores on a 500-point scale. There was no way of knowing whether a particular score represented strong or weak performance and whether the amount of change from previous years' assessments should give cause for concern or celebration. The National Assessment now also reports the percentage of students who are performing at or above "Basic," "Proficient," and "Advanced" levels of achievement.

The achievement levels have been the subject of several independent evaluations, some controversy, and conflicting recommendations. Recommendations have been carefully considered and some have been used to improve the standard-setting procedures. While the current procedures are based on what may be the nation's most widely used formal process for setting standards on tests, the Governing Board remains committed to making continual improvements.

Status of Implementation

The Governing Board will continue to set achievement levels for reporting NAEP results. These achievement levels are to be used on a developmental basis until a determination is made that the levels are reasonable, valid, and informative to the public. At that point, the developmental designation will be removed.

The Governing Board views standard setting as a judgmental, not a scientific, process. However, the process must be conducted in a manner that is technically sound and defensible. The Governing Board is preparing a report required by Congress to respond to the assertion that the process for setting the achievement levels is "flawed." This report will include a detailed plan for reviewing the criticisms and compliments found in the evaluation reports that studied the achievement levels. The plan also will address alternatives to the current level-setting procedures.

Simplify the Technical Design for the National Assessment

The current design of the National Assessment is very complex. The redesign policy requires that the research and testing companies that compete for the contract to conduct the National Assessment must identify options to simplify the design of the National Assessment. Examples of NAEP's complexity include: (1) National and state results are based on completely separate samples. (2) No student takes the complete set of test questions in a subject and as many as twenty-six different test booklets are used within a grade; thus scores on NAEP are calculated using very sophisticated statistical procedures. (3) Students, teachers, and principals complete separate background questionnaires, which may be submitted at different times, complicating their use in calculating assessment results. (4) The data for every background question collected must be compiled before any report can be produced, regardless of whether the data from the background question will be included in a report, lengthening the time from data collection to reporting.

Status of Implementation:

This is a "work in progress." Options for combining the national and state samples are being developed by the contractors in collaboration with NCES and the Governing Board. Similarly, options to reduce the size of the state sample are being considered. An option to increase the precision of the state results will be implemented in the year 2000 mathematics and science state assessments. Progress also has been made in shortening the time between data collection and reporting by eliminating the requirement to link certain background questionnaires to student achievement data. Plans for a short-form of the National Assessment, using a single test booklet, are being implemented. The purpose of the short-form test is to enable faster, more understandable initial reporting of results and, possibly, for states to have access to test instruments allowing them to obtain NAEP assessment results in years in which NAEP assessments are not scheduled in particular subjects. Plans also are in the development stage for improving the quality, relevance, and efficiency of background questionnaires.

Measure Student Achievement at Grades 4, 8, and 12

The primary purpose of the National Assessment is to measure student achievement at grades 4, 8, and 12 in academic subjects at the state and national level and for subgroups, showing trends over time in the percent of students at or above each achievement level. The subjects to assess are those listed in the national educational goals -- reading, writing, mathematics, science, U.S. history, geography, world history, civics, economics, the arts, and foreign language. Grades 4, 8 and 12 are considered to be important transition points in American education.

Although grade 12 performance is important as an "exit" measure from the K-12 system, there are problems with grade 12 results. The problems are that student and school participation rates and student motivation at grade 12 are low. The Governing Board has considered whether to change NAEP to another grade at the high school level, examining both anecdotal and empirical evidence. Anecdotal evidence about the low motivation of high school students taking low stakes tests in the spring of their senior year raises serious questions about whether NAEP should test at grade 12. However, the empirical evidence in NAEP does not indicate that switching to grade 11 would result in higher motivation on the part of students or greater accuracy in the results. In fact, there is some evidence that twelfth graders taking NAEP may try harder in some cases than eleventh graders. The redesign policy asks the companies that compete for the NAEP contract to find ways to increase school and student participation rates and student motivation. Until they increase, National Assessment reports should include clear caveats about interpreting grade 12 results.

Status of Implementation:

Because the empirical evidence does not warrant a change at this time, NAEP should continue to test at grade 12. New NAEP contracts have been awarded for the conduct of assessments through the year 2003. The contracts are designed to measure student achievement at grades 4, 8, and 12; report state, national, and subgroup results; report trends over time; and use performance standards for reporting results. Caveats for interpreting grade 12 results have been added to reports. However, more attention needs to be placed on improving grade 12 participation rates and student motivation. Toward this end, NCES is planning a series of studies to examine the relationship between student achievement and motivation, including studies of the relationship between courses taken and NAEP results.

What NAEP is Not Designed to Do

The NAEP redesign policy attempts to focus NAEP on what it does best. What the National Assessment does best is measure student achievement. Focusing NAEP on what it does best comes with a related idea -- recognizing and limiting what NAEP is not designed to do.

Although the National Assessment is well designed for measuring student achievement and trends over time, it is not a good source of data for drawing conclusions about or providing explanations for the level of performance that is reported. It also is not a measure of personal values, a national curriculum, an appropriate means for improving instruction in individual classrooms, or a basis for evaluating specific pedagogical approaches.

The National Assessment is what is known as a "cross-sectional survey," an effective and cost-efficient means for gathering data on student achievement. A cross-sectional survey gathers data at one point in time. In the case of NAEP, data are gathered on national and state-representative samples of students at a particular time during the school year. The sample is large enough to permit reasonably accurate estimates of subgroup performance (e.g., by gender, race, and ethnicity). Change over time can be measured by administering the same survey again in later years, under the same testing conditions, with samples of students that are similar to the ones tested earlier. Comparisons can be made within and across the subgroups and for the whole sample.

However, a cross-sectional survey cannot provide answers about what causes the level of performance that is reported. Measuring the causes of achievement would involve an experimental design, with specific research questions to answer, pre- and post-testing of students, and comparisons of results between groups of students receiving a particular educational approach with those that are not. While some may view such research as a worthwhile part of NAEP, the need for pre-and post-testing alone would double the costs of NAEP testing. Because pre- and post-testing would require additional administrative burden on schools and more time away from instruction for students, it could severely hamper school and student participation rates in NAEP, especially with NAEP's annual assessment schedule. Too few schools and students in the sample, in turn, would jeopardize NAEP's ability to provide national and state-representative student achievement results.

The best that can be done regarding explanation or interpretation of results is to report on background variables that may be associated with achievement. However, in many cases, the data from background questions collected by NAEP are inconclusive. Even where the associations are stronger, the data are not adequate for supporting conclusions that explain why achievement is at the level reported. Clearly, the use of NAEP background data to explain or interpret achievement results should be done with caution.

Status of Implementation:

Under the new NAEP contracts, the collection of background information will be more focused. The plan is to collect a well-defined core of background information. For example, the well-defined core of background information will include the data that are required for every assessment -- e.g., data on gender, race, ethnicity, whether the students are in public or private schools, etc. In addition, each assessment will have a set of background questions designed specifically for the subject being assessed, with each set being determined by policy. Therefore, the background questions for the mathematics assessment will vary from those for the science or reading assessments.

The intent is not only to be more purposeful about what is collected, but more strategic about how it is collected as well. For example, in the past, information on TV watching by students was collected regularly as a part of every assessment. In the same year, the same background questions could be asked of the students in each separate national sample. Clearly, where two or more subjects are being assessed in a particular year, it may not be necessary to ask identical questions across all of the assessments. Similarly, it may not be necessary to ask certain questions every year. In addition, the background questions themselves will be pilot tested to reduce the possibility of misinterpretation.

Reporting NAEP Results

The redesign policy provides that National Assessment results should be released with the goal of reporting results six to nine months after testing. Reports should be written for the American public as the primary audience and should be understandable, free of jargon, easy to use and widely disseminated. National Assessment reports should be of high technical quality, with no erosion of reliability, validity, or accuracy.

The amount of detail in reporting should be varied. Comprehensive reports would be prepared to provide an in-depth look at a subject the first time it is assessed using a newly adopted test framework, testing many students and collecting background information. Although scale scores also will be used, achievement levels shall continue to be the primary method for reporting NAEP results. Test questions, scoring guides, and samples of student work that illustrate the achievement levels -- Basic, Proficient, and Advanced-will receive prominence in reports. Data also would be reported by gender, race/ethnicity, socio-economic status, and for public and private schools; other reporting categories also are possible. Standard reports would be more modest, providing overall results in the same subject in subsequent years using achievement levels and average scores. Data could be reported by gender, race/ethnicity, socio-economic status, and for public and private schools, but would not be broken down further. The amount of background data collected and reported would be somewhat limited in comparison to a comprehensive report. Special, focused assessments on timely topics also would be conducted, exploring a particular question or issue and possibly limited to one or two grades.

Status of Implementation:

The new NAEP contracts provide for faster release of data, standards-based reporting, reports that are targeted to the general public, and three different kinds of reports: "comprehensive," "standard," and "focused." The 1998 national reading results were released in 11 months of testing; the state results in 12 months. Although still short of the Board's goal of reporting results in 6 to 9 months following testing, progress is being made.

Simplify Trend Reporting

The NAEP redesign policy requires the development of a carefully planned transition to enable "the main National Assessment" to become the primary way to measure trends in reading, writing, mathematics and science. This is because there are now two NAEP testing programs for reading, writing, mathematics and science. The two programs use different tests, draw different samples of students (i.e., one based on age -- 9, 13 and 17-year-olds, the other based on grade -- 4, 8 and 12), and report results in two different ways. Not surprisingly, the two different programs can yield different results. Having different results is not necessarily a problem, however, for it can raise important questions about student achievement and prompt helpful discussions about NAEP results. However, a trade-off is that having two separate testing programs boosts costs, potentially limiting assessments in other subjects.

The first program, referred to as the "long-term trend assessments," monitors change in student performance using tests developed during the 1960's and 1970's. The sample of students is based on age (i.e., 9, 13, and 17-year olds) for reading, mathematics, and science and on grade for writing (i.e., grades 4, 8 and 11). The age-based samples include students from two or more grades. For example, the 9-year-old sample has 3rd, 4th, and 5th grade students. Long-term trend assessment results are reported displaying changes over time in average scores. The NAEP long-term trend data is the only measure we have in the United States to gauge how well current students are performing compared to students who took the same tests in the 1970's, 1980s, and the first half of this decade. The second program, referred to as "main NAEP," uses tests developed more recently, reports results by grade, and employs performance standards for reporting whether achievement is good enough. As an example of the value of maintaining two separate programs, in 1996 the long-term trend assessment program declared mathematics results flat since 1990, while main NAEP reported significant gains.

Status of Implementation:

The Governing Board initiated the review of the long-term trend assessments in part because significant cost savings could be achieved if main NAEP is able to assume the purpose of the long-term trend assessments. This review continues as a "work in progress." The National Center for Education Statistics is just beginning to develop options. Identifying options that are practical, affordable, and technically feasible is a significant challenge. The Governing Board has scheduled long-term trend assessments to be conducted in 1999, 2003, and 2007. This will afford adequate time to evaluate the viability of the options that may be proposed and at the same time maintain the long-term trend line. The immediate effect is to change the schedule for this part of the testing program from once every two years to once every four years.

Keep NAEP Assessment Frameworks Stable

The NAEP redesign policy states that assessment frameworks shall remain stable for at least ten years. The purpose is three-fold: to provide for measuring trends in student achievement, to allow for change to frameworks when the case for change is compelling, and to manage costs.

By law, National Assessment frameworks are developed by the Governing Board through a national consensus process involving hundreds of teachers, curriculum experts, state and local testing specialists, administrators, and members of the public. The assessment frameworks describe how an assessment will be constructed, provide for the subject area content to be covered, determine what will be reported, and influence the cost of an assessment.

Both current practice and important developments in each subject area are considered: How much algebra should be in the 8th grade mathematics assessment? Should there be both multiple choice and constructed response items and if so, what is the appropriate mix? How much of what is measured should students know and be able to do? The frameworks receive wide public review before adoption by the Governing Board.

Status of Implementation:

The Governing Board is solely responsible for developing and approving assessment frameworks and has been adhering to its policy of keeping the frameworks stable. With a decision to be made this year about whether to conduct a national consensus process for the 2004 mathematics assessment, the Governing Board is beginning to examine criteria for determining when a new framework is necessary. An important factor will be the impact of changing the framework on the measurement of trends in student achievement.

Use International Comparisons

The NAEP redesign policy states that National Assessment frameworks, test specifications, achievement levels, and data interpretations shall take into account, where feasible, curricula, standards, and student performance in other nations, and promote studies to "link" the National Assessment with international assessments.

The National Assessment is, and should be, an assessment of student achievement in the United States. It should be focused on subjects and content deemed important for the U.S. through the national consensus process used to develop NAEP frameworks. However, decisions on content, achievement levels, and interpretation of NAEP results, where feasible, should be informed, in part, by the expectations for education set by other industrialized countries, and comparative test results. Although there are technical hurdles to overcome, consideration of such information can be useful in determining "how good is good enough" in an assessment for U.S. students.

Status of Implementation:

The National Center for Education Statistics conducted a linking study of the 1996 NAEP science and mathematics assessments with the 1995 Third International Mathematics and Science Study (TIMSS). The Governing Board used information from this linking study in setting the achievement levels for the 1996 science assessment. NCES will be conducting TIMSS again in the spring of 1999 and thirteen states have agreed to participate to collect state-representative TIMSS data. NCES will be applying a methodology for relating TIMSS to NAEP and will be evaluating the strength of the relationship.

Use Innovations in Measurement and Reporting

The NAEP redesign policy states that the National Assessment shall assess, and, where warranted, implement advances related to technology and the measurement and reporting of student achievement. In addition, the competition for NAEP contracts for assessments beginning around the year 2000 shall include a plan for conducting testing by computer in at least one subject and grade and for using technology to improve test administration, measurement, and reporting.

Status of Implementation:

The newly awarded NAEP contracts include plans for a short-form test (described above) in 4th grade mathematics in the year 2000 and for the development of a computer-based assessment.

Help States and Others Link to NAEP and Use NAEP Data to Improve Education Performance

The NAEP redesign policy states that the National Assessment shall assist states, districts and others, who want to do so at their own cost, to link their test results to the National Assessment. The policy also provides that NAEP shall be designed to permit access and use by others of NAEP data and materials. These include frameworks, specifications, scoring guides, results, questions, achievement levels, and background data. In addition, the policy provides that steps be taken to protect the integrity of the NAEP program and the privacy of individual test takers.

Status of Implementation:

The state of Maryland and the state of North Carolina have collaborated with the Governing Board on studies to examine the content of their respective state mathematics tests in light of the content of NAEP. The National Center for Education Statistics has a special grants program that provides funds to analyze NAEP data. The NCES has amended priorities for this grants program to encourage applications from states (and others) to conduct analyses that will be of practical benefit in interpreting NAEP results and in improving education performance. The National Academy of Sciences report "Uncommon Measures," describes the many technical difficulties involved in linking state results to NAEP. The NCES is planning a major conference with the states to provide a forum for discussing and addressing these difficulties. In addition, NCES is planning to conduct studies on various linking methodologies to provide insight on how the linking of NAEP and state assessments may best be done.

Previous Contents Next


Home | About NAGB | About NAEP | Site Map | Calendar | Publications
Search | Other Sites | Guest Book