For other versions of this document, see http://wikileaks.org/wiki/CRS-RL31407 ------------------------------------------------------------------------------ Order Code RL31407 Educational Testing: Implementation of ESEA Title I-A Requirements Under the No Child Left Behind Act Updated October 31, 2008 Wayne C. Riddle Specialist in Education Policy Domestic Social Policy Division Educational Testing: Implementation of ESEA Title I-A Requirements Under the No Child Left Behind Act Summary The No Child Left Behind Act of 2001 (NCLB) contains several requirements related to pupil assessments for states and local educational agencies (LEAs) participating in Elementary and Secondary Education Act (ESEA) Title I-A (Education for the Disadvantaged). Under the NCLB, in addition to previous requirements for standards and assessments in reading and mathematics at three grade levels, all states participating in Title I-A were required to implement standards-based assessments for pupils in each of grades 3-8 in reading and mathematics by the end of the 2005-2006 school year. States must also implement assessments at three grade levels in science by the end of the 2007-2008 school year. Pupils who have been in U.S. schools for at least three years must be tested (for reading) in English, and states must annually assess the English language proficiency of their limited English proficient (LEP) pupils. Grants to states for assessment development are authorized, and $408.7 million was appropriated for FY2008. In addition, the NCLB requires all states receiving grants under Title I-A to participate in National Assessment of Educational Progress (NAEP) tests in 4th and 8th grade reading and mathematics to be administered every two years, with all costs to be paid by the federal government. NAEP is a series of ongoing assessments of the academic performance of representative samples of pupils primarily in grades 4, 8, and 12. Beginning in 1990, NAEP has conducted a limited number of state-level assessments wherein the sample of pupils tested in each participating state is increased in order to provide reliable estimates of achievement scores for pupils in the state. Previously, all participation in state NAEP was voluntary, and additional costs associated with state NAEP were borne by participating states. The statutory provisions authorizing NAEP are amended by the NCLB to maximize consistency with the NCLB requirements and prohibit the use of NAEP assessments by agents of the federal government to influence state or LEA instructional programs or assessments. The authorization for ESEA programs expired at the end of FY2008, and the 111th Congress is expected to consider whether to amend and extend the ESEA. Issues regarding expanded ESEA Title I-A pupil assessment requirements that are being addressed by the 111th Congress include the following: Are states meeting the expanded assessment requirements on schedule? Will federal grants be sufficient to pay the costs of meeting the assessment requirements? What might be the impact on NAEP of requiring state participation, as well as the impact of NAEP on state standards and assessments? What are the likely major benefits and costs of the expanded ESEA Title I-A pupil assessment requirements? And should the assessment requirements be expanded further? This report will be updated regularly to reflect major legislative developments and available information. Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Pre-NCLB State Testing Policies and Practices . . . . . . . . . . . . . . . . . . . . . . . 1 Testing Program Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Federal Policies or Activities Regarding Pupil Assessments Under the No Child Left Behind Act . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 ESEA Title I-A Requirements for Standards and Assessments . . . . . . . . . . . 4 Schedule for Implementation of All Assessment Requirements . . . . . . 8 Limits on ED Influence Over State Standards and Assessments . . . . . . 8 State Assessment Grants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 National Assessment of Educational Progress . . . . . . . . . . . . . . . . . . . . . . . 10 State NAEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 NAEP Provisions in the No Child Left Behind Act . . . . . . . . . . . . . . 12 Status of Implementation of the Assessment Requirements . . . . . . . . . . . . . . . . 14 ED Review of Evidence Regarding Assessments to Meet the "1994 Requirements" Under Title I-A . . . . . . . . . . . . . . . . . . . . . . 14 Common Problem Areas Found in Reviews of State Assessment Systems with Respect to the "1994 Requirements" . . . . . . . . . . . 15 Interpretation by ED of the Expanded Standard and Assessment Requirements of the No Child Left Behind Act . . . . . . . . . . . . . . . . . 15 Title I-A Standard and Assessment Requirements . . . . . . . . . . . . . . . 15 Implementation of the NAEP Requirements . . . . . . . . . . . . . . . . . . . . 23 Bush Administration Reauthorization Proposals . . . . . . . . . . . . . . . . . . . . . 25 Issues Regarding the ESEA Title I-A Pupil Assessment Requirements . . . . . . . 25 What Types of Assessments Meet the Expanded Assessment Requirements? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 How Strict Is ED's Review of State Assessment Systems? . . . . . . . . . . . . . 27 What Is the Cost of Developing and Implementing the Required Assessments, and to What Extent Will Federal Grants Be Available to Pay for Them? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 What Might Be the Impact of the Requirement for Annual Assessment of English Language Proficiency of LEP Pupils? . . . . . . . . . . . . . . . . 31 What Might Be the Impact of Requiring State Participation in NAEP? . . . 32 Possible Influence on State Standards and Assessments Arising from (Marginally) Increased Stakes . . . . . . . . . . . . . . . . 32 Voluntary Participation by LEAs, Schools, and Pupils . . . . . . . . . . . . 33 Can NAEP Results Be Used to "Confirm" State Test Score Trends? . 33 What Are the Likely Benefits and Costs of the Expanded Title I-A Assessment Requirements? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Glossary of Selected Terms Used in This Report . . . . . . . . . . . . . . . . . . . . . . . . 37 Educational Testing: Implementation of ESEA Title I-A Requirements Under the No Child Left Behind Act Introduction The No Child Left Behind Act of 2001 (NCLB, P.L. 107-110), signed into law on January 8, 2002, contains a number of new requirements related to pupil assessments for states and local educational agencies (LEAs) participating in Title I-A (Education for the Disadvantaged) of the Elementary and Secondary Education Act (ESEA). These assessment requirements expand upon an earlier series of requirements for participating states to adopt curriculum content standards, academic achievement standards, and assessments linked to these at three grade levels, which were adopted under the Improving America's Schools Act (IASA) of 1994 (P.L. 103- 382). The authorization for ESEA programs expired at the end of FY2008, and the th 111 Congress is expected to consider whether to amend and extend the ESEA. On January 24, 2007, the Bush Administration released "Building on Results: A Blueprint for Strengthening the No Child Left Behind Act,"1 which outlined its recommendations for ESEA reauthorization. Key recommendations in that documents will be mentioned at relevant places in this report. This report provides background information on state pupil assessment programs and policies, a description of the ESEA Title I-A assessment requirements as expanded by the NCLB, a review of the implementation status of these requirements, and an analysis of related issues likely to be addressed by the 111th Congress. This report will be updated regularly to reflect major legislative developments and available information. Pre-NCLB State Testing Policies and Practices The academic achievement of pupils in public elementary and secondary schools is assessed using many types of tests. Pupils may take tests developed by individual teachers or schools, commercially published tests selected by their LEA, or assessments selected or developed by their state educational agency (SEA). This report will focus almost entirely on state-mandated assessments -- tests which must be administered to virtually all pupils in selected grades who attend a state's public 1 The document is available from the Department of Education website, online at [http://www.ed.gov/policy/elsec/leg/nclb/buildingonresults.pdf]. CRS-2 K-12 schools -- because such tests are the primary focus of federal policies regarding pupil assessment. According to published surveys,2 every state except one (Iowa) now requires its LEAs to administer specified assessments to all pupils attending public schools in one or more grades.3 The number of grades and subjects in which state-mandated assessments are administered varies widely, from only one grade and subject (e.g., the only state-mandated assessment in Nebraska currently is a writing test for pupils in grade 4) to tests in multiple subjects and most K-12 grades (e.g., Alabama requires pupils in each of grades 3-11 to take state-selected tests in English, mathematics, science, and history). Few state-mandated tests are administered to pupils below grade 3, because of a variety of concerns about administering standardized tests to very young pupils, or in grade 12, in part because most assessment activity for these pupils is focused on college entrance tests. With respect to grades 3-8 in particular, 15 states plus the District of Columbia currently administer assessments in mathematics and reading to pupils in each of these grades; however, it is unclear how many of these assessments are linked to state content and achievement standards. State-mandated assessments have been developed in one of three basic patterns. They are either: (a) developed by the states themselves, usually with technical assistance from commercial firms employing assessment specialists; (b) developed almost completely by commercial test publishers, either as generic tests sold in the same form throughout the nation,4 or special versions of such tests which are customized to be more consistent with the curriculum content and achievement standards of a state; or (c) developed through multi-state consortia.5 Some state-mandated assessments, whether developed by the states themselves or in cooperation with other states or commercial firms, are "criterion-referenced" tests, or CRTs (see Glossary) designed to determine the extent to which pupils have mastered specific curriculum content and skills. Other state-mandated tests are either 2 Much of the data in this section is derived from No State Left Behind: The Challenges and Opportunities of ESEA 2001, by the Education Commission of the States, available at [http://www.ecs.org]; and Assessment and Accountability Systems: 50 State Profiles, by the Consortium for Policy Research in Education, available at [http://www.cpre.org/ Publications/Publications_Accountability.htm]. 3 While Iowa does not mandate participation in any specific assessment, tests developed by the Iowa Testing Programs at the University of Iowa and published nationwide by Riverside Publishing are administered to a large majority of pupils attending public K-12 schools in Iowa, on the basis of voluntary decisions by each LEA. 4 Three of the largest such commercial test publishers are: (1) CTB/McGraw-Hill, at [http://www.ctb.com/]; (2) Riverside (Houghton Mifflin) Publishing, at [http://www.riverpub.com]; and (3) Harcourt Assessment, at [https://harcourtassessment. com/hai/International.aspx]. 5 One example of such a consortium is the New Standards Project, a joint effort of several states and LEAs, the National Center on Education and The Economy, and the Learning Research and Development Center at the University of Pittsburgh. Another is a consortium for assessment development formed by three New England states -- New Hampshire, Rhode Island, and Vermont. CRS-3 generic or customized "norm-referenced" tests, or NRTs (see Glossary) -- tests designed primarily to rank pupils' achievement level in comparison to a nationally representative sample of pupils -- purchased by states from commercial test publishers. These two types of tests vary primarily regarding how test results are analyzed, but also typically differ to some degree with respect to such characteristics as the range of questions included.6 As of spring 2000, immediately preceding consideration of the NCLB, two states (Montana and South Dakota) administered only NRTs, 17 administered only CRTs, and 29 administered both kinds of tests in different grades and/or subject areas, with six of the latter states (Alabama, Idaho, Montana, South Dakota, West Virginia, and Wisconsin) using NRTs as their primary assessment instruments. In addition, six states (California, Delaware, Indiana, Missouri, New Mexico, and Tennessee) had developed state tests that are designed to produce both achievement results linked to state standards (criterion-referenced results) and nationally normed results (norm-referenced results). Testing Program Costs. Complete information on the costs associated with state-mandated pupil testing programs is not available. There are many potential sources of such costs, both direct and indirect, at the state, LEA, and school levels, and there are unresolved debates over how to estimate and whether to consider certain types of costs, especially indirect ones.7 A survey of direct, state-level expenditures for state-mandated assessment programs was conducted in early 2001 by the Pew Center on the States.8 These data combine state-level expenditures for both test development and administration for FY2001 (FY2000 for North Dakota and Vermont). The figures do not include any LEA-level expenditures, either direct or indirect, nor possible indirect state-level expenditures for state-mandated testing programs. According to this survey, state-level, direct expenditures for K-12 pupil assessment programs in FY2001 totaled $422.8 million. The expenditures per state varied from zero for Iowa and $0.2 million for North Dakota, to $44.0 million for California and $26.7 million for Texas. On a per-pupil basis, these costs were found 6 For example, in order to clarify distinctions between high- and low-achieving pupils, a norm-referenced test will typically include some very difficult questions that only a few pupils can answer, and some very easy questions that almost all pupils can answer correctly. Test content and questions are selected largely on the basis of how efficiently they rank pupils. In contrast, a CRT would be focused solely on the relevant content standards, with no direct emphasis on distinguishing the highest- from the lowest-achieving pupils. 7 Direct expenditures include those for such activities and services as development and field testing of assessments, purchase of test materials, scoring, or dissemination of results. Indirect expenditures might include those for time spent by teachers and other staff preparing pupils for or administering assessments or overhead costs. For a review of related issues, see Richard P. Phelps, "Estimating the Costs of Standardized Student Testing in the United States," Journal of Education Finance, winter 2000, pp. 343-380. 8 Available at [http://www.stateline.org/live/ViewPage.action?siteNodeId=136&languageId =1&contentId=14274]. CRS-4 to vary from $1.46 per pupil in West Virginia to $82.55 per pupil in Alaska. Per pupil costs of state-mandated assessments tend to be low in states which rely primarily on versions of commercially-published NRTs, such as West Virginia, Alabama ($7.80 per pupil), New Mexico ($3.21 per pupil), and Utah ($3.16 per pupil). In contrast, per pupil costs were found to be highest for several states which rely primarily or solely on state-specific CRTs, such as Alaska, Wyoming ($78.34 per pupil), Virginia ($68.90 per pupil) and Massachusetts ($68.02 per pupil).9 More detailed, but less comprehensive or current, information may be found in a study of the costs of developing and initially implementing assessments aligned with curriculum standards in two states -- Kentucky and North Carolina. According to this study,10 the total five-year state-level costs of developing and implementing a new assessment aligned with state standards for Kentucky were $9.55 million ($1.9 million per year) for test development and $33.3 million ($6.67 million per year) in total (including development, administration, etc.). For North Carolina, the total three-year state-level costs were found to be $4.0 million ($1.34 million per year) for test development and $27.5 million ($5.5 million per year) in total. The costs for these two states are not necessarily representative of the costs for all states. For example, costs might be lower for states which develop tests jointly with a group of other states, or which contract with a commercial test publisher for a customized version of a test which is marketed nationwide in a generic form. Federal Policies or Activities Regarding Pupil Assessments Under the No Child Left Behind Act The following section of this report describes the major pupil assessment-related provisions of the ESEA as amended by the NCLB. ESEA Title I-A Requirements for Standards and Assessments The provisions of ESEA Title I-A, as amended by the NCLB, regarding standards and assessments reinforce and expand upon provisions initially adopted in the Improving America's Schools Act of 1994 (IASA). Whether under the IASA or the NCLB, these standards and assessment provisions are linked to receipt of financial assistance under ESEA Title I-A -- that is, they apply only to states wishing to maintain eligibility for Title I-A grants. However, since Title I-A is the largest federal K-12 education program, funded at $13.9 billion for FY2008, it is generally considered unlikely that many states would decline to participate in the program in order to avoid implementing the expanded assessment requirements. 9 See Education Commission of the States, Estimated Per-Student Spending on Statewide Testing Programs, October 2001, available at [http://www.ecs.org]. 10 Lawrence O. Picus, Estimating the Costs of Student Assessment in North Carolina and Kentucky: A State-Level Analysis, CRESST Technical Report 408, February 1996. CRS-5 The IASA of 1994 attempted to raise the instructional standards of Title I-A programs, and the academic expectations for participating pupils, by tying Title I-A instruction to state-selected curriculum content and academic achievement standards. These provisions were adopted in response to concerns that Title I-A programs had not been sufficiently challenging academically; had not been well integrated with the "regular" instructional programs of participants; and had required extensive pupil testing that was of little instructional or diagnostic value, and was not linked to the curriculum to which pupils were exposed. Further, the legislation attempted to make Title I-A tests more meaningful by using state assessments to determine whether schools and LEAs are making "adequate yearly progress" (AYP) toward meeting state achievement standards.11 States were given several years to meet the IASA requirements. In particular, the full system of standards and assessments was not required to be in place until the 2000-2001 school year and, as is discussed in detail below, only a minority of states met that deadline. Thus, in its debates on the NCLB in 2001, the Congress considered not only the expanded assessment requirements proposed by the Bush Administration, but also the implementation status of requirements adopted in 1994. Under the ESEA, as amended first by the IASA of 1994 and later by the NCLB of 2001, states wishing to remain eligible for Title I-A grants are required to develop or adopt curriculum content standards as well as academic achievement standards and assessments tied to the standards. In general, these standards and assessments are to be applicable to all LEAs, schools, and pupils statewide. One major exception to this general policy is that if no agency or entity in a state has the authority to establish statewide standards or assessments (as is generally assumed to be the case for Iowa and Nebraska), then the state may adopt either: (a) statewide standards and assessments applicable only to Title I-A pupils and programs, or (b) a policy providing that each LEA receiving Title I-A grants will adopt standards and assessments which meet the requirements of Title I-A and are applicable to all pupils served by each such LEA. Another possible exception, which is discussed further below, is that ED regulations would allow local variation in the assessments used for at least some grade levels. Thus, it should be kept in mind that "state systems of standards and assessments," as referred to frequently below, may not in some cases be uniform statewide. In order to comply with the provisions of ESEA Title I-A, state systems of standards and assessments are required to meet a number of specific statutory requirements, as follows: 11 See CRS Report RL32495, Adequate Yearly Progress (AYP): Implementation of the No Child Left Behind Act, by Wayne C. Riddle. CRS-6 1. Standards and assessments at 3 grade levels were to be developed or adopted at least in the subjects of mathematics and reading/language arts by the 2000- 2001 school year.12 Standards were to be adopted in science by the end of the 2005-2006 school year, and assessments in science by the end of the 2007-2008 school year. 2. The standards and assessments used to meet the Title I-A eligibility requirements must be the same as those applied to all public school pupils in the state (with the two possible exceptions discussed above). 3. The content standards are to specify what pupils are expected to know and be able to do, and are to be "coherent and rigorous." 4. Achievement standards must establish at least three performance levels for all pupils -- advanced, proficient, and partially proficient (or basic). 5. Assessments must be aligned with state content and achievement standards. 6. Assessments in mathematics, reading and, beginning in 2007-2008, science must be administered annually to students in at least one grade in each of three grade ranges -- grades 3-5, grades 6-9, and grades 10-12. In addition, assessments in mathematics and reading were to be administered to pupils in each of grades 3-8 by the end of the 2005-200613 school year.14 7. All pupils in the relevant grades who have attended schools in the LEA for at least one year must participate in the assessments.15 8. LEP pupils are to be assessed in a valid and reliable manner and provided with "reasonable" accommodations. To the extent practicable, LEP pupils are to be assessed in the language and form most likely to yield accurate and reliable information on what they know and can do in academic content areas (in subjects other than English itself). However, pupils who have attended schools in the United States (excluding Puerto Rico) for three or more consecutive school years are to be assessed in English.16 12 As is discussed later in this report, most states did not meet this deadline, established in the 1994 IASA. 13 There is explicit authority for a one-year delay of this requirement in cases of exceptional or uncontrollable circumstances. 14 There is some obvious overlap in these requirements -- e.g., states meeting the requirement for assessments in reading and math at three grade levels already meet the requirements for one or two of grades 3-8. 15 Separately, the provisions regarding AYP provide that at least 95% of the pupils in each demographic group within each school must be included in the assessments in order for the school to meet AYP requirements. Pupils may be excluded from school-level score reporting and accountability if they have attended a specific school for less than one year. 16 LEAs may continue to administer assessments to pupils in non-English languages for up to five years if, on a case-by-case basis, they determine that this would likely yield more accurate information on what the students know and can do. CRS-7 9. "Reasonable" adaptations and accommodations are to be provided for students with disabilities, consistent with the provisions of the Individuals with Disabilities Education Act (IDEA) where such adaptations or accommodations are necessary to measure the achievement of those students relative to state standards. 10. The assessment system must involve multiple approaches with up-to-date measures of student performance, including measures that assess higher order thinking skills and understanding. 11. Assessments must be used for purposes for which they are valid and reliable, and they must meet relevant, nationally recognized, professional and technical standards. In particular, the state educational agency (SEA) must provide evidence from a test publisher or other relevant source that the assessments are of adequate technical quality for the purposes required under Title I-A. 12. The assessment system must produce individual student interpretive and diagnostic reports that are provided to parents, teachers, and principals as soon as is "practically possible" after the assessments are administered. It must also enable "itemized score analyses" to be produced and reported to LEAs and schools, so that specific academic needs may be identified. 13. The assessment system must enable results for each state, LEA, and school, to be disaggregated (i.e., reported separately) by gender, major racial and ethnic groups, English proficiency status, migrant status, students with disabilities as compared to students without disabilities, and economically disadvantaged students as compared to students who are not economically disadvantaged. However, such disaggregation is not required in cases where the number of pupils in a group would be too small to yield statistically reliable information or where personally identifiable information would be revealed. 14. Assessments must objectively measure academic achievement, knowledge, and skills, and not assess personal or family beliefs and attitudes, or disclose personally identifiable information. 15. Assessment results must be provided to LEAs, schools, and teachers before the beginning of the subsequent school year. 16. In addition to the general assessment system described in 1-15 above, states are to provide that their LEAs will annually assess the English language proficiency of their LEP pupils -- including pupils' oral, reading, and writing skills.17 Finally, as is discussed further below, states receiving grants under ESEA Title I-A must participate in biennial state-level administrations of the National Assessment of Educational Progress in 4th and 8th grade reading and mathematics, beginning in the 2002-2003 school year. The timing of several of the key requirements listed above is summarized in the following schedule. 17 A one-year waiver of this requirement is specifically authorized in cases of exceptional or uncontrollable circumstances. CRS-8 Schedule for Implementation of All Assessment Requirements. School Year 2000-2001 ! States were to have adopted content and performance standards, plus assessments linked to these, at three grade levels in mathematics and reading. These requirements were included in the 1994 reauthorization of the ESEA. (As of the date of this report, 21 states fully met these requirements.) School Year 2002-2003 ! States were required to begin to annually assess the English language proficiency of LEP pupils (possible one-year waiver for "exceptional or uncontrollable circumstances"). ! States were first required to participate in biennial administration of NAEP. ! Annual report cards on state and LEA school systems and schools were required to be published (with a possible one year waiver authorized for "exceptional or uncontrollable circumstances"). ! States were required to begin reporting annually to ED on progress toward meeting new assessment and related requirements under the NCLB. School Year 2005-2006 ! Standards-based assessments in reading and mathematics were to be administered to pupils in each of grades 3-8 by the end of this year. ! States were required to adopt content and achievement standards at three grade levels in science by the end of this year. School Year 2007-2008 ! States must begin to administer assessments at three grade levels in science by the end of this year. Limits on ED Influence Over State Standards and Assessments. Several statutory constraints have been placed on the authority of the Secretary of Education to enforce these standard and assessment requirements. First, the ESEA contains a provision -- similar to others found in the Department of Education Organization Act and the General Education Provisions Act -- stating that nothing in ESEA Title I shall be construed to authorize any federal official or agency to "mandate, direct, or control a State, local educational agency, or school's specific instructional content, academic achievement standards and assessments, curriculum, or program of instruction" (Section 1905).18 Second, states may not be required to submit their standards to the U.S. Secretary of Education (Section 1111(b)(1)(A)) or to have their content or achievement standards approved or certified by the federal government (Section 9527(c)) in order to receive funds under the ESEA, other than the (limited) review necessary in order to determine whether the state meets the Title I-A requirements. Finally, no state plan may be disapproved by ED on the basis of 18 Similar, although somewhat less specific, language may be found in ESEA Section 9526(b)(1) and Section 9527(a). CRS-9 specific content or achievement standards or assessment items or instruments (Section 1111(e)(1)(F)). State Assessment Grants. The ESEA authorizes (in Title VI-A-1) annual grants to the states to help pay the costs of meeting the Title I-A standard and assessment requirements added by the NCLB (i.e., the newly required assessments in science at three grade levels and at grades 3-8 in mathematics and reading). These grants may be used by states for development of standards and assessments or, if these have been developed, for assessment administration and such related activities as developing or improving assessments of the English language proficiency of LEP pupils. The amount authorized to be appropriated for these state assessment grants, plus grants for development of enhanced assessment instruments (see below), is $490 million for FY2002 and "such sums as may be necessary" for each of FY2003- FY2008. The state assessment requirements that were newly adopted under the NCLB are contingent upon the appropriation of minimum annual amounts for these state assessment grants. The administration, but not the development, of grade 3-8 and science assessments may be delayed by one year for each year that the following minimum amounts are not appropriated: FY2002, $370 million; FY2003, $380 million; FY2004, $390 million; and each of FY2005-FY2008, $400 million. For example, if an amount less than $400 million had been appropriated for state assessment grants for FY2005, the deadline for state administration of tests in reading and mathematics for each of grades 3-8 would have moved from 2005-2006 to 2006-2007. For each of FY2002-FY2008, at least the minimum amounts have been appropriated for these grants. The state assessment grants are to be allocated as follows: after reservation of 0.5% of the total for the Outlying Areas and 0.5% for the Bureau of Indian Affairs, each state will first receive $3 million. Remaining funds will be allocated among the states in proportion to their number of children and youth aged 5-17 years. This allocation formula reflects an implicit assumption that costs of assessment development are partially similar for all states, regardless of their size, and partially related to the size of the state's school age population. The ESEA also authorizes competitive grants to states for the development of enhanced assessment instruments. Aided activities may include efforts to improve the quality, validity, and reliability of assessments beyond the levels required by Title I-A, to track student progress over time, or to develop performance or technology- based assessments. Funds appropriated each year for state assessment grants which are in excess of the "trigger" amounts for assessment development grants listed above are to be used for enhanced assessment grants. The amounts available for assessment enhancement grants thus far are $17 million for FY2002, $4.5 million for FY2003, none for FY2004, $11.7 million for FY2005, $7.6 million for each of FY2006 and FY2007, and $8.7 million for FY2008. Finally, the NCLB authorizes a study of the impact of the expanded Title I-A assessment requirements. The Secretary of Education is authorized to use the lesser of 15% of total appropriations for Title I, Part E (National Assessment of Title I) or $1.5 million per year to contract for an independent study of "assessments used for CRS-10 State accountability purposes," including the correlations between such assessments and pupil achievement, instructional practices, dropout and graduation rates, and school staff turnover rates; effects on different groups of pupils, such as LEP pupils, pupils from low-income families, or pupils with disabilities; and relationships between accountability systems and exclusion of pupils from state assessments. National Assessment of Educational Progress19 The National Assessment of Educational Progress (NAEP) is a federally funded series of assessments of the academic performance of elementary and secondary students in the United States. NAEP tests generally are administered to public and private school pupils in grades 4, 8, and 12 in a variety of subjects, including reading, mathematics, science, writing and, less frequently, geography, history, civics, social studies, and the arts. NAEP assessments have been conducted since 1969. NAEP is administered by the National Center for Education Statistics (NCES), with oversight and several aspects of policy set by the National Assessment Governing Board (NAGB), both within the U.S. Department of Education. Since 1983, the assessment has been developed primarily under a cooperative agreement with the Educational Testing Service (ETS), a private, non-profit organization which also develops and administers such assessments as the SAT. A private business firm, Westat, Inc., carries out much of the test administration activities. Two other private firms, National Computer Systems and American Institutes for Research, distribute and score the assessments and develop the background questionnaires, respectively. NAEP consists of two separate groups of tests. One is the main assessment, in which test items (questions) are revised over time in both content and structure to reflect more current views and practices. The main assessment also reports pupil scores in relation to performance levels -- standards for pupil achievement that are based on score thresholds set by NAGB. The performance levels are considered to be "developmental," and are intended to place NAEP scores into context. They are based on determinations by NAGB of what pupils should know and be able to do at a basic ("partial mastery"), proficient ("solid academic performance"), and advanced ("superior performance") level with respect to challenging subject matter. The second group of NAEP tests form the long-term trend assessment, which monitors trends in math and reading achievement.20 The tests in each subject area have not changed in content or structure since they were originally developed in 1969, purportedly making it possible to reliably compare results from year to year. However, many have expressed concerns that the long-term trend assessment questions may be increasingly disconnected from what pupils are actually taught with 19 For additional information on NAEP, see CRS Report 98-348, National Assessment of Educational Progress: Background and Reauthorization Issues, by Wayne C. Riddle (out of print; available from the author: 7-7382). 20 Additional long-term trend assessments in writing and science were last administered in 1999. There is no current plan to administer the writing assessment in the future; revised science assessment test items are being developed, and may be administered in the future. CRS-11 the passage of decades of time.21 Since the long-term trend assessment is not involved with the ESEA Title I-A assessment requirements, it will not be discussed further. All NAEP tests are administered to only a sample of pupils, and the tests are designed so that no pupil takes an entire NAEP test. The use of sampling is intended to minimize both the costs of NAEP and test burdens on pupils. It also makes it possible to include a broad range of items in each test. Since no individual pupil takes an entire NAEP test, it is impossible for NAEP to report individual pupil scores.22 It is intended that NAEP tests be administered to a representative sample of all pupils in public and private schools, although there has been ongoing debate over whether LEP pupils or those with disabilities are adequately represented and whether appropriate accommodations or adaptations are being provided for them. The frameworks for NAEP tests provide a broad outline of the content on which pupils are to be tested. Frameworks are developed by NAGB through a national consensus approach involving teachers, curriculum specialists, policymakers, business representatives, and the general public. In developing the test frameworks, national and various standards are taken into consideration, but the frameworks are not intended to specifically reflect any particular set of standards. In addition, pupils and school staff fill out background questionnaires. The NAEP statute limits the range of background information that may be collected to data "directly related to the appraisal of academic achievement, and to the fair and accurate presentation of such information" (Section 303(b)(5)(B)) plus demographic data on pupil race, ethnicity, socioeconomic status, disability, LEP status, and gender. State NAEP. While NAEP, as currently structured, cannot provide assessment results for individual pupils, the levels at which scores could be provided, whether the nation overall, states, LEAs, or schools, depend on the size and specificity of the sample group of pupils tested. NAEP has always provided scores for the Nation as a whole and four multistate regions. Beginning in 1990, NAEP has conducted a limited number of state-level assessments in 4th and 8th grade mathematics and reading. In addition, state science assessments have been administered to 4th and 8th grade pupils in 1996 (8th grade only), 2000, and 2005. Only the main NAEP, not the long-term trend assessment, is administered at the state level. Under state NAEP, the sample of pupils tested in a state is increased in order to provide reliable estimates of achievement scores for pupils in each participating state. 21 An NAGB policy adopted in May 2002 addresses this concern with respect to the science assessment, and changes were to be made to the content of the science assessment before its next administration. 22 The Voluntary National Test proposal of the Clinton Administration was to develop individual versions of the NAEP 4th grade reading and 8th grade math tests (see CRS Report 97-774, National Tests: Administration Initiative, by Wayne C. Riddle [archived; available from the author: 7-7382]). Activity related to this proposal has been terminated. CRS-12 Until enactment of the NCLB (see below), participation in NAEP was voluntary for states,23 the additional cost associated with state NAEP administration was borne by the states and, after participating in any state NAEP test, states could separately decide whether to allow release of NAEP results for their state. As with other main NAEP tests, state NAEP scores are reported with respect to performance levels -- basic, proficient, and advanced -- developed by NAGB. In general, approximately 40 states participated in each state-level NAEP assessment administered between 1990 and 2000, and all states except one (South Dakota) participated in state NAEP at least once during this period. In addition to this administration of NAEP at a state level, the FY2002 appropriations provided for a Trial Urban Assessment of achievement in reading and writing: experimental administration of NAEP to expanded pupil samples in a limited number of large urban LEAs. The assessment was administered to extended samples of pupils in 2002 in Atlanta, Chicago, the District of Columbia, Houston, Los Angeles, and New York City, as part of the regular state and national assessment activities.24 Additional trial urban assessments were conducted in 2003, 2005, and 2007. NAEP Provisions in the No Child Left Behind Act. The NCLB provides that all states wishing to remain eligible for grants under ESEA Title I-A will be required to participate in state NAEP tests in 4th and 8th grade reading and mathematics, which are to be administered every two years. The costs of testing expanded pupil samples in the states in these subjects are now paid by the federal government. An unstated, but implicit, purpose of this new requirement is to "confirm" trends in pupil achievement, as measured by state-selected assessments.25 The results from the initial state NAEP assessment in 4th and 8th grade reading and mathematics involving all 50 states were released in 2003, with subsequent rounds of results released in 2005 and 2007. 23 Once states decided to participate they were not prohibited from mandating participation by LEAs or schools under state and local law, although it appears that most states have always attempted to obtain LEA and school participation through voluntary recruitment. 24 For a description of the Trial Urban Assessment, and available results, see [http://nationsreportcard.gov/tuda_reading_2007/] and [http://nationsreportcard.gov/tuda_ math_2007/], accessed January 8, 2008. 25 The role of NAEP in "confirming" state test score trends is not explicitly stated in the final statute, but is explicitly mentioned in ED documents, such as the following: Confirming Progress -- Under H.R. 1 a small sample of students in each state will participate in the 4th and 8th grade National Assessment of Educational Progress (NAEP) in reading and math every other year in order to help the U.S. Department of Education verify the results of statewide assessments required under Title I to demonstrate student performance and progress. See Using the National Assessment of Educational Progress to Confirm State Test Results, prepared by an Ad Hoc Committee on Confirming Test Results, National Assessment Governing Board, at [http://www.nagb.org]. CRS-13 In addition, the authorizing statute for NAEP (at that time, Sections 411-412 of the National Education Statistics Act, or NESA) was almost completely rewritten in the NCLB. Although most of the new provisions are essentially the same as previous law, the statute has been amended in several respects. It is explicitly provided that pupils in home schools may not be required to participate in NAEP tests. Agents of the federal government are prohibited from using NAEP assessments to influence state or LEA instructional programs or assessments. Mechanisms are provided for limited public access to NAEP questions and test instruments and for review of complaints about NAEP tests. Provisions regarding NAGB are revised to specify that at least two members must be parents who are not employed by any educational agency. Regarding the release of state NAEP results, participating states still may choose not to allow such release but only with respect to state NAEP tests other than those required for Title I-A purposes. There are conflicting statutory and regulatory provisions regarding participation in NAEP tests by LEAs and schools that may be selected for NAEP test administration. The NCLB itself explicitly provides that participation in NAEP tests is voluntary for all pupils and schools, but it contains conflicting provisions regarding voluntary participation by LEAs. The NAEP authorization statute (redesignated in 2002 as Section 303 of the Education Sciences Reform Act by P.L. 107-279) states that participation is voluntary for LEAs as well, but ESEA Title I-A provides that the plans of LEAs receiving aid under that program must include an assurance that they will participate in state NAEP tests if selected (Section 1112(b)(1)(F)). Finally, program regulations published by the U.S. Department of Education (Federal Register, December 2, 2002) require both LEAs that receive Title I-A grants, and schools within such LEAs, to participate in NAEP if selected to be among the samples tested (34 C.F.R. § 200.11(b)). The NCLB authorizes funds specifically for state NAEP tests for FY2002- FY2007: $72 million for FY2002 and "such sums as may be necessary" for the succeeding years. The NCLB did not extend the authorization for NAEP overall. However, Title III of P.L. 107-279, the National Assessment of Educational Progress Authorization Act, extended the general NAEP authorization through FY2008. The authorization level is $107.5 million for all NAEP activities (including state assessments), plus $4.6 million for NAGB, for FY2003, and "such sums as may be necessary" for each of FY2004-FY2008. P.L. 107-279 also redesignates NAEP's statutory language as Title III of the Education Sciences Reform Act of 2002 (ESRA), but does not otherwise directly or substantially amend the provisions.26 For FY2002, the total amount appropriated for all NAEP and NAGB activities was $111.6 million. This was a large increase over the FY2001 level of $40 million, primarily as a result of the shift in responsibility for state NAEP costs from states to the federal government. The FY2002 appropriation also included $2.5 million for the Trial Urban Assessment described above. The total amount appropriated for NAEP and NAGB was $94.8 million for each of FY2003 and FY2004, $94.1 million 26 See CRS Report RL31353, Educational Research, Statistics, and Evaluation: Legislation in the 107th Congress, by Paul M. Irwin (out of print report, available from the author: 7- 7573). CRS-14 for FY2005, $93.1 million for each of FY2006 and FY2007, and $104.1 million for FY2008. Status of Implementation of the Assessment Requirements The scheduled deadlines for implementation of major assessment requirements under ESEA Title I-A are outlined earlier in this report. Thus far, almost all implementation activity has taken place with respect to requirements adopted initially in the 1994 IASA and continued under the NCLB. The process of implementing the 1994 requirements is still incomplete. ED Review of Evidence Regarding Assessments to Meet the "1994 Requirements" Under Title I-A In their reviews of state systems of standards and assessments, peer reviewers (specialists in the areas of standards and assessments who are not federal employees) and ED staff have been considering only various forms of "evidence" submitted by the states which are intended to document that state standards and assessments meet the specific Title I-A requirements outlined earlier in this report; that is, they are not reviewing the assessments themselves.27 Examples of such "evidence" include results from studies, by test publishers or others, of the degree of alignment between state standards and assessments; evaluations of the validity, reliability, or other aspects of the technical quality of state assessments; state policies on providing native language testing or other accommodations for LEP pupils, or alternate assessments or other accommodations for pupils with disabilities; provisions for reporting scores by disaggregated pupil groups; or data on the extent of actual participation in assessments of LEP pupils or pupils with disabilities. Both before and after the NCLB, the ESEA authorized sanctions for states failing to meet the deadlines for adopting standards and assessments. The 1994 version provided that the Secretary of Education may withhold funds for state administration plus program improvement from states failing to meet any of the Title I-A state plan requirements, including those related to standards and assessments (Section 1111(d)(2)). As amended by the NCLB, the ESEA provides that the Secretary shall withhold 25% of funds otherwise available for state administration and program improvement activities from states that fail to meet the 1994 requirements, and may withhold additional state administration funds for failure to meet new assessment requirements adopted under the NCLB. In addition, states that persistently and thoroughly fail to meet the standard and assessment requirements 27 Peer reviewers have relied primarily upon the Department's Peer Reviewer Guidance for Evaluating Evidence of Final Assessments Under Title I of the Elementary and Secondary Education Act (available at [http://www.ed.gov/policy/elsec/guid/cpg.pdf]) to guide their activities. While this document was published before enactment of the NCLB, it remains applicable, at least for the present, mainly because most applicable underlying requirements are essentially unchanged. CRS-15 over an extended period of time potentially may be subject to elimination of their Title I-A grants altogether, since they would be out of compliance with a basic program requirement. Common Problem Areas Found in Reviews of State Assessment Systems with Respect to the "1994 Requirements". The peer reviews of state assessment systems conducted thus far have identified a number of common problem areas, as indicated in "decision letters" from ED officials to the states.28 These are: (a) lack of adequate inclusion, accommodation, and incorporation of alternate assessments for LEP and disabled pupils; (b) insufficient documentation of the technical quality of assessments (i.e., their reliability, alignment, validity, etc.), especially the degree of alignment of assessments with content and pupil performance/achievement standards; and (c) inadequate timelines for completion and implementation of the assessments. The first of these three problem areas has received the greatest attention. The revised ESEA, ED's "Summary Guidance on the Inclusion Requirement for Title I Final Assessments," as well as other letters and policy guidance documents, indicate that the only students who should be excluded from assessments are those who have attended public schools in a LEA for less than one year. Otherwise, all pupils should be included in both the assessments and associated accountability systems.29 Where appropriate, accommodations (for example, extended time to complete an assessment) or alternate assessments30 should be provided for pupils with disabilities. LEP pupils should be assessed in the language most likely to yield valid results, except that those who have attended schools in the United States (other than Puerto Rico) for three or more years must generally be assessed in English, and they should be provided with other accommodations (e.g., extended time or use of bilingual word lists or dictionaries) where appropriate, as determined on an individual basis. With respect to inclusion of LEP pupils and those with disabilities, ED is reviewing "evidence" not only of state policies but also practices (i.e., actual rates of participation by LEP and disabled pupils). Many of the states whose assessments have not yet been approved have been informed that they need to make changes regarding assessment of or reporting of scores for LEP and/or disabled pupils. Interpretation by ED of the Expanded Standard and Assessment Requirements of the No Child Left Behind Act Title I-A Standard and Assessment Requirements. On July 5, 2002, ED published regulations on the Title I-A assessment requirements newly adopted 28 These are available at [http://www.ed.gov/admins/lead/account/finalassess/index.html]. 29 Pupils who have attended schools in a LEA for one year or more, but who have attended a particular school for less than one year, may be excluded from accountability determinations for the school (but not for the LEA overall). 30 Section 612 (a)(17) of the Individuals with Disabilities Education Act (IDEA) requires states to develop guidelines for the administration of alternate assessments for pupils with disabilities who cannot participate in state- and LEA-wide assessment programs. CRS-16 under the NCLB.31 Under the provisions of ESEA Title I, Part I, ED was required to establish a "negotiated rulemaking" procedure, as authorized under the Negotiated Rulemaking Act of 1990, in developing regulations regarding the Title I-A standards and assessments requirements. Under negotiated rulemaking, ED solicits advice from "representatives of Federal, State, and local administrators, parents, teachers, paraprofessionals, and members of local school boards or other organizations involved with the implementation and operation of" Title I-A programs (Section 1901(b)(1)), after which an initial draft of proposed regulations is prepared. ED selects representatives of these organizations to participate in a negotiated rulemaking process, to include persons "from all geographic regions of the United States, in such numbers as will provide an equitable balance between representatives of parents and students and representatives of educators and education officials" (Section 1901(b)(3)(B)). The selected representatives are to discuss the Department's draft of proposed regulations, and make any changes to this, consistent with the authorizing statute, on which they can reach consensus. The NCLB provides that "published proposed regulations shall conform to agreements that result from negotiated rulemaking" unless "the Secretary reopens the negotiated rulemaking process or provides a written explanation to the participants involved in the process explaining why the Secretary decided to depart from, and not adhere to, such agreements" (ESEA Title I, Section 1902(a)). Thus, ED is encouraged, but not required, to follow the recommendations of the negotiated rulemaking panel, and the process may be viewed primarily as an additional mechanism, beyond publication for comments in the Federal Register, of obtaining input on proposed regulations from concerned organizations.32 Significant features of the Department's final regulations, developed through the negotiated rulemaking process33 and published in the Federal Register on July 5, 31 Federal Register, July 5, 2002, pp. 45038-45047. As is discussed below, proposed amendments to these regulations were published in the Federal Register on March 20, 2003. 32 ED's implementation of the negotiated rulemaking requirement was challenged in federal court. Four organizations (the Center on Law and Education, the National Coalition for the Homeless, the National Law Center on Homelessness, and Designs for Change) and an individual parent charged that parents and students were inadequately represented in the process, particularly in view of the language requiring an "equitable balance between representatives of parents and students and representatives of educators and education officials." The negotiated rulemaking panel included 17 persons; while only 2 of the 17 persons represented parents specifically, several of the others were parents in addition to representing other groups. On May 22, 2002, the United States District Court for the District of Columbia ruled in favor of the Department of Education and the case was dismissed. An analysis of the legal issues associated with this suit is beyond the scope of this report. 33 In the negotiated rulemaking process, which took place in mid-March 2002, the initial draft proposed regulations were changed in very few significant respects. The primary changes: (a) it was further clarified that the assessment requirements apply only to public schools and their pupils, not to private (or home) schools; (b) for purposes of disaggregated score reporting, "pupils with disabilities" would be only those identified under the IDEA (continued...) CRS-17 2002, are described below. In general, the regulations repeat statutory requirements, while clarifying the following points: (a) content standards can cover multiple grades, but they must include grade-specific "content expectations," and achievement standards must be grade-specific; (b) high school standards must cover what all high school students are expected to know and be able to do; (c) assessments may include extended or essay response items or ask a pupil to analyze text or express opinions; (d) assessments may include either CRTs or NRTs, although any NRTs used must be augmented to "measure accurately the depth and breadth of" the state's content standards, provide results expressed in terms of the state's achievement standards, and be "designed to provide a coherent system across grades and subjects"; (e) state assessment systems may include assessments which vary by LEA in some grades,34 and any LEA-selected assessments used to meet the Title I-A requirements must be "equivalent to one another and to state assessments, where they exist, in their content coverage, difficulty, and quality," "have comparable validity and reliability," provide "consistent determinations of the annual progress of schools and LEAs within the state," and produce results which are sufficiently comparable that they can be aggregated; (f) LEP, migrant, and homeless pupils are to be included in the assessment system at all times; (g) states are to determine the minimum number of students from specific demographic groups to include in public reports or accountability calculations, to maintain statistical reliability and protect privacy; (h) the requirement for dissemination of "itemized score analyses" does not require the release of individual test items; (i) states must provide evidence, from test publishers or other "relevant sources," that their assessment systems are of adequate technical quality to meet each purpose required under Title I-A, and this information can be made available by ED to the public, consistent with applicable federal laws on disclosure of information; (j) the assessment requirements apply only to public schools and their pupils, not to private (or home) schools, although the achievement of private school pupils who participate in Title I-A must be assessed in some manner; (k) while states must develop achievement (as well as content) standards in science by 2005-2006, they need not develop specific cut scores for the achievement levels until 2007-2008, when the assessments must be implemented; and 33 (...continued) (this would exclude pupils identified only under Section 504 of the Rehabilitation Act); and (c) the criteria to be met by varying local assessments was changed from "equivalent content, rigor, and quality" and "concurrent validity" to "equivalent to one another in their content coverage, difficulty, and quality," and "comparable validity and reliability." These changes constituted essentially fine-tuning of certain points of clarification in the draft proposed regulations. 34 In states that lack authority to require the use of the same assessments statewide (only), the assessment system may consist entirely of locally selected assessments. CRS-18 (l) for purposes of disaggregated score reporting, "pupils with disabilities" are only those identified under the IDEA,35 although all pupils with disabilities, whether identified under the IDEA or Section 504 of the Rehabilitation Act, are to be included in assessments and provided with appropriate accommodations. Evolution of ED Policy Regarding Participation Rates Plus Treatment of Limited English Proficient Pupils and Certain Pupils With Disabilities in Assessments and AYP Determinations. ED published supplementary "non-regulatory draft guidance" on the standard and assessment requirements, as well as those related to NAEP participation, on March 10, 2003.36 This document was intended to provide guidance consistent with that in the regulations discussed above, but it is more detailed. This guidance specifically provided that states were to include in their ESEA consolidated application/plan academic content standards in reading/language arts and mathematics for each of grades 3-8, as well as a detailed timeline for meeting subsequent deadlines for the development and implementation of assessments in these subjects and grades, plus standards and assessments at three grade levels in science, by May 1, 2003. Assessment Participation Rates. More recently, ED officials have published regulations and other policy guidance on participation rates plus the treatment of limited English proficient pupils and certain pupils with disabilities in assessments and the calculation of AYP for schools and LEAs, in an effort to provide additional flexibility and reduce the number of schools and LEAs identified as failing to make AYP. On March 29, 2004, ED announced that schools could meet the requirement that 95% or more of pupils (all pupils as well as pupils in each designated demographic group) participate in assessments (in order for the school or LEA to make AYP) on the basis of average participation rates for the last two or three years, rather than having to post a 95% or higher participation rate each year. In other words, if a particular demographic group of pupils in a school has a 93% test participation rate in the most recent year, but had a 97% rate the preceding year, the 95% participation rate requirement would be met. In addition, the new guidance would allow schools to exclude pupils who fail to participate in assessments due to a "significant medical emergency" from the participation rate calculations. The new guidance further emphasizes the authority for states to allow pupils who miss a primary assessment date to take make-up tests, and to determine the minimum size for demographic groups of pupils to be considered in making AYP determinations (including those related to participation rates). According to ED, in some states, as many as 20% of the schools failing to make AYP did so on the basis of assessment participation rates alone. It is not known how many of these schools would meet the new, somewhat more relaxed standard. LEP Pupils. In a letter dated February 19, and proposed regulations published on June 24, 2004, ED officials announced two new policies with respect to LEP 35 This would exclude pupils identified only under Section 504 of the Rehabilitation Act. 36 See [http://www.ed.gov/topics/topicsTier2.jsp?&top=Policy&subtop=Policy+guidance& subtop2=Elementary+%26+secondary+education&type=T]. CRS-19 pupils.37 First, with respect to assessments, LEP pupils who have attended schools in the United States (other than Puerto Rico) for less than 12 months must participate in English language proficiency and mathematics tests. However, the participation of such pupils in reading tests (in English), as well as the inclusion of any of these pupils' test scores in AYP calculations, is to be optional (i.e., schools and LEAs need not consider the scores of first year LEP pupils in determining whether schools or LEAs meet AYP standards). Such pupils are still considered in determining whether the 95% test participation has been met. Second, in AYP determinations, schools and LEAs may continue to include pupils in the LEP demographic category for up to two years after they have attained proficiency in English. However, these formerly LEP pupils need not be included when determining whether a school or LEA's count of LEP pupils meets the state's minimum size threshold for inclusion of the group in AYP calculations, and scores of formerly LEP pupils may not be included in state, LEA, or school report cards. Both these options, if exercised, should increase average test scores for pupils categorized as being part of the LEP group, and reduce the extent to which schools or LEAs fail to meet AYP on the basis of LEP pupil groups.38 Finally, it was reported in August 2005 that the Secretary of Education had formed a working group to consider better ways to assess the achievement of LEP pupils for purposes of accountability under the NCLB.39 Pupils With Disabilities. Regulations addressing the application of the Title I-A standards and assessment requirements to certain pupils with disabilities were published in the Federal Register on December 9, 2003 (pp. 68698-68708). The purpose of these regulations is to clarify the application of standard, assessment, and accountability provisions to pupils "with the most significant cognitive disabilities." Under the regulations, states and LEAs may adopt alternate assessments based on alternate achievement standards -- aligned with the state's academic content standards and reflecting "professional judgment of the highest achievement standards possible" -- for a limited percentage of pupils with disabilities.40 The number of pupils whose proficient or higher scores on these alternate assessments may be considered as proficient or above for AYP purposes is limited to a maximum of 1.0% of all tested pupils (approximately 9% of all pupils with disabilities) at the state and LEA level (there is no limit for individual schools). SEAs may request from the U.S. Secretary of Education an exception allowing them to exceed the 1.0% cap statewide, and SEAs may grant such exceptions to LEAs within their state. According to ED 37 See 69 Federal Register, pp. 35462-35465, June 24, 2004; and [http://www.ed.gov/nclb/ accountability/schools/factsheet-english.html]. 38 A bill introduced in the 108th Congress, H.R. 3049, would have authorized the exclusion of scores of LEP pupils who have resided in the United States for less than three years, and would allow formerly LEP pupils to be included in that group for AYP calculation purposes indefinitely. 39 "Task Force to Gauge Progress of English Language Learners," Education Daily, August 10, 2005, p. 1. 40 This limitation does not apply to the administration of alternate assessments based on the same standards applicable to all students, for other pupils with (non-cognitive or less severe cognitive) disabilities. CRS-20 staff, three states in 2003-2004 (Montana, Ohio, and Virginia), and four states in 2004-2005 (the preceding three states plus South Dakota), received waivers to go marginally above the 1.0% limit statewide. In the absence of a waiver, the number of pupils scoring at the proficient or higher level on alternate assessments, based on alternate achievement standards, in excess of the 1.0% limit is to be added to those scoring below proficient in LEA or state level AYP determinations. ED policy affecting an additional group of pupils with disabilities was announced initially in April 2005, with final regulations based on it published in the Federal Register on April 9, 2007. The new policy is divided into short-term and long-term phases. It is focused on pupils with disabilities whose ability to perform academically is assumed to be greater than that of the pupils with "the most significant cognitive disabilities" discussed in the above paragraph, and who are capable of achieving high standards, but may not reach grade level within the same time period as their peers. In ED's terminology, these pupils would be assessed using alternate assessments based on modified achievement standards. The short-term policy may apply, with the approval of the Secretary, to states until they develop and administer alternative assessments under the long-term policy (described below).41 Under this short-term policy, in eligible states that have not yet adopted modified achievement standards, schools may add to their proficient pupil group a number of pupils with disabilities equal to 2.0% of all pupils assessed (in effect, deeming the scores of all of these pupils to be at the proficient level).42 This policy would be applicable only to schools and LEAs that would otherwise fail meet AYP standards due solely to their pupils with disabilities group. According to ED staff, as of the date of this report, 28 states are currently exercising this flexibility. Alternatively, in eligible states that have adopted modified achievement standards (currently six states), schools and LEAs may count proficient scores for pupils with disabilities on these assessments, subject to a 2.0% (of all assessed pupils) cap at the LEA and state levels. The long-term policy is embodied in final regulations published in the Federal Register on April 9, 2007. These regulations affect standards, assessments, and AYP for a group of pupils with disabilities who are unlikely to achieve grade level proficiency within the current school year, but who are not among those pupils with the most significant cognitive disabilities (whose situation was addressed by an earlier set of regulations, discussed above). For this second group of pupils with disabilities, states would be authorized to develop "modified academic achievement standards" and alternate assessments linked to these. The modified achievement standards must be aligned with grade-level content standards, but may reflect reduced breadth or depth of grade-level content in comparison to the achievement standards 41 Under current regulations, the short-term policy cannot be extended beyond the 2008-2009 school year. 42 This would be calculated based on statewide demographic data, with the resulting percentage applied to each affected school and LEA in the state. In making the AYP determination using the adjusted data, no further use may be made of confidence intervals or other statistical techniques. (The actual, not just the adjusted, percentage of pupils who are proficient must also be reported to parents and the public.) CRS-21 applicable to the majority of pupils. The standards must provide access to grade- level curriculum, and not preclude affected pupils from earning a regular high school diploma. As with the previous regulations regarding pupils with the most significant cognitive disabilities, there would be no direct limit on the number of pupils who take alternate assessments based on modified achievement standards. However, in AYP determinations, pupil scores of proficient or advanced on alternate assessments based on modified achievement standards may be counted only as long as they do not exceed a number equal to 2.0% of all pupils tested at the state or LEA level (i.e., an estimated 20% of pupils with disabilities); such scores in excess of the limit would be considered "non-proficient." As with the 1.0% cap for pupils with the most significant cognitive disabilities, this 2.0% cap does not apply to individual schools. In general, LEAs or states could exceed the 2.0% cap only if they did not reach the 1.0% limit with respect to pupils with the most significant cognitive disabilities. Thus, in general, scores of proficient or above on alternate assessments based on alternate and modified achievement standards may not exceed a total of 3.0% of all pupils tested at a state or LEA level.43 In particular, states are no longer allowed to request a waiver of the 1.0% cap regarding pupils with the most significant cognitive disabilities. The April 9, 2007, proposed regulations also include provisions that are widely applicable to AYP determinations. First, states are no longer allowed to use varying minimum group sizes ("n") for different demographic groups of pupils. This prohibits the previously common practice of setting higher "n" sizes for pupils with disabilities or LEP pupils than for other pupil groups. Second, when pupils take state assessments multiple times, states and LEAs may use the highest score for pupils who take tests more than once. Finally, as with LEP pupils, states and LEAs may include the test scores of former pupils with disabilities in the disability subgroup for up to two years after such pupils have exited special education.44 Thus, eligible states and LEAs will be allowed to count as "proficient or above" in AYP determinations the proficient or higher scores of up to 1.0% of all tested pupils on "alternate assessments based on alternate achievement standards," and of up to an additional 2.0% of all tested pupils on "alternate assessments based on modified achievement standards." For both groups, there is no limit for individual schools on the percentage of pupils in either of these categories, and there is no limit on the number or percentage of pupils to whom either type of alternate assessment may be administered. Regulations Published in October 2008 on Title I-A Assessments and Accountability. Several new final regulations affecting the Title I-A assessment, AYP, and accountability policies were published in the Federal Register 43 The 3.0% limit might be exceeded for LEAs, but only if -- and to the extent that -- the SEA waives the 1.0% cap applicable to scores on alternate assessments based on alternate achievement standards. 44 In such cases, the former pupils with disabilities would not have to be counted in determining whether the minimum group size was met for the disability subgroup. CRS-22 on October 29, 2008 (pages 64435-64513). Many of the regulations deal with policy areas other than assessments and related accountability topics. Many of the proposed regulations clarify previous regulations or codify as regulations policies that had previously been established through less formal mechanisms (such as policy guidance or peer reviewer guidance). The regulations relevant to assessments are briefly described below. The October 2008 regulations clarify that assessments required under Title I-A may include multiple formats as well as multiple assessments within each subject area (reading, mathematics, and science). This does not include the concept of "multiple measures," as this term has been used by many to refer to proposals to expand NCLB through inclusion of a variety of indicators other than standards-based assessments in reading, mathematics, and science. Also, states are required to include the latest results from the most recent National Assessment of Educational Progress (NAEP) assessments on their state and LEA performance report cards. Further, ED policies regarding growth models of AYP are codified in regulations (previously they were published only in policy guidance and peer reviewer guidance documents). States must provide a more extensive rationale than previously required for their selection of minimum group sizes, use of confidence intervals, and related aspects of their AYP policies. Although no specific limits are placed on these parameters, states must explain in their Accountability Workbooks how their policies provide statistically reliable information while minimizing the exclusion of designated pupil groups in AYP determinations, especially at the school level. States must also report on the number of pupils in designated groups that are excluded from separate consideration in AYP determinations due to minimum group size policies. In addition, the regulations codify provisions for the National Technical Advisory Council that was established in August 2008 to advise the Secretary on a variety of technical aspects of state standards, assessments, AYP, and accountability policies. Each state is required to submit its Accountability Workbook, modified in accordance with the proposed regulations, to ED for a new round of technical assistance and peer review. Workbooks must be submitted in time to implement any needed changes before making AYP determinations based on assessment results for the 2009-2010 school year. ED Review to Determine Whether States Meet 2005-2006 Assessment Requirements. Peer reviews are being conducted for each state's assessment program, to determine if they meet the NCLB requirements to test pupils in each of grades 3-8 in reading and mathematics, and to adopt content and achievement standards in science. This round of review includes content and achievement standards (but not "cut scores") in science, in addition to the reading and mathematics assessments in each of grades 3-8. A letter sent to chief state school officers in April 2006 by the Assistant Secretary for Elementary and Secondary Education45 describes the current categories of results from the state reviews. These categories, and the number of states in each category as of the publication date of this report, include the following: 45 See [http://www.ed.gov/admins/lead/account/saapr3.pdf]. CRS-23 ! Full Approval. Meets all statutory and regulatory requirements (31 states: Alabama, Alaska, Arkansas, Arizona, Delaware, Florida, Georgia, Idaho, Iowa, Kansas, Kentucky, Maine, Maryland, Massachusetts, Michigan, Minnesota, Missouri, Montana, New Mexico, New York, North Dakota, Ohio, Oklahoma, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Virginia, Washington, and West Virginia). ! Full Approval with Recommendations. Meets all statutory and regulatory requirements, but ED makes selected recommendations for improvement (4 states: Indiana, New York, North Carolina, and Utah). ! Approval Expected. "Evidence to date" suggests that the state's assessment system is fully compliant with the statutory and regulatory requirements, but some elements of the system were not complete as of July 1, 2006. The state must provide evidence of compliance with remaining requirements before administering its assessments for the 2006-2007 school year (2 states: Connecticut and Illinois, plus the District of Columbia). ! Approval Pending. A limited number (generally one to three) of fundamental components of the state assessment system fail to meet the statutory or regulatory requirements (13 states: all of those not listed in another category, plus Puerto Rico, which has entered into a Compliance Agreement with ED). Peer reviews are continuing for the states whose assessment systems have not yet been fully approved. States in the last two categories above (Approval Pending and Not Approved) face the possibility of loss of Title I-A administrative funds (25% in the case of the two "not approved" states, 10% or 15% in the case of "approval pending" states), plus the additional sanctions of limitations on approval of flexibility requests, and heightened oversight by ED. According to ED, withheld funds (from the SEA) would be distributed to LEAs in the state. In addition, states that persistently and thoroughly fail to meet the standard and assessment requirements over an extended period of time potentially may be subject to elimination of their Title I-A grants altogether, since they would be out of compliance with a basic program requirement.46 Implementation of the NAEP Requirements. In the period since enactment of the NCLB, a number of steps have been taken toward implementation of the new requirements for state participation in NAEP. First, the schedule for test 46 Thus far, the sanction of withholding 25% of state administration funds for failure to meet the 1994 assessment requirements has been applied at least twice, to Georgia in 2003 and the District of Columbia in 2005, for failure to administer assessments linked to state content standards. CRS-24 administration has been revised to provide for administration of state NAEP tests in 4th and 8th grade reading and mathematics every two years, beginning with the 2002- 2003 school year (spring 2003). Initial NAEP 4th and 8th grade reading and mathematics results for all states were released in November 2003. Subsequent rounds of NAEP tests was administered in all states in 2005 and 2007. Further, as is discussed in a later section of this report, the NAGB has published a report, "Using the National Assessment of Educational Progress to Confirm State Test Results," which examines issues related to the possible use of state NAEP results to "confirm" trends in state assessment results. Several changes to NAEP policies and practices have been implemented that are supportive of, or were adopted primarily in response to, the expanded role for NAEP under the NCLB.47 In recognition of the increased emphasis on measurement of performance gaps among different demographic groups of pupils in the NCLB, more questions are being added at the upper and lower ends of the difficulty range, so that achievement gaps among pupil groups can be more reliably measured. In addition, studies are being conducted of possible ways to adjust sampling strategy in order to assure adequate numbers of pupils in the various demographic groups referenced in the NCLB. At the same time, a number of administrative adjustments are being implemented that are intended to reduce required pupil sample sizes in the aggregate (e.g., the main NAEP state and national pupil samples will be combined for the first time), although samples of pupils will likely be increased in small and/or sparsely populated states in order to enhance the precision of results. Efforts are being made to minimize time demands, with a goal of reporting results of reading and mathematics assessments within six months of test administration. Special issues arise with respect to Puerto Rico, which is treated as a state under ESEA Title I-A but did not participate in state NAEP tests prior to the enactment of the NCLB. Questions have been raised about the comparability of tests administered in different languages, especially in reading. NAEP tests in mathematics were administered to 4th- and 8th-grade pupils in Puerto Rico in 2003 and 2005, and results from both test administrations have been recently released.48 Finally, state NAEP tests are now administered by contractors, rather than (as in the past) local teachers; there is a full-time NAEP coordinator in every state, and a State Service Center has been established to support these coordinators; and NAGB has established procedures for limited public access to NAEP test items, and for submission, review, and resolution of complaints about NAEP tests by parents and other members of the public. 47 See NAGB Adopts Policies to Implement the No Child Left Behind Act of 2001 at [http://www.nagb.org/], plus [http://nces.ed.gov/nationsreportcard/about/current.asp]. 48 See [http://nces.ed.gov/nationsreportcard/puertorico/], visited on April 16, 2007. CRS-25 Bush Administration Reauthorization Proposals The Bush Administration's Reauthorization Blueprint contains two proposals regarding the ESEA Title I-A assessment provisions. First, participating states would be required to develop content and performance standards in English and math covering 2 additional years of high school by 2010-2011, and assessments linked to these standards by 2012-2013. The assessments would include a pair of 11th grade assessments of college readiness in reading and math. However, states would be required only to report the results of these assessments, not to use them for adequate yearly progress determinations. In addition, states receiving Title I-A grants would be required to include NAEP results, along with results on state assessments, on state report cards, to facilitate cross-state comparisons of achievement levels. Finally, the Administration has requested an increased FY2008 appropriation of $116.6 million for NAEP, in order to support expansion of biennial state-level NAEP assessments in reading and math to the 12th grade in 2009. Issues Regarding the ESEA Title I-A Pupil Assessment Requirements What Types of Assessments Meet the Expanded Assessment Requirements? As described above, the NCLB includes explicit reference to a number of criteria that state assessments must meet in order to comply with the ESEA Title I-A requirements. However, the statute does not appear to directly or explicitly address two major issues with respect to the assessments: (a) whether qualifying state assessment systems must include only CRTs or whether they may include a mix of CRTs and NRTs, as long as the latter are modified to provide the required linkage to state content and achievement standards; and (b) whether qualifying state assessment systems must include only assessments that are the same statewide (except in states that lack authority to require statewide assessments) or whether they may include a mixture of statewide and locally varying assessments, as long as the latter are deemed to be "equivalent" and adequately linked to state content and achievement standards. It is stated that assessments must "be the same academic assessments used to measure the achievement of all children" (Section 1111(b)(3)(C)(i)), but the implications of this provision are ambiguous in cases where a state has no assessment to measure the achievement of all children in certain grades. Arguably, criterion-referenced assessments which are administered to all public school pupils statewide in the relevant grades are most fully consistent with the requirements which are explicitly stated in Title I-A. Only CRTs are designed comprehensively and "from the ground up" to measure pupil achievement with respect to specific content and academic achievement standards. While certain NRTs may be somewhat related to state standards in their generic form, with substantial overlap in test items with CRTs, and more closely related if modified specifically for this purpose -- as would be required under the regulations -- they are nevertheless CRS-26 initially designed primarily for the purpose of ranking and sorting pupils, not for the purpose of determining whether pupils meet state-determined achievement levels. In fact, it is not yet clear whether modified versions of assessments designed initially as NRTs can indeed meet the Title I-A requirements for linkage with state content and performance standards; some states, such as California, have attempted to meet the 1994 assessment requirements through use of modified NRTs, but no such assessments have yet been fully approved by ED.49 Similarly, assessments that are the same statewide would seem to most fully meet the purposes of Title I-A, especially with respect to the use of assessment results to determine whether schools or LEAs meet state standards of adequate yearly progress (AYP). The best way to assure that assessments of the extent to which pupils meet state achievement standards are equivalent and consistent statewide is to use the same assessments throughout the state. This is especially important in view of the use of assessment results to determine whether schools or LEAs meet AYP standards, and the need to aggregate local results to determine whether states overall meet such requirements. Establishing equivalence among varying local tests might be possible, but is likely to be very difficult. According to a National Research Council report, "Under limited conditions it may be possible to calculate a linkage between two tests, but multiple factors affect the validity of inferences that may be drawn from the linked scores. These factors include the context, format, and margin of error of the tests; the intended and actual uses of the tests; and the consequences attached to the results of the tests."50 Further, there is no precedent for allowing states to meet Title I-A assessment requirements through use of different assessments in different LEAs -- except for the two states that may lack authority to establish statewide assessments, no states have been allowed to meet the 1994 standard and assessment requirements through the use of locally varying assessments. Articulation between the tests used in different grades, and coherence of the overall assessment system, are also important concerns. If, for example, statewide tests are used in some grades but locally varying tests in other grades, or if CRTs are used in some grades and modified NRTs in others, this would likely create significant articulation difficulties, with variations from grade to grade in the proportion of pupils meeting state standards which result solely from the assessment instrument used, separate from any underlying differences in achievement levels. Criteria established in the regulations published by ED for mixed state assessment systems are relatively demanding. Any NRTs used must be augmented to "measure accurately the depth and breadth of the State's academic content standards" (34 C.F.R. § 200.3(a)(2)(ii)(A)), and have results expressed in terms of the state's achievement standards; and any LEA-selected assessments used to meet the Title I-A requirements must be of "equivalent to one another ... in their content coverage, difficulty and quality," have "comparable validity and reliability," and 49 However, ED has approved the assessment systems of three other states (Delaware, Indiana, Missouri) where state-specific tests were reportedly designed from the beginning to produce both criterion-referenced and norm-referenced results. 50 National Research Council, Uncommon Measures: Equivalence and Linkage Among Educational Tests (Washington: National Academies Press, 1998), p. 5-4. CRS-27 produce results which can be aggregated (34 C.F.R. § 200.3(c)(2)). If these criteria were to be strictly interpreted by ED in the assessment review process, it is likely to be very difficult for mixed state assessment systems to be approved. However, opponents of proposals to allow states to meet the Title I-A requirements through mixed assessment systems are concerned that ED's review process may not be very strict, and that in some states, systems may be approved which are not well aligned with state standards or are not consistent among LEAs statewide, at least in certain grades, with the result that the standards for determining whether schools are meeting AYP standards would significantly vary among LEAs. In contrast, proponents of a relatively high degree of state flexibility in meeting the Title I-A requirements through mixed assessment systems argue that this will minimize federal influence and intrusion, recognize state primacy in selecting assessment systems which meet their needs, minimize costs, and still meet the purposes of Title I-A because of the criteria which such systems would have to meet. Proponents of allowing the use of modified NRTs to meet the requirements, at least for some grades, argue that the differences between NRTs and CRTs have more to do with how test results are analyzed and presented than with the test items themselves. The fact that several states currently use a mix of statewide CRTs in some grades and NRTs in others, or statewide tests of either type in some grades and locally varying tests in others, may indicate that such mixed assessment systems meet important educational needs and goals, as perceived by the states themselves. How Strict Is ED's Review of State Assessment Systems? As indicated by the relevant policy guidance and the published communications to states, peer reviewers and ED staff appear to have been conducting relatively rigorous and detailed reviews of the "evidence" submitted by states regarding whether their assessment systems meet the ESEA's requirements. The features which the Title I-A statute requires state assessment systems to exhibit are themselves numerous and relatively detailed, and a substantial implementation of them is likely to involve somewhat exhaustive review. The assessment reviews have focused especially on issues regarding testing, score reporting, and inclusion in accountability systems for LEP pupils and those with disabilities. While there are complex issues and considerations in these areas, they are not being raised solely, and possibly not even primarily, because of the Title I-A requirements. For example, while there are general guidelines, applicable under Title VI of the Civil Rights Act of 1964 to any LEA receiving federal grants, regarding the use of an appropriate language and/or other accommodations for assessment of LEP pupils,51 and requirements under the IDEA for alternate assessments where necessary for pupils with disabilities, it is largely in the context of Title I-A that such requirements are having an impact because of the scrutiny currently being given to whether state assessments meet the Title I-A requirements. 51 See U.S. Department of Education, Office for Civil Rights, "Testing the Academic Educational Achievement Of Limited English Proficient Students," in The Use of Tests When Making High-Stakes Decisions for Students: A Resource Guide for Educators and Policymakers, a draft document dated July 6, 2000, available on the Internet at [http://www.ed.gov/legislation/FedRegister/other/2000-4/121500b.html]. CRS-28 Although it may be questioned whether ED should be reviewing state assessment systems in such detail, this scrutiny may be necessary to enforce Title I- A's statutory requirements, and might also be necessary to establish outcome accountability for all major groups of disadvantaged pupils. If, for example, significant numbers of LEP pupils or those with disabilities were excluded from state assessments, or were not provided with appropriate accommodations, then it would be impossible to determine whether they, along with the pupil population in general, are adequately meeting state performance goals. Such inclusive assessment, combined with disaggregated score reporting, becomes increasingly important as focus shifts toward outcome measures to assure accountability for use of federal aid funds, and Title I-A programs are increasingly conducted in a schoolwide program format, in which services are not targeted on the individual pupils with lowest achievement in a participating school.52 Although detailed review by ED of state assessment systems may raise concerns about undue federal influence over this fundamental aspect of state and local public education systems, there are many statutory limitations on the review process. As noted earlier, the federal government is prohibited from mandating, directing, or controlling a state's, LEA's, or school's standards, assessments, or curriculum; states may not be required to submit their standards to ED; and no state plan may be disapproved by ED on the basis of specific content or achievement standards or assessment items or instruments. Nevertheless, the degree of federal influence over at least the broad parameters of state pupil assessment systems -- such as grades and subject areas tested, inclusion of special needs pupil groups, disaggregated reporting of results -- has increased under the NCLB. The rigor of ED's assessment review process, and the flexibility of the assessment regulations, will also likely influence the extent to which states meet the expanded requirements on schedule. A Government Accountability Office report published in 2002 identified four additional factors which have influenced the pace of state compliance with Title I-A assessment requirements: "(1) the efforts of state leaders to make Title I compliance a priority; (2) coordination between staff of different agencies and levels of government; (3) obtaining buy-in from local administrators, educators, and parents; and (4) the availability of state level expertise."53 52 There are two basic types of Title I-A programs. Schoolwide programs are authorized when 40% or more of the pupils in a school are from low-income families. In these programs, Title I-A funds may be used to improve the performance of all pupils in a school, and there is no requirement to focus services on only the most disadvantaged pupils. The other major type of Title I-A service model is the targeted assistance school program, under which services are generally limited to the lowest achieving pupils in the school. 53 U.S. Government Accountability Office (GAO), Title I, Education Needs to Monitor States' Scoring of Assessments, GAO-02-393, April 2002, p. 13. CRS-29 What Is the Cost of Developing and Implementing the Required Assessments, and to What Extent Will Federal Grants Be Available to Pay for Them? The addition of requirements to conduct annual reading and mathematics assessments in at least four more grades than required previously, and to include standards and assessments at three grade levels in science, has required most states to significantly increase their expenditures for standard and test development and administration. As indicated earlier, it is very difficult, if not impossible, to specify all of these potential costs with precision. The NCLB conference report directed the Government Accountability Office to conduct a study of the costs to each state of developing and administering the assessments required under Title I-A, both overall and for each of fiscal years 2002- 2008. In 2003, GAO published a report (Title I: Characteristics of Tests Will Influence Expenses; Information Sharing May Help States Realize Efficiencies, GAO-03-389) that discussed issues related to potential costs of meeting the NCLB assessment requirements, and provided a range of alternative cost projections. GAO based its conclusions on a survey of assessment practices in all states, and a detailed examination of the costs of assessment development and administration in seven states. According to the GAO, the level of state costs for assessment development and administration, as well as the relationship between those costs and funding provided by the NCLB's assessment development grants, depends primarily on the kinds of test questions states choose to utilize: multiple choice, open-ended (essay questions), or a combination of these. Tests with questions that elicit open-ended responses, which require people who can evaluate pupils' responses, are much more expensive to administer and score than multiple-choice questions that can be scored by computers. Over the period of FY2002-FY2008, in comparison to a total of the annual minimum assessment development grant appropriations of $2.7 billion, GAO estimated that it would cost states $1.9 billion to meet the NCLB assessment requirements using only multiple choice tests, $5.3 billion using a mixture of multiple choice and open-ended test items in all states, and $3.9 billion if states use the same mixture of multiple choice and open-ended test items as in the recent past. It should be noted that this study considered only the projected state-level costs of developing standard assessments on reading, mathematics, and science, and not costs for developing alternate assessments for pupils with disabilities, or English language proficiency assessments for LEP pupils, or possible increased costs for LEAs.54 54 Earlier, two organizations attempted during 2001-2002 to estimate costs for states of meeting assessment requirements similar to those of the NCLB. In 2001, the National Association of State Boards of Education (NASBE) estimated that the new grade 3-8 assessments (only) would cost states between $2.7 and $7.0 billion in the aggregate over a seven-year period [http://www.nasbe.org/Archives/cost.html]. On an annual basis, if costs were equally distributed across the seven years, this would represent a range of $386 million to $1 billion per year. In contrast, Accountability Works, a private consulting firm, estimated that the annual cost of meeting all of the new assessment requirements in the (continued...) CRS-30 The NCLB authorizes $400 million for FY2002, and "such sums as may be necessary" through FY2008, for state assessment development and administration grants. The administration, although not the development, of assessments newly required by the NCLB (grades three through eight reading and mathematics assessments, plus science assessments at three grade levels) may be delayed by one year for each year that the minimum amounts (e.g., $400 million for FY2007) are not appropriated. Thus far, the minimum amount has been appropriated for each of FY2002-FY2008. The available information on direct, state-level expenditures for testing programs indicates that the "trigger" appropriation levels for state assessment grants are, in the aggregate, similar to these estimates.55 They are also either similar to, or substantially below, the test development and administration costs projected by GAO (above), depending on assumptions regarding types of test items used.56 It is probable that the costs of meeting the expanded assessment requirements have varied widely from state to state, not only because of differences in state size, but also particularly because of substantial differences in the extent to which state- mandated tests in reading and mathematics were already being administered to all pupils in grades three through eight, or tests in science for pupils in selected grade ranges, and whether the tests met the Title I-A technical requirements of alignment with state standards, inclusion of all pupil groups, etc. Assessment development costs may also be reduced through cooperative arrangements among some states to jointly develop certain assessments, such as the New England Common Assessment Program involving New Hampshire, Rhode Island, and Vermont. With respect to the distribution among the states of funds for test development and administration, the NCLB provides for allocation of a substantial share of these funds in equal amounts to each state, with the remainder allocated in proportion to children and youth aged five to 17 years. The allocation formula does not recognize the substantial variation in the extent to which states may already administer the required assessments, and therefore face varying levels of additional assessment program costs. The allocation of funds by formula to all states, regardless of the current status of their state assessment policies and programs, might recognize that all states face ongoing costs, and might possibly reward states which have already adopted relatively extensive assessment programs. At the same time, the formula does not target funds on the states with the greatest needs. 54 (...continued) NCLB would range from approximately $312 million to $388 million for each of 2002-2003 through 2007-2008 [http://www.schoolreport.com/AWNCLBTestingCostsStudy.pdf]. 55 The $400 million "trigger" amount (and actual appropriation) for FY2007 is 95% of the estimated aggregate expenditure level for FY2001 (discussed earlier in this report) of $422.8 million. 56 Estimates of the state-level costs of developing and administering assessments required by the NCLB are becoming available for a limited number of individual states. For example, a study published in September 2005 for Virginia [http://www.pen.k12.va.us/ VDOE/nclb/coststudyreport-state.pdf], concluded that estimated assessment costs for this state ranged from $7.3-$8.2 million for each of the 2004-2005 through 2007-2008 school years. These amounts are somewhat less than the assessment grants to Virginia of $8.5-$8.8 million for FY2004-FY2005. CRS-31 What Might Be the Impact of the Requirement for Annual Assessment of English Language Proficiency of LEP Pupils? As noted earlier, the NCLB requires states to provide that their LEAs will annually assess the English language proficiency of their LEP pupils. This is separate from the requirements regarding treatment of LEP pupils in states' general assessment systems -- that is, the requirement that LEP pupils be included in such assessments, in which they are to be assessed in a valid and reliable manner and provided with "reasonable" accommodations, in the language and form most likely to yield accurate and reliable information on what they know and can do in academic content areas (in subjects other than English itself), with pupils who have attended schools in the United States (excluding Puerto Rico) for three or more consecutive school years to be assessed in English. In contrast to such requirements regarding treatment of LEP pupils in states' general assessment systems, the separate requirement for annual assessments of English language proficiency lacks specificity. There are no statutory details regarding technical characteristics of the tests -- except that the assessment must consider the pupils' oral, reading, and writing skills -- and (thus far) no policy guidance from ED. It is also somewhat ambiguous regarding whether states or LEAs are ultimately or primarily responsible for implementing this requirement. Depending on possible future regulations or policy guidance from ED, this new requirement may lead to relatively little change in current activities in LEAs. Although comprehensive and detailed surveys of such assessment practices are not currently available, there is substantial evidence that LEAs in general already assess the English language proficiency of LEP pupils for purposes of placement in instructional programs, determination of needed accommodations in general assessment programs, evaluation of programs targeted on LEP pupils, and movement of pupils from special programs to mainstream instruction. While a variety of assessment methods are used, including teacher observation and home language surveys, recent surveys indicate that a large majority of LEAs administer formal English language proficiency tests to their LEP (or potentially LEP) pupils.57 Policy guidance from ED's Office for Civil Rights indicates that such assessments should be undertaken especially, but not only, for purposes of assigning pupils to instructional programs targeted at LEP pupils, determining the timing of transition to regular or mainstream instruction for such pupils, and evaluating the effectiveness of special programs for LEP pupils; although this guidance is unspecific regarding the type of assessment LEAs should use.58 In addition, LEAs participating in the new English Language Acquisition program authorized under ESEA Title III, Part A, must report annually the number and percentage of participating pupils who attain English proficiency, as determined by a "valid and reliable assessment of English proficiency" (Section 3121(a)(3)). If 57 See National Research Council, Improving Schooling for Language-Minority Children: A Research Agenda (Washington: National Academies Press, 1997), pp. 115-116. 58 See [http://www.ed.gov/about/offices/list/ocr/docs/laumemos.html]. CRS-32 ED's future policy guidance is consistent with the statute's lack of specificity regarding the new Title I-A requirement, there may be little required change in LEA activities as a result of the requirement. What Might Be the Impact of Requiring State Participation in NAEP? Possible Influence on State Standards and Assessments Arising from (Marginally) Increased Stakes. Two key characteristics of the NAEP program since its inception have been: (1) the content frameworks, upon which test items are based, have been independent of the content standards adopted by any state or national organization; and (2) the "stakes" associated with performance on the tests have been extremely low. The NCLB's requirement for states to participate in NAEP in order to retain eligibility for ESEA Title I-A grants, with the implicit purpose of using the results to "confirm" performance trends on state-selected assessments, has potential implications for both of these characteristics of NAEP. Previously, the only "stakes" associated with state participation in NAEP have been the symbolic ones arising from public dissemination of NAEP results for states that chose to participate and which allowed their assessment results to be published. Public attention to these results, among persons other than selected policymakers, researchers, and policy analysts, seems to have been limited. The NAEP scores have had no impact on state finances or eligibility for federal programs or services. While state involvement with NAEP will change significantly under the NCLB, the stakes for states will remain relatively low. State results will be published as an implicit "confirmation" of test score trends on state assessments, but these NAEP scores will still have no direct impact on state eligibility for federal assistance. Provisions of the House- and Senate-passed versions of the NCLB for state bonuses and sanctions based in part on NAEP score trends were eliminated from the conference version. Under the NCLB as enacted, ED is required to establish a peer review process to evaluate whether states have met their statewide AYP goals; states which fail to meet them are to be listed in an annual report to Congress, and technical assistance is to be provided to states that fail to meet their goals for two consecutive years. State NAEP scores will likely be considered in this review process. However, there is no provision for state bonuses or sanctions under this procedure, only publicity and technical assistance. This increases the "stakes" associated with state NAEP performance, but only to a very modest degree. Nevertheless, even a small increase in the stakes associated with state performance on NAEP tests attracts attention to the possibility that NAEP frameworks and test items might influence state standards and assessments. To the extent that the required participation in NAEP increases attention to state performance on these tests, there might be a basis for concern that states would have an incentive to modify their curriculum content standards to more closely resemble the NAEP test frameworks. To counteract this potential problem, the NCLB prohibits the use of NAEP assessments by agents of the federal government to influence state or LEA instructional programs or assessments. However, subtle, indirect, and/or unintended forms of influence may be impossible to detect or CRS-33 prohibit. A "White Paper" policy statement released by NAGB on May 18, 2002, attempts to distinguish between "active attempts ... to persuade others to adopt NAEP policies, procedures, or content," which are prohibited, and "influence by good example," which (according to this document) is not. Voluntary Participation by LEAs, Schools, and Pupils. Might a conflict arise between the requirement for NAEP participation by states participating in ESEA Title I-A and the provision that participation in NAEP tests is voluntary for all pupils, schools, and possibly LEAs? While participation by states, LEAs, schools, and pupils was voluntary under previous federal law and policy, states or LEAs were not prohibited from requiring participation by LEAs, schools, or pupils under their own laws or policies. However, as noted earlier (see the section of this report titled "NAEP Provisions in the No Child Left Behind Act"), there are conflicting statutory and regulatory provisions regarding participation in NAEP tests by LEAs and schools which may be selected for NAEP test administration. Some have expressed concern that the new provisions regarding voluntary participation in NAEP might lead to two types of difficulties: (a) in a time of likely increased assessment activity for pupils nationwide, resistance to participation in NAEP might grow to an extent that it threatens the quality of the national sample of tested pupils and makes it difficult to maintain trend lines; and (b) more specifically, states might be stuck between a requirement to participate in NAEP and an inability to recruit a sufficiently large sample of LEAs, schools, and pupils to participate in order to produce valid and reliable assessment results. In the past, some states have attempted to participate in NAEP but found themselves unable to induce sufficient numbers of LEAs or schools to do so.59 The primary counter to this concern is that the policies regarding voluntary participation in NAEP have changed only modestly. As far as federal policies are concerned, participation has already been voluntary at all levels. While states or LEAs previously could have mandated participation by LEAs, schools, or pupils, apparently they generally attempted to avoid doing so. Thus, in practice, little may have changed. There may nevertheless be some cause for concern, with the expansion of NAEP to states that have not previously chosen to participate. Can NAEP Results Be Used to "Confirm" State Test Score Trends? An unstated, but clearly implicit, purpose of the state NAEP participation requirement is to "confirm" trends in pupil achievement, as measured by state- selected assessments by comparing them with trends in NAEP results. Some have questioned whether it is possible or appropriate to use results on one assessment to "confirm" results on another assessment which may have been developed very differently, and what form this "confirmation" might take. 59 In 2000, 48 states (all except Alaska and South Dakota) initially stated their intention of participating in state NAEP, although ultimately only 41 did so. States which intended to participate, but did not do so, reportedly were unable to recruit sufficient number of LEAs and schools. See "Test Weary Schools Balk at NAEP," Education Week, February 16, 2000. CRS-34 State assessments vary widely in terms of several important characteristics, such as the content and skills which they are designed to assess, their format, and modes of response. They are likely to continue to vary widely, especially as the final assessment regulations allow the use of both CRTs and modified NRTs, as well as locally varying assessments. As a result, some state assessments will be much more similar to NAEP in these important respects than others, and there will be consequent variation in the significance of similarities or differences when comparing trends in NAEP versus state assessment score trends for pupils. If, for example, a state test is closely aligned to state curriculum content standards which are substantially different from the content embodied in NAEP assessment frameworks, and if instruction is modified to better match the state standards, then it is possible that scores on the state assessment will rise while those on NAEP will be flat or even decline. NAEP frameworks are designed with the intention that they substantially reflect state standards on average; according to a recent analysis, "States vary in the amount that their assessment domains [i.e., the content and skills covered by the assessments] overlap with NAEP. For some, there is almost complete overlap. For others, the overlap is modest."60 Other major differences between NAEP and state assessments include (a) the time of year when tests are administered; (b) relative placement of cut scores for achievement levels; (c) the (often high, but varying) stakes associated with state assessments versus the low stakes associated with NAEP; and (d) test format and modes of response. As for the form which a comparison of NAEP and state test scores might take, two obvious candidates are average raw scores and the percentages of pupils at different achievement levels (basic, proficient, etc.). While these are key benchmarks, either alone, or even both, might overlook important changes or differences in the distribution of pupil scores. For example, the scores of several pupils might improve but not by enough to raise them above the cut score for the next highest achievement level. As noted above, the NAGB has published a report, "Using the National Assessment of Educational Progress to Confirm State Test Results," whose authors argue that state NAEP scores can be used as evidence to confirm the general trends in scores on individual state assessments, although such confirmation should not be viewed as, or take the form of, a strict statistical "validation" of state test results. They address the question of whether comparisons should be based on raw scores or percentages of pupils at various achievement levels by recommending a new method of comparison which considers changes and differences in the overall achievement score distribution, not focusing solely on overall averages or cut scores.61 60 Mark D. Rekase, "Using NAEP to Confirm State Test Results: Opportunities and Problems," in No Child Left Behind: What Will It Take? (Washington: Thomas B. Fordham Foundation, February 2002), p. 14. 61 See the report for details, available at [http://www.nagb.org]. CRS-35 What Are the Likely Benefits and Costs of the Expanded Title I-A Assessment Requirements? This report concludes with a review of major potential benefits and costs of the expanded pupil assessment requirements of ESEA Title I-A. The primary benefit from annual administration of a consistent series of standards-based tests would be the provision of timely information on the performance of pupils, schools, and LEAs, throughout most of the elementary and middle school grades. While a majority of pupils have already been taking assessments in many of grades 3-8, these have been typically a mix of CRTs and NRTs, state-mandated and locally selected tests, with no provision that most of these are either equivalent statewide or aligned to state content and achievement standards. Even under the broadest interpretation of ED's draft policy guidance, which would allow states to use modified NRTs in addition to CRTs, and locally varying tests which are deemed to be equivalent, the resulting state assessment systems would be more coherent, consistent, and well articulated than the current systems in most states. The availability of such consistent, annual assessment results would be of value for both diagnostic and accountability purposes. The resulting assessment systems would also continuously emphasize the importance of meeting state standards as embodied by the assessments. These expanded requirements regarding pupil assessments -- and school, LEA, and state accountability based on performance on the assessments -- have been enacted in the context of a broader strategy, also initiated in the 1994 ESEA amendments and expanded by the NCLB, which involves increased state and local flexibility in the use of federal education assistance funds.62 Under this strategy, accountability for appropriate use of federal aid funds is to be established more on the basis of pupil performance outcomes, and less on prescribed procedures or targeting of resources, than in the past. Such a strategy implicitly relies heavily on high quality, current, detailed, and widely disseminated information on pupil achievement as a basis for outcome accountability policies and procedures. It is desirable that achievement data be as comparable and current as possible while not compromising the primacy of states and LEAs in setting K-12 education policy. According to the ED publication, "Testing for Results, Helping Families, Schools and Communities Understand and Improve Student Achievement,"63 annual standards-based assessments "will empower parents, citizens, educators, administrators and policymakers with data ... in annual report cards on school performance and on statewide progress." Further: The tests will give teachers and principals information about how each child is performing and help them to diagnose and meet the needs of each student. They will also give policymakers and leaders at the state and local levels critical information about which schools and school districts are succeeding and why, so this success may be expanded and any failures addressed.... A good evaluation system provides invaluable information that can inform instruction and 62 These provisions are described in CRS Report RL31284, K-12 Education: Highlights of the No Child Left Behind Act of 2001 (P.L. 107-110), by Wayne C. Riddle. 63 See [http://www.ed.gov/nclb/accountability/ayp/testingforresults.html]. CRS-36 curriculum, help diagnose achievement problems and inform decision making in the classroom, the school, the district and the home. Testing is about providing useful information and it can change the way schools operate." At the same time, the expanded Title I-A assessment requirements might lead to a variety of costs, or unintended consequences, in both financial and other forms. One such "cost" is expanded federal influence on state and local education policies. Assuming that states will continue to implement them in order to maintain Title I-A eligibility, then assessment requirements attached to an aid program focused on disadvantaged pupils are broadly influencing policies regarding standards, assessments, and accountability affecting all pupils in the participating states. This represents a substantial increase in federal influence in the assessment and accountability aspects of K-12 education policy. In the majority of states that did not previously mandate standards-based assessments in each of grades 3-8, their policies may have resulted primarily from cost or time constraints, or the states may have determined that annual testing of this sort is not educationally appropriate, or at least that its benefits are not equal to the relevant costs. These costs may include not only the direct costs of test development, administration, scoring, reporting, etc., not all of which may be paid through federal assessment grants, but also an increased risk of "over-emphasis" on preparation for the tests, especially if the tests do not adequately assess the full range of knowledge and skills which schools are expected to impart. The authors of a recent study of the effects of high-stakes assessment policies in 18 states have posited an "Uncertainty Principle," which may be relevant to such concerns: "The more important that any quantitative social indicator becomes in social decision-making, the more likely it will be to distort and corrupt the social process it is intended to monitor."64 At the least, annual testing of pupils in grades 3-8 would increase the importance of having tests that are well designed and closely linked to state content and achievement standards which are truly challenging. Nevertheless, even within the specific realm of standards and assessments, federal influence remains limited in several important respects. With the exception of the limited role of state NAEP tests, the standards and assessments are totally selected by the states. ED is not authorized by the NCLB to review the substance of any state standards, and no state plan may be disapproved by ED on the basis of specific content or achievement standards or test items or instruments. Ultimately, whether increased federal influence in certain respects, combined with less federal control over certain other aspects of state and local use of federal aid funds, is a "balanced tradeoff" is a subjective political judgment. The key analytical point is that the increase in federal influence is constrained, and is balanced by a decrease of federal influence in certain other respects. 64 Audrey L. Amrein, and David C. Berliner, High Stakes Testing, Uncertainty, and Student Learning, published on the Internet at the Education Policy Analysis Archives, vol. 10, no. 18, at [http://epaa.asu.edu/epaa/v10n18/]. CRS-37 Glossary of Selected Terms Used in This Report Criterion-Referenced Test (CRT): "Criterion-referenced" tests measure the extent to which pupils have mastered specified content (content standard) to a predetermined degree (achievement standard). A typical criterion-referenced test result is that a 4th grade pupil's achievement in mathematics is at the "proficient" level, which is above a "basic" level, but below an "advanced" level. Most state-developed assessments, such as the Connecticut Mastery Test, the North Carolina End-of-Grade Tests, or the Texas Assessment of Academic Skills, are criterion-referenced tests. Domain (of a test): The content and skills upon which a test is based. Item (of a test): A test question. Norm-ReferencedTtest (NRT): The primary distinguishing characteristic of "norm- referenced" tests is that pupil performance is measured against that of other pupils, rather than against some fixed standard of performance. Norm-referenced test results are usually expressed in terms of population percentiles along a bell-shaped distribution of tested pupils. A typical norm-referenced test result is that a 4th grade pupil's achievement in mathematics is at the 55th percentile, meaning that her or his performance is better than that of 55% of a nationally representative sample of 4th grade pupils who have taken the test under the same conditions, but worse than that of the other 45% of tested pupils in the sample. Most of the widely administered, commercially published K-12 achievement tests, such as the Iowa Test of Basic Skills, TerraNova, or the Stanford series, are norm-referenced tests, at least in their standard forms. Standardized Test: Any test for which the test items, as well as the conditions under which the test is administered, are constant. Thus, both CRTs and NRTs may be standardized tests. ------------------------------------------------------------------------------ For other versions of this document, see http://wikileaks.org/wiki/CRS-RL31407