[Federal Register: January 25, 2005 (Volume 70, Number 15)]
[Notices]
[Page 3585-3589]
From the Federal Register Online via GPO Access [wais.access.gpo.gov]
[DOCID:fr25ja05-87]
[[Page 3585]]
-----------------------------------------------------------------------
Part II
Department of Education
-----------------------------------------------------------------------
Scientifically Based Evaluation Methods; Notice
[[Page 3586]]
-----------------------------------------------------------------------
DEPARTMENT OF EDUCATION
RIN 1890-ZA00
Scientifically Based Evaluation Methods
AGENCY: Department of Education.
ACTION: Notice of final priority.
-----------------------------------------------------------------------
SUMMARY: The Secretary of Education announces a priority that may be
used for any appropriate programs in the Department of Education
(Department) in FY 2005 and in later years. We take this action to
focus Federal financial assistance on expanding the number of programs
and projects Department-wide that are evaluated under rigorous
scientifically based research methods in accordance with the Elementary
and Secondary Education Act of 1965 (ESEA), as reauthorized by the No
Child Left Behind Act of 2001 (NCLB). The definition of scientifically
based research in section 9201(37) of NCLB includes other research
designs in addition to the random assignment and quasi-experimental
designs that are the subject of this priority. However, the Secretary
considers random assignment and quasi-experimental designs to be the
most rigorous methods to address the question of project effectiveness.
While this action is of particular importance for programs authorized
by NCLB, it is also an important tool for other programs and, for this
reason, is being established for all Department programs. Establishing
the priority on a Department-wide basis will permit any office to use
the priority for a program for which it is appropriate.
EFFECTIVE DATE: This priority is effective February 24, 2005.
FOR FURTHER INFORMATION CONTACT: Margo K. Anderson, U.S. Department of
Education, 400 Maryland Avenue, SW., room 4W333, Washington, DC 20202-
5910. Telephone: (202) 205-3010.
If you use a telecommunications device for the deaf (TDD), you may
call the Federal Relay Service (FRS) at 1-800-877-8339.
Individuals with disabilities may obtain this document in an
alternative format (e.g., Braille, large print, audiotape, or computer
diskette) on request to the contact person listed under FOR FURTHER
INFORMATION CONTACT.
SUPPLEMENTARY INFORMATION:
General
The ESEA as reauthorized by the NCLB uses the term scientifically
based research more than 100 times in the context of evaluating
programs to determine what works in education or ensuring that Federal
funds are used to support activities and services that work. This final
priority is intended to ensure that appropriate federally funded
projects are evaluated using scientifically based research.
Establishing this priority makes it possible for any office in the
Department to encourage or to require appropriate projects to use
scientifically based evaluation strategies to determine the
effectiveness of a project intervention.
We published a notice of proposed priority in the Federal Register
on November 4, 2003 (68 FR 62445). Except for a technical change to
correct an error in the language of the priority, one minor clarifying
change, and the addition of a definitions section, there are no
differences between the notice of proposed priority and this notice of
final priority. The definitions section provides the generally accepted
meaning for technical terms used throughout the document.
Analysis of Comments
In response to our invitation in the notice of proposed priority,
almost 300 parties submitted comments on the proposed priority.
Although we received substantive comments, we determined that the
comments did not warrant changes. However, we have reviewed the notice
since its publication and have made a change based on that review. An
analysis of the comments and changes is published as an appendix to
this notice.
Note: This notice does not solicit applications. In any year in
which we choose to use this priority, we invite applications for new
awards under the applicable program through a notice in the Federal
Register. When inviting applications we designate the priority as
absolute, competitive preference, or invitational. The effect of
each type of priority follows:
Absolute priority: Under an absolute priority we consider only
applications that meet the priority (34 CFR 75.105(c)(3)).
Competitive preference priority: Under a competitive preference
priority we give competitive preference to an application by either
(1) awarding additional points, depending on how well or the extent
to which the application meets the competitive preference priority
(34 CFR 75.105(c)(2)(i)); or (2) selecting an application that meets
the competitive priority over an application of comparable merit
that does not meet the priority (34 CFR 75.105(c)(2)(ii)).
When using the priority to give competitive preference to an
application, the Secretary will review applications using a two-stage
process. In the first stage, the application will be reviewed without
taking the priority into account. In the second stage of review, the
applications rated highest in stage one will be reviewed for
competitive preference.
Invitational priority: Under an invitational priority we are
particularly interested in applications that meet the invitational
priority. However, we do not give an application that meets the
invitational priority a competitive or absolute preference over other
applications (34 CFR 75.105(c)(1)).
Priority
The Secretary establishes a priority for projects proposing an
evaluation plan that is based on rigorous scientifically based research
methods to assess the effectiveness of a particular intervention. The
Secretary intends that this priority will allow program participants
and the Department to determine whether the project produces meaningful
effects on student achievement or teacher performance.
Evaluation methods using an experimental design are best for
determining project effectiveness. Thus, when feasible, the project
must use an experimental design under which participants--e.g.,
students, teachers, classrooms, or schools--are randomly assigned to
participate in the project activities being evaluated or to a control
group that does not participate in the project activities being
evaluated.
If random assignment is not feasible, the project may use a quasi-
experimental design with carefully matched comparison conditions. This
alternative design attempts to approximate a randomly assigned control
group by matching participants--e.g., students, teachers, classrooms,
or schools--with non-participants having similar pre-program
characteristics.
In cases where random assignment is not possible and participation
in the intervention is determined by a specified cutting point on a
quantified continuum of scores, regression discontinuity designs may be
employed.
For projects that are focused on special populations in which
sufficient numbers of participants are not available to support random
assignment or matched comparison group designs, single-subject designs
such as multiple baseline or treatment-reversal or interrupted time
series that are capable of demonstrating causal relationships can be
employed.
Proposed evaluation strategies that use neither experimental
designs with random assignment nor quasi-experimental designs using a
matched comparison group nor regression discontinuity designs will not
be considered responsive to the priority
[[Page 3587]]
when sufficient numbers of participants are available to support these
designs. Evaluation strategies that involve too small a number of
participants to support group designs must be capable of demonstrating
the causal effects of an intervention or program on those participants.
The proposed evaluation plan must describe how the project
evaluator will collect--before the project intervention commences and
after it ends--valid and reliable data that measure the impact of
participation in the program or in the comparison group.
If the priority is used as a competitive preference priority,
points awarded under this priority will be determined by the quality of
the proposed evaluation method. In determining the quality of the
evaluation method, we will consider the extent to which the applicant
presents a feasible, credible plan that includes the following:
(1) The type of design to be used (that is, random assignment or
matched comparison). If matched comparison, include in the plan a
discussion of why random assignment is not feasible.
(2) Outcomes to be measured.
(3) A discussion of how the applicant plans to assign students,
teachers, classrooms, or schools to the project and control group or
match them for comparison with other students, teachers, classrooms, or
schools.
(4) A proposed evaluator, preferably independent, with the
necessary background and technical expertise to carry out the proposed
evaluation. An independent evaluator does not have any authority over
the project and is not involved in its implementation.
In general, depending on the implemented program or project, under
a competitive preference priority, random assignment evaluation methods
will receive more points than matched comparison evaluation methods.
Definitions
As used in this notice--
Scientifically based research (section 9101(37) NCLB):
(A) Means research that involves the application of rigorous,
systematic, and objective procedures to obtain reliable and valid
knowledge relevant to education activities and programs; and
(B) Includes research that--
(i) Employs systematic, empirical methods that draw on observation
or experiment;
(ii) Involves rigorous data analyses that are adequate to test the
stated hypotheses and justify the general conclusions drawn;
(iii) Relies on measurements or observational methods that provide
reliable and valid data across evaluators and observers, across
multiple measurements and observations, and across studies by the same
or different investigators;
(iv) Is evaluated using experimental or quasi-experimental designs
in which individuals entities, programs, or activities are assigned to
different conditions and with appropriate controls to evaluate the
effects of the condition of interest, with a preference for random-
assignment experiments, or other designs to the extent that those
designs contain within-condition or across-condition controls;
(v) Ensures that experimental studies are presented in sufficient
detail and clarity to allow for replication or, at a minimum, offer the
opportunity to build systematically on their findings; and
(vi) Has been accepted by a peer-reviewed journal or approved by a
panel of independent experts through a comparably rigorous, objective,
and scientific review.
Random assignment or experimental design means random assignment of
students, teachers, classrooms, or schools to participate in a project
being evaluated (treatment group) or not participate in the project
(control group). The effect of the project is the difference in
outcomes between the treatment and control groups.
Quasi experimental designs include several designs that attempt to
approximate a random assignment design.
Carefully matched comparison groups design means a quasi-
experimental design in which project participants are matched with non-
participants based on key characteristics that are thought to be
related to the outcome.
Regression discontinuity design means a quasi-experimental design
that closely approximates an experimental design. In a regression
discontinuity design, participants are assigned to a treatment or
control group based on a numerical rating or score of a variable
unrelated to the treatment such as the rating of an application for
funding. Eligible students, teachers, classrooms, or schools above a
certain score (``cut score'') are assigned to the treatment group and
those below the score are assigned to the control group. In the case of
the scores of applicants' proposals for funding, the ``cut score'' is
established at the point where the program funds available are
exhausted.
Single subject design means a design that relies on the comparison
of treatment effects on a single subject or group of single subjects.
There is little confidence that findings based on this design would be
the same for other members of the population.
Treatment reversal design means a single subject design in which a
pre-treatment or baseline outcome measurement is compared with a post-
treatment measure. Treatment would then be stopped for a period of
time, a second baseline measure of the outcome would be taken, followed
by a second application of the treatment or a different treatment. For
example, this design might be used to evaluate a behavior modification
program for disabled students with behavior disorders.
Multiple baseline design means a single subject design to address
concerns about the effects of normal development, timing of the
treatment, and amount of the treatment with treatment-reversal designs
by using a varying time schedule for introduction of the treatment and/
or treatments of different lengths or intensity.
Interrupted time series design means a quasi-experimental design in
which the outcome of interest is measured multiple times before and
after the treatment for program participants only.
Executive Order 12866
This notice of final priority has been reviewed in accordance with
Executive Order 12866. Under the terms of the order, we have assessed
the potential costs and benefits of this regulatory action.
The potential costs associated with the notice of final priority
are those we have determined as necessary for administering applicable
programs effectively and efficiently.
In assessing the potential costs and benefits--both quantitative
and qualitative--of this notice of final priority, we have determined
that the benefits of the final priority justify the costs.
We have also determined that this regulatory action does not unduly
interfere with State, local, and tribal governments in the exercise of
their governmental functions.
Intergovernmental Review
Some of the programs affected by this final priority are subject to
Executive Order 12372 and the regulations in 34 CFR part 79. One of the
objectives of the Executive order is to foster an intergovernmental
partnership and a strengthened federalism. The Executive order relies
on processes developed by State and local governments for coordination
and review of proposed Federal financial assistance.
[[Page 3588]]
This document provides early notification of our specific plans and
actions for these programs.
Electronic Access to This Document
You may view this document, as well as all other Department of
Education documents published in the Federal Register, in text or Adobe
Portable Document Format (PDF) on the Internet at the following site:
http://www.ed.gov/news/fedregister.
To use PDF you must have Adobe Acrobat Reader, which is available
free at this site. If you have questions about using PDF, call the U.S.
Government Printing Office (GPO), toll free, at 1-888-293-6498; or in
the Washington, DC, area at (202) 512-1530.
Note: The official version of this document is the document
published in the Federal Register. Free Internet access to the
official edition of the Federal Register and the Code of Federal
Regulations is available on GPO Access at: http://www.gpoaccess.gov/nara/index.html
.
(Catalog of Federal Domestic Assistance Number does not apply.)
Program Authority: ESEA, as reauthorized by the No Child Left
Behind Act of 2001, Pub. L. 107-110, January 8, 2002.
Dated: January 17, 2005.
Rod Paige,
Secretary of Education.
Appendix--Analysis of Comments
Comment: Twenty-nine comments were received in support of the
priority for random assignment studies of education policies and
program interventions. Commenters noted that random assignment
evaluations have been essential to understanding what works, what
does not work, and what is harmful among interventions in many areas
of public policy--including employment and training, welfare
programs, health insurance, subsidies, pregnancy prevention,
criminal justice, and substance abuse.
Discussion: The Secretary agrees with this comment.
Change: None.
Comment: One hundred and eighty-three respondents commented that
random assignment is not the only method capable of generating
understandings of causality. They stated that the Secretary's
proposal would elevate experimental over quasi-experimental,
observational, single-subject, and other designs which are sometimes
more feasible and equally valid. However, 21 respondents commented
that the priority correctly identifies random assignment
experimental designs as the methodological standard for what
constitutes scientific evidence for determining whether an
intervention produces meaningful effects. The commenters pointed out
that attempts to draw conclusions about intervention effects based
on other methods have often led to misleading results. They stated
that the priority is consistent with widely recognized
methodological standards in the social and medical sciences.
Discussion: The Secretary agrees that a random assignment design
is not the only method capable of providing estimates of program
effectiveness; however, it is the most defensible method in that it
reliably produces an unbiased estimate of effectiveness. Conclusions
about causality based on other methods, including the quasi-
experimental designs included in this priority, have been shown to
be misleading compared with experimental evidence. This is largely
due to the difficulty in establishing equal treatment and comparison
groups on all important characteristics related to the outcome
variable with methods other than random assignment. The Secretary
agrees with the latter commenters that random assignment is the
standard for scientific evidence for determining the project
effectiveness.
Change: None.
Comment: One hundred and seventy-three respondents commented
that random assignment methods examine a limited number of isolated
factors that are neither limited nor isolated in natural settings.
These commenters stated that the complex nature of causality renders
random assignment methods less capable of discovering causality than
designs sensitive to local culture and conditions. Four respondents
commented that random assignment methods estimate only the impact of
the treatment and that the response to the treatment may vary
according to contextual factors. These four respondents noted that
random assignment assures that the contextual factors affecting
outcomes are the same for the treatment and the control group and,
therefore, the impact of the treatment is unambiguous. They noted
further that it has not been demonstrated that evaluation methods
``sensitive'' to local culture and conditions can provide
unambiguous answers as to whether the treatment is the cause of the
observed outcome.
Discussion: The Secretary agrees with the latter comments. A
major strength of the random assignment design is that it yields
comparable treatment and control groups with respect to all
characteristics and conditions, both observable and unobservable.
When participants, e.g. students, teachers, classrooms, or schools,
are randomly assigned to the project or to a control group, the only
difference between the two groups is the impact of the treatment.
While quasi-experimental designs, including carefully matched
comparison groups, are also permitted under this priority, it is a
practical impossibility to match on numerous characteristics and
conditions, especially those that are unobservable or difficult to
measure. However, case studies that collect information on local
culture and conditions are an important complement to a random
assignment study by providing a deeper understanding of the
conditions that may influence the effectiveness of an intervention.
Change: None.
Comment: One hundred and eighty-six respondents commented that
random assignment should sometimes be ruled out for reasons of
ethics. For example, randomly assigning experimental subjects to
educationally inferior treatments, or denying control groups access
to important instructional opportunities, is not ethically
acceptable even when the results might be enlightening. Another 13
respondents commented that the priority recognizes that there are
cases in which random assignment is not ethical and, in such cases,
identifies quasi-experimental designs and single-subject designs as
alternatives that may be justified by the circumstances of
particular interventions.
Discussion: The Secretary agrees with both comments. There are
occasions when random assignment is not an acceptable or feasible
method of evaluation. The Department will address these issues in
deciding whether or not to apply this priority in specific program
competitions. Also, consistent with the American Psychological
Association ethics code and in accordance with 34 CFR part 97, the
Department has adopted the Common Rule for protection of human
subjects in research including Subpart D dealing with inclusion of
children in research. Grantees submit their plans for all research
involving human subjects to an Institutional Review Board. All
research involving human subjects must be conducted in accordance
with an approved research protocol. This includes obtaining informed
consent for participation when required by the Institutional Review
Board as a condition of approval.
In general, random assignment does not pose ethical issues when
employed to test the effectiveness of a new service or product that
is believed to be beneficial and when the number of students who are
equally eligible for and seeking that service is more than the
number who can be served. When all applicants cannot be served,
random assignment is fair, because it gives all participants an
equal chance of being selected for the program.
When a random assignment evaluation is not ethical or not
feasible, this priority includes quasi-experimental designs such as
carefully matched comparison groups, regression discontinuity
designs, single-subject designs, and interrupted time series that
are capable of estimating program impacts. However, quasi-
experimental designs do not provide the level of confidence in
causal relationships that random assignment designs provide.
Change: None.
Comment: One hundred and seventy-four respondents commented that
although it may be important to examine causality prior to wide
implementation, pilot or exploratory programs are often too small in
scale to provide reliable conclusions.
Discussion: The priority recognizes that for projects that are
focused on special populations in which sufficient numbers of
participants are not available to support random assignment or
matched comparison group designs, single-subject designs such as
multiple baseline or treatment-reversal or interrupted time series
that are capable of demonstrating causal relationships can be
employed. These small-scale or efficacy studies should lead to
large-scale or effectiveness studies. Further, this priority is only
relevant to programs for which demonstrations of effectiveness are
[[Page 3589]]
reasonable and relevant. The priority would generally not be applied
in competitions to fund pilot or exploratory programs.
Change: None.
Comment: Two hundred and forty-two respondents commented that
the choice of a research method must be determined by the goal or
question being asked. They stated that alternative and mixed methods
are rigorous and scientific and are important in knowing how well a
program was implemented and what is ``inside the box.'' Another
group of 14 respondents commented that the priority does not
preclude non-experimental designs, but gives clear priority to
experimental designs for determining project effectiveness. These
commenters noted that there may be areas in which an experimental
design may not be feasible and non-experimental methods, including
observational studies, may provide information on how to move
research forward.
Discussion: The Secretary agrees with these comments. There are
many research questions other than effectiveness that can be
pursued. For these questions, research designs other than
experimental and quasi-experimental would be appropriate. This
priority is to be applied only when the question to be addressed is
program effectiveness. The priority would be inappropriate if it
were applied, for example, to applications in which the primary
question is the fidelity of program implementation.
Change: None.
Comment: Twenty respondents expressed concern that the
Department will make the priority a requirement for all grant
competitions regardless of the intervention.
Discussion: The Secretary does not intend to make random
assignment a requirement for all of the Department's grant
competitions. The priority is intended for use only with
discretionary grant programs in which grantees may use their funds
to implement clearly specified interventions, and when the
Department desires to obtain evidence of the impact of those
interventions on relevant outcomes.
Change: None.
Comment: One hundred and sixty-eight respondents disagreed with
the Department's statement in the notice of proposed priority that
``this regulatory action does not unduly interfere with State,
local, and tribal governments in the exercise of their governmental
functions.'' They took the position that as provision and support of
programs are governmental functions so, too, is determining program
effectiveness.
Discussion: As indicated above, the priority is for use only
with discretionary grant programs in which awards are made on the
basis of competition. The Secretary often establishes priorities for
such programs and does not agree that supporting projects that would
use scientific methods to evaluate the effectiveness of the
interventions being implemented with grant funds would interfere
with State, local, and tribal governments in the exercise of their
governmental functions.
Change: None.
Comment: Six respondents expressed concern that the priority
might limit what is studied or result in poorer quality programs
being funded because of the additional points given to the
evaluation priority.
Discussion: When using the priority to give competitive
preference to an application, the Secretary intends to review
applications using a two-stage process. The first stage would review
the application without taking the priority into account. In the
second stage of review, the applications rated highest in stage one
would be reviewed for competitive preference. This will ensure that
applications of lower program quality will not be funded as a result
of additional points for the evaluation priority.
Change: Although no change has been made in the priority, the
description of the competitive preference is clarified to include a
two-stage review.
Comment: Nine respondents recommended that the Department
continue to recognize the importance of independent evaluators.
Discussion: The priority gives preference to independent
evaluators who have no authority over the project and are not
involved in its implementation. Thus the importance of independent
evaluators is recognized.
Change: None.
Comment: Twenty-three respondents expressed concern that there
would be inadequate financial and technical resources in small
programs and in rural areas to carry out a random assignment study
and may prevent congressionally-intended beneficiary communities
from receiving federal assistance.
Discussion: The priority provides for the use of alternate
designs where insufficient numbers of participants are available to
support random assignment or matched comparison group designs. The
Secretary believes that investing in projects that generate evidence
regarding the effectiveness of specified interventions would provide
benefits beyond the individual grantee, and thus would represent a
wise use of program dollars.
Change: None.
Comment: None.
Discussion: In order to make this priority more understandable
to the general public, the Secretary believes that the priority
would be improved by adding generally accepted definitions for
technical terms used throughout the document. This may be helpful to
practitioners and others who are interested in strengthening the
evaluations of proposed projects but who may not be familiar with
the specific types of evaluation described in this notice.
Change: The Secretary has added a definitions section to provide
generally-accepted definitions of terms used throughout the
document.
[FR Doc. 05-1317 Filed 1-24-05; 8:45 am]
BILLING CODE 4000-01-P