Strategies and Activities/ Initial Implementation
Early/Short- term and Intermediate Objectives
How Should I Design the Evaluation? Evaluation Design
You are most likely evaluating your program because you want to know to what extent it works, under what conditions or with what supports it works, for which students it works best, and how to improve it. You have spent the last step defining the program and what you mean when you say it “works.” A strong evaluation design can help you to rule out other plausible explanations as to why your program may or may not have met the expectations you set through your indicators and targets. How many programs are continued with little examination of how they are benefiting the students? How often do we “experiment” in education by putting a new program into the classroom without following up to see if there was any benefit
In an effort to organize their logic model and associated information, the E-Team created an Evaluation Matrix. At this stage, the Evaluation Matrix included the READ logic model components, evaluation questions, indicators, and targets by the READ logic model strategies, early/short-term and intermediate objectives, and long-term goals. A copy of the READ Evaluation Matrix starts at Table 7: Evaluation Matrix Addressing Strategies and Activities During the Initial Implementation—Indicators and Targets.
(much less, any adverse effect)? When do we make our decisions based on data, and how often do we accept anecdotal stories or simple descriptions of use as though they were evidence of effectiveness (because we have nothing else on which to base our decisions)? Evaluation can provide us with the necessary information to make sound decisions regarding the methods and tools we use to educate our students.
Evaluation should be built into your program so you can continually monitor and improve your program—and so you know whether students are benefiting (or not). Your evaluation also should help you determine the extent to which your program influenced your results. Suppose you are evaluating a mathematics program and your results show that student scores in mathematics, on average, increased twofold after your program was put into place. But upon further investigation, you find that half of the students had never used the program, and that the students who used the program in fact had much lower scores than those who did not.
What if you had not investigated? This program may have been hindering, rather than helping, student learning.
The questions and example in the above paragraph are intended to show that while evaluation is important, it is a good evaluation (one that gives you valid information as to how your program is working) that really matters. Evaluation relies on attribution. And, the more directly you can attribute your evaluation
findings to the program activities you implemented, the more meaningful your findings will be—and the more useful your findings will be to you as you work to improve your program.
Some evaluation designs provide you with stronger evidence of causality than others. So, how do you choose the strongest possible design and methods to answer your evaluation questions, taking into account any constraints that you may have? This will partly depend upon the extent to which you have control over your implementation setting and other, similar settings.
Common evaluation designs include:
comparison group designs
randomized controlled experiments
Strong comparison group designs are often referred to as quasi-experimental designs.
Randomized controlled experiments are also called true experiments or randomized controlled trials (RCTs).
If you are implementing a project in only one of the schools in your district, your evaluation may focus on a single group—one school. In a single-group design, one group participates in the program and that same group is evaluated. While a single-group design is the simplest
evaluation design, it is also the weakest evaluation design. This is because there may be many competing explanations as to why your evaluation produced the results it did. If your evaluation showed promising results, could it be because of something else that was going on at the same time? Or perhaps the participants would have had the same results without the program?
Using your logic model along with the single-group design can help to improve the credibility of your findings. For instance, suppose you are working with an evaluator to examine a new program in your classroom or school focused on improving reading comprehension among third graders. If the evaluation results are promising, the principal has agreed to incorporate the funding for the program into the ongoing budget. If you do not have another classroom or
school against which to compare progress (i.e., you have a single-group design), you can explain how the program operates by using your logic model and the data collected at each stage of operation. You can give evidence showing that the program’s activities were put into place, use data from your early and intermediate objectives to show change in teacher practice and student progress, and present your long-term outcomes showing how reading comprehension changed. While you cannot claim that your program caused the change in reading comprehension, you can use your logic model and its associated indicators to demonstrate a theoretical association
between your program and long-term outcomes.
Using your logic model to guide your evaluation will strengthen your evaluation design.
Other ways to strengthen your design include:
measuring indicators multiple times,
studying the program longitudinally.
Comparison Group Designs
If you are able to have more than one group participate in your evaluation, typically you can improve the usability of your findings. For instance, one teacher could use the program in the classroom in one year, and another the next year—and you could compare the results not only within the evaluation classroom from one year to the next, but also between the two classrooms in the evaluation. Using multiple groups, referred to as a comparison group design, can help you rule out some of the other competing explanations as to why your program may have worked. The comparison group is the group that does not use the program being evaluated. However, the groups must be comparable. Comparing test scores from a district that used a new program to test scores from another district that did not use the new program would not yield meaningful information if the two districts are not comparable.
The strength of your evaluation design will vary with how closely matched your comparison group is with the group that will be implementing your program. Convenience groups, such as a district chosen because it neighbors your district, will likely not yield results that are as meaningful as would a comparison district that is purposefully chosen to match your school district based on multiple key indicators that you believe might influence your outcomes, such as gender, ethnic and socioeconomic composition, or past test performance.
Just because a good comparison group does not readily exist for your program, do not give up on the possibility of finding or creating one. Use some creativity when creating your evaluation design and identifying comparison groups. If you are implementing a project across a district, you may have flexibility such that you could vary the timing of the implementation in order to create a comparison group. For instance, several schools could implement the program in one year, leaving the remaining schools as a comparison group. If your evaluation results are promising, the comparison schools can be brought on board in the following year.
Strong comparison group designs are often referred to as quasi-experimental designs. When considering a comparison group, seek to identify or create a group that is as similar as possible, especially on the key indicators that you believe might influence your results, to the group that will be implementing your program. However, the only way to make certain, to the extent possible, that groups are equivalent is through random assignment. Random assignment is discussed in the following section.
The gold standard of evaluation design is the true experiment. Comparison group designs, discussed in the above paragraph, attempt to approximate, to the extent possible, a true experiment. In an experimental design, participants are randomly assigned to the program or to a nonprogram control group. True experiments are also referred to as randomized controlled experiments or randomized controlled trials (RCTs).
In a true experiment, participants are randomly assigned to either participate in the program or an alternative condition (such as a different program or no program at all). Theoretically, the process of random assignment creates groups that are equivalent across both observable and unobservable characteristics. By randomly assigning program participants, you can rule out other explanations for and validity threats to your evaluation findings. See the Research and Evaluation Design, Including Reliability and Validity section and the Threats to Validity section in Appendix C for resources addressing random assignment and threats to validity.
For some programs, random assignment may align well with program resources. For instance, for programs that do not have the resources to include all students from the start, randomly assigning students or classrooms to the program would address in a fair manner who participates in the program and would allow you to draw causal conclusions from your