evaluation findings. As mentioned in the section on comparison groups, be creative when designing your evaluation. You might find that, with a little resourcefulness at the design stage, you can implement a stronger evaluation than you originally thought. For example, instead of purposefully assigning students or teachers to the program or allowing participants to self- select, you might consider a lottery at the start to determine who will participate. In such cases, if results are promising for the first cohort who participates, additional resources could be sought to expand the program to all students and classrooms.
Enriching Your Evaluation Design
Whether you have chosen to evaluate using a single-group, comparison-group, or experimental design, there are several methods and approaches you can use to enrich your evaluation design. Such methods are added supports in your evaluation design that can increase the usefulness of your results and credibility of your findings, make your evaluation more manageable, and expand upon information obtained throughout program implementation. These methods include using repeated measures, longitudinal data, and sampling. Logic modeling too can enrich your evaluation, as it can be used to construct a reasoned explanation for your evaluation findings. Supplementing your evaluation design with a case study could also enrich your evaluation design by providing in-depth information regarding implementation and participant experiences.
Using repeated measures, collecting the same data elements at multiple time points, can also help to strengthen your evaluation design. If the program you are evaluating is intended to improve critical thinking skills over a specified time period (e.g., 1 year), taking repeated measurements (perhaps monthly) of indicators that address critical thinking skills will not only provide you with baseline and frequent data with which to compare end-of-year results, but will also enable program staff to use midterm information in order to make midcourse corrections and improvements.
Using longitudinal data, data collected over an extended period of time, can enable you to follow program participants long-term and examine post-program changes. Longitudinal data can also enable you to examine a program’s success using a time series analysis. For example, suppose your district made the change from half-day to full-day kindergarten 5 years ago, and you are asked whether the program positively affected student learning. The district has been using the same reading assessment for kindergarteners for the past 10 years. The assessment is given in September and May of each year. You examine the September scores over the past 10 years and find that there has been little variability in mean scores. Mean scores by gender, ethnicity, and English Language Learner (ELL) status have been fairly steady. You conclude the kindergarteners have been entering school at approximately the same mean reading level for the past 10 years.
37
Next, you examine the May reading scores for the past 10 years. You notice that for the first 5 years, the mean end-of-year scores (overall and by subgroup) were significantly greater than the September scores, but varied little from year to year. However, for the past 5 years, the May scores were about 15 percent higher than in the previous 5 years. The increase by gender and ethnicity was similar and also consistent over the past 5 years, while reading scores for ELL students were over 30 percent higher in the spring, after the full-day program was instituted. After ruling out other possible explanations for the findings to the extent possible, you conclude that the full-day kindergarten program appears to have been beneficial for all students and, particularly, for the district’s ELL students.
If your program has many program participants or if you lack the funds to use all of your participants in your evaluation, sampling to choose a smaller group from the larger population of program participants is an option. Random sampling selects evaluation participants randomly from the larger group of program participants, and may be more easily accepted by teachers, parents, and students, as well as other stakeholder groups. Whether you are using random sampling or purposeful sampling, you should select a sample group that is as representative as possible of all of your participants (i.e., the population). Typically, the larger the sample you use, the more precise and credible your results will be. For more information on Research and Evaluation Design, Including Reliability and Validity and Threats to Validity, see Appendix C.
Using logic modeling in your evaluation can also help to strengthen the credibility of your findings. By examining the implementation of your strategies and activities as well as the measurement of progress on your early, intermediate, and long-term indicators, your logic model can provide you with interim data that can be used to adjust and improve your program during its operation. As described with the reading comprehension example in the single-group design section, logic modeling can help to show a theoretical association between the strategies and outcomes in even the weakest of evaluation designs.
Evaluation reporting should be ongoing. While formal evaluation data and reports may be issued once or twice a year, informal updates should be provided to program staff on a regular basis.
It is a good idea to explicitly identify (on your logic model) when evaluation updates will occur and to delineate these evaluation milestones in the evaluation contract with your evaluator.
Frequent and ongoing evaluation updates give a program the best opportunity to monitor and improve implementation, as well as to help maintain stakeholder support.
Finally, case studies are in-depth examinations of a person, group of people, or context. Case studies can enrich your understanding of a program, as well as provide a more
38
accurate picture of how a program operates. See the Evaluation Methods and Tools section for more information on case studies.
Building Reporting into Your Evaluation Design
You do not need to wait until the end of the evaluation to examine your goals. In fact, you should not wait until the end! Just as our teachers always told us that our grades should not be a surprise, your evaluation findings should not be a surprise. You should build reporting into your evaluation design from the very start.
It works well to align your evaluation’s schedule with your program’s time line. If you aim to have your program infrastructure in place by the end of summer, monitor your logic model indicators that address this activity prior to the end of the summer (to verify that the program is on track), and again at the end of summer (or early fall). If the infrastructure is not in place on schedule or if it is not properly operating, program staff need to know right away to minimize delays in program implementation (and so you do not waste time measuring intermediate indicators when early indicators tell you the program is not in place). Likewise, do not wait until the end of the year to observe classrooms to determine how the program is used. Frequent and routine observations will provide program staff with valuable information from which they can determine whether additional professional development or resources are needed.
39
Grovemont School District had 80 third- through fifth-grade classrooms across six elementary schools (28 third-grade classrooms, 28 fourth-grade classrooms, and 24 fifth-grade classrooms). District class size for grades three through five ranged from 22 to 25 students per classroom. Because of state budget cuts and reduced funding for the program, the E-Team knew that Mrs. Anderson and the READ oversight team would have to make some difficult choices about how to structure and evaluate their program.
Some members of the oversight team wanted to implement the program in fifth grade only for the first year, and then reexamine funds to see if they might be able to expand down to fourth grade in Year 2. Others voted to start the program at two of the six elementary schools and then try to include an additional school in Year 2.
Dr. Elm and the E-Team recommended that they consider partially implementing the program at all six schools and across all three grades. Dr. Elm explained that they would receive much better information about how their program was working and, more importantly, how it could be improved, if they were able to compare results from those classrooms that were using the program with those that were not. Dr. Elm knew that students at all of the schools in Grovemont School District were randomly assigned to teachers during the summer before each school year. However, Dr. Elm explained that in order to minimize initial differences between those classrooms that participate in READ and those that do not, they should consider randomly assigning half of the classrooms to continue with the existing district curriculum while the other half would supplement their existing curriculum with the READ program.
Dr. Elm also recommended that they first divide the classrooms by school and grade level so that each school and grade would have one half of the classrooms assigned to the program. Teachers whose classrooms were not assigned to the program would be assured that if the program proved successful, they would be on board by Year 3. However, if the program did not have sufficient benefits for the students, it would be discontinued in all classrooms after Year 2. Dr. Elm concluded that building a strong evaluation into their program would provide them with credible information as to how their program was working and that having data to direct their program adjustments and improvements would give the program the best opportunity to be successful.
The READ oversight team agreed to think about this idea and reconvene in 1 week to make a decision. The E-Team also distributed the evaluation matrix it had created based on the READ logic model. The E-Team asked the oversight team to review the matrix and provide any feedback or comments.
40
The following week, the E-Team and READ oversight team reconvened to decide how to structure the program and to work on the evaluation design. Mrs. Anderson had spoken with the district superintendent about the evaluator’s suggestion of implementing READ in half the district’s third- through fifth-grade classrooms, with the promise that it would be expanded to all classrooms in Year 3 if the program was successful. Although logistically it would be easier to implement the program in two or three schools or one or two grades than to implement it in half the classrooms in all schools and at all grades, the superintendent understood the benefit of the added effort. The evaluation would provide higher quality data to inform decisions for program improvement and decisions regarding the program’s future.