Articles

Tips for Measuring Impact in Environmental Education and Outreach

This article offers 10 tips for improving and measuring the short-term impact of an environmental education program on adult audiences.
Updated:
August 22, 2023

How people shop, vote, and manage their land eventually affects the quality of the environment and the health of our planet. Professionals working in environmental education and outreach seek to reverse these trends by promoting environmentally responsible behaviors. This article offers 10 tips for improving and measuring short-term impact on adult audiences as well as data analysis and reporting.

Definition of an Impact Analysis

Impact measures are not the same as teaching evaluations and metrics such as attendance and participation. An impact analysis is an evaluation procedure that determines the effect or influence of a program on the target audience. More specifically, it measures changes in people's thinking processes (i.e., short-term impact) which are expected to lead to changes in behavior (i.e., long-term impact). The widespread adoption of environmentally responsible behaviors can eventually lead to changes in environmental quality (i.e., broader impact; Figure 1).

Stages of progression associated with environmentally responsible behaviors
Figure 1. Stages of progression associated with environmentally responsible behaviors.

Understand the Broader Context

There are many factors that influence your audience's behavior toward the environment. These factors often occur within social, personal, and structural domains. Educational experiences are best suited for influencing personal factors, or thinking processes, including knowledge, perceptions, attitudes, beliefs, values, and preferences. However, educational interventions alone are often not enough to mitigate the influence of some structural or social factors.

Some beliefs and behaviors are also a matter of conditioning or a subconscious response within a given context. In these cases, the participant needs to learn how to adopt the use of external signals that prompt them to pay attention and think about their behavior (e.g., a sign on the door that says, "Turn off the light!").

Audiences that are consistently under pressure (lack of time) or in a state of crisis (unable to meet basic needs) are also more constrained in their cognitive abilities and less able to make reasoned decisions. In this case, partnering with organizations that provide assistance will be important for helping certain audiences prepare to learn.

Tip 1. It is easier to impact audiences when the barriers to behavior change are due to a lack of knowledge and misleading perceptions. It is more difficult to encourage new behaviors when audiences also have financial barriers or don't have the opportunity to implement new or expected behaviors.

Identify Goals and Outcomes

Learning is a multi-stage process where a person first becomes aware of the issue and engages in critical thinking. Only after this will they have the capacity or motivation to take meaningful action to address the issue (Figure 2).

The four stages of learning by Noel Burch
Figure 2. The four stages of learning by Noel Burch.

When the topics presented in the program are selected by the experts, the goal of your program will most likely be audience engagement. Experts have new insights about broader drivers of environmental change, which the average person may not fully understand or appreciate. The goal of engagement is to transform not just people's knowledge, but also their thinking about the issue and perceived responsibility as citizens (e.g., attitudes, beliefs).

You can be confident that an audience is prepared to take action when the topics or ideas for changing behaviors come from the audience. This is why needs assessment procedures are typically done with an already engaged audience. Programs that offer easy ways to change behavior, increase a person's social credit (e.g., certification), and help participants reduce their costs are more likely to encourage sustained changes in behavior.

Tip 2. It is important that audiences are already engaged in the issue before they generate attitudes towards the behavior; otherwise, opinions about the behavior may interfere with opinions about the environmental problem.

Link Outcomes with Changes in Thinking Processes

There are a variety of theories from psychology, economics, and environmental education that link responsible behaviors with underlying thinking processes and intentions. One of the more established theories is the Theory of Planned Behavior, which asserts that factors such as knowledge, perceptions, attitudes, social norms (i.e., peer approval), and perceived ability to execute new behaviors underly a person's behavioral intentions (Figure 3).

Theory of Planned Behavior by Icek Ajzen
Figure 3. Theory of Planned Behavior by Icek Ajzen

If the expected behavioral outcome is "engagement", then impact measures need to relate to changes in knowledge or understanding about the environmental issue. Participants also need information on how changes in environmental quality may affect human welfare. This supports critical thinking and helps participants generate a position (i.e., attitude) towards all possible outcomes. Many also need to determine if their position will be approved by the groups they identify with (i.e., social norms).

If the expected behavioral outcome is "taking action", then the knowledge gained by participants needs to support the practice or adoption of the desired behavior. They should no longer have to be convinced that the issue is important. They also need to feel that by adopting the behavior they can make a difference in addressing the environmental issue (i.e., attitude). This attitude is supported when they believe that others are also willing to adopt the same behavior or would support the participant (i.e., social norms). Finally, participants need to feel confident that they have the skills needed to conduct the behavior correctly (i.e., perceive skills and abilities).

Another barrier to behavior change relates to the delivery of the education program itself. If participants express favorable opinions about the quality, importance, and usefulness of a program, then the educator can be confident that participants are getting the support they need to change their behavior (Figure 4).

flow chart
 Figure 4. The relationship between the perceived quality, importance, and usefulness (value) of the program and behavior change

Tip 3. The thinking processes that underlie behavioral intentions will be different across programs and audiences. It is useful to test which barriers are important during the pilot stage of a program to help focus the curriculum. In cases where participants are already highly motivated to adopt new behaviors, educators may only need to measure impact using customer satisfaction questions.

Measure Participant Characteristics

Some educators avoid asking participants to share information about themselves, thinking that it may interfere with privacy concerns. However, it is important to collect data describing participant characteristics. Demographic data can help you determine which groups were most impacted by your program and which groups you are not reaching. It can also help you make predictions about how well your program may work in contexts where the demographic characteristics are different.

Collecting information about current behaviors is also important for describing the counterfactual condition, or what the person was doing before they were impacted by your program. For example, if you want to encourage people to walk to work, to reduce their carbon footprint, it is important to determine not only if they walk to work already, but also how far it is to walk to their workplace. Questions that measure your participant's sphere of environmental influence (e.g., acres owned, number of cars owned) will help you better describe how your program contributed to changes in environmental quality.

Tip 4. It is just as important to measure "who" is being impacted by your program as "how" they are impacted.

Getting Defensible Data

The quality of a survey is based on two types of metrics, validity and reliability. Survey findings are considered valid when the data accurately describe the phenomena that you are attempting to measure. To design a survey that produces valid measures of behavioral intentions, it is best to use a logic model to help link expected barriers, curriculum strategies, and thinking processes with statements that describe short-term impact. An example logic model is provided in Table 1.

Table 1. Example of the logic model used to design survey questions to measure impact on participant intentions to engage.

Barrier

Curriculum

Change in thinking

Impact statement

Knowledge about plant ecology

Demonstrates that the native red berry bush is an important part of local ecology (e.g., supports birds and insects)

Has a broader understanding of the bush within a natural setting

I learned something new about the red berry bush and how it supports other plants and animals

Knowledge about invasive plants

Illustrates morphological and chemical differences between native and invasive red berry bushes

Knows the difference between native and invasive bush species

I learned something new about the difference between the invasive and native red berry bush

Attitude towards conservation outcomes

Demonstrates that the native red berry bush is part of local history (e.g., seen in historic photos, used in traditional foods)

Positive attitude towards the native bush as representing community history and culture

Today, I feel more positive about the native red berry bush as a symbol of our community

Attitude about perceived peer approval

Demonstrates that people are concerned that the invasive bush has toxic berries that may harm people through accidental ingestion Positive attitude about neighbors supporting control the invasive bush to help protect people  Today, I feel more positive that my neighbors will support my working to prevent ingestion of the toxic berries of the invasive red berry bush

Survey questions are considered reliable when they provide consistent measures from person to person. That is, variation in response to a question is due to variation in people's thinking processes and not because it was a confusing question.

Tip 5. Use your needs assessment findings, expert advice, and social science theory to identify which thinking processes are important to measure. Survey testing with the intended audience is also critical for identifying questions that may be confusing or appear biased.

Question Format

In an impact survey, multiple-choice questions are best suited for collecting data about the participants in your program and the counterfactual condition (Table 2). For these questions to be reliable, the data needs to be collected using the same units of measure, and participants need to retrieve the information easily from their memory.

Table 2: Multiple-choice questions for measuring impact

Figure 2: Example questions for measuring impact.

It is reasonable to assume that multiple-choice questions could also be used to measure change in knowledge. However, adults can have test anxiety and may avoid questionnaires that appear to be tests of knowledge. Most people prefer questionnaires that ask their opinion, which is considered neither correct nor incorrect.

Likert scales are best suited for measuring opinions, attitudes, perceptions, and preferences. The scale functions by asking respondents to react to a statement using options that represent strongly positive to strongly negative positions. Most impact surveys use a 5-point scale, which is useful for identifying a middle or neutral position (Table 3). 

Table 3: An example of a Likert scale for measuring impact

Example of a Likert Scale

Likert scales that describe changes in quality, importance, and value can also be used to measure customer satisfaction. Frequency and magnitude scales may be used to help participants describe important phenomena or past behaviors when there is difficulty remembering exact details. For example, instead of asking participants to report the number of invasive beetles they have in their backyard, they can use the magnitude scale to report if they think they have more or fewer beetles compared to the demonstration site (which serves as the baseline). When used correctly, Likert scales can provide meaningful data about changes in participants' thinking processes. The statements in the scale should represent the thinking processes and behavioral outcomes identified in the logic model. If a participant agrees that they experienced positive changes in multiple areas of thinking, then it is reasonable to expect that they have engaged in critical thinking and this may affect their future behaviors.

Tip 6. It is important to consider that some preformatted questions may appear like a Likert scale, but are really multiple-choice questions with ambiguous units. When questions fail to use established units of measure (e.g., days, acres) or don't contain both positive and negative positions within the same scale, what the choice options mean from person to person is unknown. Variation in how the options are interpreted can obscure important indicators of impact. 

Data Collection and Sampling

Data collection procedures should seek to gather a representative sample of not only your participants but the target audience as well. Data that represents your participants helps you identify who is underrepresented in your program. A representative sample of the target audience in your data can help you make more accurate estimates of your program's potential broader impact.

If you want to collect a random sample and the total number of participants is less than 500, then you need to collect data from at least half of the participants. In most cases, a 50% sample rate is estimated to have a 95% confidence interval and a 5% margin of error.

Another approach is to strategically select respondents according to categories that are important to your program and/or representative of the target audience (e.g., demographic characteristics, homeowner characteristics, location). A good rule of thumb is to have at least 30 observations from people representing each category of interest.

To help increase the response rate, educators need to find a balance between having a robust survey tool (i.e., containing many questions) and a survey that is more likely to be completed (i.e., has a limited number of questions).

Tip 7. It is reasonable to expect participants to spend about five minutes answering a survey for every hour of participation. You are also more likely to get a complete response if the survey is taken in person, directly after an event.

Data Analysis

The goal of any data analysis is to describe general trends in the data and understand variation in responses across key areas. The most basic type of data reporting is to calculate the percentage associated with each question and response option. When statistics are performed separately for each question, this is known as an analysis of group data (Figure 5).

Question: What is your gender?

  • Male (48%)
  • Female (49%)
  • Prefer not to say (3%)

Figure 5. Example of a multiple-choice question.

The challenge with this simple approach is that the interpretation of the data is limited. How individual responses are correlated across questions is not well understood. A matrix can be used to compare group responses across different questions, but this approach is still limited because the matrix becomes more cumbersome and difficult to interpret as more questions are added. The inherent problem is the reliance on percentages to report data.

One way to reduce this complexity is to assign a grade or score to each person, indicating how much they were impacted. The score is then used to describe variation in impact across respondent characteristics. Table 4 is an illustration of a simple scoring method. In scales with three or more statements, mean scores greater than three indicate that the participant agreed with more than half of the statements, suggesting the program was impactful. Mean scores less than three indicate the participant disagreed with more than half of the statements, suggesting no meaningful impact occurred. Using an odd number of statements will help ensure that individual scores mean that the participant agreed/disagreed with more than half of the statements.

Table 4. Example of how mean scores are calculated for individuals and groups based on level of agreement with statements in the Likert scale (i.e., 1=strongly disagree, 5=strongly agree)
Participant # Change in Knowledge Change in Attitude Perceived Value Individual Mean Score
1 3 5 2 3.33
2 4 3 1 2.66
3 5 4 3 4.00
4 4 5 4 4.33
Group Mean Score 4.00 4.35 2.50

Tip 8. When using the individual scoring method, it is important that all the statements in the scale are measured using the same choice options. Just as with multiple choice questions, you can only combine data that have the same units of measure.

Writing an Impact Report

The elements of a good impact report help describe excellence in different areas of teaching, correlations between impact and respondent characteristics, and potential broader impacts.

The opening section should briefly describe the environmental issue, the goal of the education program, and the target audience. Next, the methods section should include a brief description of the survey items measured and sampling procedures.

The findings section should start with a description of who responded to the survey. Group analysis findings are also reported here and should be used to illustrate areas of excellence in teaching. Figures in the report should focus on who was impacted and what their counterfactual behaviors were. This will allow you to highlight what outcomes could occur as a result of your program.

Findings should also include no impact results and a discussion of what factors may be contributing to that outcome. In some cases, the reason for no impact can be due to factors outside the educator's control. In other cases, it may be that educators misunderstood the problem (i.e., barriers to behavior change), and the curriculum and survey need to be improved.

The final section of the report describes broader impacts. Here, you use evidence from the analysis to make predictions of impact if the education program was distributed more widely or over time. Research describing economic impact and impacts on human health can also be used here so that the implications of expanding or continuing the program are better understood.

Tip 9. It is important to be sincere in your efforts to learn about potential weaknesses in your program. Honesty in this area can leave one feeling vulnerable, but it can also help open the door to working with more people, which can lead to more effective solutions.

Next Steps

An impact analysis doesn't stop here. Evidence of real behavior change is needed to help validate the tool and inform programming (i.e., long-term impact). Educators should follow up with a subsample of participants (6 months to 1 year) to determine if the recommended behaviors were adopted and how these changes may have affected the environment. This can be done quickly using email and a few short survey questions.

Tip 10. Long-term impact questions should ask participants (1) if they actually used the information when making a decision and (2) if the behavior change resulted in the desired outcome. Interviews and comment boxes can be used to collect detailed stories about how the education program helped facilitate the behavior change and how this may have helped improve or protect the environment.