For most experiments we want to compare two or more treatments. Sometimes experiments involve quantitative treatments, such as several rates of fertilizer or several rates of a growth regulator. Other times the treatments are qualitative in nature, such as comparing different rootstocks or strains of a variety, different methods of pruning, or different fruit thinning treatments. After the treatments are applied, we measure various things, such as number of fruit per 100 blossom clusters, yield per tree, number of fruit per branch, fruit weight, shoot length, number of fire blight strikes per tree, or the percentage of fruit on a tree with bitter pit or an insect defect. I call all of these things that we measure "response variables" because they respond to our treatments.
After one has recorded the data, what next? How do we summarize it so we can tell if the treatments are different? In research we test hypotheses. The hypothesis that we always test is that all the treatments are equal or that the differences between treatment means are equal to zero. A mean is the term we use for an averagethe sum of the values for a response variable divided by the number of values comprising that total. Obviously two means are rarely exactly equal, but by taking into account the amount of variability of the observations between and within treatments, relative to the magnitude of the difference between treatment means, we can estimate the probability of incorrectly declaring that the difference between treatment means is due to something other than the treatment. Most scientists use the 5% level (alpha=0.05) of significance to decide if treatments are different. Thus, there is a 5% chance of incorrectly concluding that the difference between the treatment means is due to the treatments rather than something other than the treatments. I think this is a bit conservative for many field experiments and most growers are probably willing to accept a risk of 10% or more that the treatment differences are actually not due to the treatments. The choice of a probability level is up to the researcher.
Step 1: Data Proofing
The first step in summarizing the data is to look at the data and make sure that the values seem realistic. This first step is called "data proofing" because we want to make sure that we recorded the data correctly. Look for values that seem much larger or smaller than expected. For example it is unlikely that a shoot will be 85 inches long, so maybe who ever recorded the data accidently put the decimal in the wrong place and the shoot was actually 8.5 inches long. I have analyzed thousands of data sets and I am always surprised at the number of mistakes that I find, so this is a very important aspect of data analysis and summarization.
Step 2: Calculating Means
The second step is to calculate the means. For each treatment add up all the values and divide by the number of values to obtain the treatment means. You can do this by hand or you can use Excel or some other software package to perform this task. This will tell you the average response to each treatment. Means can sometimes be misleading because they can be influenced by large or small values, so it is a good idea to look at the values that went into the calculations for the mean. If one unusual value was responsible for the difference in means, then you might consider the difference to be an anomaly, but you might want to look at the tree or shoot that was responsible for the unusual value to determine why it was unusual. If the reason for the unusual value can be identified, such as short shoots due to mice girdling the trunk or small fruit on a tree because it was not thinned properly, then there may be a valid reason to delete that observation from the data set. There are differing philosophies about deleting unusual observations, but I delete them only if I can identify the reason for the unusual value, so I rarely delete values.
Step 3. Analyzing Simple Experimental Designs
By looking at the treatment means, you should be able to tell if the treatments had an effect on the response variable and you might want to stop there. However, we can also perform a statistical test called analysis of variance (ANOVA) to provide us with the probability of incorrectly declaring that the differences between treatment means are due to the treatments. Most researchers use statistical software packages to analyze data, but these packages are too expensive for most small businesses. Excel offers an Analysis Toolpack that can perform ANOVA for simple experimental designs, such as those that I described in the previous article. To obtain the ToolPack, click on the "file" tab, from the list click on "Options" and then click on "AddIns", then finally click on "Analysis ToolPack". The "Data analysis" tab should now appear in the upper right hand corner of window.
Below is a small data set from an apple rootstock trial where I measured the trunk crosssectional area (TCA) for 8 trees (reps) of three rootstocks (B.9, M.26 and Pajam2). The treatment (rootstock) names are listed in the first row of the data set and the TCA values are listed in the 8 rows below the treatment labels.
Table 1: Apple Rootstock TrialB9 rootstock
 M26 rootstock
 Pajam2 rootstock


6.7  11.8  12.2 
8.8  11.7  17.9 
7.1  9.5  8.3 
7  17.2  13.7 
7.8  10.3  10.3 
8  8  20.9 
6.4  10  11.1 
6.2  14.9  14.7 
Using Excel to analyze data
To perform an ANOVA on this data set, enter the data into an Excel spread sheet, just as the data are presented above. In the upper righthand corner of the spread sheet click on the "Data analysis" tab. In the data analysis window click on "Anova: Single Factor" because there is only one factor (rootstock) in this experiment and click OK. If we had a second factor, let's say variety, where we had 2 or more varieties on each of the three rootstocks, then we would click on the "Anova: two factors with replication" tab. Since the treatment labels are in the first column, click on "columns" then click on the box for "labels in first row". By default, Excel choses alpha 0.05, but you can choose another alpha level. If you multiply alpha by 100, you will get the probability of incorrectly rejecting the hypothesis that treatment means are different. Click on OK and the two tables shown below are output to a new sheet.
Table 2: SummaryGroups  Count  Sum  Average  Variance 

B9  8  58  7.25  0.782857 
M26  8  82.1  10.2625  5.539821 
Paj2  8  109.1  13.6375  17.16839 
Table 3: ANOVASource of Variation  SS  df  MS  F  Pvalue  F crit 

Between Groups  163.3758  2  81.68792  10.43221  0.000714  3.4668 
Within Groups  164.4375  21  7.830357    
Total  327.8133  23     
Table 2 shown above presents a summary of the data; these values are usually referred to as descriptive statistics. The first column lists the three treatments (rootstocks) and the second column lists the number of observations (trees) for each rootstock. The third column shows the sum of the eight values for each rootstock and the forth column presents the average TCA or the mean TCA for each rootstock. The fifth column contains the variance for each rootstock. Variance measures how far a set of numbers are spread out. A variance of zero indicates that all the values are identical. Small variance indicates that the data points tend to be close to the mean and also to each other; whereas a large variance indicates that the data are very spread out around the mean and from each other. The variance for Pajam2 is quite a bit larger than the other two variances. This is because the range for Pajam2 is 8.3 to 20.9, whereas the range for B9 is only 6.2 to 8.8. When variances are large, it becomes difficult to detect statistical differences, so when variances are large we need differences between treatments that are large in magnitude to be able to reject the hypothesis that treatment means are equal.
Table 3 shows the ANOVA. ANOVA is a method to partition the variation due to treatments and the remaining variation that is not explained by our treatments, often called random or residual variation. The first column shows "Between Groups" and this information pertains to the variation due to rootstock affects. The second row in the first column is labeled "Within Groups" which is information concerning the variation due to random variation. The third row shows the total variation, obtained by adding the values for between groups and within groups. The second column shows the sum of squares (SS) which is a measure of variation or deviation from the mean. SS is obtained by subtracting and squaring the difference of each observation from the overall mean for all three rootstocks. These differences are also called residuals. The column labeled df is the degrees of freedom. This represents a penalty for estimating the means and is calculated as the number of treatments minus 1 (31=2) for rootstocks and 241=23 for total. Degrees of freedom for within groups is obtained by subtracting the between groups df from the total df or 232=21. The mean squares (MS) is obtained by dividing the sum of squares by the df. The MS for within groups is sometimes called the error or residual mean squares and is an estimate of the overall variance for the experiment.
The F statistic is calculated by dividing the between groups MS by the within groups MS. The F statistic is actually the ratio of the variation explained by rootstocks and the unexplained or random variation. Fvalues near or less than 1.0 indicate that there is as much unexplained variation as the variation explained by the rootstocks and the rootstock means are not different. Large Fvalues indicate that a large proportion of the variation is explained by the treatments. The magnitude of the Fvalue needed to reject the hypothesis, that treatments are equal, depends on the df, but values larger than about 4.0 usually indicate that treatments are different.
The right hand column labelled F crit is the Fvalue needed to declare that all three rootstocks are not equal. In this case the actual value for F is 10.43, which is much larger than the critical value of 3.47, so we normally would reject the hypothesis that all three rootstocks are equal and conclude that at least two of the rootstocks are different. The Pvalue is the probability of incorrectly declaring that the three rootstocks are not all equal. If we multiply the Pvalue by 100 we can obtain this probability. Therefore there is a 0.07% chance that the differences in the rootstock means are not due to the rootstocks. As I mentioned previously, most researchers look for Pvalues less than 0.05, so the Pvalue (0.000714) obtained from this analysis is well below 0.05, so we can be quite certain that the differences between the three rootstock means for TCA are not due to factors other than the rootstocks.
Summary
The analysis just described is for the simplest type of experiment because it involves a single treatment variable (rootstock) in a completely randomized design. By simply looking at the treatment means, one should be able to determine if treatments are different enough to be economically important. The statistical analysis is useful for determining the probability that the observed differences are actually due to the treatments rather than random variation. The amount of risk one is willing to take in concluding that the treatments are not different (5%, 10% or more) is an individual choice.