Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

For each uploaded document

Answer questions

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Explanatory Variables, Exercises of Design

Through the National Nonpoint Source Monitoring Program (NNPSMP), states monitor and evaluate a subset of watershed projects funded by the.

Typology: Exercises

2022/2023

Uploaded on 02/28/2023

kaijiang 🇺🇸

4.5

(8)

280 documents

1 / 45

This page cannot be seen from the preview

Don't miss anything!

bg1

1

Through the National Nonpoint Source Monitoring Pro gram (NNPSMP),

states monitor and evaluate a subset of water shed projects funde d by the

Clean Water Act Section 319 Nonpoint Source Control Pro gram.

The program has two major objective s:

1. To scientifically evaluate the effectivenes s of watershed technologie s

designed to control nonpoint s ource pollution

2. To improve our understanding of nonpoint source pollution

NNPSMP Tech Notes is a series of publications that share s this unique

research and monitoring ef fort. It off ers guidance on data collection,

implementation of pollution control technolo gies, and monitoring design,

as well as case studies that illustrate principles in action.

Introduction

An important objective of many nonpoint source (NPS) watershed projects is to document

water quality changes and associate them with changes in land management. Accounting

for major sources of variability in water quality and land treatment/land use data increases

the likelihood of isolating water quality trends resulting from best management practices

(BMPs). Correlation of water quality and land treatment changes alone is not sufficient

to infer causal relationships. Factors not related to BMPs may be causing the water quality

changes, such as changes in land use, climatic, or hydrologic conditions. These factors are

often referred to as explanatory variables or covariates.

Including explanatory variables in water quality trend analyses yields estimates of changes

that are closer to those that would have been measured if the ”non-BMP” factors did not

vary over time. For example, precipitation totals and patterns that differ substantially

between the periods before and after BMPs are implemented can essentially shroud the

impacts of the BMPs on water quality. By accounting for, or filtering out, these changes

in precipitation it becomes easier to isolate changes in water quality that may be associated

with the BMPs. In statistical terms, accounting for variability in water quality due to

these other factors decreases “unexplained” variation in the now-adjusted1 water quality

data, facilitating documentation of statistically significant trends.

This Tech Note describes explanatory variables that are often important in NPS

watershed studies and offers suggestions on how to determine which explanatory variables

should be tracked for a specific project. Techniques to incorporate explanatory variables

into statistical trend models are highlighted, and example data sets are provided. The

1 Data are considered to be “adjusted” after values are altered using appropriate statistical met hods to

account for explanatory va riables.

Explanatory Variables: Improving the Ability

to Detect Changes in Water Quality in

Nonpoint Source Watershed Studies

Jean Spooner, Jon B. Harcum, and Stev en A. Dressing. 2014. Explanatory

variables: improving the ability to d etect changes in water quality in nonpoint

source watershe d studies. Tech Notes 12, August 2014. Developed for

U.S. Environmental Protec tion Agency by Tetra Tech, Inc., Fairfax, VA, 45

p. Available online at https://w ww.epa.g ov/polluted -runoff-n onpoint-so urce-

pollution/nonpoint-source-monitoring-technical-notes.

August 2014

pf3

pf4

pf5

pf8

pf9

pfa

pfd

pfe

pff

pf12

pf13

pf14

pf15

pf16

pf17

pf18

pf19

pf1a

pf1b

pf1c

pf1d

pf1e

pf1f

pf20

pf21

pf22

pf23

pf24

pf25

pf26

pf27

pf28

pf29

pf2a

pf2b

pf2c

pf2d

Related documents

Self explanatory Self explanatory Self explanatory Self explanatory

Explanatory and Response Variables 1

Understanding Response and Explanatory Variables in Statistics

Multiple Regression: Predicting Response Variables with Multiple Explanatory Variables - P

Explanatory and Response Variables: Practice Questions and Answers

Statistics Exam: Understanding Explanatory Variables, Outliers, and Modeling - Prof. Willi

Understanding Relationship between Response and Explanatory Variables in Linear Regression

STAT 303-502 Quiz Solutions: Percentiles, Explanatory Variables, and Exam Scores

Understanding the Link Between Explanatory and Response Variables: Causation & Lurking Var

Explanatory note about my history

How to write explanatory paragraph

Understanding Psychology: An Explanatory Blog

Partial preview of the text

Download Explanatory Variables and more Exercises Design in PDF only on Docsity!

Through the National Nonpoint Source Monitoring Program (NNPSMP) , states monitor and evaluate a subset of watershed projects funded by the Clean Water Act Section 319 Nonpoint Source Control Program. The program has two major objectives:

To scientifically evaluate the effectiveness of watershed technologies designed to control nonpoint source pollution
To improve our understanding of nonpoint source pollution NNPSMP Tech Notes is a series of publications that shares this unique research and monitoring effort. It offers guidance on data collection, implementation of pollution control technologies, and monitoring design, as well as case studies that illustrate principles in action.

Introduction

An important objective of many nonpoint source (NPS) watershed projects is to document water quality changes and associate them with changes in land management. Accounting for major sources of variability in water quality and land treatment/land use data increases the likelihood of isolating water quality trends resulting from best management practices (BMPs). Correlation of water quality and land treatment changes alone is not sufficient to infer causal relationships. Factors not related to BMPs may be causing the water quality changes, such as changes in land use, climatic, or hydrologic conditions. These factors are often referred to as explanatory variables or covariates.

Including explanatory variables in water quality trend analyses yields estimates of changes that are closer to those that would have been measured if the ”non-BMP” factors did not vary over time. For example, precipitation totals and patterns that differ substantially between the periods before and after BMPs are implemented can essentially shroud the impacts of the BMPs on water quality. By accounting for, or filtering out, these changes in precipitation it becomes easier to isolate changes in water quality that may be associated with the BMPs. In statistical terms, accounting for variability in water quality due to these other factors decreases “unexplained” variation in the now-adjusted^1 water quality data, facilitating documentation of statistically significant trends.

This Tech Note describes explanatory variables that are often important in NPS watershed studies and offers suggestions on how to determine which explanatory variables should be tracked for a specific project. Techniques to incorporate explanatory variables into statistical trend models are highlighted, and example data sets are provided. The

(^1) Data are considered to be “adjusted” after values are altered using appropriate statistical methods to account for explanatory variables.

Explanatory Variables: Improving the Ability

to Detect Changes in Water Quality in

Nonpoint Source Watershed Studies

Jean Spooner, Jon B. Harcum, and Steven A. Dressing. 2014. Explanatory variables: improving the ability to detect changes in water quality in nonpoint source watershed studies. Tech Notes 12, August 2014. Developed for U.S. Environmental Protection Agency by Tetra Tech, Inc., Fairfax, VA, 45 p. Available online at https://www.epa.gov/polluted-runoff-nonpoint-source- pollution/nonpoint-source-monitoring-technical-notes.

August 2014

statistical trend approaches discussed here are parametric. Although explanatory variables are also part of biological monitoring efforts, this Tech Note focuses on water chemistry.

Information provided here is directed primarily to water quality personnel, but all involved in a NPS watershed project should find the following three sections useful in deciding which explanatory variables to monitor. The subsequent section on statistical trend analysis approaches and the examples (with sample data sets) are written with additional statistical details intended for data analysts.

What are Explanatory Variables and Why are

They Important in NPS Watershed Studies

Definition of Explanatory Variable

Explanatory variables can be defined in several, related ways. In statistical trend analysis, explanatory variables are broadly defined as variables that can be used to explain some of the variability in the response of a primary variable of interest. The response variable is usually referred to as the “Y” or “Dependent” variable. The explanatory variables are the “X” or “Independent” variables.

In NPS watershed studies, explanatory variables refer to the variables that affect the relationship between the dependent variable (e.g., water quality variable) and the independent variable of primary interest (e.g., trend). Inclusion of measured values of explanatory variables in trend analysis enables adjustment for their influence on measured water quality variables. Under this definition, variables such as streamflow and season would be examples of explanatory variables, as well as paired water quality values from a control (non-treated) watershed.

Another definition commonly found in statistics books is applicable to studies in which a response is measured in two or more categorical treatments. In this case, a covariate is a continuous variable that is correlated to the response (Y) variable and therefore “explains” some of the variation in Y in addition to that explained by the categorical treatment variable. For example, a NPS watershed study might use Pre- and Post- BMP time

Basic Terms^1 Categorical Variable : A variable that can take on one of a limited, and usually fixed, number of possible values (e.g., seasons). Continuous Variable : A variable that can take on any value between its minimum and maximum value (e.g., flow rate). Control : The absence of treatment with BMPs or other land treatment. Pertains to the control watershed in NPS monitoring studies. Control Variable : A water quality variable (e.g., nitrate) measured in a control watershed at the same time it is also measured in the treatment watershed, resulting in a paired observation. Covariate : Essentially equivalent to explanatory variable. Dependent or Response Variable : The “Y” variable in an equation, typically the primary water quality variable of interest in NPS watershed studies. Explanatory Variable : Variable that affects the relationship between the primary water quality variable of interest and the primary land treatment variable of interest (e.g., flow). Factor : A variable that influences the value of the primary variable. Independent and explanatory variables are factors influencing the value of the primary water quality variable of interest in NPS watershed studies. Independent Variable : Each “X” variable in an equation (e.g., trend variable, land treatment variable such as acres with cover crops, control watershed water quality variable, and other explanatory variables such as flow or season. LS-Means : The mean values of Y for each time period that have been adjusted for explanatory variable values. Primary Variable : The water quality variable of primary interest (e.g., total phosphorus). Treatment : The application of BMPs or land treatment during a monitoring study. Occurs in the treatment watershed of a NPS monitoring study. (^1) Definitions are tailored to the purposes of this Tech Note.

These two designs incorporate two monitoring periods (calibration and treatment periods) to allow for comparisons of statistically valid relationships established between paired observations of the same primary variable in the two watersheds before and after BMPs are implemented. Differences in the paired-observations relationships from the two monitoring periods are used as evidence of the effects of the BMPs. In these studies, the primary variable(s) is considered a response variable when measured in the treatment watershed and an explanatory variable when measured in the control watershed.

Explanatory variables play an important role in the analysis of data from other monitoring designs as well, including above/below and single-station trend designs (Dressing and Meals 2005). These weaker monitoring designs, in fact, generally rely more on the use of explanatory variables to tease out the effects of BMPs on measured water quality because they do not have the built-in control of the two stronger designs described above. This creates a need, for example, to use flow, precipitation, land use, and other factors in statistical analyses to account for their influence on the measured parameter(s) of interest.

Tracking of relevant meteorologic, hydrologic, and land use factors is essential to document the impacts of land management and BMPs on water quality. With this information, analysts can account for the influence of non-BMP factors to more accurately interpret the impacts of NPS management. Observed changes (or lack thereof) could be artifacts of hydrologic and/or meteorologic variability or some other hidden variable that also changes over time (Hirsch et al., 1982; Joiner, 1981; Baker 1988). Therefore, the addition of explanatory variables helps ensure an unbiased estimate of the true differences over time due to BMP implementation.

The ability to detect trends can be increased by the incorporation of explanatory variables into trend models, thereby decreasing the unexplained variance in the models. For the same reason, the amount of change in water quality needed to be able to detect statistically significant changes is decreased (Spooner et al. 2011). In addition, use of explanatory vari- ables may also minimize the influence of outlier observations (Joiner, 1981). Adjustment for explanatory variables such as stream discharge can also reduce autocorrelation (e.g., correlation between the current observation and the past or adjacent observations) which will increase the effective sample size and increase the power to detect trends.

Explanatory Variables Commonly Used in

NPS Watershed Studies

This section lists and describes various types of explanatory variables that can be of importance in NPS monitoring efforts. How to incorporate these variables into monitoring designs is addressed in the subsequent section.

Watershed Design Variables from the “Control

Watershed”

The control variables that are measured as a direct part of the experimental design (for paired-watershed, above/below-before/after, or nested-pair designs) are explanatory variables (Table 1). These paired observations from the control watershed could be from the same date, the same time period for composite samples, or from the same storm event as those from the treatment watershed. For example, weekly, flow-weighted composite samples taken at the outlet of both control and study (or above/below) watersheds would satisfy this requirement.

Table 1. Explanatory variables from control watersheds. Watershed Design Control Explanatory Variables Paired watershed Concentration or load values from the control watershed that can be paired with the treatment watershed water quality values Above/Below- Before/After

Concentration or load values from the upstream watershed that can be paired with the treatment watershed Nested watershed Concentration or load values from the non-treated watershed that can be paired with the treatment watershed

BMPs and Land Use

The basic hypothesis associated with NPS watershed implementation projects is that implementation of BMPs or other land management measures will cause an improvement in water quality, so it follows that measurement of this activity is essential. Quantitative documentation of land treatment trends is a necessary step in linking water quality to land treatment in statistical analysis.

Examples of quantitative measures of land treatment include:

l Number or percent of watershed animal units under animal waste management l Acres or percent of cropland in cover crops or residue management l Annual manure-based nutrient or fertilizer application rate and extent l Extent and capacity of stormwater infiltration practices

Land use changes can influence water quality in a number of ways, including changing hydrology (e.g., increased impervious surface), altering temperature regimes (e.g., decreased shading of stream), and modifying pollutant source areas (e.g., cropland converted to pasture). These changes must also be recorded to help isolate the impact of BMPs and land treatment on measured water quality. Land use modifications that could affect water quality include:

l Conversion from pasture to row crops or changes in cropping patterns l Agricultural set-asides

in sediment concentrations from field runoff. Diurnal variations were caused by irrigation schedules, and seasonal variations were characterized by maximum sediment concentrations in June and July with a dramatic drop during July due to declining erosion rates after cultivation.

Explanatory variables that can be used to account for seasonal changes include:

l Monthly or seasonal indicator variables l Sine and cosine trigonometric functions l Other explanatory variables that also exhibit seasonal patterns (e.g., streamflow)

Time series models can also incorporate seasonality using a “differencing” technique.

Details on how to calculate explanatory variables for each of these seasonal adjustment approaches are given in Attachment 1.

Meteorologic and Hydrologic Variables

Meteorologic and hydrologic processes also contribute to the variability in water quality data, often accounting for a portion of the seasonal variation noted above due to seasonal patterns in rainfall amount and intensity.

Hydrologic and meteorologic variables include:

l Stream discharge/flow (stage height is sometimes a surrogate) l Antecedent flow conditions prior to a storm l Storm volume l Duration of time to peak of storm hydrograph l Rising or falling limb of storm hydrograph l Direction of the change in flow l Magnitude/peak of event maximum discharge l Precipitation l Storm event intensity and frequency l Ground water table depth l Humidity l Salinity l Water or air temperature

How to Determine Which Explanatory

Variables are Most Important to Measure

and Incorporate into Trend Analyses

Some watershed projects begin with a dataset that can be explored for relationships between primary and explanatory variables. Many projects, however, begin with no data or such a small dataset that possibilities for analysis are limited. In these cases, project personnel should examine data from nearby, similar watersheds and examine the literature for information to guide selection of explanatory variables. Past studies have shown that a number of explanatory variables are generally useful in most projects. Where projects base selection of explanatory variables on information from similar watersheds or from the literature, it is important to confirm these relationships in the current study as data are collected.

It is important to keep in mind that monitoring designs should begin with clear goals and an outline of data analysis plans designed to determine if these goals have been met (Dressing and Meals 2005). The types and uses of explanatory variable data needed for statistical analyses should be considered and specified before monitoring begins. Reassessment of the value of selected explanatory variables is a necessary component of data analysis, and exploratory analysis of new data may reveal relationships between variables that were not expected. For these and other reasons, projects should examine data frequently (e.g., monthly) to ensure that the monitoring program is on track to meet objectives. The information below is designed to help both projects with existing data and those that are essentially starting from scratch.

General Rules of Thumb

Both projects beginning with and without a rich dataset should apply some basic rules of thumb when selecting explanatory variables.

l The date should be associated with every variable value, thus allowing assignment of month or season to address seasonal considerations. l The literature has many examples of relationships between flow measurements and pollutant concentrations and loads (Baker 1988, Foster 1980, Johnson et al. 1969, Lowrance and Leonard 1988, and Schilling and Spooner 2006), so flow or a flow surrogate (e.g., stage) should be measured whenever possible. l Runoff begins with precipitation and a multitude of studies has shown the effects of rainfall intensity and amount on runoff quality and amount, so precipitation should be measured or weather data obtained from a nearby existing weather station. l Information on land use and ground cover is essential to most projects, particularly given that BMPs are generally targeted on the basis of land use and management.

Box and Whisker plots can reveal important explanatory variables. For example, if data are stratified into groupings of the explanatory variable, inspection of the Box and Whisker plots may reveal their importance. In this application, concentration/load values would be on the Y-axis, and groupings of the potential explanatory variable on the X-axis. Examples may include data stratified by season, baseflow and stormflow, or land management types. Visual inspection of medians and extreme values may indicate the need to use these variables as an explanatory variable.

Time series plots of water quality variable values versus time can reveal seasonal patterns in data. For example, weekly flow data from the Corsica River, MD, Clean Water Act Section 319 National Nonpoint Source Monitoring Program (NNPSMP) Project shows a pattern that indicates a seasonal pattern that should be accounted for in the monitoring program (Figure 2).

Although graphical approaches can help to reveal strong candidates for explanatory variables, they are not rigorous statistical approaches and do not reveal potential correlations between the explanatory variables being considered (e.g., multicollinearity). Examples of more advanced statistical approaches are provided below.

Statistical Measures to Determine Important Explanatory

Variables

Univariate analyses

Correlation and regression analysis between primary and explanatory variables can help identify important relationships that can guide selection and use of explanatory variables in subsequent data analysis. Correlation analysis is supported by both spreadsheet and advanced statistical software. The output of a correlation analysis between two variables includes the correlation coefficient (r), which ranges from -1 to 1, and a probability value

Figure 2. Time series of weekly flow from the Three Bridges Branch subwatershed of Corsica River National Nonpoint Source Monitoring Program Project.

indicating the statistical significance of the correlation. The regression of a “Y” variable on an “X” variable reveals if there is a significant relationship as well, but also yields information on the significance, magnitude, and direction of the slope of the relationship. Similarly, correlation or regression analysis of primary variables with seasonal explanatory variables (e.g., sine/cosine seasonal components, monthly indicator variables) can be performed to test for significant seasonal patterns.

Analysis of variance (ANOVA) and the non-parametric Kruskal-Wallis test are methods that can be used to test for differences between seasons. ANOVA analyzes the differences between group means whereas Kruskal-Wallis uses ranks to test whether samples originate from the same distribution.

Another approach to determining if a seasonal element exists in a dataset is to examine the autocorrelation structure, or the similarity between observations as a function of the time lag between them. This type of test is generally not available in spreadsheet software, but is commonly found in statistical software packages. A seasonal component in a data time series can be indicated by a strong positive autocorrelation at the seasonal lag value corresponding to the length of the seasonal cycle. For example, an annual cycle will appear as a strong positive autocorrelation at lag 12 when the data consists of monthly values. Negative autocorrelations may also appear at lag intervals corresponding to one- half of the seasonal cycle length. So, while seasonality introduces variability to a dataset, it can also introduce autocorrelation which is discussed in greater detail under Data Examination and Required Adjustments. Thus, attention must be given to seasonality in the analysis of trends in NPS watershed studies, both to explain some of the variability in the primary variable and to adjust for seasonally-based autocorrelation to ensure valid results.

Multivariate analyses

Multivariate statistical procedures such as factor analysis, principal component analysis (PCA), and canonical correlation analysis (CCA) are advanced procedures that can be used to define (and perhaps subsequently adjust for) complex relationships among variables such as precipitation, flow, season, land use, or agricultural activities that influence NPS problems. These procedures require a rich dataset that many projects will not have before monitoring begins. Projects can also use these methods later, however, to analyze newly collected data to strengthen regression analyses.

Projects with robust historic datasets can apply PCA and factor analysis to help determine the most important water quality indicators and stressors, aiding in the selection of water quality and land use/treatment variables to be used in the monitoring program. PCA is a multivariate technique for examining linear relationships among several quantitative variables, particularly when the variables are correlated to each other. This technique can be used to determine the relative importance of each independent variable and determine the relationship among several variables. The results of PCA can often be enhanced

over time that is consistent in direction, but not necessarily linear. Ramp trends may include time periods of little change (e.g., pre-BMP) followed by improving trends as BMP implementation occurs, and perhaps a leveling out when maximum water quality improvement has been achieved. The methods presented in this section focus on step, linear, and monotonic trends.

Statistical Test Assumptions

The degree to which the data meet test assumptions must be assessed to ensure appropriate application of either parametric or nonparametric tests. Assumptions for the residuals 4 from parametric trend tests are generally:

l Data are normally distributed and independent l Variance is homogenous (i.e., variance doesn’t change over time) l Residuals from the regression models are independent and normally distributed

Clearly, some of these tests can be performed prior to trend analysis, whereas others such as testing of residuals are completed as part of the trend analysis.

Data Examination and Required Adjustments

Exploratory data analysis (EDA) procedures should be applied to determine if a dataset satisfies the requirements of planned statistical tests. Readers are referred to Meals and Dressing (2005) for detailed information on EDA and data transformation in addition to what is presented below.

Data Distribution and Transformation

Most statistics software packages contain a range of options for testing whether a dataset meets the distributional requirements of a statistical test, while spreadsheet software may be limited to tests for kurtosis and skewness. Nonpoint source datasets are often characterized by skewness caused by a long right tail in the distribution (i.e., higher values typically occurring during high flows). While many data transformations are possible, the log-transformation is most commonly used in NPS watershed studies to reduce skewness and enable valid results from parametric statistical trend tests. Data should be re-tested after transformation to confirm that test requirements are met.

Autocorrelation

Time series data collected through monitoring of water resources often exhibit autocorrelation (also called serial correlation or dependent observations) where the value of an observation is closely related to a previous observation (usually the one immediately before it). Autocorrelation in water quality observations is usually positive in that high

(^4) Residuals are the differences between the observed and predicted values of the dependent variable ( Y ) in statistical trend analysis.

values are followed by high values and low values are followed by low values. For example, streamflow data often show autocorrelation, as numerous high wet-weather flows tend to occur in sequence, while low values follow low values during dry periods. Autocorrelation can also be introduced by seasonality in a dataset.

Autocorrelation can affect statistical trend analyses and their interpretations because it reduces the effective sample size (degrees of freedom). Adjustment for autocorrelation is needed to ensure that trend tests yield valid results. For example, in a typical weekly or biweekly water quality dataset with positive autocorrelation, the significance of simple step and linear trends given by the test statistic is artificially increased if autocorrelation is not considered in the trend analysis, in some cases indicating a trend when it does not exist. In these cases, autocorrelation can be addressed by using a software regression program that incorporates the autocorrelation in the error term, for example PROC AUTOREG by SAS (SAS Institute 2010). Alternatively, a correction of the standard deviation of the slope estimate and revised confidence intervals can be used (see p. 11 of Spooner et al. 2011). Aggregating data by computing monthly means or medians from weekly data throughout the period of record will reduce autocorrelation, but this approach also reduces the sample size and information content of the dataset.

Trend Analysis: Statistical Models and Examples

The following sections provide details on appropriate statistical models to use for analysis of step and linear trends and examples using sample datasets accessible by the reader. Brief summaries of step and linear trend approaches are provided for readers with limited expertise or interest in statistics, followed by more detailed discussions for those with greater interest or expertise. Discussions highlight ways to incorporate explanatory variables into the analyses. Additional considerations are highlighted in Attachment 1.

Step Trends

Summary of Statistical Approach

Analysis of covariance (ANCOVA) is the most appropriate parametric test for assessing a step trend between mean water quality values from before and after BMPs are implemented. This method incorporates explanatory variables to isolate the effects of the BMPs. The appropriate statistical model will either accommodate a change in both slope and mean or just a change in mean. Explanatory variables can be added to either model. A t-test is performed to determine if there is a significant difference between the mean Y values (adjusted for explanatory variables) from the two periods.

Detailed Discussion of Statistical Method

The graphical depictions of conceptualized step trends in Figure 4 can be used to help select the appropriate statistical trend analysis model for NPS studies using the paired- watershed, above/below-before/after, and single-station trend monitoring designs. In the

suitable for isolating the effect of BMPs. The ANOVA or t-test model becomes ANCOVA when explanatory variables are added to the model. ANCOVA combines the features of ANOVA with regression (Snedecor and Cochran 1989) and can be used to compare LS- mean values from each period instead of simply comparing the unadjusted means.

When applied to the analysis of paired-watershed data (Figures 4A and 4B), ANCOVA is used both (a) to compare pre- and post-BMP regression equations between water quality measurement values (e.g., sediment concentration/load) for the treatment and control watersheds and (b) to test for differences in the average value (e.g., of sediment concentration/load) for the treatment watershed between the two time periods after adjusting for measured values of the control watershed and other explanatory variables.

In the analysis of an above/below-before/after watershed design (Figures 4C and 4D), the control variable is the upstream values (e.g., concentration/loads) which are paired with the values obtained from the monitoring site downstream of BMP treatment. The ANCOVA is used to determine if significant changes occurred in the downstream values in the post-BMP period compared to the pre-BMP period, after adjustment for variations in the upstream values.

In the analysis of a single-station step trend design (Figures 4E and 4F), the control variable is values of the hydrologic variable (or other appropriate explanatory variable) which are paired with the values obtained from the monitoring site downstream of BMP treatment. The ANCOVA is used to determine if significant changes occurred in the downstream values in the post-BMP period as compared to the pre-BMP period, after adjustment for variations in the explanatory variable(s) values.

There are two basic steps to performing ANCOVA:

Determine the proper form of statistical trend model, considering both if the slopes are the same in the pre- and post-BMP periods, as well as inclusion of explanatory variables.
Calculate the adjusted means (LS-means) and their confidence intervals to determine if there is a significant difference in the water quality pollutant values between the two periods. This is the estimate for the magnitude of change between the pre- (calibration) and post- (treatment) BMP periods.

The trend model that allows for different slopes for the pre- and post-BMP periods in the regression of the treatment watershed variable (Y-axis) on the control watershed variable (X-axis) is called the “Full Model” (Figures 4B, 4D, and 4F). If there is no statistically significant evidence of different slopes, a “Reduced Model” that assumes the same slope for each time period is appropriate (Figures 4A, 4C, and 4E). For example, in the paired- watershed study:

l Full Model: The slope of these relationships changes from calibration to treatment period (Figure 4B). A change in slope indicates that pollutant concentrations

for the treatment watershed exhibited different response to conditions that also resulted in changes in the control watershed values, or magnitude, after BMPs were applied as compared to the calibration period. l Reduced Model: The slope of the relationship between the treatment watershed concentrations/loads and control watershed concentrations/loads remains constant throughout both time periods (Figure 4A).

The homogeneity of slopes (i.e., same or different slopes) is tested using the full model to determine which of these statistical trend models is appropriate by evaluating the significance of the interaction term (b 3 in Equation 1). The full model for the paired- watershed or above/below-before/after watershed designs is:

Equation 1. Full regression model.

Where: t = time of sample (e.g., date of sample taken; could also be sequential such as day or week or month since sampling began)

i = time period (e.g., pre-BMP or “Calibration” period or post-BMP or “Treatment” period)

Yt = observation for Y at time t (e.g., weekly pollutant concentration or load from treatment watershed or downstream monitoring station)

X (^) 1t = observation for X 1 at time t (X 1 is the pollutant concentration or load from the control watershed or upstream monitoring station that is paired with Yt )

X (^) 2i = Step Trend Variable value in period i (e.g., “0” for the “Calibration” period and “1” for the “Treatment” period). Because the values are not continuous, X 2 is a categorical variable.

(X (^) 1t * X (^) 2i ) = X 3 = interaction term that enables different regression slopes for the pre-and post- BMP periods

X (^) ct = observation for X (^) c (covariate or explanatory variable) at time t

b 0 = y-intercept of the pre-BMP (calibration) period regression line (i.e., during the period for which X (^) 2i =0)

b 1 = slope of the pre-BMP (calibration) period regression line (i.e., during the period for which X (^) 2i =0)

l X (^) 2i is also used to depict the step trend for the single-station trend design. X (^) 1t becomes the key explanatory variable ( e.g., stream flow), and other explanatory variables (X (^) c ) can be included as appropriate.

This full statistical model allows the slopes to be different for each time period. Substituting the values of X (^) 2i into Equation 1 for the calibration (X 21 =0) and treatment (X 22 =1) periods, respectively, yields Equations 2 and 3:

Equation 2. Full regression model for the calibration period.

Equation 3. Full regression model for treatment period.

The homogeneity of slopes is determined by looking at the statistical significance of the interaction term, b 3 in the statistical software program output. The full model is the correct model if the interaction term is significant. If there is no evidence for separate slopes, then a reduced model with the same slopes assumed for each group (based on pooled data) should be used.

When the reduced model with common slopes is used, the interaction term is dropped and the trend model is rerun. Equation 4 would then be used to describe the linear regression for each time period (i) which would have the same slope, but be allowed to have different intercepts:

Equation 4. Reduced regression model.

Where: b 1 = slope of both the pre-BMP (calibration) and post-BMP (treatment, X (^) 2i =1) period regression lines

(b 0 + b 2 ) = y-intercept of the post-BMP (X (^) 2i =1) period regression line

Yt , X (^) 1t , X (^) 2i , c, d, b 0 , b 2 , b (^) c , X (^) c , and e (^) t are defined as above.

Finally, to test for a statistically significant trend, the LS-means and their confidence intervals are examined. LS-means correct for the bias in the X 1 and X (^) c values between the pre- and post-BMP periods. The LS-mean of each period (pre- and post- BMP periods) is the period mean for Y (Y (^) i ) adjusted to the overall mean value of each of the X 1 and X (^) c values. In other words, the LS-means are the calibration and treatment period regression values for the treated watershed evaluated at the mean of all the control watershed and explanatory values over both time periods (e.g., mean of all the X values). Operationally, inserting the mean of all X values into the regression equations for the calibration and treatment periods and evaluating the equations for the estimated adjusted value of Y (^) i will yield the LS-mean values for each period, respectively. A t-test on the adjusted LS-means then determines if there is sufficient evidence to conclude that the adjusted LS-mean for the treatment period is different from the adjusted LS-mean for the calibration period. Most statistics software provides this information. The red lines in Figure 4 indicate the comparison of LS-means from the pre-BMP and post-BMP periods. For example, in Figure 4A, for the same concentration in the control watershed, there is a lower LS-mean value for the treatment watershed in the post-BMP period, indicating an improvement in water quality after BMP implementation.

Caution must be used when interpreting the results of comparing adjusted means in the full model with individual slopes. When the slopes are not parallel, the comparisons of adjusted means may not be the most meaningful question. One may be more interested in the behavior over the entire range of X. For example, the regression lines may cross, potentially indicating a breakpoint where BMP effectiveness kicks in as described by Meals (2001). In this case a graphical presentation may be most appropriate.

Step Trend Example

Analysis of step trends is illustrated in Attachment 2 using data from Sinbad Creek (a simulated dataset based upon a watershed study). Step trend analysis was chosen for this example because implementation of livestock exclusion and pasture management occurred rapidly between the two monitoring sites and a step improvement in water quality was anticipated. Weekly TP and TSS loads were simulated from weekly grab samples and continuous flow monitoring conducted before and after BMP implementation.

Linear Trend Over Time Analysis

Summary of Statistical Approach

The most appropriate parametric test for gradual trends is regression analysis. This approach requires paired observations of the primary variable and any explanatory variables used in the statistical model. Statistical models can be selected that address linear or ramp trends, and all appropriate explanatory variables (e.g., control watershed values, discharge, BMPs) can be added to the model. Simple linear regression involves a single explanatory variable, while multiple linear regression incorporates two or more