Creating an Analysis Plan, Study Guides, Projects, Research of Epidemiology

Analysis planning can be an invaluable investment of time. It can help you select the most appropriate research methods and statistical tools. It will ensure ...

Typology: Study Guides, Projects, Research

2021/2022

Uploaded on 08/01/2022

hal_s95
hal_s95 🇵🇭

4.4

(655)

10K documents

1 / 45

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
PARTICIPANT WORKBOOK
Measures of
association
analysis
table
shells
Descriptive
analysis univariable
assess
calculate
testing
variables
bivariable
design
justify
confidence intervals
stratify
plan
confounding
statistical
Creating an Analysis
Plan
Created: 2013
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d

Partial preview of the text

Download Creating an Analysis Plan and more Study Guides, Projects, Research Epidemiology in PDF only on Docsity!

PARTICIPANT WORKBOOK

Measures ofassociation

analysis

table

shells

Descriptive analysis

univariable

assess calculate

testing

variables

bivariable design justify confidence intervals

software

stratify

plan

confounding statistical

Creating an Analysis

Plan

Created: 2013

Creating an Analysis Plan. Atlanta, GA: Centers for Disease Control and Prevention (CDC), 2013.

Introduction

The Creating an Analysis Plan training module is one of three modules that will provide you with the skills needed to analyze and interpret quantitative 1 noncommunicable disease (NCD) data. When you apply these quantitative analysis skills, you will turn data into information that can be used to make informed decisions on public health program and policy recommendations.

An analysis plan helps you think through the data you will collect, what you will use it for, and how you will analyze it. Creating an analysis plan is an important way to ensure that you collect all the data you need and that you use all the data you collect. Analysis planning can be an invaluable investment of time. It can help you select the most appropriate research methods and statistical tools. It will ensure that the way you collect your data and structure your database will help you get reliable analytic results.

L EARNING O BJECTIVES

Given information about a noncommunicable (NCD) health problem and a request for health-related information, you will be able to create an analysis plan that includes the following:

  • Research question(s) and/or hypotheses,
  • Dataset(s) to be used,
  • Inclusion/exclusion criteria,
  • Variables to be used in the main analysis,
  • Statistical methods and software to be used, and,
  • Table shells to prepare for: o Univariable analysis, o Bivariable analysis,

(^1) Collecting, analyzing, and reporting qualitative data is a valuable epidemiologic skill that requires careful consideration but will not be covered in this module.

Data into Action

Analyzing and Interpreting Large Datasets

Managing Data

Creating an Analysis Plan

o Calculating measures of association, and, o Assessing for confounding and effect measure modification.

E STIMATED C OMPLETION T IME

The workbook should take between 6 and 7 hours to complete.

T ARGET A UDIENCE

The workbook is designed for FETP residents who specialize in NCDs; however, you can also complete the module if you are working in the infectious disease area.

P RE-WORK AND P REREQUISITES

Before participating in this training module, you must complete training in:

  • Basic epidemiology and surveillance
  • Basic analysis

A BOUT THIS WORKBOOK AND THE A CTIVITY W ORKBOOK

The format of the Participant Workbook consists of 3 sections. You will read information about creating an analysis plan and complete 2 exercises to practice the skills and knowledge learned. At the end of the training module you will access the Activity Workbook and complete a skill assessment which combines all skills taught.

ICON G LOSSARY

The following icons are used in this workbook:

Image Type Image Meaning

Activity Icon

Activity, exercise, assessment or case study that you will complete

Stop Icon

Stop and consult with your facilitator/mentor for further instruction

Tip Icon

Supplemental information, or key idea to note and remember

Section 1 : Overview of Data Analysis

S TEPS TO C OMPLETE B EFORE A NALYZING D ATA

There are several steps you must complete before you analyze data. For this training, these steps have been divided into two modules – Create an Analysis Plan and Manage Data. The main tasks are as follows:

1. Create an analysis plan - Identify research questions and/or hypotheses. - Select and access a dataset. - List inclusion/exclusion criteria. - Review the data to determine the variables to be used in the main analysis. - Select the appropriate statistical methods and software. - Create table shells. 2. Manage the data - Create a data dictionary. - Create a working copy of the dataset. - Clean the data in the working file. - Create an analysis file.

O VERVIEW OF S TEPS IN A NALYZING NCD D ATA

The focus of this and subsequent data analysis modules is on analyzing NCD data. In order to accurately create an analysis plan, you should be familiar with the steps in analyzing NCD data, for example, from a large complex survey. The steps are as follows:

1. Conduct univariable analyses : i. Review characteristics of the population Describe the sample population by person, place, and time characteristics. Summarize variables using population-level frequencies and calculate stratified frequencies across important sub- groups (if any). Determine the frequency distribution of these characteristics.

ii. Determine frequency of outcome variables The objective of many surveys is to estimate the prevalence of one or more health-related behaviors, practices, or outcomes, such as seat- belt use, smoking, hypertension, or diabetes. Univariable analyses should include the frequency distribution of these variables and also confidence intervals for the prevalence estimates.

2. Conduct bivariable analyses:

i. Display the data in two-variable tables: Sometimes the characterization of the population can be extended to two-variable tables, such as age by sex. For surveys in which the objective is to estimate prevalence, the data are often analyzed by population characteristics. For example, you can use two-variable tables to determine whether the prevalence varies by sex or education level. For analytic studies in which the objective is to quantify associations between exposures and outcomes, the two- variable table displays the core result, with rows representing levels of exposure and columns representing presence or absence of the outcome. ii. Compute and interpret measures of association: Determine the magnitude of association between an exposure variable and an outcome variable. If there are two or more populations, consider comparing their demographic data to determine whether they were different before the study/analysis was conducted. iii. Calculate confidence intervals and/or statistical significance: Utilize confidence intervals to quantify the variability of the data in your analysis. Use t-tests for continuous data, chi-square tests for categorical data, and other statistical tests as appropriate for the data to determine whether the results are “statistically significant.” iv. Assess for effect measure modifcation: Effect measure modification (also known as “effect modification”) is present when an effect measure such as sex, age or geographic location is different at several levels in an exposure-disease relationship. This is evaluated through statistical assessment of interaction between variables. v. Assess the effect of potential confounders:

a. Nominal: (define)_____________________________________


i. Nominal data with just two values can be called



ii. Example:



iii. Nominal data can be assigned a ___________________________ to facilitate analysis. b. Ordinal: (define)_______________________________________________


i. Example:__________________________________________


  1. Continuous (scale) data are:

  1. Two types of continuous data

are:______________________________________________ a. Interval (define):



i. Example:



b. Ratio (define) :



Example:



  • the National Health and Nutritional Examination survey (NHANES - U.S.), and,
  • the Behavioral Risk Factor Surveillance System (BRFSS - U.S., Jordan).

The databases typically are representative of a population either through a census (all persons included) or a sample (number of people selected to represent the population). For example, NHANES 1999–2000 interviewed 9,965 persons in the United States, and the database includes hundreds of variables. Before attempting data analysis for large datasets, it is very important you locate the survey sampling methodology, questionnaire, data variable dictionary and any other supporting documentation.

Tip Because you most likely did not create the dataset, you must take the time to understand the dataset in its entirety.

Some of the questions you should answer about the dataset are:

  • Who owns the database?
  • How can you get access to the database?
  • Do you need permission to use the database?
  • Does the database cost anything to use?
  • Are there rules about storing the database?
  • What was the purpose of the study?
  • What are the study hypotheses?
  • What methods were used to identify (select) the population under study and gather information from them?
  • How was the data collected, entered, and checked for quality control?
  • In what program (e.g., MS Access/Epi Info, Excel, SQL, etc.) and in what format is the dataset stored (e.g., text, ASCII, comma-delimited, etc.)?
  • How many records are in the dataset?
  • Were weights used?^2

(^2) Use weights to account for complex survey design (including oversampling), survey

non-response, and post-stratification. When a sample is weighted, it is representative of the population.

  • What is the number of observations?

Determine the original purpose of the data and the sponsor or collector of the data. Then determine the study design and methods. Identify whether the data include:

  • all persons in the population of interest (census),
  • a sample representative of the population (e.g. probability simple random sample, random sample or cluster sampling), or,
  • a sample not representative of the population (e.g. non-probability convenience sampling or purposive sampling).

Determine if the dataset contains the variables you need to answer the research questions. Assess how complete and recent are the data. Determine if you need to conduct a new survey to obtain the required data.

Verify that appropriate instruments were used to collect the data. Keep the questionnaire and codebook (data dictionary) accessible. You can use the data dictionary to learn the coding scheme and the variable names. (In the Managing Data module you will learn how to create a data dictionary.)

INCLUSION / E XCLUSION C RITERIA

Describe the criteria you will use to determine which records to analyze. For example, if you have data from an entire country or region but you work in a particular district, your inclusion criteria might include “all records of participants residing in District X.” Similarly, if you are assigned to the Diabetes Unit and you are analyzing hospital discharge data, your inclusion criteria might be “all hospital discharge records with ICD-10 codes E10 to E14.” You might exclude readmissions within 3 days of a previous discharge (which is likely a continuation of the previous problem or a complication from the previous hospitalization rather than a new episode). If your intention is to look at discharge planning, you would exclude any patient that died while hospitalized.

V ARIABLES TO BE U SED IN THE M AIN A NALYSIS

The Analysis Plan should contain a list of variables 3 to analyze that will be kept in the analysis file (a computer file derived from the original data). For

(^3) The listing of variables (i.e., data dictionary) is taught in the Managing Data module.

S TATISTICAL M ETHODS AND S OFTWARE

There are different statistical methods you will use depending on the research questions. For example, if you want to estimate the prevalence of a behavioral risk factor such as smoking or an outcome such as hypertension, you would first conduct a univariable analysis, then stratify by subgroups. If you need to determine the magnitude of association between an exposure variable and an outcome variable, you will conduct bivariable analysis. There are many quantitative statistical software packages to use for the analysis. Some examples are:

  • SPSS
  • STATA
  • SAS
  • SUDAAN
  • Epi Info

Suppose, for example, you are planning to conduct descriptive analysis on the most recent BRFSS study. Because analysis of BRFSS can involve weights for clustering of samples, you will likely need statistical software that can account for this weighting. You will also need a person with training in complex analyses or someone who can help you learn these analysis techniques.

Stop

Let the facilitator or mentor know you are ready for the group discussion.

K EY P OINTS TO R EMEMBER

Use the space below to record any key points from the facilitator-led discussion:

Activity

Practice Exercise #1 (Estimated Time: 45 minutes)

Hypertension case study The past few decades have brought a new global phenomenon called the “nutrition transition” in many low and middle income countries^4 This transition includes a large shift from traditional diets and lifestyles to one

(^4) Popkin, Barry. (2002) Stages of the Nutrition Transition: Dynamic Global Shifts Appear

to be Accelerating. Available online as of 5/7/2008 at: http://www.cgdev.org/doc/events/9.10.07/Barry_Popkin_Presentation.pdf

deaths and about 50% of all NCDs (WHO, 2011a). In 2008, CVD caused an estimated 17 million deaths and led to 151 million DALYs. Common behavioral risk factors, including tobacco use, physical inactivity, unhealthy diet and the harmful use of alcohol, are responsible for approximately 80% of the global CVD burden. 8

Source _: Adapted from Global health risks: mortality and burden of disease attributable to selected major risks. Geneva, World Health Organization,

*Includes communicable diseases, maternal and prenatal conditions, and nutritional deficiencies._

Raised blood pressure, or hypertension 9 , is the leading risk factor for mortality and is ranked third as a cause of disability-adjusted life-years. 10 It

(^8) Gaziano TA, Bitton A, Anand S, Abrahams-Gessel S, Murphy A. Curr Probl Cardiol.

2010 Feb;35(2):72-115. (^9) Hypertension is defined as blood pressure 140/90mmHg or above most of the time. If

blood pressure is > 120/80 mmHg and < 140/90 mmHg, it is called pre-hypertension. Normal blood pressure is 120/80. (Chobanian, A, Bakris, G. et al. The Seventh Report of

0% 10% 20% 30% 40%

Diabetes

Chronic respiratory diseases

Other chronic diseases

Injuries

Cancer

Other conditions*

Cardiovascular diseases

Disease Mortality

Conditions

Global Burden of Disease

has been estimated that hypertension resulted in 51% of stroke deaths and 45% of coronary heart disease deaths in 2008. 11 Mean blood pressure has decreased significantly in nearly all high-income countries due to widespread diagnosis and treatment along with access to low-cost medications. In contrast, mean blood pressure has been stable or increasing in most African countries; approximately 40% (and up to 50%) of adults in many of these countries are estimated to have high blood pressure. Most of these people remain undiagnosed, although many could be treated with low-cost medications; this would significantly reduce the risk of death and disability from heart disease and stroke. 12

Effective prevention strategies for NCDs, and specifically for reducing the burden of hypertension and CVDs, do exist. For example, efforts to reduce sodium consumption have been identified as a cost-effective means to reducing and reversing hypertension. Increasing awareness and education of the consumer through campaigns to encourage dietary change within households with low-sodium alternatives as well as use of salt substitutes is one approach. In addition, working with industry to encourage voluntary reduction of salt content of processed foods and condiments by manufacturers is another proven strategy. Recent estimates indicate that implementation of a salt reduction program could avert 8.5 million deaths globally. Combining this with implementing the WHO Framework Convention on Tobacco Control could save an additional 5.5 million lives. This would be at a cost of less than USD $0.40 per person per year in low-income and lower middle-income countries, and

the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure. JAMA. 2003;289:2560-91. www.nhlbi.nih.gov/guidelines/ hypertension / jnc 7full.pdf) (^10) Asaria P, Chisholm D, Mathers C, Ezzati M, Beaglehole R. Chronic disease prevention:

health effects and financial costs of strategies to reduce salt intake and control tobacco use. The Lancet 2007; 370(9604):2044-2053. (^11) World Health Statistics: A Snapshot of Global Health. World Health Organization, 2012.

Geneva, Switzerland. http://who.int/gho/publications/world_health_statistics/2012/en/index.html (^12) Yach D, Hawkes C, Gould C, Hofman KJ. The Global Burden of Chronic Diseases: Overcoming Impediments to Prevention and Control. JAMA. 2004;291(21):2616-2622. doi:10.1001/jama.291.21.2616.