

































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Various aspects of the data analytics lifecycle, including data acquisition, exploratory data analysis, predictive modeling, and data reporting. It discusses common data analysis techniques such as regression, data mining, and data representation. The document also touches on project management considerations, stakeholder engagement, and ethical considerations in data analytics. Overall, this document provides a comprehensive overview of the key concepts and practices in the field of data analytics, making it a valuable resource for students, researchers, and professionals interested in understanding the data analytics process and the tools and techniques used in this domain.
Typology: Exams
1 / 41
This page cannot be seen from the preview
Don't miss anything!


































What is an effective method for a data analyst to prepare for a one-on-one meeting with a manager? a. Make a written list of all source code comments b. Ask other inside employees about the manager's reputation c. Bring a set of question to draw on to keep the conversation going d. Create an essay summarizing steps in the source code.
Numerical measurements of the amount of a toxic chemical substance are recorded in a large database. Which hypothesis can the data analyst answer through exploratory data analytic methods? a. The chemical will not cause harm to the habitat's native species. b. The chemical contamination is a result of human activity. c. The statistical distribution of the chemical measurements is normal. d. The best analytic approach for analyzing the data is linear regression.
c. Hypothesis testing d. Measures of central tendency
b. Cleaning Data c. Identifying outliers d. Identifying business nees - ANSWER d. Identifying business needs In which phase of the data analytics life cycle does an analyst build a histogram? a. Data acquisition b. Data exploration c. Discovery d. Predictive modeling - ANSWER b. Data exploration An analyst applies a statistical formula to obtain the average temperature for a city over the last 50 years.Which phase of the data analytics life cycle is represented by this activity?A. A. Data acquisition B. Exploratory data analysis C. Predictive modeling D. Data reporting - ANSWER B. Exploratory data analysis An analyst has been tasked with defining data columns that could contain null values. Which activity of the data acquisition phase is represented? a. Collecting data b. Disqualifying data sources c. Detecting missing values d. Transforming improperly formatted text. - ANSWER c. Detecting missing values Which activity in the data analytics life cycle occurs during the data acquisition phase and requires the most time and effort from the data analyst? a. Selecting the data sources b. Importing data into a database c. Cleaning the data d. Defining goals - ANSWER c. Cleaning the data What might be developed by data analysts when acquiring data from a data warehouse? a. The procedures for extracting files from the data warehouse. b. The procedures for updating tables in the data warehouse. c. The relational structure of tables. d. The SQL queries of data within the tables - ANSWER d. The SQL queries of data within the tables. What can be identified using a box plot? a. Frequency b. Correlation c. Interquartile range d. Mean - ANSWER c. Interquartile range
What will be a consequence of poor attention to detail during the data exploration phase? a. Not enough variables will be considered in the analysis. b. The outcome of the analysis will be misaligned to business needs. c. The analyst will lack insight into the structure of the data set. d. The model will be built using the wrong data set. - ANSWER c. The analyst will lack insight into the structure of the data set. Which aspect of data exploration occurs when an analyst writes code to compile a bar graph of dog food sales per month? a. Performance of a correlation analysis b. Analysis of data anomalies c. Verification through visualization d. Determination of variabilities - ANSWER c. Verification through visualization An oil company uses robots and sensors to detect how pipeline corrosion changes over time. The collected data is then used in a predictive model that estimates when a pipe should be replaced. How does the predictive model serve this oil company? a. To minimize interruptions from maintenance shutdowns b. To minimize the need for workforce safety training c. To improve compliance with pipeline construction standards d. To improve compliance with pipeline disposal standards - ANSWER a. To minimize interruptions from maintenance shutdowns During which phase in the data analytics life cycle would a churn analysis be performed? a. Data cleaning b. Data acquisition c. Predictive analysis d. Representation and reporting - ANSWER c. Predictive analysis Which mistake is commonly made during the predictive analytics phase? a. The data are separated into different sets b. The variables are separated into response and independent variables c. The data are prepared before the model is developed d. The model is developed before the research question is known - ANSWER d. The model is developed before the research question is known Why might a data analyst resample a data set with replacement data in a data mining project? a. Misidentification of causation due to correlation b. Wrong variables chosen for analyzation c. Too little data for training and testing data sets d. skewed data resulting from outliers - ANSWER c. Too little data for training and testing data sets
d. Define business needs at the onset of a project - ANSWER c. Maintain data on the IT infrastructure What is an example of an external stakeholder for a data analytics project? a. President/CEO b. Project manager c. Regulatory body d. Data analyst's supervisor - ANSWER c. Regulatory body Which party has the primary vision for a data analytics project and bring resources to complete it? a. Project sponsors b. Project managers c. Customers d. Data analysts - ANSWER a. Project sponsors What does the critical path represent in data analytics project management? a. Minimum time to complete independent tasks b. Maximum time to complete independent tasks c. Minimum time to complete dependent tasks d. Maximum time to complete dependent tasks - ANSWER c. Minimum time to complete dependent tasks A data analytics project manager has been asked to complete a project on a very short timeline. Which action is likely to yield positive results? a. Outsource the skilled work to an unproven vendor b. Expand the team with experienced staff c. Require current team to work overtime d. Accept lowered quality standards - ANSWER b. Expand the team with experienced staff Which type of project management problem occurs when a data mining task has started but a data acquisition task has not been completed? a. Scope b. Schedule c. Procedure d. Cost - ANSWER b. Schedule How can an organization improve interprofessional communication among team members? a. By setting work priorities for team members b. By requiring weekly updates on project deadlines c. By using tools that provide a team-based collaboration space d. By ensuring employees can recite the desired outcomes - ANSWER c. By using tools that provide a team-based collaboration space
A data analyst needs to contact a specific member of the database administration team. Which method should be used to discover the person's email address? A. Ask the project's customers B. Ask the project's sponsors C. Send an email to project stakeholders D. Send an email to the team member's manager - ANSWER D. Send an email to the team member's manager Which feature is commonly found in collaboration tools like Jira, Slack, Teams, and PivotalTracker? a. Real-time messaging b. Multivariate analysis c. Equation editor d. Source code management - ANSWER a. Real-time messaging Which action can the project manager take to keep the team engaged in the analytics project? a. At the end of the project, the team publishes an extensive research report and includes it in an email to project stakeholders. b. Throughout the project, the project manager communicates insights from the data analytics team and provides ideas of ways to act on those insights. c. At the end of the project, the project manager sends an email with the predictive model to the stakeholders so they can use it. d. Throughout the project, the project manager holds regular meetings so the entire data analytics team can showcase their work to different departments. - ANSWER b. Throughout the project, the project manager communicates insights from the data analytics team and provides ideas of ways to act on those insights. Data scientists are able to find ______, _________, and _____ in unstructured data. - Ans - order, meaning, and value What is involved in the planning phase? - Ans - 1. Defining goals
___________________________ describes the data that is present. Mean, Median, Mode, counting things. How many of each size and color of shirt were sold in the last month? Do we sell more shirts in the summer vs winter? - Ans - Descriptive analysis ____________________ makes predictions about future state of business. Forecasting volumes for example. Based on last summer and winter, what will we sell next year? - Ans - Predictive analytics _______________________ analysis with an end goal of making a recommendation. What colors and sizes of shirts should we sell to maximize profits? - Ans - Prescriptive analytics ______________________ is just looking at any variable over time - Ans - Time series analysis ____________________ is a programing language that is specific to statistics. It also has capabilities to visualize data. - Ans - R _______________ is a multipurpose programing language that has libraries that extend its capabilities to do statistical analysis. - Ans - Python ______________________ are platforms that specialize in visualization. This is where you can make graphs and charts for presentations and data storytelling to executive leaders. - Ans - Tableau and Power BI _______________________ are instant messaging platforms that facilitate in a faster, but less formal, way than email. - Ans - Teams, Slack An European union law regulating their citizens must have informed consent and ability to request or delete their own data that you collect. - Ans - GDPR When the researching organization consciously ignores data that calls their results into question or only presents one side of the results that puts them in a positive light. - Ans
In this phase, the analyst begins to understand the basic nature of data and the relationships within it. This phase often relies on the use of data visualization tools and numerical summaries, such as measures of central tendency and variability. - Ans - Data Exploration __________________ enables an analyst to move beyond describing the data to creating models that enable predicting outcomes of interest. - Ans - Predictive Modeling Tools such as _______________ play an important role in automating the training and using of models. - Ans - Python and R In this phase, an analyst tells the story of the data and uses graphs or interactive dashboards to inform others of the findings from the analyses. - Ans - Reporting and Visualization Even if you have a wide spread of a variable, let's say, age in a population, and you take lots of sample groups, the mean age of those sample groups would tend to have a normal distribution. - Ans - Central Limit theorem This is the phase of collecting data. Frequently, data will be retrieved from a database, perhaps a component of a data warehouse, by using a language like SQL. - Ans - Data Acquisition "Collect the data" is synonymous with ____________________ - Ans - data acquisition Exploring the data could be seen either in "________________" or "_____________" - Ans - Prepare the data Create a model Predictive or data mining models could be considered in the "_________________________" grouping. - Ans - Create a model ____________________ examines the distances between each point and the closest point to it, and then compares these to expected values for a random sample of points from a CSR (complete spatial randomness) pattern. - Ans - Nearest Neighbor ______________ is a simple mathematical formula used for calculating conditional probabilities. - Ans - Bayes' Theorem Interactive dashboards tools, such as _____________, allow even the novice user the ability to interact with the data and spot trends and patterns. - Ans - Tableau Data Acquisition (Step 5), Data Cleaning (Step 6), and Data Exploration (Step 7) in this framework all fall under the "____________" domain. - Ans - "Wrangling" domain.
True or false: Data science can be done without machine learning. - Ans - True If a person feels that they have been harmed by a decision made by a neural network, such as it refused a loan application, they can sue the organization. - Ans - right to explanation ____________ is data that is characterized by any or all of three characteristics. Unusual volume, unusual velocity, and unusual variety. - Ans - Big data ______________ analytics is about causation - Ans - Prescriptive the gold standard for establishing cause and effect is what's called an _________________________________ - Ans - Randomized controlled trial (RCT) ___________________ are a whole host of research designs that let you use correlational data to try to estimate the size of the causal relationship between the two variables - Ans - quasi-experiments You can do a very good _______________ without needing everything that goes into data science. - Ans - prescriptive analysis ____________ may be, at least in theory, impossible. But ____________________ can get you close enough for any practical purposes and help put you and your organization on the right path to maximizing the outcomes that are most important to you. - Ans - Causality, prescriptive analytics __________________ is all about getting the insight to do something better in your business. - Ans - Business intelligence You can get the analytics and see how well is this performing, who's watching it and when. That's a ______________________________ of a form. - Ans - business intelligence dashboard two of the most important things you can do in business intelligence are _________________, to predict what's likely to happen next, and to ___________________. - Ans - find trends, flag anomalies ___________________ is what makes business intelligence possible. - Ans - Data science ___________________ really shows to the best extent how data science can be used to make practical decisions that make organizations function more effectively and more efficiently. - Ans - Business intelligence
we have open-source programming languages like _________________ that make more rigorous data analysis inexpensive and relatively easy as well. - Ans - R and Python we can convey key performance indicators of our business to ____________, ______________, _____________ using dashboards. - Ans - executives, management, and employees we can convey complex information about our business to a wider audience using _____________ that allow users to rapidly consume and digest data - Ans - infographics analytics answers what has happened in the past. - Ans - Descriptive ________________ data is information that is gathered in non-numerical form that is typically ___________ and may be recoded to try and quantify its meaning. - Ans - Qualitative, descriptive ________________ data includes things such as: summaries of written comments on customer cards collected from suggestion boxes at stores, results from interviews of store managers by an outside consultant, a paragraph taken from an employee's self- evaluation on a performance review. - Ans - Qualitative Data is made up of a set of ______________, the individual units being measured. - Ans - Observations The ____________ is the middle number in a series that is arranged from smallest to largest. - Ans - median The ________________ is the most commonly occurring number in the dataset. - Ans - mode The fact that the _________ is not close to the ________ or the __________ tells us the distribution of scores are skewed. The scores are not evenly distributed around the ________. - Ans - average, median, mode, mean The _______ and the _________ inform a user of the central points in skew of the data.
______________________ include mean a median max and men. - Ans - Descriptive methods Data sources usually occur within some combination of four different types. The types are
______________________________________ - Ans - structured, unstructured internal and external What does GDPR stand for? - Ans - European Union's General Data Protection Regulation ______________________ which is the actual things that you end up with _______________________, how are the decisions made _______________________, how is the decision communicated - Ans - distributive justice procedural justice interactional justice _______________________ is where the algorithm processes your data and makes a recommendation, or suggestion to you and you can either take it or leave it. - Ans - Recommendations __________________________ decision making is where advanced algorithms can make and even implement their own decisions, as with self-driving cars. - Ans - Human- in-the-Loop ________________ decision making - Many algorithmic decisions are made automatically, and even implemented automatically. But they're designed such that humans can at least understand what happened in them. Such as, for instance, with an online mortgage application. - Ans - Human-Accessible ______________________ decision making is when machines are talking to other machines. And the best example of this is the internet of things. And that can include things like Wearables. My smart watch talks to my phone, which talks to the internet, which talks to my car in sharing and processing data at each point. - Ans - Machine- Centric Electronic Communications Privacy Act, or the ECPA. This law was passed by congress in 1986, during the age of ____________________. - Ans - police wiretapping ________ analytics answers what might happen in the future - Ans - Predictive
___________ analytics attempts to answer the toughest question of all, what should we do going forward? - Ans - Prescriptive _________________ data is data that can be measured in numerical form. - Ans - Quantitative A _______________________________ means that the data are close together around the mean. - Ans - small standard deviation What is the 68, 95, 99 rule - Ans - 68% of the data are within one standard deviation, above or below the mean. 95% of the observations are within two standard deviations above or below the mean. And 99.7% of the observations are within three standard deviations above or below the mean. the output of research and business environment must not only drive value but be simple enough and easy enough to __________________________. - Ans - consume by the end users Who are the end users? - Ans - Business managers, executives and even customers who are using the outputs of the research. The three principle sources of U.S. law are: ________________________, ________________________, __________________________ - Ans - Common law or judge made law, statutory law and constitutional Law _________________ is law created by courts. In the U.S. and other former British colonies judge's have authority to actually create rules of law that determine individual and organizational rights and responsibilities. - Ans - Common Law ___________________ by contrast is law created by representative bodies such as the U.S. Congress. - Ans - Statutory law ___________________________ gives the government the authority to act and restricts that authority to ensure that the branches don't overstep their bounds or infringe unnecessarily on individual rights such as rights to fairness and equality. - Ans - Constitutional law Privacy is sourced in common law in what we call _______________. It's sourced in statutory law such as GINA and other legislation that protects financial, health and student privacy. And it is sourced in the U.S. Constitution's amendments. - Ans - privacy torts It is the traditional method for analyzing legal problems that arise in any context. The legal analysis framework we will be using in this course has four parts: ______________, ______________, ______________, ______________. - Ans - Issue, Rule, Application and Conclusion. (or IRAC)