Data Analytics Lifecycle and Techniques, Exams of Advanced Data Analysis

Various aspects of the data analytics lifecycle, including data acquisition, exploratory data analysis, predictive modeling, and data reporting. It discusses common data analysis techniques such as regression, data mining, and data representation. The document also touches on project management considerations, stakeholder engagement, and ethical considerations in data analytics. Overall, this document provides a comprehensive overview of the key concepts and practices in the field of data analytics, making it a valuable resource for students, researchers, and professionals interested in understanding the data analytics process and the tools and techniques used in this domain.

Typology: Exams

2024/2025

Available from 10/17/2024

premium-essay
premium-essay šŸ‡ŗšŸ‡ø

5

(2)

1.4K documents

1 / 41

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
D204 The Data Analytic 2024 EXAM
QUESTION WITH ACCURATE ANSWERS
GUARANTEED PASS.
What is an effective method for a data analyst to prepare for a one-on-one meeting with
a manager?
a. Make a written list of all source code comments
b. Ask other inside employees about the manager's reputation
c. Bring a set of question to draw on to keep the conversation going
d. Create an essay summarizing steps in the source code.
- ANSWER c. Bring a set of question to draw on to keep the conversation going
What is a characteristic of active listening?
a. Actively working on a task while listening to the speaker
b. Seeking to understand the speaker's emotions and intent
c. Focusing intently on the content of the message
d. Waiting patiently to share one's own thoughts –
ANSWER b. Seeking to understand the speaker's emotions and intent
Which circumstances could cause a data analyst to have difficulty developing a model
to answer a business question?
a. Project scope creep
b. Poor project budgeting
c. Lack of relevant data sources
d. Lack of stakeholder support
- ANSWER c. Lack of relevant data sources
A data analytics project team is preparing to develop a predictive model that will be
included within a business intelligence tool for upper management. Which step should
be considered for inclusion when creating the project schedule?
a. Model testing and validation for users
b. Business intelligence tool interface training
c. Model training and testing for stakeholders
d. Business intelligence tool data transformation training
- ANSWER b. Business intelligence tool interface training
Which task would an analyst consider first during the discovery phase of the data
analytics lifecycle?
a. Seek out necessary data sources
b. Formulate a project plan
c. Identify project goals
d. Develop key metrics
- ANSWER c. Identify project goals
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29

Partial preview of the text

Download Data Analytics Lifecycle and Techniques and more Exams Advanced Data Analysis in PDF only on Docsity!

D204 The Data Analytic 2024 EXAM

QUESTION WITH ACCURATE ANSWERS

GUARANTEED PASS.

What is an effective method for a data analyst to prepare for a one-on-one meeting with a manager? a. Make a written list of all source code comments b. Ask other inside employees about the manager's reputation c. Bring a set of question to draw on to keep the conversation going d. Create an essay summarizing steps in the source code.

  • ANSWER c. Bring a set of question to draw on to keep the conversation going What is a characteristic of active listening? a. Actively working on a task while listening to the speaker b. Seeking to understand the speaker's emotions and intent c. Focusing intently on the content of the message d. Waiting patiently to share one's own thoughts – ANSWER b. Seeking to understand the speaker's emotions and intent Which circumstances could cause a data analyst to have difficulty developing a model to answer a business question? a. Project scope creep b. Poor project budgeting c. Lack of relevant data sources d. Lack of stakeholder support
    • ANSWER c. Lack of relevant data sources A data analytics project team is preparing to develop a predictive model that will be included within a business intelligence tool for upper management. Which step should be considered for inclusion when creating the project schedule? a. Model testing and validation for users b. Business intelligence tool interface training c. Model training and testing for stakeholders d. Business intelligence tool data transformation training
  • ANSWER b. Business intelligence tool interface training Which task would an analyst consider first during the discovery phase of the data analytics lifecycle? a. Seek out necessary data sources b. Formulate a project plan c. Identify project goals d. Develop key metrics
  • ANSWER c. Identify project goals

Numerical measurements of the amount of a toxic chemical substance are recorded in a large database. Which hypothesis can the data analyst answer through exploratory data analytic methods? a. The chemical will not cause harm to the habitat's native species. b. The chemical contamination is a result of human activity. c. The statistical distribution of the chemical measurements is normal. d. The best analytic approach for analyzing the data is linear regression.

  • ANSWER c. The statistical distribution of the chemical measurements is normal. A restaurant owner wants to sponsor a data analytics project to provide insights regarding hamburger sales before developing a strategy for increasing sales. Which question is framed appropriately for the data analytics project? a. What are the characteristics of customers who buy hamburgers? b. What does the supply and demand curve look like for hamburgers? c. Which discount coupons should we send to neighborhood residents? d. Which varieties of hamburgers are featured by competitors?
  • ANSWER a. What are the characteristics of customers who buy hamburgers? Which organizational objective could be accomplished with a descriptive data analytics project using website request logs as a data source? a. Explain why web data transfer has increased 25% b. Estimate the traffic increase for a new product launch c. Improve the speed of server request processing d. Recommend a strategy to increase network capacity
  • ANSWER a. Explain why web data transfer has increased 25% A travel website tabulated the results of their latest marketing campaign to understand the relationship of clicks-to-sales conversions. Which area of analytics does this activity represent? a. Prescriptive b. Proactive c. Descriptive d. Predictive
  • ANSWER c. Descriptive An analyst is looking at data that includes the customer's address, date of purchase, and age. Which question could be answered from this data? a. Which customer has spent the highest dollar amount? b. Which customer is most likely to respond favorably to the next marketing campaign? c. Which state has the highest total customers? d. Which product has sold the most in a certain state?
  • ANSWER c. Which state has the highest total customers? Which outcome should be expected when working with data aggregated from multiple sources? Select two answers. a. Consistently named fields
  • ANSWER c. Privacy A specific drug is manufactured for the treatment of depression. The company decides to ignore research results on an alternative, less expensive, drug treatment in order to make higher profits. Which ASA ethical standard has the company violated? a. Unfair discrimination b. Reproducible results c. Conflict of interest d. Transparent assumptions
  • ANSWER c. Conflict of interest What do open-source software tools and widely available analysis tools, such as spreadsheets, help accomplish? a. Data schemas b. Data democratization c. Data security d. Data compliance
  • ANSWER b. Data democratization What is a feature of SQL? Choose 2 answers. a. It is an object-oriented programming language. b. The basic language is the same across database servers. c. It has built-in chart and graph creation. d. It is used with structured data and unstructured data.
  • ANSWER b. The basic language is the same across database servers. d. It is used with structured data and unstructured data. What is an example of unstructured data? a. Names, dates, and addresses b. Credit card numbers that include a credit score c. Text messages that include video d. Height, weight, and gender
  • ANSWER c. Text messages that include video Which tool should a researcher use to conduct a univariate analysis on complex statistical data? a. Tableau b. Power BI c. R d. SQL
    • ANSWER c. R Which statistical technique should be used to draw conclusions about an entire population based on a representative sample? a. Correlation b. Bayes Theorem

c. Hypothesis testing d. Measures of central tendency

  • ANSWER c. Hypothesis testing What is an example of random sampling of college students? a. Surveying students chosen arbitrarily from around the entire college campus b. Surveying every student in the college library c. Surveying students chosen arbitrarily in the library of the university d. Surveying every student on campus
  • ANSWER a. Surveying students chosen arbitrarily from around the entire college campus Which type of analysis would be used to predict a binary outcome based on a set of independent variables? a. Hypothesis testing b. Descriptive statistics c. Regression d. Time series
  • ANSWER c. Regression Which type of data analysis is appropriate if the goal is to minimize the cost of a diet, using a data set consisting of the following variables: protein content, fat content, and cost per unit? a. Decision trees b. Calculus c. Optimization d. Bayes' Theorem
  • ANSWER c. Optimization Which technique can be used to determine the likelihood that a positive diagnostic test result indicates whether the disease is actually present? a. Bayes' Theorem b. Central limit theorem c. Regression d. Optimization
  • ANSWER a. Bayes' Theorem Which concept should be considered when choosing variables for inclusion in a linear regression model? a. Feasibility of merging the variables b. Feasibility of controlling the variables c. Feasibility of testing the variables d. Feasibility of classifying the variables
  • ANSWER b. Feasibility of controlling the variables

b. Cleaning Data c. Identifying outliers d. Identifying business nees - ANSWER d. Identifying business needs In which phase of the data analytics life cycle does an analyst build a histogram? a. Data acquisition b. Data exploration c. Discovery d. Predictive modeling - ANSWER b. Data exploration An analyst applies a statistical formula to obtain the average temperature for a city over the last 50 years.Which phase of the data analytics life cycle is represented by this activity?A. A. Data acquisition B. Exploratory data analysis C. Predictive modeling D. Data reporting - ANSWER B. Exploratory data analysis An analyst has been tasked with defining data columns that could contain null values. Which activity of the data acquisition phase is represented? a. Collecting data b. Disqualifying data sources c. Detecting missing values d. Transforming improperly formatted text. - ANSWER c. Detecting missing values Which activity in the data analytics life cycle occurs during the data acquisition phase and requires the most time and effort from the data analyst? a. Selecting the data sources b. Importing data into a database c. Cleaning the data d. Defining goals - ANSWER c. Cleaning the data What might be developed by data analysts when acquiring data from a data warehouse? a. The procedures for extracting files from the data warehouse. b. The procedures for updating tables in the data warehouse. c. The relational structure of tables. d. The SQL queries of data within the tables - ANSWER d. The SQL queries of data within the tables. What can be identified using a box plot? a. Frequency b. Correlation c. Interquartile range d. Mean - ANSWER c. Interquartile range

What will be a consequence of poor attention to detail during the data exploration phase? a. Not enough variables will be considered in the analysis. b. The outcome of the analysis will be misaligned to business needs. c. The analyst will lack insight into the structure of the data set. d. The model will be built using the wrong data set. - ANSWER c. The analyst will lack insight into the structure of the data set. Which aspect of data exploration occurs when an analyst writes code to compile a bar graph of dog food sales per month? a. Performance of a correlation analysis b. Analysis of data anomalies c. Verification through visualization d. Determination of variabilities - ANSWER c. Verification through visualization An oil company uses robots and sensors to detect how pipeline corrosion changes over time. The collected data is then used in a predictive model that estimates when a pipe should be replaced. How does the predictive model serve this oil company? a. To minimize interruptions from maintenance shutdowns b. To minimize the need for workforce safety training c. To improve compliance with pipeline construction standards d. To improve compliance with pipeline disposal standards - ANSWER a. To minimize interruptions from maintenance shutdowns During which phase in the data analytics life cycle would a churn analysis be performed? a. Data cleaning b. Data acquisition c. Predictive analysis d. Representation and reporting - ANSWER c. Predictive analysis Which mistake is commonly made during the predictive analytics phase? a. The data are separated into different sets b. The variables are separated into response and independent variables c. The data are prepared before the model is developed d. The model is developed before the research question is known - ANSWER d. The model is developed before the research question is known Why might a data analyst resample a data set with replacement data in a data mining project? a. Misidentification of causation due to correlation b. Wrong variables chosen for analyzation c. Too little data for training and testing data sets d. skewed data resulting from outliers - ANSWER c. Too little data for training and testing data sets

d. Define business needs at the onset of a project - ANSWER c. Maintain data on the IT infrastructure What is an example of an external stakeholder for a data analytics project? a. President/CEO b. Project manager c. Regulatory body d. Data analyst's supervisor - ANSWER c. Regulatory body Which party has the primary vision for a data analytics project and bring resources to complete it? a. Project sponsors b. Project managers c. Customers d. Data analysts - ANSWER a. Project sponsors What does the critical path represent in data analytics project management? a. Minimum time to complete independent tasks b. Maximum time to complete independent tasks c. Minimum time to complete dependent tasks d. Maximum time to complete dependent tasks - ANSWER c. Minimum time to complete dependent tasks A data analytics project manager has been asked to complete a project on a very short timeline. Which action is likely to yield positive results? a. Outsource the skilled work to an unproven vendor b. Expand the team with experienced staff c. Require current team to work overtime d. Accept lowered quality standards - ANSWER b. Expand the team with experienced staff Which type of project management problem occurs when a data mining task has started but a data acquisition task has not been completed? a. Scope b. Schedule c. Procedure d. Cost - ANSWER b. Schedule How can an organization improve interprofessional communication among team members? a. By setting work priorities for team members b. By requiring weekly updates on project deadlines c. By using tools that provide a team-based collaboration space d. By ensuring employees can recite the desired outcomes - ANSWER c. By using tools that provide a team-based collaboration space

A data analyst needs to contact a specific member of the database administration team. Which method should be used to discover the person's email address? A. Ask the project's customers B. Ask the project's sponsors C. Send an email to project stakeholders D. Send an email to the team member's manager - ANSWER D. Send an email to the team member's manager Which feature is commonly found in collaboration tools like Jira, Slack, Teams, and PivotalTracker? a. Real-time messaging b. Multivariate analysis c. Equation editor d. Source code management - ANSWER a. Real-time messaging Which action can the project manager take to keep the team engaged in the analytics project? a. At the end of the project, the team publishes an extensive research report and includes it in an email to project stakeholders. b. Throughout the project, the project manager communicates insights from the data analytics team and provides ideas of ways to act on those insights. c. At the end of the project, the project manager sends an email with the predictive model to the stakeholders so they can use it. d. Throughout the project, the project manager holds regular meetings so the entire data analytics team can showcase their work to different departments. - ANSWER b. Throughout the project, the project manager communicates insights from the data analytics team and provides ideas of ways to act on those insights. Data scientists are able to find ______, _________, and _____ in unstructured data. - Ans - order, meaning, and value What is involved in the planning phase? - Ans - 1. Defining goals

  1. Organizing resources
  2. Coordinate people
  3. Schedule project What is involved in the wrangling phase? - Ans - 5. Get data
  4. Clean data
  5. Explore data
  6. Refine data What is involved in the Modeling phase? - Ans - 9. Create model
  7. Validate model
  8. Evaluate model
  9. Refine model

___________________________ describes the data that is present. Mean, Median, Mode, counting things. How many of each size and color of shirt were sold in the last month? Do we sell more shirts in the summer vs winter? - Ans - Descriptive analysis ____________________ makes predictions about future state of business. Forecasting volumes for example. Based on last summer and winter, what will we sell next year? - Ans - Predictive analytics _______________________ analysis with an end goal of making a recommendation. What colors and sizes of shirts should we sell to maximize profits? - Ans - Prescriptive analytics ______________________ is just looking at any variable over time - Ans - Time series analysis ____________________ is a programing language that is specific to statistics. It also has capabilities to visualize data. - Ans - R _______________ is a multipurpose programing language that has libraries that extend its capabilities to do statistical analysis. - Ans - Python ______________________ are platforms that specialize in visualization. This is where you can make graphs and charts for presentations and data storytelling to executive leaders. - Ans - Tableau and Power BI _______________________ are instant messaging platforms that facilitate in a faster, but less formal, way than email. - Ans - Teams, Slack An European union law regulating their citizens must have informed consent and ability to request or delete their own data that you collect. - Ans - GDPR When the researching organization consciously ignores data that calls their results into question or only presents one side of the results that puts them in a positive light. - Ans

  • Conflict of interest Sometimes data might not be available and the analyst will use tools such as web scraping or surveys to acquire it during which phase? - Ans - Data aquisition The ____________ states that the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger (if you were to take 50 people out of that population and get the mean, then take another 50 random people and get their mean age, and so forth, all of those means would follow the normal distribution (bell curve)). - Ans - Central Limit Theorem

In this phase, the analyst begins to understand the basic nature of data and the relationships within it. This phase often relies on the use of data visualization tools and numerical summaries, such as measures of central tendency and variability. - Ans - Data Exploration __________________ enables an analyst to move beyond describing the data to creating models that enable predicting outcomes of interest. - Ans - Predictive Modeling Tools such as _______________ play an important role in automating the training and using of models. - Ans - Python and R In this phase, an analyst tells the story of the data and uses graphs or interactive dashboards to inform others of the findings from the analyses. - Ans - Reporting and Visualization Even if you have a wide spread of a variable, let's say, age in a population, and you take lots of sample groups, the mean age of those sample groups would tend to have a normal distribution. - Ans - Central Limit theorem This is the phase of collecting data. Frequently, data will be retrieved from a database, perhaps a component of a data warehouse, by using a language like SQL. - Ans - Data Acquisition "Collect the data" is synonymous with ____________________ - Ans - data acquisition Exploring the data could be seen either in "________________" or "_____________" - Ans - Prepare the data Create a model Predictive or data mining models could be considered in the "_________________________" grouping. - Ans - Create a model ____________________ examines the distances between each point and the closest point to it, and then compares these to expected values for a random sample of points from a CSR (complete spatial randomness) pattern. - Ans - Nearest Neighbor ______________ is a simple mathematical formula used for calculating conditional probabilities. - Ans - Bayes' Theorem Interactive dashboards tools, such as _____________, allow even the novice user the ability to interact with the data and spot trends and patterns. - Ans - Tableau Data Acquisition (Step 5), Data Cleaning (Step 6), and Data Exploration (Step 7) in this framework all fall under the "____________" domain. - Ans - "Wrangling" domain.

True or false: Data science can be done without machine learning. - Ans - True If a person feels that they have been harmed by a decision made by a neural network, such as it refused a loan application, they can sue the organization. - Ans - right to explanation ____________ is data that is characterized by any or all of three characteristics. Unusual volume, unusual velocity, and unusual variety. - Ans - Big data ______________ analytics is about causation - Ans - Prescriptive the gold standard for establishing cause and effect is what's called an _________________________________ - Ans - Randomized controlled trial (RCT) ___________________ are a whole host of research designs that let you use correlational data to try to estimate the size of the causal relationship between the two variables - Ans - quasi-experiments You can do a very good _______________ without needing everything that goes into data science. - Ans - prescriptive analysis ____________ may be, at least in theory, impossible. But ____________________ can get you close enough for any practical purposes and help put you and your organization on the right path to maximizing the outcomes that are most important to you. - Ans - Causality, prescriptive analytics __________________ is all about getting the insight to do something better in your business. - Ans - Business intelligence You can get the analytics and see how well is this performing, who's watching it and when. That's a ______________________________ of a form. - Ans - business intelligence dashboard two of the most important things you can do in business intelligence are _________________, to predict what's likely to happen next, and to ___________________. - Ans - find trends, flag anomalies ___________________ is what makes business intelligence possible. - Ans - Data science ___________________ really shows to the best extent how data science can be used to make practical decisions that make organizations function more effectively and more efficiently. - Ans - Business intelligence

we have open-source programming languages like _________________ that make more rigorous data analysis inexpensive and relatively easy as well. - Ans - R and Python we can convey key performance indicators of our business to ____________, ______________, _____________ using dashboards. - Ans - executives, management, and employees we can convey complex information about our business to a wider audience using _____________ that allow users to rapidly consume and digest data - Ans - infographics analytics answers what has happened in the past. - Ans - Descriptive ________________ data is information that is gathered in non-numerical form that is typically ___________ and may be recoded to try and quantify its meaning. - Ans - Qualitative, descriptive ________________ data includes things such as: summaries of written comments on customer cards collected from suggestion boxes at stores, results from interviews of store managers by an outside consultant, a paragraph taken from an employee's self- evaluation on a performance review. - Ans - Qualitative Data is made up of a set of ______________, the individual units being measured. - Ans - Observations The ____________ is the middle number in a series that is arranged from smallest to largest. - Ans - median The ________________ is the most commonly occurring number in the dataset. - Ans - mode The fact that the _________ is not close to the ________ or the __________ tells us the distribution of scores are skewed. The scores are not evenly distributed around the ________. - Ans - average, median, mode, mean The _______ and the _________ inform a user of the central points in skew of the data.

  • Ans - mean, median If the mean and the median are fairly close to each other means we likely have some type of ________________________ - Ans - normal distribution ____________________ attempts to determine which future events are the most likely.
  • Ans - Predictive analytics

______________________ include mean a median max and men. - Ans - Descriptive methods Data sources usually occur within some combination of four different types. The types are


______________________________________ - Ans - structured, unstructured internal and external What does GDPR stand for? - Ans - European Union's General Data Protection Regulation ______________________ which is the actual things that you end up with _______________________, how are the decisions made _______________________, how is the decision communicated - Ans - distributive justice procedural justice interactional justice _______________________ is where the algorithm processes your data and makes a recommendation, or suggestion to you and you can either take it or leave it. - Ans - Recommendations __________________________ decision making is where advanced algorithms can make and even implement their own decisions, as with self-driving cars. - Ans - Human- in-the-Loop ________________ decision making - Many algorithmic decisions are made automatically, and even implemented automatically. But they're designed such that humans can at least understand what happened in them. Such as, for instance, with an online mortgage application. - Ans - Human-Accessible ______________________ decision making is when machines are talking to other machines. And the best example of this is the internet of things. And that can include things like Wearables. My smart watch talks to my phone, which talks to the internet, which talks to my car in sharing and processing data at each point. - Ans - Machine- Centric Electronic Communications Privacy Act, or the ECPA. This law was passed by congress in 1986, during the age of ____________________. - Ans - police wiretapping ________ analytics answers what might happen in the future - Ans - Predictive

___________ analytics attempts to answer the toughest question of all, what should we do going forward? - Ans - Prescriptive _________________ data is data that can be measured in numerical form. - Ans - Quantitative A _______________________________ means that the data are close together around the mean. - Ans - small standard deviation What is the 68, 95, 99 rule - Ans - 68% of the data are within one standard deviation, above or below the mean. 95% of the observations are within two standard deviations above or below the mean. And 99.7% of the observations are within three standard deviations above or below the mean. the output of research and business environment must not only drive value but be simple enough and easy enough to __________________________. - Ans - consume by the end users Who are the end users? - Ans - Business managers, executives and even customers who are using the outputs of the research. The three principle sources of U.S. law are: ________________________, ________________________, __________________________ - Ans - Common law or judge made law, statutory law and constitutional Law _________________ is law created by courts. In the U.S. and other former British colonies judge's have authority to actually create rules of law that determine individual and organizational rights and responsibilities. - Ans - Common Law ___________________ by contrast is law created by representative bodies such as the U.S. Congress. - Ans - Statutory law ___________________________ gives the government the authority to act and restricts that authority to ensure that the branches don't overstep their bounds or infringe unnecessarily on individual rights such as rights to fairness and equality. - Ans - Constitutional law Privacy is sourced in common law in what we call _______________. It's sourced in statutory law such as GINA and other legislation that protects financial, health and student privacy. And it is sourced in the U.S. Constitution's amendments. - Ans - privacy torts It is the traditional method for analyzing legal problems that arise in any context. The legal analysis framework we will be using in this course has four parts: ______________, ______________, ______________, ______________. - Ans - Issue, Rule, Application and Conclusion. (or IRAC)