












Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A set of revision questions for a data analytics exam 1, specifically tailored for the year 2025. It covers key concepts such as relational databases, data evaluation approaches (supervised and unsupervised), data analytics techniques (clustering, classification, similarity matching, link prediction), and the etl process. The questions also delve into data quality, data manipulation, and data visualization, offering a comprehensive review of essential data analytics principles and practices. It also includes questions about statistical data analysis, data scrubbing and data preparation.
Typology: Exams
1 / 20
This page cannot be seen from the preview
Don't miss anything!













Which of the following best describes the purpose of a non-key attribute? - to provide business information Which of the following best describes the purpose of relational database? - to support business processes across the organization Which of the following best describes the purpose of a foreign key? - to create the relationship between two tables A data dictionary is paramount in helping database administrators do which of the following? - maintain databases Which of the following best describes an unsupervised approach to the evaluation of data? - data exploration looking for potential patterns of interest Which of the following best describes a supervised approach to the evaluation of data? - data exploration to examine the relationships between variables that are hypothesized to exist Retail stores often request customers' zip codes at the end of a sales transaction. This is an example of which data approach? - clustering ______ refers to data that are stored in a database or spreadsheet that is readily searchable - structured data
Big data is often described by the three vs, or - volume, velocity, and variety Which approach to data analytics attempts to assign each unit in a population into a small set of classes (or groups) where the unit best fits? - classification Which approach to data analytics attempts to identify similar individuals based on data known about them? - similarity matching Which approach to data analytics attempts to predict relationship between two data items? - link prediction Which of these terms is defined as being a central repository of descriptions for all of the data attributes of the dataset? - data dictionary Which skills were not emphasized that analytics-minded accountants should have? - classification of test approaches Which skills were not emphasized that analytic-minded accountants should have?
As mentioned in the chapter, which of the following is not a common way that data will need to be cleaned after extraction and validation? - clean up trailing zeros Why is supplier id considered to be a primary key for a supplier table? - it contains a unique identifier for each supplier What are attributes that exist in a relational database that are neither primary nor foreign keys? - descriptive attributes Which of these is not included in the five steps of the etl process? - learn what data is available in the data warehouse ________ is a set of data used to assess the degree and strength of a predicted relationship. - test data Data that are organized and reside in a fixed field with a record or a file are generally contained in a relational database or spreadsheet and are readily searchable by search algorithms. The term matching is definition is: - structured data An observation about the frequency of leading digits in many real0life sets of numerical data is called: - benford's law Which approach to data analytics attempts to predict a relationship between two data items? - link prediction
In general, the more complex the model, the greater the chance of: - overfitting the data In general, the simpler the model, the greater the chance of: - underfitting the data _______ is a discriminating classier that is defined by a separating hyperplane that works first to find the widest margin (or biggest pipe) and then works to find the middle line. - support vector machines ________ mark the split between one class and another - decision boundaries Models associated with regression and classification data approaches have all except this important part: - test data. Which approach to data analytics attempts to assign each unit in a population into a small set of classes where the unit belongs? - classification Which of the following best describes the goal of descriptive data analysis: - perform basic analysis to understand the quality of the underlying data and its ability to address the business question
Which approach to data analytics attempts to characterize the typical behavior of an individual, group or population by generating summary statistics about the data? - profiling Which of the following best describes the profiling approach to data analytics? - an attempt to characterize the typical behavior of an individual, group, or population by generating summary statistics about the data Mastering the data can also be described via the etl process. The etl process stands for: - extract, transform, and load data When using [employeeid] as the unique identifier of the employee table, [employeeid] is an example of which of the following: - primary key All of the following are included in the five steps of the etl process except: - scrub the data Removing headings or subtotals from data is an example of which of the following? - cleaning the data Comparing descriptive statistics for numeric fields within the data is an example of which of the following? - validating the data for completeness Comparing the number of records within the data is an example of which of the following? - validating the data for completeness
Regression analysis typically involves the following steps except: - set boundaries or thresholds When working with a predictive model, underfitting the data is most likely caused by _____ - an overly simple model In general, the more complex the model, the greater the chance of _____ - overfitting the data Which of the following best describes an independent variable? - input _____ are existing data that have been manually evaluated and assigned a class and ____ are existing data used to evaluate the model. - training data; test data ______ mark the split between one class and another - decision boundaries Red, yellow, and blue would be best described as an example of: - nominal data Letter grades of a,b, and c would be best described as an example of: - ordinal data
Xbrl is a global standard for exchanging financial reporting information that uses xml. - true When considering a question such as "do our customers form natural groups based on similar attributes?" you would use an unsupervised approach - true Benford's law is an absolute and all data must conform - false All of the following are examples of a supervised approach to evaluation data except: - data reduction Diagnostic analytics include all of the following except: - all of the choices are correct (similarity matching, clustering, profiling) ____ states that are in many naturally occurring collections of numbers, the leading significant digit is likely to be small - benford's law Unaware of data analysis tools available to the internal auditors, a store employee frequently processes cash returns without a receipt for $99, which is just below the amount requiring manager approval of $100. An analysis using which of the following would likely (and quickly) identify the employee's fraudulent behavior - benfords law Which of the following best describes a dependent variable? - output
What is data analytics? - the process of evaluating data with the purpose of drawing conclusions to address business questions. What is big data? - refers to datasets that are too large and complex for businesses' existing systems to handle utilizing their traditional capabilities to capture, store, manage, and analyze these datasets. How does data analytics affect business? - companies use it to discover the various buying patterns of their customers, investigate anomalies that were not anticipated, and forecast future possibilities How does data analytics effect auditing? - technology will enhance the quality, transparency, and accuracy of the audit. Due to big data, audits will not only yield important findings from a financial perspective, but also information that can help companies refine processes, improve effciency, and anticipate future problems. How does data analytics effect financial reporting? - improve the quality of the estimates and valuations How does data analytics affect tax? - by enhancing tax planning strategies because it helps tax staffs to predict what will happen rather than reacting to what just happen.
What is an example of regression? - given a balance of total accounts receivable held by a firm, what is the appropriate level of allowance for doubtful accounts for bad debts? What is an example of similarity matching? - an attempt to identify seller and customer fraud based on various characteristics known about them to see if they were similar to known fraud cases What is an example of clustering? - segment a customer into a small number of groups for additional analysis and marketing activities What is an example of co-occurence grouping? - amazon might use this to sell another item to you by knowing what items are "frequently bought together" or "customers who bought this item also bought" What is an example of profiling? - in accounting to identify fraud or just those transactions that might warrant some additional investigation (ex, travel expenses that are three standard deviations above the norm) What is an example of link prediction? - an individual might have 22 mutual facebook friends with me and we both attended byu, is there a chance we would like to be facebook friends too? What is an example of data reduction? - new ways to highlight which transactions do not need the same level of vetting as other transactions
What are the seven skills that analytic-minded accountants should have: - develop an analytics mindset, data scrubbing and preparation, data quality, descriptive data analysis, data analysis through data manipulation, define and address problems through statistical data analysis, and data visualization and data reporting What are the five steps in the etl process? - determining the purpose and scope of the data request, obtaining the data, validating the data for completeness and integrity, cleaning the data, and loading the data for analysis What does sql stand for? - structured query language What is sql used for? - focuses on extracting data, we can combine data from one or more tables and organize the data in a way that is more useful for data analysis than the way the data is stored in the relational database What does rbdms stand for? - risk based data management system What is uml? - unified modeling language. This is a graphical notation specifically for drawing diagrams of an object-oriented system. What is the benefit of using a relational database? - we do not have to export, validate, and sanitize the data every time we need to analyze the information
What are the four main categories of data analytics? - descriptive analytics, diagnostic analytics, predictive analytics, and prescriptive analytics What is descriptive analytics? - to understand what happen What is diagnostic analytics? - to understand why it happened What is predictive analytics? - to estimate a future value or category What is prescriptive analytics? - to make recommendations for a course of action What approaches can descriptive analytics use? - summary statistics and data deduction or filtering What approaches can diagnostic analytics use? - profiling, clustering, similarity matching and co-occurence grouping What approachs can predictive analytics use? - regression, classification, and link prediction What approaches can prescriptive analytics use? - decision support systems and machine learning and artificial intelligence
What is casual modeling? - a data approach similar to regression, but used when the relationship between independent and dependent variables where it is hypothesized that the independent variables cause or are associated with the dependent variable. What is fuzzy matching? - process that finds matches that may be less than 100 percent matching by finding correspondences between portions of the text or other entries What is xbrl? - xbrl stands for extensive business reporting language and is a type of xml (extensible markup language) used for organizing and defining financial elements What is structured data? - data that are organized and reside in a fixed field with a record or a file. Such data are generally contained in a relational database or spreadsheet and are readily searchable by search algorithms What is qualitative data? - categorical data, all you can do with these data are count and group, and in some cases you can rank the data. Qualitative data can be further defined in two ways: nominal data or ordinal data. There are not as many options for charting qualitative data because they are not as sophisticated as quantitative data What is quantitative data? - more complex than qualitative data. Quantitative data can be further defined in two ways: interval and ratio. In all quantitative data, the intervals between data points are meaningful, allowing the
What is interval data? - the third most sophisticated type of data on the scale of the nominal, ordinal, interval, and ratio; a type of quantitative data. Interval data can be counted and grouped like qualitative data and the differences between each data point are meaningful. However, interval data do not have a meaningful zero. What is declarative visualizations? - made when the aim of your project is to "declare" or sent your findings to an audience. Charts that are declarative are typically made after data analysis has been completed and are meant to exhibit what was found in the analysis steps. What is exploratory visualizations? - made when the lines between steps p, a, and c are not as clearly divided as they are in declarative visualization project. Often when you are exploring the data with visualizations, you are performing the test plan directly in visualization software such as tableau instead of creating the chart after the analysis has been done. In the late 1960s, ed altman developed a model to predict if a company was at severe risk of going bankrupt. He called his statistic altman's z-score, now a widely used score in finance. Based on the name of the statistic, which statistical distribution would you guess this came from? - standardized normal distribution What is the most appropriate chart when showing a relationship between two variables according to exhibit 4-8? - scatter data _______ data would be considered the least sophisticated type of data - nominal
Which of the following is not a typical example of nominal data? - sat scores Exhibit 4-8 gives chart suggestions for what data you'd like to portray. Those options include all of the following except - normalization Line charts are not recommended for what type of data? - qualitative data ________ data would be considered the most sophisticated type of data - ratio Gold, silver, and bronze medals would be an example of - ordinal Justin zobel suggests that revising your writing requires you to "be egoless - ready to dislike anything you have previously written" suggesting that it is ______ you need to please - the reader