COMPTIA DATA+ CERTIFICATION EVALUATION EXAM Q&A: 2026 STUDY GUIDE 100% CORRECT, Exams of Database Management Systems (DBMS)

COMPTIA DATA+ CERTIFICATION EVALUATION EXAM Q&A: 2026 STUDY GUIDE 100% CORRECT

Typology: Exams

2025/2026

Available from 12/21/2025

FocusFile7
FocusFile7 🇺🇸

4

(8)

27K documents

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
COMPTIA DATA+ CERTIFICATION
EVALUATION EXAM Q&A: 2026 STUDY GUIDE
100% CORRECT
HTML. Answer: Formatting tags in <>
Ident & Auth. Answer: Separate Steps: Claim & Prove
JSON. Answer: Key Value Pairs
- { "name" : "MikeMcMac1" }
- between curly braces
Measures & Dimensions. Answer: Measures - Data observed &
explored
Dimensions - Space & Time segment data
Non-Relational Databases. Answer: Developed to handle large sets
of data that are not easily organized into tables, columns, and rows
Efficient by reducing overhead via Key/Value Pairs
Record Subsets & Temp Table. Answer: Record Subsets - few
records returned from larger dataset, for less waiting during command
creating/testing. Ex: First 100
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download COMPTIA DATA+ CERTIFICATION EVALUATION EXAM Q&A: 2026 STUDY GUIDE 100% CORRECT and more Exams Database Management Systems (DBMS) in PDF only on Docsity!

COMPTIA DATA+ CERTIFICATION

EVALUATION EXAM Q&A: 2026 STUDY GUIDE

100% CORRECT

◍ HTML. Answer: Formatting tags in <> ◍ Ident & Auth. Answer: Separate Steps: Claim & Prove ◍ JSON. Answer: Key Value Pairs

  • { "name" : "MikeMcMac1" }
  • between curly braces ◍ Measures & Dimensions. Answer: Measures - Data observed & explored Dimensions - Space & Time segment data ◍ Non-Relational Databases. Answer: Developed to handle large sets of data that are not easily organized into tables, columns, and rows Efficient by reducing overhead via Key/Value Pairs ◍ Record Subsets & Temp Table. Answer: Record Subsets - few records returned from larger dataset, for less waiting during command creating/testing. Ex: First 100

Temp Table - table not kept outside of session, for running stats on it during that session ◍ Query Execution Plan. Answer: declarative - what to do: sequence of steps for how to run query, with speed varying based on optimizing via human or DBMS ◍ Correlation Coefficient. Answer: a statistical index of the relationship between two things (from - 1 to +1), computed via statistical apps ◍ Sample. Answer: random subset of population that is representative of population ◍ Sample Standard Deviation. Answer: sum of [ (sample value minus sample average) squared] divided by numbers of samples less one. used as approximation of population standard deviation ◍ T-Test. Answer: Compares mean values of a continuous variable between 2 categories/groups. ◍ Chi-Square Test. Answer: hypothesis testing method for whether your data is as expected. If you have a single measurement variable, you use a Chi-square goodness of fit test. If you have two measurement variables, you use a Chi-square test of independence.

◍ Extract, load, transform (ELT). Answer: An alternative to ETL used with data lakes, where the data is not transformed on entry to the data lake, but stored in its original raw format ◍ Delta Load (incremental load). Answer: delta between target and source data is dumped at regular intervals. The last extract date is stored so that only records added after this date are loaded. Incremental loads come in two flavors that vary based on the volume of data you're loading: Streaming incremental load - better for loading small data volumes Batch incremental load - better for loading large data volumes understand the time available for performing delta loads into your data warehouse. Regardless of how long your batch window is, think carefully about moving current data into the data warehouse without losing history. ◍ Non-Parametric Data. Answer: Data that does not fit a known or well-understood distribution Usually ordinal or interval data For real-valued data, nonparametric statistical methods are required in applied machine learning when you are trying to make claims on data that does not fit the familiar Gaussian distribution. ◍ Normalize Ratings Data. Answer: normalization of ratings means adjusting values measured on different scales to a notionally common scale, often prior to averaging

◍ Descriptive Statistical Methods. Answer: • Measures of central tendency

  • Measures of dispersion
  • Frequencies/percentages
  • Percent change
  • Percent difference
  • Confidence interval ◍ Inferential Statistical Methods. Answer: provide predictions about characteristics of a population based on information in a sample from that population ◍ Measures of Central Tendency. Answer: mean, median, mode ◍ Measures of Dispersion (variability). Answer: range (max/min), distribution, variance, standard deviation ◍ Percent Change. Answer: Percentage change equals the change in value divided by the absolute value of the original value, multiplied by 100. ◍ Percent Difference. Answer: Percentage difference equals the absolute value of the change in value, divided by the average of the 2 numbers, all multiplied by 100. We then append the percent sign, %, to designate the % difference.

◍ P-Value. Answer: The probability of results of the experiment being attributed to chance. The p value is the evidence against a null hypothesis. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis. When the p-value is less than or equal to 0.05, you should reject the null hypothesis. Alternatively, when the p-value is greater than 0.05, you should retain the null hypothesis as there is not enough statistical evidence to accept the alternative hypothesis. Remember this saying: "When the p is low, the null must go. When the p is high, the null must fly!" ◍ Hypothesis Testing. Answer: a decision-making process for evaluating claims about a population When hypothesis testing, the null and alternative hypothesis describe the effect in terms of the total population. To perform the hypothesis test itself, you need sample data to make inferences about characteristics of the overall population. ◍ Type I error (alpha). Answer: false positive Type I error: is the incorrect rejection of the null hypothesis maximum probability is set in advance as alpha

is not affected by sample size as it is set in advance increases with the number of tests or end points (i.e. do 20 rejections of H0 and 1 is likely to be wrongly significant for alpha = 0.05) ◍ Type II error (beta). Answer: false negative Type II error: is the incorrect acceptance of the null hypothesis probability is beta beta depends upon sample size and alpha can't be estimated except as a function of the true population effect beta gets smaller as the sample size gets larger beta gets smaller as the number of tests or end points increases ◍ Simple Linear Regression. Answer: linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. ◍ Correlation. Answer: A measure of the extent to which two factors vary together, and thus of how well either factor predicts the other. ◍ Trend Analysis. Answer: hypothetical extension of a past series of events into the future

◍ Report Cover Page. Answer: - Instructions

  • Summary
  • Observations and insights Why It Matters, How Did Study, What Analyst Recommends ◍ Corporate Reporting Standards (Style Guide). Answer: - Branding
  • Color codes
  • Logos/trademarks
  • Watermark ◍ Documentation Elements. Answer: Version number - Reference data sources - Reference dates - Report run date - Data refresh date ◍ Data Sources and Attributes. Answer: - Field definitions
  • Dimensions
  • Measures ◍ Delivery Considerations. Answer: - Subscription
  • Scheduled delivery
  • Interactive (drill down/roll up)
  • Saved searches
  • Filtering
  • Static
  • Web interface
  • Dashboard optimization
  • Access permissions ◍ Bubble Chart. Answer: A type of scatter plot with circular symbols used to compare three variables; the area of the circle indicates the value of a third variable ◍ Histogram. Answer: a bar graph depicting a frequency distribution ◍ Waterfall Chart. Answer: helps in understanding the cumulative effect of sequentially introduced positive or negative values. These intermediate values can either be time based or category based. The waterfall chart is also known as a flying bricks chart or Mario chart due to the apparent suspension of columns (bricks) in mid-air. ◍ Heat Map. Answer: A two-dimensional graphical representation of data that uses different shades of color to indicate magnitude ◍ Tree Map Chart. Answer: treemapping is a method for displaying hierarchical data using nested figures, usually rectangles. ◍ Stacked Chart. Answer: extends the standard bar chart from looking at numeric values across one categorical variable to two. Each bar in a standard bar chart is divided into a number of sub-bars

Consistency - reliability of an attribute. Data consistency typically comes into play in large organizations that store the same data in multiple systems. Considering data consistency is especially important when designing a data warehouse as it sources data from multiple systems. Integrity/Validity - indicates whether or not an attribute's value is within an expected range. One way to ensure data validity is to enforce referential integrity in the database. Uniqueness - describes whether or not a data attribute exists in multiple places within your organization. Closely related to data consistency, the more unique your data is, the less you have to worry about ◍ Data Quality Validation Methods. Answer: - Cross-validation

  • Sample/spot check
  • Reasonable expectations
  • Data profiling
  • Data audits ◍ Cardinality. Answer: the relationship between two entities, showing how many instances of one entity relate to instances in another entity. You specify cardinality in an ERD with various line endings. The first component of the terminator indicates whether the relationship between two entities is optional or required. The second component indicates whether an entity instance in the first table is associated with a single entity instance in the related table or if an association can exist with multiple entity instances.

◍ Schema. Answer: an ERD with the additional details needed to create a database. ◍ Column-family databases. Answer: use an index to identify data in groups of related columns. optimize performance when you need to examine the contents of a column across many rows. ◍ Graph databases. Answer: specialize in exploring relationships between pieces of data. Graph models map relationships between actual pieces of data. Graphs are an optimal choice if you need to create a recommendation engine, as graphs excel at exploring relationships between data. ◍ Normalization Process. Answer: Objective is to ensure that each table conforms to the concept of well-formed relations

  • Each table represents a single subject
  • No data item will be unnecessarily stored in more than one table
  • All NONPRIME attributes in a table are dependent on the primary key
  • Each table is void of insertion, update, and deletion anomalies

design complex surveys without worrying about building a database. Qualtrics is a powerful tool for developing and administering surveys. What makes Qualtrics so compelling is its API, which you can use to integrate survey response data into a data warehouse for additional analysis. ◍ Data manipulation in SQL. Answer: CRUD (Create INSERT, Read SELECT, Update, Delete) ◍ Common SQL aggregate functions. Answer: COUNT MIN MAX AVG SUM STDDEV ◍ Parametrization. Answer: using variables in query ◍ DB Index. Answer: database index can point to a single column or multiple columns. When running queries on large tables, it is ideal if all of the columns you are retrieving exist in the index. If that is not feasible, you at least want the first column in your SELECT statement to be covered by an index. ◍ Star and Snowflake Schema. Answer: analytical databases prioritize reading data and follow a denormalized approach. The star and snowflake schema designs are two approaches to structuring data for analysis. Both methods implement dimensional modeling, which organizes quantitative data into facts and qualitative data into dimensions.

◍ Historical Analysis. Answer: Although an effective date approach is valid, the SQL queries to retrieve a value at a specific point in time are complex. A table design that adds start date and end date columns allows for more straightforward queries. Enhancing the design with a current flag column makes analytical queries even easier to write. ◍ Specification Mismatch. Answer: occurs when an individual component's characteristics are beyond the range of acceptable values. OR specification mismatch occurs when data doesn't conform to its destination data type. For example, you might be loading data from a file into a database. If the destination column is numeric and you have text data, you'll end up with a specification mismatch. To resolve this mismatch, you must validate that the inbound data consistently map to its target data type. ◍ Recoding Data. Answer: technique you can use to map original values for a variable into new values to facilitate analysis. Recoding groups data into multiple categories, creating a categorical variable. A categorical variable is either nominal or ordinal. Nominal variables are any variable with two or more categories where there is no natural order of the categories, like hair color or eye color. Ordinal variables are categories with an inherent rank. ◍ Derived Variable. Answer: new variable resulting from a calculation on an existing variable.

◍ Reduction. Answer: process of shrinking an extensive dataset without negatively impacting its analytical value. There are a variety of reduction techniques from which you can choose. Selecting a method depends on the type of data you have and what you are trying to analyze. Dimensionality reduction and numerosity reduction are two techniques for data reduction. dimensionality reduction removes attributes from a dataset. numerosity reduction reduces the overall volume of data. ◍ Aggregation. Answer: summarization of raw data for analysis. OR also a means of controlling privacy. ◍ Transposition. Answer: Transposing data is when you want to turn rows into columns or columns into rows to facilitate analysis. ◍ Data Profiling. Answer: statistical measures to check for data discrepancies, including values that are missing, that occur either infrequently or too frequently, or that should be eliminated. Profiling can also identify irregular patterns within your data. ◍ Data Audits. Answer: look at your data and help you understand whether or not you have the data you need to operate your business. Data audits use data profiling techniques and can help identify data integrity and security issues.

◍ Cross-Validation. Answer: statistical technique that evaluates how well predictive models perform. Cross-validation works by dividing data into two subsets. The first subset is the training set, and the second is the testing, or validation, set. You use data from the training set to build a predictive model. You then cross-validate the model using the testing subset to determine how accurate the prediction is. Cross-validation is also helpful in identifying data sampling issues. Cross-validation can help identify sampling bias since predictions using biased data are inaccurate. ◍ Skewed Distribution. Answer: has an asymmetrical shape, with a single peak and a long tail on one side. Skewed distributions have either a right (positive) or left (negative) skew. When the skew is to the right, the mean is typically greater than the median. On the other hand, a distribution with a left skew typically has a mean less than the median. ◍ Bimodal Distribution. Answer: has two distinct modes, whereas a multimodal distribution has multiple distinct modes. When you visualize a bimodal distribution, you see two separate peaks. Suppose you are analyzing the number of customers at a restaurant over time. You would expect to see a large numbers of customers at lunch and dinner. ◍ Standard Normal Distribution (Z-distribution). Answer: a special normal distribution with a mean of 0 and a standard deviation of 1. You can standardize any normal distribution by converting its values into - scores. Converting to the standard normal lets you compare normal distributions with different means and standard deviations.