
















































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An exam focused on developing strong analytical skills in Vantage. It tests the use of analytic functions, decision trees, path analysis, predictive scoring, and SQL-based statistical operations. Scenarios include customer segmentation, risk scoring, retention modelling, and real-time analytics execution.
Typology: Exams
1 / 88
This page cannot be seen from the preview
Don't miss anything!

















































































Question 1. Which Teradata Vantage function is used to replace NULL values with a specified default? A) COALESCE() B) NULLIF() C) ISNULL() D) REPLACE() Answer: A Explanation: COALESCE() returns the first non‑NULL expression in its argument list, making it ideal for substituting NULLs with a default value. Question 2. In a Teradata CREATE TABLE statement, which attribute improves query performance for large fact tables by reducing I/O on skewed data? A) PRIMARY INDEX (PI) on a high‑cardinality column B) UNIQUE PRIMARY INDEX (UPI) on a low‑cardinality column C) NO PRIMARY INDEX (NOPI) with a partitioned primary index D) PRIMARY INDEX on a column with a uniform distribution Answer: D Explanation: A primary index on a uniformly distributed column balances data across AMPs, minimizing skew and improving I/O performance. Question 3. Which statistical measure indicates the average distance of data points from the mean? A) Median B) Variance C) Standard deviation D) Interquartile range Answer: C
Explanation: Standard deviation is the square root of variance and quantifies the average spread of values around the mean. Question 4. When interpreting a ROC curve, what does a point closest to the top‑left corner represent? A) Highest false‑positive rate and lowest true‑positive rate B) Lowest false‑positive rate and highest true‑positive rate C) Equal false‑positive and true‑positive rates D) Random guessing performance Answer: B Explanation: The top‑left corner corresponds to high sensitivity (TPR) and low false‑positive rate (1‑Specificity), indicating optimal classifier performance. Question 5. Which Teradata function tokenizes a string into individual words for text mining? A) OREPLACE() B) REGEXP_SPLIT_TO_TABLE() C) SUBSTRING() D) CAST() Answer: B Explanation: REGEXP_SPLIT_TO_TABLE() splits a string based on a regular expression delimiter, commonly used for tokenization. Question 6. A data set contains outliers that distort the mean. Which aggregation function is most robust to outliers? A) AVG() B) SUM() C) MEDIAN()
C) SELECT CASE sales > 1000 ‘High’ ‘Low’ END D) CASE sales > 1000? ‘High’ : ‘Low’ Answer: A Explanation: The standard CASE expression in Teradata follows the pattern CASE WHEN condition THEN result ELSE result END. Question 10. Which EXPLAIN output attribute indicates that a query will perform a full table scan? A) Access Path = Primary Index B) Join Strategy = MERGE C) Table Scan = Yes D) Partition Elimination = No Answer: C Explanation: “Table Scan = Yes” signals that the optimizer will read the entire table rather than using an index. Question 11. In a correlation analysis, a Pearson coefficient of – 0.85 indicates: A) Strong positive linear relationship B) Weak negative linear relationship C) Strong negative linear relationship D) No linear relationship Answer: C Explanation: Coefficients near – 1 denote a strong inverse linear correlation. Question 12. Which Teradata window function assigns a unique sequential number to rows ordered by sales descending? A) RANK() OVER (ORDER BY sales DESC)
B) ROW_NUMBER() OVER (ORDER BY sales DESC) C) DENSE_RANK() OVER (ORDER BY sales DESC) D) NTILE(10) OVER (ORDER BY sales DESC) Answer: B Explanation: ROW_NUMBER() provides a unique sequential identifier without gaps. Question 13. What does a p‑value of 0.03 imply for a hypothesis test at a 5% significance level? A) Fail to reject the null hypothesis B) Accept the alternative hypothesis with 97% confidence C) Reject the null hypothesis; result is statistically significant D) The test is inconclusive Answer: C Explanation: Since 0.03 < 0.05, the null hypothesis is rejected, indicating statistical significance. Question 14. Which visualization is most appropriate to compare the proportion of market share among five product categories? A) Scatter plot B) Bar chart C) Pie chart D line chart Answer: C Explanation: Pie charts effectively display parts‑of‑whole percentages for a limited number of categories. Question 15. In a time‑series table, which attribute defines the interval at which data points are stored? A) Primary Index
Question 18. In a scatter plot, what does a tight clustering of points along a diagonal line suggest? A) High variance, low correlation B) Strong linear correlation C) No relationship between variables D) Presence of outliers only Answer: B Explanation: Points aligning closely to a line indicate a strong linear relationship between the two variables. Question 19. Which statistical test is appropriate for comparing the means of two independent groups? A) Chi‑square test B) Paired t‑test C) Independent samples t‑test D) ANOVA Answer: C Explanation: The independent samples t‑test evaluates mean differences between two unrelated groups. Question 20. What does the GINI coefficient represent in a binary classification model? A) Ratio of true positives to false positives B) Area between the ROC curve and the diagonal line C) Difference between sensitivity and specificity D) Probability of random guessing Answer: B
Explanation: GINI = 2AUC – 1; it measures the separation between the ROC curve and the line of no‑skill. Question 21. Which clause should be used to remove duplicate rows from a SELECT result in Teradata? A) DISTINCT B) UNIQUE C) GROUP BY ALL D) HAVING COUNT() = 1 Answer: A Explanation: DISTINCT eliminates duplicate rows in the result set. Question 22. In a sessionization analysis, what does the TIMEOUT parameter control? A) Maximum number of sessions per user B) Inactivity period that ends a session C) Length of each session in minutes D) Number of parallel session threads Answer: B Explanation: TIMEOUT defines the idle time after which a new session is started for the same user. Question 23. Which of the following best describes a “high‑cardinality” column? A) Contains only two distinct values B) Has many distinct values relative to row count C) Stores numeric data only D) Is always a primary key Answer: B
Explanation: Histograms display frequency counts across intervals, revealing the shape of a distribution. Question 27. In Teradata, which keyword is used to create a temporary table that persists only for the session? A) VOLATILE B) TEMPORARY C) GLOBAL TEMPORARY D) SESSION ONLY Answer: A Explanation: VOLATILE tables are session‑specific and automatically dropped when the session ends. Question 28. Which metric is defined as TP / (TP + FP) in binary classification? A) Sensitivity B) Specificity C) Precision D) Recall Answer: C Explanation: Precision measures the proportion of positive predictions that are correct. Question 29. Which SQL construct can be used to filter rows after a window function calculation? A) WHERE B) HAVING C) QUALIFY D) GROUP BY
Answer: C Explanation: QUALIFY applies predicates to the results of window functions, similar to WHERE for regular rows. Question 30. In a time‑series aggregation, which function computes a moving average over the previous 7 days? A) AVG(value) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) B) SUM(value) OVER (PARTITION BY date) C) MAX(value) OVER (RANGE UNBOUNDED PRECEDING) D) COUNT(*) OVER (ORDER BY date) Answer: A Explanation: The ROWS BETWEEN clause defines a sliding window of the current row and six preceding rows, yielding a 7‑day moving average. Question 31. Which of the following is a consequence of not scaling features before training a k‑Nearest Neighbors model? A) Faster model training B) Increased model interpretability C) Distance calculations become biased toward larger‑scale features D) No impact on model performance Answer: C Explanation: k‑NN relies on Euclidean distance; unscaled features with larger ranges dominate the distance metric. Question 32. Which Teradata function extracts the year part from a DATE column? A) EXTRACT(YEAR FROM date_col) B) YEAR(date_col)
Answer: B Explanation: PARTITION BY specifies the column(s) used to segment the table into partitions for parallel processing. Question 36. When using the SENTENCE tokenization function, which option preserves punctuation as separate tokens? A) KEEP_PUNCTUATION = FALSE B) SPLIT_ON_WHITESPACE = TRUE C) INCLUDE_DELIMITERS = TRUE D) REMOVE_STOPWORDS = FALSE Answer: C Explanation: INCLUDE_DELIMITERS = TRUE tells the tokenizer to treat punctuation characters as individual tokens. Question 37. Which of the following best describes a “false positive” in a fraud detection model? A) A legitimate transaction incorrectly flagged as fraud B) A fraudulent transaction missed by the model C) A transaction correctly identified as fraud D) A transaction correctly identified as legitimate Answer: A Explanation: False positives are benign cases mistakenly classified as the target (fraud).
Question 38. In Teradata, which statement is used to change the data type of an existing column? A) ALTER TABLE … MODIFY COLUMN … B) UPDATE TABLE … SET column = CAST(column AS new_type) C) ALTER COLUMN … TYPE … D) REDEFINE TABLE … Answer: A Explanation: ALTER TABLE … MODIFY COLUMN allows changing a column’s definition, including its data type. Question 39. Which visualization type is most effective for showing the relationship between three variables: two numeric and one categorical? A) 3‑D scatter plot B) Grouped bar chart C) Bubble chart D) Heat map Answer: C Explanation: Bubble charts map two numeric axes to X/Y and use bubble size (or color) to represent the third variable, often a categorical grouping. Question 40. Which Teradata function can be used to calculate the median of a numeric column without a GROUP BY? A) MEDIAN(column) OVER () B) PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY column) C) APPROX_MEDIAN(column) D) SELECT column FROM table ORDER BY column FETCH FIRST 1 ROW ONLY Answer: B
D) SPLIT_PART(email, '@', 2) Answer: B Explanation: REGEXP_SUBSTR with a pattern captures the substring after '@', returning the domain. Question 44. In a time‑series model, what does seasonality refer to? A) Random noise in the data B) Trend component over time C) Repeating patterns at regular intervals D) Data gaps due to missing timestamps Answer: C Explanation: Seasonality is a systematic, periodic fluctuation that recurs at known intervals (e.g., monthly, yearly). Question 45. Which of the following is a disadvantage of using a bar chart to display time‑series data? A) Bars cannot be stacked B) Time axis is not continuous, leading to visual gaps C) Colors cannot be varied D) Bar charts cannot show negative values Answer: B Explanation: Bar charts treat time points as discrete categories, which may obscure continuity compared to line charts. Question 46. Which Teradata function removes leading and trailing spaces from a string? A) TRIM() B) LTRIM()
Answer: A Explanation: TRIM() eliminates both leading and trailing whitespace characters. Question 47. In a classification report, which metric combines precision and recall into a single value? A) F1‑Score B) Accuracy C) Specificity D) Gini Answer: A Explanation: F1‑Score is the harmonic mean of precision and recall, balancing both concerns. Question 48. Which SQL construct allows you to retrieve the top 5 customers by total sales using a window function? A) SELECT TOP 5 * FROM (SELECT customer, SUM(sales) AS total FROM orders GROUP BY customer) ORDER BY total DESC B) SELECT customer, SUM(sales) FROM orders GROUP BY customer QUALIFY ROW_NUMBER() OVER (ORDER BY SUM(sales) DESC) <= 5 C) SELECT customer, SUM(sales) FROM orders GROUP BY customer HAVING ROW_NUMBER() <= 5 D) SELECT * FROM orders WHERE RANK() <= 5 Answer: B Explanation: QUALIFY filters the result of the window function ROW_NUMBER() after aggregation, returning the top 5.
Explanation: The window clause with UNBOUNDED PRECEDING computes a running total. Question 52. What does the term “data skew” refer to in a distributed database environment? A) Uneven distribution of rows across AMPs B) Missing values in a column C) Duplicate primary keys D) Incorrect data types Answer: A Explanation: Skew occurs when some AMPs hold significantly more rows than others, causing performance bottlenecks. Question 53. Which statistical test evaluates the association between two categorical variables? A) T‑test B) ANOVA C) Chi‑square test of independence D) Pearson correlation Answer: C Explanation: The chi‑square test assesses whether observed frequencies differ from expected frequencies under independence. Question 54. In Teradata, which option enables automatic compression for a table? A) COMPRESS = ON B) BLOCKSIZE = 8192 C) COMPRESS = ZLIB D) USING COMPRESSION = AUTO Answer: C
Explanation: COMPRESS = ZLIB (or other algorithms) activates column‑level compression for eligible data types. Question 55. Which visualization best conveys the change in market share for multiple products over several years? A) Stacked area chart B) Pie chart C) Histogram D) Scatter plot Answer: A Explanation: Stacked area charts show cumulative totals and individual contributions over time, ideal for market‑share trends. Question 56. Which function would you use to replace all occurrences of the word “error” with “issue” in a column? A) REPLACE(column, ‘error’, ‘issue’) B) OREPLACE(column, ‘error’, ‘issue’) C) TRANSLATE(column, ‘error’, ‘issue’) D) SUBSTR(column, ‘error’, ‘issue’) Answer: B Explanation: OREPLACE performs a case‑sensitive replacement of a substring within a string. Question 57. When performing a binary classification, which metric is most affected by class imbalance? A) Accuracy B) AUC C) F1‑Score