Teradata Vantage Vantage Data Science Practice Exam, Exams of Technology

Evaluates deep knowledge of Vantage data science workflows, including model training, A/B testing, algorithmic selection, supervised and unsupervised methods, and in-database scoring pipelines. Candidates must demonstrate proficiency in scaling ML workloads and optimizing model performance within Vantage's architecture.

Typology: Exams

2025/2026

Available from 01/06/2026

shilpi-jain-1
shilpi-jain-1 🇮🇳

4.2

(5)

29K documents

1 / 87

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Teradata Vantage Vantage Data Science Practice
Exam
**Question 1.** In a Teradata Vantage ETL workflow, which stage is primarily responsible for
applying business rules to raw data?
A) Extraction
B) Transformation
C) Loading
D) Validation
Answer: B
Explanation: The transformation stage cleanses, integrates, and applies business logic to the
extracted data before loading it into the target schema.
**Question 2.** Which Vantage utility is most efficient for bulk loading terabytes of data from
flat files into a table?
A) BTEQ
B) FastLoad
C) TPump
D) MultiLoad
Answer: B
Explanation: FastLoad is optimized for highspeed loading of large, empty tables and bypasses
transaction logging.
**Question 3.** When encountering duplicate rows during data cleaning, which SQL clause
should you use to keep only the first occurrence based on a timestamp column?
A) GROUP BY
B) QUALIFY ROW_NUMBER() OVER (PARTITION BY … ORDER BY …) = 1
C) DISTINCT
D) HAVING COUNT(*) = 1
Answer: B
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57

Partial preview of the text

Download Teradata Vantage Vantage Data Science Practice Exam and more Exams Technology in PDF only on Docsity!

Exam

Question 1. In a Teradata Vantage ETL workflow, which stage is primarily responsible for applying business rules to raw data? A) Extraction B) Transformation C) Loading D) Validation Answer: B Explanation: The transformation stage cleanses, integrates, and applies business logic to the extracted data before loading it into the target schema. Question 2. Which Vantage utility is most efficient for bulk loading terabytes of data from flat files into a table? A) BTEQ B) FastLoad C) TPump D) MultiLoad Answer: B Explanation: FastLoad is optimized for high‑speed loading of large, empty tables and bypasses transaction logging. Question 3. When encountering duplicate rows during data cleaning, which SQL clause should you use to keep only the first occurrence based on a timestamp column? A) GROUP BY B) QUALIFY ROW_NUMBER() OVER (PARTITION BY … ORDER BY …) = 1 C) DISTINCT D) HAVING COUNT(*) = 1 Answer: B

Exam

Explanation: ROW_NUMBER with QUALIFY filters keeps the earliest row per duplicate group. Question 4. Which function can be used to replace NULL numeric values with the column’s median in a single SQL statement? A) COALESCE B) NULLIFZERO C) MEDIAN_IGNORE_NULLS D) REPLACE_NULL_WITH_MEDIAN (user‑defined) Answer: C Explanation: MEDIAN_IGNORE_NULLS computes the median while ignoring NULLs, which can be used in an UPDATE. Question 5. A dataset shows a heavily right‑skewed distribution. Which sampling technique best preserves the distribution when creating a training set? A) Simple random sampling B) Systematic sampling C) Stratified sampling based on quantiles D) Cluster sampling Answer: C Explanation: Stratifying by quantiles ensures each segment of the skewed distribution is proportionally represented. Question 6. Which Vantage function returns the first four statistical moments (mean, variance, skewness, kurtosis) for a numeric column? A) UNIVARIATE_STATISTICS B) MOMENTS_TABLE C) DESCRIPTIVE_STATS

Exam

B) Gaussian Mixture Model (GMM) C) Canopy D) DBSCAN Answer: C Explanation: Canopy uses a loose distance threshold to form overlapping “canopies” before finer clustering. Question 10. In graph analytics, which centrality measure quantifies the number of shortest paths that pass through a node? A) Degree Centrality B) Betweenness Centrality C) Closeness Centrality D) Eigenvector Centrality Answer: B Explanation: Betweenness centrality counts how often a node lies on shortest paths between other nodes. Question 11. The PageRank algorithm in Vantage is most appropriate for which type of analysis? A) Customer segmentation B) Ranking web pages or nodes by importance C) Time‑series forecasting D) Spatial clustering Answer: B Explanation: PageRank computes a probability distribution over nodes, reflecting their relative importance.

Exam

Question 12. Which boosting technique is designed for binary classification and adjusts instance weights after each weak learner? A) Gradient Boosting B) AdaBoost C) XGBoost D) LightGBM Answer: B Explanation: AdaBoost updates weights to focus subsequent learners on previously mis‑classified instances. Question 13. Linear regression in Vantage (LAR) differs from Principal Component Analysis (PCA) primarily in that LAR: A) Reduces dimensionality without regard to the response variable B) Models the relationship between predictors and a continuous target C) Is an unsupervised technique D) Uses eigenvalue decomposition Answer: B Explanation: LAR is a supervised method that predicts a numeric target, whereas PCA is unsupervised dimensionality reduction. Question 14. In the CFilter function, which metric indicates the strength of a filter’s ability to separate two classes? A) Gini impurity B) Information Gain C) Chi‑square statistic D) Correlation coefficient Answer: B

Exam

D) Filtering out punctuation Answer: B Explanation: POS tagging assigns grammatical categories, enabling syntactic analysis such as noun‑verb relationships. Question 18. Lemmatization differs from stemming because it: A) Always produces shorter words B) Uses a dictionary to return the base form of a word C) Is language‑agnostic D) Removes all suffixes regardless of meaning Answer: B Explanation: Lemmatization relies on linguistic dictionaries to produce the correct base (lemma) form. Question 19. Latent Dirichlet Allocation (LDA) in Vantage is best suited for: A) Predicting numeric time series B) Clustering geographic coordinates C) Discovering hidden topics in a collection of documents D) Classifying images Answer: C Explanation: LDA is a probabilistic topic‑modeling technique that uncovers latent topics. Question 20. A Naive Bayes model outputs a probability of 0.82 for class “Churn”. How should this be interpreted? A) The model predicts “Churn” with 82 % confidence. B) 82 % of the training data belongs to “Churn”.

Exam

C) The odds ratio for “Churn” is 0.82. D) The model is 82 % accurate overall. Answer: A Explanation: The probability indicates the model’s confidence that the instance belongs to the “Churn” class. Question 21. Which statement correctly describes how VARMAX extends ARIMA? A) VARMAX adds exogenous variables and multiple interrelated time series. B) VARMAX replaces differencing with moving averages. C) VARMAX only works with seasonal data. D) VARMAX is a non‑parametric alternative to ARIMA. Answer: A Explanation: VARMAX (Vector Autoregressive Moving Average with eXogenous inputs) handles multivariate series and external regressors. Question 22. In the ARIMA function call ARIMA(col, 2, 1, 0), what does the second argument represent? A) Number of seasonal periods B) Differencing order (d) C) Autoregressive order (p) D) Moving‑average order (q) Answer: C Explanation: The signature is ARIMA(column, p, d, q); thus the second argument is the AR order. Question 23. Which Vantage data type is specifically designed to store a range of dates, such as a fiscal quarter?

Exam

Question 26. In survival analysis, which order of COX functions yields a valid model pipeline? A) COX_FIT → COX_PREDICT → COX_EVALUATE B) COX_PREP → COX_FIT → COX_SCORE C) COX_SPLIT → COX_FIT → COX_EVAL D) COX_TRAIN → COX_VALIDATE → COX_DEPLOY Answer: A Explanation: First fit the model, then predict survival probabilities, and finally evaluate performance. Question 27. Which geospatial function calculates the shortest distance between two points stored as ST_Point? A) ST_Distance_Sphere B) ST_Length C) ST_Intersects D) ST_Area Answer: A Explanation: ST_Distance_Sphere returns the great‑circle distance between two geographic points. Question 28. Using the ST_Contains function, you can determine: A) Whether two polygons overlap partially B) If a point lies inside a polygon C) The centroid of a polygon D) The area of a multipolygon Answer: B

Exam

Explanation: ST_Contains returns true when the first geometry completely contains the second. Question 29. Which Teradata‑specific open‑source library provides a dplyr‑like interface for Vantage? A) TeradataML B) TDPLYR C) VantageR D) TDPandas Answer: B Explanation: TDPLYR mimics dplyr syntax, enabling fluent data manipulation directly on Vantage. Question 30. TeradataML is primarily used for: A) Data visualization only B) In‑database machine learning model training and scoring C) ETL orchestration D) Managing user privileges Answer: B Explanation: TeradataML offers in‑database ML algorithms, allowing training and scoring without data export. Question 31. If a sample’s histogram is approximately normal, which hypothesis test is most appropriate for comparing its mean to a known population mean? A) Wilcoxon signed‑rank test B) One‑sample t‑test C) Chi‑square goodness‑of‑fit D) Kolmogorov‑Smirnov test

Exam

Answer: D Explanation: PCA transforms correlated variables into orthogonal principal components, eliminating multicollinearity. Question 35. In a scatter‑plot matrix, a strong linear pattern between two variables suggests: A) High correlation B) Causation C) Heteroscedasticity D) Multimodality Answer: A Explanation: Linear alignment indicates a strong Pearson correlation, though not causation. Question 36. Including extreme outliers in a linear regression model typically results in: A) Improved R‑squared B) Inflated standard errors and biased coefficients C) Reduced overfitting D) Better residual plots Answer: B Explanation: Outliers can disproportionately influence parameter estimates, causing bias and larger errors. Question 37. If outliers are not removed before training a tree‑based model, the most likely effect is: A) The tree will become deeper with many splits to accommodate outliers. B) The model will ignore them automatically.

Exam

C) Accuracy will always improve. D) Feature importance scores become zero. Answer: A Explanation: Decision trees try to partition around outliers, often leading to over‑complex trees. Question 38. Shapley values are used in attribution to: A) Rank features by their absolute coefficients. B) Quantify each feature’s contribution to a specific prediction. C) Compute global model accuracy. D) Replace cross‑validation. Answer: B Explanation: Shapley values distribute the prediction difference among features fairly. Question 39. Which model input is NOT required by the Vantage ATTRIBUTION function? A) Model coefficients B) Feature matrix for the observation C) Baseline (reference) prediction D) Hyperparameter tuning grid Answer: D Explanation: Attribution uses model outputs and feature values; hyperparameter grids are irrelevant. Question 40. In a ROC curve, the point closest to the top‑left corner represents: A) Maximum false‑positive rate B) Minimum true‑positive rate

Exam

C) Increase model bias. D) Select the best hyperparameter automatically. Answer: B Explanation: By rotating validation folds, cross‑validation provides a robust performance estimate. Question 44. Translating a churn model’s lift of 2.5 into business value for a marketing team could be expressed as: A) “The model reduces data storage costs by 2.5 %.” B) “Targeting the top 10 % of scored customers yields 2.5 × more retained customers than random outreach.” C) “Model training time is 2.5 hours faster.” D) “The model improves SQL query speed by 2.5×.” Answer: B Explanation: Lift directly relates to the increased number of positive outcomes (retained customers) compared with random selection. Question 45. Which SQL syntax correctly applies a trained Vantage model named CUSTOMER_CHURN_MODEL to score a new table NEW_CUSTOMERS? A) SELECT * FROM NEW_CUSTOMERS SCORE USING MODEL CUSTOMER_CHURN_MODEL; B) SELECT SCORE(CUSTOMER_CHURN_MODEL, *) FROM NEW_CUSTOMERS; C) SELECT * FROM NEW_CUSTOMERS, MODEL_APPLY('CUSTOMER_CHURN_MODEL'); D) SELECT * FROM NEW_CUSTOMERS APPLY_MODEL('CUSTOMER_CHURN_MODEL'); Answer: B Explanation: The SCORE function takes the model name and the input columns to produce predictions.

Exam

Question 46. The first step in building an operational predictive pipeline in Vantage is: A) Deploying the model to a REST endpoint. B) Performing feature engineering on the source data. C) Registering the model in the Model Store. D) Writing a Python script for inference. Answer: B Explanation: Feature engineering creates the inputs that the model will consume; without it, the pipeline cannot proceed. Question 47. A best practice for model governance in Vantage includes: A) Storing only the latest model version. B) Tagging models with metadata such as business owner, version, and training date. C) Deleting training logs after deployment. D) Allowing any user to overwrite models. Answer: B Explanation: Rich metadata supports traceability, auditability, and lifecycle management. Question 48. In a containerized Vantage environment, Docker is used to: A) Orchestrate multiple Vantage nodes. B) Package and isolate custom analytics code for deployment. C) Replace the Vantage query optimizer. D) Store large datasets on disk. Answer: B Explanation: Docker creates lightweight containers that encapsulate code, dependencies, and runtime.

Exam

Explanation: The windowed COUNT_IF tallies NULLs over a moving window, detecting runs of missing data. Question 52. To efficiently load JSON data stored in an S3 bucket into a Vantage table, which combination is recommended? A) FastLoad + JSON_EXTRACT B) TPT + External Table referencing S3 with FORMAT JSON C) BTEQ + INSERT SELECT from S D) TPump + XMLTABLE Answer: B Explanation: TPT external tables can directly read JSON files from S3, handling semi‑structured data natively. Question 53. Which statistical test is appropriate for comparing the medians of two independent, non‑normally distributed samples? A) Two‑sample t‑test B) Mann‑Whitney U test C) Paired t‑test D) ANOVA Answer: B Explanation: The Mann‑Whitney U test is a non‑parametric alternative to compare medians. Question 54. In Vantage, the UNIFORM function is often used for: A) Generating random numbers for stratified sampling. B) Normalizing numeric columns. C) Creating a uniform distribution of dates.

Exam

D) Calculating the uniformity of a dataset. Answer: A Explanation: UNIFORM produces pseudo‑random numbers useful for random sampling partitions. Question 55. Which method best handles categorical variables with high cardinality in a tree‑based model? A) One‑hot encoding all categories. B) Frequency encoding (replace category with its count). C) Dropping the variable. D) Using label encoding only. Answer: B Explanation: Frequency encoding reduces dimensionality while preserving information about rare vs. common categories. Question 56. The ST_Buffer function applied to a point geometry with radius 10 km returns: A) A polygon representing all locations within 10 km of the point. B) The distance between two points. C) The area of a circle with radius 10 km. D) The centroid of the point. Answer: A Explanation: ST_Buffer creates a buffer zone (polygon) around the given geometry. Question 57. When using the TDPLYR function filter(), which of the following is true? A) It pushes the filter predicate to the Vantage engine for in‑database execution.