[CNLP] Certificate in Natural Language Processing using Python Certification Exam Guide, Exams of Technology

This certification exam guide introduces natural language processing concepts using Python. Topics include text preprocessing, language modeling, machine learning algorithms, and practical NLP applications. Candidates gain hands-on knowledge for developing intelligent language-based systems and data-driven solutions.

Typology: Exams

2025/2026

Available from 02/10/2026

shilpi-jain-3
shilpi-jain-3 🇮🇳

2.5

(11)

80K documents

1 / 81

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
[CNLP] Certificate in Natural Language Processing using
Python Certification Exam Guide
**Question 1.** Which step in the NLP pipeline typically follows data collection?
A) Model deployment
B) Text preprocessing
C) Hyperparameter tuning
D) Evaluation
Answer: B
Explanation: After gathering raw text, the next essential step is preprocessing (cleaning, tokenizing, etc.)
to make the data suitable for modeling.
**Question 2.** In handling lexical ambiguity, which technique is most commonly used?
A) Dependency parsing
B) Word sense disambiguation
C) Named entity recognition
D) Stemming
Answer: B
Explanation: Word sense disambiguation resolves which meaning of a word is intended, directly
addressing lexical ambiguity.
**Question 3.** Which Python library provides pretrained statistical models for partofspeech tagging
and named entity recognition with a focus on speed?
A) NLTK
B) spaCy
C) Gensim
D) TextBlob
Answer: B
Explanation: spaCy is designed for industrialstrength NLP, offering fast POS tagging and NER out of the
box.
**Question 4.** Which of the following is NOT a core component of the scikitlearn library for NLP?
A) CountVectorizer
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51

Partial preview of the text

Download [CNLP] Certificate in Natural Language Processing using Python Certification Exam Guide and more Exams Technology in PDF only on Docsity!

Python Certification Exam Guide

Question 1. Which step in the NLP pipeline typically follows data collection? A) Model deployment B) Text preprocessing C) Hyper‑parameter tuning D) Evaluation Answer: B Explanation: After gathering raw text, the next essential step is preprocessing (cleaning, tokenizing, etc.) to make the data suitable for modeling. Question 2. In handling lexical ambiguity, which technique is most commonly used? A) Dependency parsing B) Word sense disambiguation C) Named entity recognition D) Stemming Answer: B Explanation: Word sense disambiguation resolves which meaning of a word is intended, directly addressing lexical ambiguity. Question 3. Which Python library provides pre‑trained statistical models for part‑of‑speech tagging and named entity recognition with a focus on speed? A) NLTK B) spaCy C) Gensim D) TextBlob Answer: B Explanation: spaCy is designed for industrial‑strength NLP, offering fast POS tagging and NER out of the box. Question 4. Which of the following is NOT a core component of the scikit‑learn library for NLP? A) CountVectorizer

Python Certification Exam Guide

B) TfidfTransformer C) Word2Vec D) Pipeline Answer: C Explanation: Word2Vec is provided by Gensim, not scikit‑learn; the other options are part of scikit‑learn’s feature extraction utilities. Question 5. When using regular expressions, which pattern matches any digit character? A) \w B) \d C) \s D) \b Answer: B Explanation: \d is the regex token for a digit (0‑9); \w matches word characters, \s whitespace, and \b word boundary. Question 6. Which tokenization method splits text into sub‑word units that can handle out‑of‑vocabulary words? A) Word tokenization B) Sentence tokenization C) Byte‑Pair Encoding (BPE) D) Whitespace tokenization Answer: C Explanation: BPE merges frequent character sequences to create sub‑word tokens, enabling representation of unseen words. Question 7. What is the primary purpose of stop‑word removal? A) Reduce dimensionality by eliminating high‑frequency, low‑information tokens B) Convert words to their base forms C) Detect named entities

Python Certification Exam Guide

Explanation: Lemmatization relies on morphological analysis and POS tags to return the correct base form (lemma) of a word. Question 11. In case folding, why is converting text to lowercase important for many NLP models? A) It removes punctuation B) It normalizes tokens, reducing vocabulary size C) It improves model interpretability D) It encodes semantic meaning Answer: B Explanation: Lowercasing treats “Apple” and “apple” as the same token, decreasing the number of unique words and improving model efficiency. Question 12. Which POS tag corresponds to a proper noun in the Penn Treebank tag set? A) NN B) VB C) NNP D) JJ Answer: C Explanation: “NNP” denotes singular proper nouns; “NN” is common noun, “VB” verb, “JJ” adjective. Question 13. Dependency parsing primarily aims to identify: A) Word frequencies B) Syntactic head‑dependent relationships C) Document topics D) Sentiment polarity Answer: B Explanation: Dependency parsing creates a tree that links each word (dependent) to its governing head, revealing grammatical structure.

Python Certification Exam Guide

Question 14. Which NER label typically represents a geopolitical entity? A) PERSON B) ORG C) LOC D) GPE Answer: D Explanation: “GPE” (Geo‑Political Entity) tags locations such as countries, cities, and states. Question 15. In a Bag‑of‑Words model, what does the term “sparse matrix” refer to? A) A matrix containing many zero entries because most words are absent from a given document B) A matrix with dense word embeddings C) A matrix storing word order information D) A matrix that includes only stop words Answer: A Explanation: BoW vectors have dimensions equal to vocabulary size; each document contains only a small subset, leading to many zeros. Question 16. TF‑IDF weighting reduces the impact of words that are: A) Rare across the corpus B) Common across many documents C) Long in character length D) Proper nouns Answer: B Explanation: Inverse Document Frequency penalizes terms appearing in many documents, highlighting discriminative words. Question 17. Which word embedding technique learns vector representations by predicting surrounding words within a fixed window? A) GloVe

Python Certification Exam Guide

D) k‑Nearest Neighbors Answer: C Explanation: Multinomial Naive Bayes models the probability of word counts assuming conditional independence, making it well‑suited for bag‑of‑words features. Question 21. In binary classification, the F1‑Score is the harmonic mean of: A) Accuracy and Recall B) Precision and Recall C) Precision and Specificity D) Accuracy and Precision Answer: B Explanation: F1 = 2 · (Precision · Recall) / (Precision + Recall); it balances false positives and false negatives. Question 22. Which metric is most appropriate when classes are highly imbalanced? A) Accuracy B) Macro‑averaged F1‑Score C) Mean Squared Error D) R² Answer: B Explanation: Macro‑averaged F1 treats each class equally, mitigating the bias that accuracy can have toward majority classes. Question 23. In a confusion matrix, the term “False Positive” refers to: A) Correctly predicted positive instances B) Incorrectly predicted negative instances C) Incorrectly predicted positive instances D) Correctly predicted negative instances Answer: C

Python Certification Exam Guide

Explanation: A false positive occurs when the model predicts the positive class but the true label is negative. Question 24. Which kernel is commonly used with SVMs for text data to handle high‑dimensional sparse vectors? A) Linear kernel B) Polynomial kernel C) Radial Basis Function (RBF) kernel D) Sigmoid kernel Answer: A Explanation: Linear kernels work well with sparse, high‑dimensional data like TF‑IDF vectors and are computationally efficient. Question 25. Logistic regression outputs probabilities because it applies which function to its linear combination of features? A) Softmax B) ReLU C) Sigmoid D) Tanh Answer: C Explanation: The sigmoid function maps any real‑valued input to a (0,1) probability range. Question 26. In sentiment analysis, which preprocessing step is most likely to improve model performance on social media text? A) Lemmatization of proper nouns B] Removing emojis and emoticons C) Expanding contractions (e.g., “don’t” → “do not”) D) Keeping all punctuation Answer: C

Python Certification Exam Guide

Explanation: Hierarchical clustering builds a tree of clusters (dendrogram) without needing to pre‑specify the number of clusters. Question 30. Recurrent Neural Networks (RNNs) are particularly suited for: A) Image classification B) Fixed‑size tabular data C) Sequential data where order matters D) Graph data Answer: C Explanation: RNNs maintain hidden states that capture information from previous time steps, making them ideal for sequences. Question 31. The vanishing gradient problem in standard RNNs primarily affects: A) Model convergence speed on large datasets B) Ability to learn long‑range dependencies C) Memory consumption during training D) Compatibility with GPU acceleration Answer: B Explanation: Gradients shrink exponentially over many time steps, preventing the network from learning relationships far apart in the sequence. Question 32. Which gate in an LSTM cell controls how much of the previous cell state is retained? A) Input gate B) Forget gate C) Output gate D) Reset gate Answer: B Explanation: The forget gate decides which information from the prior cell state should be discarded.

Python Certification Exam Guide

Question 33. GRU units differ from LSTMs by: A) Having separate input and forget gates B) Using a single update gate instead of separate input and forget gates C) Not maintaining a cell state D) Being incompatible with attention mechanisms Answer: B Explanation: GRUs combine the input and forget functionalities into an update gate, simplifying the architecture. Question 34. The “Attention is All You Need” paper introduced which architecture that relies solely on attention mechanisms? A) Convolutional Neural Network (CNN) B) Recurrent Neural Network (RNN) C) Transformer D) Autoencoder Answer: C Explanation: The Transformer architecture discards recurrence and convolutions, using self‑attention to model dependencies. Question 35. In self‑attention, the term “query” refers to: A) The vector representation of the entire document B) The token whose relationships to other tokens are being computed C) The output of the feed‑forward network D) The positional encoding vector Answer: B Explanation: Each token generates a query vector that is compared with keys of other tokens to compute attention scores. Question 36. Positional encodings in Transformers are added to token embeddings to: A) Encode part‑of‑speech information

Python Certification Exam Guide

C) Leveraging sub‑word models like FastText or Byte‑Pair Encoding D) Re‑training the entire embedding matrix Answer: C Explanation: Sub‑word approaches can compose vectors for unseen words from known character n‑grams or sub‑tokens. Question 40. Which evaluation metric is most appropriate for multi‑label text classification where each document can belong to multiple categories? A) Macro‑averaged F1‑Score B) Exact match ratio C) Hamming loss D) ROC‑AUC per label Answer: D Explanation: ROC‑AUC can be computed per label and then averaged, handling the presence of multiple simultaneous classes. Question 41. When visualizing word embeddings with t‑SNE, the primary goal is to: A) Reduce dimensionality to 2‑D for human interpretation while preserving local structure B) Increase the number of dimensions for better performance C) Convert embeddings to binary vectors D) Perform clustering on raw text Answer: A Explanation: t‑SNE projects high‑dimensional data into 2‑D/3‑D, preserving neighborhood relationships for visual inspection. Question 42. Which Python function from pandas is commonly used to read a CSV file containing textual data? A) pd.read_excel() B) pd.read_sql() C) pd.read_csv()

Python Certification Exam Guide

D) pd.read_json() Answer: C Explanation: pd.read_csv() loads comma‑separated values into a DataFrame, a typical format for text corpora. Question 43. In NumPy, which method creates an array of zeros with the same shape as an existing array X? A) np.empty_like(X) B) np.ones_like(X) C) np.zeros_like(X) D) np.full_like(X, 0) Answer: C Explanation: np.zeros_like() returns a zero‑filled array matching the shape and dtype of X. Question 44. Which NLTK function tokenizes a string into sentences? A) word_tokenize() B) sent_tokenize() C) regexp_tokenize() D) pos_tag() Answer: B Explanation: sent_tokenize() splits text into a list of sentence strings. Question 45. The re.findall() function in Python returns: A) The first match object B) All non‑overlapping matches of a pattern as a list C) A compiled regular expression object D) The number of matches Answer: B Explanation: re.findall() extracts every non‑overlapping occurrence of the pattern.

Python Certification Exam Guide

B) TfidfVectorizer C) HashingVectorizer D) FeatureHasher Answer: B Explanation: TfidfVectorizer converts raw documents to TF‑IDF weighted term-document matrices. Question 50. In a typical machine‑learning text classification workflow, the step that determines the optimal hyper‑parameters is called: A) Feature extraction B) Model training C) Hyper‑parameter tuning (e.g., GridSearchCV) D) Data cleaning Answer: C Explanation: Hyper‑parameter tuning searches over parameter space (e.g., regularization strength) to improve performance. Question 51. Which of the following is a common method for handling class imbalance before training a classifier? A) Feature scaling B) Oversampling the minority class (e.g., SMOTE) C) Removing stop words D) Using a larger batch size Answer: B Explanation: Synthetic Minority Over‑sampling Technique (SMOTE) creates new minority samples to balance the dataset. Question 52. In the context of LDA, the hyper‑parameter α (alpha) controls: A) The number of topics B) The Dirichlet prior on per‑document topic distribution (sparsity) C) The learning rate of the algorithm

Python Certification Exam Guide

D) The word‑topic distribution sparsity Answer: B Explanation: α influences how many topics are likely to appear in each document; lower α leads to sparser topic mixtures. Question 53. Which loss function is used for training a binary classifier with logistic regression? A) Hinge loss B) Mean Squared Error C) Binary Cross‑Entropy (log loss) D) Kullback‑Leibler divergence Answer: C Explanation: Binary cross‑entropy measures the difference between predicted probabilities and true binary labels. Question 54. In a confusion matrix for a multi‑class problem, the sum of the diagonal elements represents: A) Total number of predictions B) Overall accuracy (correct predictions) C) Number of false negatives D) Number of false positives Answer: B Explanation: Diagonal entries are true positives for each class; summing them gives total correct predictions. Question 55. Which of the following is a key advantage of using a linear SVM for high‑dimensional text data? A) Ability to model non‑linear decision boundaries without kernels B) Low memory consumption due to sparse representations C) Automatic feature selection D) Built‑in handling of missing values

Python Certification Exam Guide

Answer: C Explanation: ReLU (or GELU) is applied after the linear transformation to introduce non‑linearity. Question 59. When fine‑tuning BERT for a question‑answering task, the model typically predicts: A) The next word in the sequence B) The start and end token positions of the answer span C) A binary relevance score D) A summary of the passage Answer: B Explanation: BERT QA heads output two probability distributions over tokens for answer start and end positions. Question 60. Which Python package provides the transformers library for accessing pre‑trained models like BERT and GPT? A) torchtext B) keras C) huggingface‑transformers D) gensim Answer: C Explanation: The transformers library (formerly huggingface‑transformers) offers a unified API for many LLMs. Question 61. In the context of word embeddings, cosine similarity between two vectors is computed as: A) Dot product divided by product of their magnitudes B) Euclidean distance C) Manhattan distance D) Jaccard index Answer: A

Python Certification Exam Guide

Explanation: Cosine similarity measures the angle between vectors, calculated as the dot product over the product of norms. Question 62. Which preprocessing step is essential before applying a CountVectorizer to a corpus containing HTML content? A) Stemming B) Removing HTML tags (e.g., using BeautifulSoup) C) Lemmatization D) Adding bigrams Answer: B Explanation: HTML tags introduce noise; stripping them ensures only meaningful text is tokenized. Question 63. In Gensim’s Word2Vec, the window parameter controls: A) The size of the embedding vectors B) The maximum number of training epochs C) The number of surrounding words considered as context D) The learning rate Answer: C Explanation: window defines the context size on each side of the target word during training. Question 64. Which evaluation technique provides an unbiased estimate of model performance by repeatedly splitting the data into training and validation sets? A) Hold‑out validation B) Leave‑One‑Out cross‑validation C) k‑fold cross‑validation D. Bootstrapping Answer: C Explanation: k‑fold cross‑validation partitions data into k subsets, rotating the validation set across folds.