ARTIFICIAL NEURAL NETWORKS – IMPLEMENTATION AND APPLICATIONS IN COMPUTER SCIENCES.pdf, Exams of Artificial Intelligence

This comprehensive test bank contains 150 practice questions covering artificial neural network fundamentals (perceptrons, activation functions (ReLU, sigmoid, tanh, softmax), backpropagation, loss functions (MSE, cross-entropy), optimizers (SGD, Adam, RMSProp)), convolutional neural networks (CNNs) (filters, stride, padding, pooling, receptive field, architectures (AlexNet, ResNet, VGG)), recurrent neural networks (RNNs) (vanishing gradient problem, LSTM, GRU, backpropagation through time), transformers (self-attention, positional encoding, multi-head attention, BERT, GPT), autoencoders (latent space, variational autoencoders, denoising autoencoders), generative adversarial networks (GANs) (generator, discriminator), regularization techniques (dropout, batch normalization, weight decay), transfer learning, fine-tuning, large language models, prompt engineering, reinforcement learning (DQN, policy gradients, PPO), graph neural networks (GNNs), attention mechanisms, diffusion models.

Typology: Exams

2025/2026

Available from 06/07/2026

ndirangu-smurf
ndirangu-smurf 🇺🇸

58 documents

1 / 45

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ARTIFICIAL NEURAL NETWORKS –
IMPLEMENTATION AND APPLICATIONS IN
COMPUTER SCIENCES
150 Practice Questions with Correct Answers & Detailed Explanations
Comprehensive Study Resource
Question 1
What is the basic computational unit of an artificial neural network?
A) A weight
B) A bias
C) A perceptron (neuron)
D) A layer
Correct Answer: C
Explanation: The perceptron (or artificial neuron) is the basic computational unit that receives
inputs, applies weights, sums them with a bias, and passes the result through an activation
function.
Question 2
Which activation function is most commonly used in hidden layers of deep neural networks?
A) Sigmoid
B) Tanh
C) ReLU (Rectified Linear Unit)
D) Softmax
Correct Answer: C
Explanation: ReLU (max(0,x)) is the most popular activation function for hidden layers because
it mitigates the vanishing gradient problem and is computationally efficient.
Question 3
What is the primary purpose of the activation function in a neural network?
A) To initialize weights
B) To introduce non-linearity into the network
C) To regularize the model
D) To reduce overfitting
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d

Partial preview of the text

Download ARTIFICIAL NEURAL NETWORKS – IMPLEMENTATION AND APPLICATIONS IN COMPUTER SCIENCES.pdf and more Exams Artificial Intelligence in PDF only on Docsity!

ARTIFICIAL NEURAL NETWORKS –

IMPLEMENTATION AND APPLICATIONS IN

COMPUTER SCIENCES

150 Practice Questions with Correct Answers & Detailed Explanations Comprehensive Study Resource

Question 1 What is the basic computational unit of an artificial neural network? A) A weight B) A bias C) A perceptron (neuron) D) A layer

Correct Answer: C Explanation: The perceptron (or artificial neuron) is the basic computational unit that receives inputs, applies weights, sums them with a bias, and passes the result through an activation function.

Question 2 Which activation function is most commonly used in hidden layers of deep neural networks? A) Sigmoid B) Tanh C) ReLU (Rectified Linear Unit) D) Softmax

Correct Answer: C Explanation: ReLU (max(0,x)) is the most popular activation function for hidden layers because it mitigates the vanishing gradient problem and is computationally efficient.

Question 3 What is the primary purpose of the activation function in a neural network? A) To initialize weights B) To introduce non-linearity into the network C) To regularize the model D) To reduce overfitting

Correct Answer: B Explanation: Activation functions introduce non-linearity, allowing neural networks to learn complex, non-linear relationships between inputs and outputs.

Question 4 Which of the following is NOT a type of artificial neural network architecture? A) Feedforward Neural Network (FNN) B) Convolutional Neural Network (CNN) C) Recurrent Neural Network (RNN) D) Linear Regression Network

Correct Answer: D Explanation: Linear Regression is a statistical method, not a neural network architecture. Common ANN architectures include FNN, CNN, RNN, LSTM, GAN, and Autoencoders.

Question 5 What does the term "epoch" mean in neural network training? A) One forward pass of a single sample B) One forward and backward pass of the entire training dataset C) The learning rate decay step D) The number of hidden layers

Correct Answer: B Explanation: An epoch is one complete pass of the entire training dataset through the neural network (both forward and backward propagation).

Question 6 What is a "batch" in the context of neural network training? A) The entire dataset B) A subset of the training dataset used for one weight update C) The test dataset D) The validation dataset

Correct Answer: B Explanation: A batch is a subset of training samples used to compute the gradient and update weights in mini-batch gradient descent.

A) Mean Squared Error (MSE) B) Binary Cross-Entropy (Log Loss) C) Categorical Cross-Entropy D) Hinge Loss

Correct Answer: B Explanation: Binary cross-entropy is the standard loss function for binary classification problems where the output is a probability between 0 and 1.

Question 11 Which optimization algorithm is most commonly used in modern deep learning? A) Standard Gradient Descent B) Stochastic Gradient Descent (SGD) C) Adam (Adaptive Moment Estimation) D) Newton's Method

Correct Answer: C Explanation: Adam combines the advantages of momentum and adaptive learning rates, making it the most popular optimizer for deep learning due to its efficiency and robustness.

Question 12 What is the purpose of the learning rate in gradient descent? A) To determine the number of epochs B) To control the step size during weight updates C) To initialize the weights D) To activate neurons

Correct Answer: B Explanation: The learning rate controls how much to change weights in response to the estimated error each time weights are updated.

Question 13 What is "dropout" in neural networks? A) A regularization technique that randomly deactivates neurons during training B) A method to drop low-weight connections C) A data augmentation technique D) A type of activation function

Correct Answer: A Explanation: Dropout randomly drops out (deactivates) a fraction of neurons during training, preventing co-adaptation and reducing overfitting.

Question 14 What is the main advantage of Convolutional Neural Networks (CNNs) over standard feedforward networks? A) They are faster to train B) They automatically learn spatial hierarchies of features (translation invariance) C) They use fewer parameters D) Both B and C

Correct Answer: D Explanation: CNNs use weight sharing and local connectivity to learn hierarchical features and require fewer parameters than fully connected networks for image data.

Question 15 What is a "filter" (kernel) in a convolutional layer? A) A data preprocessing step B) A small matrix that slides over the input to detect features C) A pooling operation D) A type of activation function

Correct Answer: B Explanation: Filters (or kernels) are small learnable matrices that slide over the input to detect specific features like edges, textures, or patterns.

Question 16 What is the purpose of the pooling layer in a CNN? A) To increase the spatial dimensions of feature maps B) To downsample feature maps, reducing dimensionality and providing translation invariance C) To apply non-linearity D) To classify images

Correct Answer: B Explanation: Pooling (max pooling or average pooling) reduces the spatial dimensions of feature maps, decreasing computational load and providing some translation invariance.

B) The output layer C) The compressed representation (bottleneck layer) between encoder and decoder D) The loss function

Correct Answer: C Explanation: The latent space (bottleneck) is the compressed, lower-dimensional representation of the input data learned by the autoencoder.

Question 21 What are the two main components of a Generative Adversarial Network (GAN)? A) Encoder and Decoder B) Generator and Discriminator C) Convolution and Pooling D) Forward and Backward

Correct Answer: B Explanation: GANs consist of a Generator (creates fake data) and a Discriminator (distinguishes real from fake), which are trained adversarially.

Question 22 What is "transfer learning" in deep learning? A) Moving a trained model to a different hardware B) Using a pre-trained model on a new, related task C) Transferring data between training and test sets D) Changing the learning rate during training

Correct Answer: B Explanation: Transfer learning reuses a model trained on a large dataset (e.g., ImageNet) as a starting point for a new, related task, reducing training time and data requirements.

Question 23 What is "fine-tuning" in transfer learning? A) Unfreezing some layers of a pre-trained model and retraining them on the new task B) Adjusting the learning rate C) Changing the loss function D) Adding more layers to the network

Correct Answer: A

Explanation: Fine-tuning involves unfreezing some of the top layers of a pre-trained model and training them on the new dataset to adapt the features to the new task.

Question 24 Which Python library is most commonly used for building deep learning models? A) NumPy B) Pandas C) TensorFlow / Keras D) Matplotlib

Correct Answer: C Explanation: TensorFlow (with Keras API) and PyTorch are the most popular deep learning libraries for building and training neural networks.

Question 25 What is the purpose of the validation set? A) To train the model B) To evaluate model performance during training and tune hyperparameters C) To test the final model D) To augment the training data

Correct Answer: B Explanation: The validation set is used during training to monitor performance, tune hyperparameters, and detect overfitting before evaluating on the test set.

Question 26 What is "overfitting" in neural networks? A) The model performs well on training data but poorly on unseen data B) The model performs poorly on both training and test data C) The model trains too slowly D) The model has too few parameters

Correct Answer: A Explanation: Overfitting occurs when the model learns noise and specific patterns in the training data instead of generalizable features.

D) A type of loss function

Correct Answer: B Explanation: Batch normalization normalizes layer outputs, reducing internal covariate shift and allowing higher learning rates.

Question 31 What is the difference between "supervised" and "unsupervised" learning in neural networks? A) Supervised uses labeled data; unsupervised uses unlabeled data B) Unsupervised uses labeled data; supervised uses unlabeled data C) Both require labeled data D) Both require unlabeled data

Correct Answer: A Explanation: Supervised learning uses input-output pairs (labels). Unsupervised learning finds patterns in unlabeled data (e.g., clustering, autoencoders).

Question 32 Which layer in a CNN is responsible for reducing spatial dimensions? A) Convolutional layer B) Activation layer C) Pooling layer D) Fully connected layer

Correct Answer: C Explanation: Pooling layers (max pooling, average pooling) downsample feature maps, reducing spatial dimensions and computational load.

Question 33 What is "stride" in a convolutional layer? A) The number of filters B) The step size at which the filter moves across the input C) The size of the filter D) The learning rate

Correct Answer: B Explanation: Stride is the number of pixels the filter shifts at each step. Larger stride produces smaller output dimensions.

Question 34 What is "padding" in a convolutional layer? A) Adding zeros around the input to control output size B) Adding random noise to the input C) A type of activation function D) A regularization technique

Correct Answer: A Explanation: Padding adds zeros around the input border to control output spatial dimensions (same padding vs. valid padding).

Question 35 Which architecture won the ImageNet 2012 competition and sparked the deep learning revolution? A) VGGNet B) AlexNet C) ResNet D) Inception

Correct Answer: B Explanation: AlexNet (Krizhevsky et al., 2012) won ImageNet 2012, popularizing deep CNNs for image recognition.

Question 36 What is the key innovation of ResNet (Residual Network)? A) Inception modules B) Skip connections (residual connections) that allow training of very deep networks C) Depthwise separable convolutions D) Attention mechanisms

Correct Answer: B Explanation: ResNet introduces skip connections that bypass layers, allowing gradients to flow directly through the network and enabling training of hundreds of layers.

Question 37

Correct Answer: B Explanation: Softmax exponentiates and normalizes outputs, producing a probability distribution over classes.

Question 41 What does "backpropagation" compute? A) The forward pass of the network B) The gradient of the loss function with respect to each weight C) The activation of neurons D) The accuracy of the model

Correct Answer: B Explanation: Backpropagation computes gradients using the chain rule, allowing weights to be updated to minimize the loss.

Question 42 What is "exploding gradients"? A) Gradients become extremely large, causing unstable weight updates B) Gradients become extremely small C) Gradients oscillate between positive and negative D) Gradients become zero

Correct Answer: A Explanation: Exploding gradients occur when gradients become very large, often due to deep networks or poor initialization. Gradient clipping is a common solution.

Question 43 What is "gradient clipping"? A) A technique to limit the size of gradients during backpropagation B) Removing small gradients entirely C) Increasing all gradients D) A type of activation function

Correct Answer: A Explanation: Gradient clipping caps gradient values to a maximum threshold, preventing exploding gradients.

Question 44 What is the "curse of dimensionality" in machine learning? A) Data becomes sparse as the number of features increases, requiring exponentially more data B) Models train too slowly C) Overfitting is reduced D) Features are redundant

Correct Answer: A Explanation: As dimensionality increases, the volume of space grows exponentially, making data sparse and requiring exponentially more samples for reliable learning.

Question 45 What is "dimensionality reduction" in neural networks? A) Reducing the number of input features using techniques like PCA or autoencoders B) Reducing the number of layers C) Reducing the batch size D) Reducing the learning rate

Correct Answer: A Explanation: Dimensionality reduction techniques compress high-dimensional data into lower-dimensional representations while preserving important information.

Question 46 What is the role of an autoencoder? A) To classify images B) To learn efficient data encodings (compression) in an unsupervised manner C) To generate new data D) To perform regression

Correct Answer: B Explanation: Autoencoders learn to reconstruct their input, creating a compressed bottleneck representation (latent space) useful for dimensionality reduction and denoising.

Question 47 What is a "variational autoencoder" (VAE)?

Correct Answer: A Explanation: Data augmentation artificially expands the training set by applying transformations, reducing overfitting and improving generalization.

Question 51 What is the output of a neuron with inputs [1, 2, 3], weights [0.1, 0.2, 0.3], bias 0.5, using sigmoid activation? A) 0. B) 1. C) 0. D) 0.

Correct Answer: C Explanation: Weighted sum 1×0.1 + 2×0.2 + 3×0.3 + 0.5 0.1+0.4+0.9+0.5 1.9. Sigmoid(1.9) 1/(1+e⁻¹·⁹) ≈ 0.88.

Question 52 What is the output of a ReLU activation for input -2? A) - B) 0 C) 2 D) 1

Correct Answer: B Explanation: ReLU(x) max(0,x). For negative inputs, ReLU outputs 0.

Question 53 A neural network has 3 input features, 2 hidden layers with 4 neurons each, and an output layer with 2 neurons. How many weights are there between the input and first hidden layer? A) 3 × 4 12 B) 4 × 4 16 C) 4 × 2 8 D) 3 + 4 7

Correct Answer: A Explanation: Each connection from 3 inputs to 4 hidden neurons gives 3×4 12 weights.

Question 54 What is the total number of weights in a network with architecture 5-3-2 (input, hidden, output)? A) 5×3 + 3×2 21 B) 5+3+2 10 C) 5×3 + 3×2 + (3+2) biases 26 D) 5×2 10

Correct Answer: A Explanation: Weights: Input to hidden: 5×315, hidden to output: 3×26, total weights 21 (biases are separate parameters).

Question 55 If a CNN has an input of size 32×32×3, and a convolutional layer with 16 filters of size 3×3, stride1, padding"same", what is the output size? A) 30×30× B) 32×32× C) 32×32× D) 30×30×

Correct Answer: B Explanation: With padding"same", output spatial dimensions remain the same as input (32×32). The number of filters determines the output channels (16).

Question 56 What is the formula for output size in a convolutional layer? A) (W - F + 2P)/S + 1 B) (W + F - 2P)/S - 1 C) W × F / S D) (W - F)/S

Correct Answer: A Explanation: Output size (Input size - Filter size + 2×Padding) / Stride + 1.

Question 57 What is "Max Pooling"? A) Taking the average of all values in a pooling window B) Taking the maximum value in a pooling window

Explanation: LSTM has three gates (forget, input, output) and a cell state that maintains long-term memory.

Question 61 What is "teacher forcing" in RNN training? A) Using the true previous output instead of the model's predicted output during training B) The teacher training the student C) A regularization method D) An optimization algorithm

Correct Answer: A Explanation: Teacher forcing uses ground truth outputs as inputs to the next time step during training, stabilizing RNN training.

Question 62 What is the "BLEU score" used for? A) Evaluating classification models B) Evaluating machine translation and text generation quality C) Evaluating image generation D) Evaluating regression models

Correct Answer: B Explanation: BLEU (Bilingual Evaluation Understudy) measures the similarity between generated text and reference translations based on n-gram overlap.

Question 63 What is "self-attention"? A) Attending to the input itself to weigh the importance of different positions B) A type of pooling C) A regularization technique D) An activation function

Correct Answer: A Explanation: Self-attention computes relationships between all pairs of positions in a sequence, allowing the model to capture long-range dependencies.

Question 64 What is "positional encoding" in Transformers? A) A way to inject information about the position of tokens since attention itself is permutation-invariant B) A type of pooling C) A regularization method D) A loss function

Correct Answer: A Explanation: Positional encodings add information about token order to the input embeddings, as self-attention does not inherently capture sequence order.

Question 65 What is "masked self-attention" in decoder layers? A) Preventing the model from attending to future positions during training B) Randomly dropping attention weights C) A type of regularization D) A pooling operation

Correct Answer: A Explanation: Masked self-attention masks (hides) future positions, allowing the decoder to generate output autoregressively during training.

Question 66 What is "fine-tuning" in the context of large language models? A) Training a model from scratch B) Taking a pre-trained model and training it further on a specific task C) Adjusting the learning rate D) Reducing the model size

Correct Answer: B Explanation: Fine-tuning adapts a pre-trained LLM (e.g., BERT, GPT) to a specific downstream task with relatively small amounts of labeled data.

Question 67 What is "prompt engineering"? A) Designing input prompts to get desired outputs from LLMs without fine-tuning B) Engineering the model architecture