CDSP Exam Set 2 (New) Quiz

1. Which of the following machine learning models is best suited for a binary classification task with imbalanced data?

A: k-means clustering
B: Logistic regression with L1 regularization
C: Decision tree with Gini impurity
D: Random Forest with SMOTE (Synthetic Minority Over-sampling Technique)

2. What is the primary difference between L1 and L2 regularization in machine learning?

A: L1 regularization penalizes large weights more heavily than L2 regularization.
B: L2 regularization results in sparse models by reducing some weights to zero.
C: L1 regularization encourages sparsity, while L2 regularization encourages small weights but not exactly zero.
D: L1 regularization increases the complexity of the model, while L2 regularization reduces the complexity.

3. Given the confusion matrix below, calculate the precision for the positive class:

A: 77%
B: 83.3%
C: 90%
D: 85.7%

4. Which of the following metrics is most appropriate to evaluate the performance of a model on imbalanced datasets?

A: Accuracy
B: F1 score
C: Root Mean Squared Error (RMSE)
D: Adjusted R-squared

5. Which of the following Python libraries provides a Gradient Boosting classifier implementation commonly used for machine learning tasks?

A: Pandas
B: TensorFlow
C: Scikit-learn
D: Matplotlib

6. Which of the following techniques is typically used to handle categorical variables with many categories in machine learning?

A: Label encoding
B: One-hot encoding
C: Frequency encoding
D: Target encoding

7. You are working on a dataset with missing values. Which of the following methods would you use if the missing values are randomly distributed and only affect a small percentage of the data?

A: Drop the columns containing missing values
B: Impute missing values using the mean of the column
C: Use a forward-fill or backward-fill method
D: Replace missing values with zeros

8. In principal component analysis (PCA), which of the following describes the first principal component?

A: It is the component with the smallest eigenvalue.
B: It captures the maximum variance in the data.
C: It is always orthogonal to the second principal component.
D: It is the component that has the most correlation with the target variable.

9. You are preprocessing a dataset with highly skewed data. Which of the following transformations is most appropriate to normalize the distribution?

A: Log transformation
B: One-hot encoding
C: Label encoding
D: Feature scaling

10. Which of the following algorithms is most suitable for a multiclass classification task where the number of classes is greater than two?

A: Logistic Regression
B: Naive Bayes
C: k-Means Clustering
D: Decision Tree

11. In k-means clustering, which of the following is true about the k parameter?

A: k represents the number of features used in the dataset.
B: k represents the number of clusters to divide the data into.
C: k represents the number of outliers in the data.
D: k represents the number of iterations during the clustering process.

12. Which evaluation metric would you use for a regression model predicting house prices, and why?

A: Precision, because it measures correct predictions.
B: F1 score, because it balances precision and recall.
C: Mean Absolute Error (MAE), because it calculates the average magnitude of errors.
D: Accuracy, because it measures overall correctness.

13. Consider the ROC curve of two models (Model A and Model B). Based on the ROC curve illustration, which model performs better in distinguishing between the two classes?

A: Model A, because its ROC curve is closer to the diagonal line.
B: Model B, because its ROC curve is farther from the diagonal line.
C: Model A, because it has a higher false positive rate.
D: Model B, because it has a lower Area Under the Curve (AUC).

14. What is the purpose of hyperparameter tuning in machine learning?

A: To adjust the weights of the model during training.
B: To minimize the loss function by adjusting model parameters.
C: To optimize the fixed parameters of the model that control learning (e.g., learning rate, number of trees).
D: To evaluate the model on unseen test data.

15. Which of the following is a key feature of ensemble methods like Random Forest or Gradient Boosting?

A: They are based on unsupervised learning.
B: They combine multiple weak learners to improve model performance.
C: They require fewer data than other machine learning algorithms.
D: They are primarily used for clustering tasks.

16. Which of the following optimization techniques is commonly used in neural networks to update weights and minimize the loss function?

A: Adam optimizer
B: Principal Component Analysis (PCA)
C: L2 regularization
D: k-means clustering

17. Which Python library is primarily used for natural language processing (NLP) tasks, such as tokenization and stemming?

A: TensorFlow
B: scikit-learn
C: NLTK
D: Matplotlib

18. In deep learning, what is the role of an activation function in a neural network?

A: To initialize the weights of the network.
B: To determine whether a neuron should be activated based on its inputs.
C: To optimize the loss function during training.
D: To regularize the model and prevent overfitting.

19. Which of the following is a common method to prevent overfitting in deep learning models?

A: Dropout
B: Increasing the number of layers
C: Reducing the size of the training dataset
D: Using one-hot encoding

20. Which of the following feature scaling techniques ensures that all features have a mean of 0 and a standard deviation of 1?

A: Min-Max scaling
B: Standardization
C: Normalization
D: One-hot encoding

21. You are working with a dataset that contains highly correlated features. Which of the following methods is most appropriate for reducing the dimensionality of the dataset while retaining the most variance?

A: Cross-validation
B: One-hot encoding
C: Principal Component Analysis (PCA)
D: Feature scaling

22. Which of the following is an example of data leakage in a machine learning pipeline?

A: Performing feature scaling before model training
B: Including features in the training data that are not available at the time of prediction
C: Using cross-validation to evaluate model performance
D: Applying PCA to reduce the number of features

23. Which technique is commonly used to handle class imbalance in a dataset where one class significantly outnumbers the other?

A: Removing the majority class samples
B: Adding noise to the data
C: Oversampling the minority class using SMOTE
D: Performing cross-validation

24. In time series analysis, which of the following is typically used to account for seasonality in the data?

A: Autoregressive models (AR)
B: Differencing the data
C: Fourier Transforms
D: Exponential smoothing with a seasonal component

25. Which of the following is a characteristic of stationary time series data?

A: The data exhibits a clear upward or downward trend over time.
B: The mean and variance of the data change over time.
C: The statistical properties, such as mean and variance, remain constant over time.
D: The data has frequent outliers and sudden spikes.

26. Which of the following metrics is most commonly used to evaluate the performance of forecasting models in time series analysis?

A: Mean Squared Error (MSE)
B: Precision
C: Accuracy
D: F1 score

27. You are working with daily sales data and want to make short-term predictions for the next 7 days. Which of the following models is most appropriate for this task?

A: Linear regression
B: Decision tree
C: ARIMA (AutoRegressive Integrated Moving Average)
D: k-Means clustering

28. In a time series forecasting problem, which of the following techniques is used to make the data stationary?

A: Feature scaling
B: One-hot encoding
C: Differencing
D: Cross-validation

29. Which of the following deep learning architectures is best suited for image recognition tasks?

A: Recurrent Neural Networks (RNN)
B: Convolutional Neural Networks (CNN)
C: Autoencoders
D: Support Vector Machines (SVM)

30. In deep learning, what does dropout refer to?

A: The process of removing features from a dataset.
B: The act of stopping the training process early when performance plateaus.
C: The regularization technique that randomly ignores a subset of neurons during training.
D: The process of increasing the learning rate for faster convergence.

31. Which of the following is a key advantage of using transfer learning in deep learning?

A: It allows models to be trained on very small datasets by leveraging pre-trained networks.
B: It ensures that the model will not overfit the training data.
C: It reduces the need for data augmentation.
D: It avoids the need for hyperparameter tuning.

32. In a Recurrent Neural Network (RNN), which type of problem is best suited for its architecture?

A: Image classification
B: Time series forecasting
C: Unsupervised clustering
D: Dimensionality reduction

33. Which activation function is most commonly used in the hidden layers of a deep neural network to introduce non-linearity?

A: Sigmoid
B: ReLU (Rectified Linear Unit)
C: Softmax
D: Linear

34. In deep learning, which of the following is a primary advantage of using batch normalization?

A: It reduces the amount of training data needed.
B: It allows for faster convergence and stabilizes the training process.
C: It eliminates the need for an activation function.
D: It increases the regularization effect of dropout.

35. Which of the following describes the primary purpose of attention mechanisms in deep learning models?

A: To increase the model’s interpretability by highlighting important features.
B: To allow the model to focus on specific parts of the input when making predictions.
C: To add regularization to prevent overfitting.
D: To replace the need for backpropagation.

36. Which of the following is a key component of Convolutional Neural Networks (CNNs) that allows them to detect local features in images?

A: Activation function
B: Fully connected layers
C: Convolutional layers
D: Dropout layers

37. Which of the following best describes feature selection in machine learning?

A: Reducing the number of features by applying dimensionality reduction techniques such as PCA.
B: Choosing a subset of relevant features based on their importance to improve model performance.
C: Adding new features to increase model accuracy.
D: Standardizing and normalizing the features to bring them to a similar scale.

38. Which of the following is an appropriate method for handling categorical features with high cardinality in a machine learning model?

A: One-hot encoding
B: Mean encoding (target encoding)
C: Label encoding
D: Standardization

39. You are working with a dataset that contains missing values in multiple columns. Which method is typically used to impute missing values for numerical features based on their distribution?

A: Forward-fill method
B: Backward-fill method
C: Impute with the median
D: Impute with a constant value

40. Which of the following describes the goal of dimensionality reduction in machine learning?

A: To reduce the number of data samples in the training set.
B: To reduce the number of features while retaining as much information as possible.
C: To increase the number of features to improve the model’s accuracy.
D: To combine similar features into new ones to improve the model's complexity.

41. In a machine learning workflow, which of the following dimensionality reduction techniques is based on finding the linear combinations of features that maximize variance?

A: Cross-validation
B: k-Means clustering
C: Principal Component Analysis (PCA)
D: Logistic regression

42. Which of the following NLP tasks involves breaking down text into individual words or phrases?

A: Stemming
B: Tokenization
C: Named Entity Recognition (NER)
D: Lemmatization

43. Which of the following models is commonly used for word embeddings in NLP tasks, capturing semantic relationships between words?

A: Term Frequency-Inverse Document Frequency (TF-IDF)
B: k-Means clustering
C: Word2Vec
D: Decision tree

44. In sentiment analysis, which of the following machine learning algorithms is most suitable for classifying text as positive or negative?

A: Support Vector Machine (SVM)
B: k-Means clustering
C: Principal Component Analysis (PCA)
D: Logistic Regression

45. Which of the following is the purpose of Named Entity Recognition (NER) in NLP?

A: To classify text into different categories based on sentiment.
B: To identify and extract named entities such as people, organizations, or locations from text.
C: To reduce words to their root forms.
D: To compute word frequencies in a document.

46. Which of the following metrics is used to evaluate a binary classification model when class distribution is imbalanced?

A: Accuracy
B: Precision-Recall AUC
C: Mean Squared Error (MSE)
D: Root Mean Squared Error (RMSE)

47. Which of the following metrics measures the proportion of true positives out of all actual positives in a classification problem?

A: Precision
B: Recall (Sensitivity)
C: F1 score
D: Specificity

48. Which of the following is the primary purpose of using a confusion matrix in a classification problem?

A: To visualize the relationship between precision and recall.
B: To show the number of correct and incorrect predictions for each class.
C: To evaluate the regression model's residuals.
D: To calculate the variance and bias of the model.

49. In a binary classification model, which metric would you use if false negatives are more costly than false positives?

A: Accuracy
B: Precision
C: Recall
D: Specificity

50. Which of the following describes the F1 score in the context of model evaluation?

A: It is the harmonic mean of precision and recall.
B: It is the sum of precision and recall.
C: It measures the overall accuracy of the model.
D: It calculates the variance of the model’s predictions.

51. Which of the following metrics is best suited for evaluating the performance of a regression model?

A: Accuracy
B: Precision
C: Root Mean Squared Error (RMSE)
D: F1 Score

52. Which of the following evaluation metrics would you prioritize when testing a binary classifier on an imbalanced dataset?

A: Accuracy
B: Specificity
C: Precision-Recall AUC
D: R-squared

53. You are building a model to detect credit card fraud, where minimizing false negatives is a priority. Which of the following evaluation metrics should you prioritize?

A: Precision
B: Recall
C: Accuracy
D: F1 Score

54. You are tasked with engineering new features from a dataset. Which of the following is considered a derived feature in feature engineering?

A: Imputed missing values with the median.
B: One-hot encoded a categorical variable.
C: Created a new feature by combining two existing variables.
D: Standardized numerical features.

55. Which of the following is an appropriate method for handling missing values in categorical data?

A: Impute missing values using the mean of the column.
B: Impute missing values using the most frequent category (mode).
C: Replace missing values with zeros.
D: Use forward fill.

56. When performing feature scaling, which of the following is used to scale features to a range between 0 and 1?

A: Standardization
B: One-hot encoding
C: Min-Max scaling
D: Z-score normalization

57. In feature engineering, which of the following methods is commonly used to reduce the dimensionality of a dataset while retaining as much information as possible?

A: Logistic regression
B: Principal Component Analysis (PCA)
C: Decision Trees
D: Cross-validation

58. Which of the following is an effective method to handle an imbalanced dataset where the positive class is much smaller than the negative class?

A: Randomly remove samples from the majority class.
B: Upsample the minority class using techniques like SMOTE.
C: Normalize all features to a common scale.
D: Drop all rows with missing values.

59. Which of the following machine learning algorithms is most commonly used for clustering tasks?

A: Linear Regression
B: Support Vector Machines
C: k-Means
D: Decision Trees

60. Which of the following unsupervised learning techniques can be used to reduce the number of features in a dataset while retaining important information?

A: Principal Component Analysis (PCA)
B: Random Forest
C: Logistic Regression
D: Gradient Boosting

61. Which of the following is a characteristic of unsupervised learning?

A: The algorithm is provided with labeled data to make predictions.
B: The algorithm tries to find patterns in the data without labels.
C: The algorithm’s performance is evaluated using accuracy and precision.
D: The algorithm is typically used for classification tasks.

62. Which of the following metrics is commonly used to evaluate the performance of a clustering algorithm?

A: Silhouette score
B: Mean Squared Error
C: Accuracy
D: F1 score

63. Which of the following describes the purpose of hierarchical clustering?

A: To find the optimal number of clusters using a fixed number of iterations.
B: To recursively build nested clusters by merging or splitting them.
C: To classify data into binary labels.
D: To reduce the dimensionality of the dataset.

64. Which of the following is the main challenge of unsupervised learning compared to supervised learning?

A: Lack of labeled data to validate model performance.
B: Higher computational complexity.
C: Difficulty in interpreting the results.
D: Poor scalability to large datasets.

65. Which of the following methods is used in k-means clustering to assign data points to clusters?

A: Calculating the distance between each point and the centroid of each cluster.
B: Calculating the maximum likelihood estimate of each point.
C: Finding the probability distribution for each cluster.
D: Performing a decision tree split based on the feature values.

66. Which of the following techniques is commonly used to determine the optimal number of clusters in k-means clustering?

A: Precision-Recall curve
B: Elbow method
C: Grid search
D: Cross-validation

67. Which of the following best describes dimensionality reduction techniques such as t-SNE?

A: They transform high-dimensional data into a lower-dimensional space for visualization.
B: They optimize the learning rate of neural networks.
C: They are used to perform hyperparameter tuning in machine learning models.
D: They replace missing values in the dataset.

68. Which of the following is a common strategy to address the vanishing gradient problem in deep neural networks?

A: Increasing the learning rate.
B: Using ReLU (Rectified Linear Unit) as the activation function.
C: Adding more layers to the network.
D: Performing L2 regularization.

69. Which of the following techniques is used to prevent overfitting in deep neural networks?

A: Gradient descent
B: Dropout
C: Increasing the number of neurons
D: Using one-hot encoding

70. Which of the following describes the purpose of batch gradient descent in training neural networks?

A: It updates the model weights after processing the entire training dataset.
B: It updates the model weights after processing a single sample.
C: It uses a subset of the training dataset to update the model weights in each iteration.
D: It tunes the hyperparameters of the model to minimize loss.

71. Which of the following neural network architectures is most effective for sequential data like time series or natural language processing?

A: Convolutional Neural Networks (CNNs)
B: Recurrent Neural Networks (RNNs)
C: Decision trees
D: Random forests

72. Which of the following is an appropriate method to handle outliers in a dataset?

A: Removing all rows with missing values.
B: Scaling the data using Min-Max scaling.
C: Applying log transformation to reduce the impact of outliers.
D: Imputing missing values with the mean.

73. Which of the following methods is used to impute missing values for numerical features in a dataset with a skewed distribution?

A: Impute with the mean
B: Impute with the median
C: Impute with zeros
D: Drop rows with missing values

74. You are working with a large dataset with a high number of features. Which of the following techniques would help in reducing the number of features without significant loss of information?

A: Cross-validation
B: Random forests
C: Principal Component Analysis (PCA)
D: Decision tree pruning

75. Which of the following techniques is used to ensure that all features in a dataset are on the same scale?

A: One-hot encoding
B: Standardization
C: Principal Component Analysis (PCA)
D: Cross-validation

76. Which of the following is a common practice to handle categorical features with many unique categories in a dataset?

A: One-hot encoding
B: Mean encoding (target encoding)
C: Feature scaling
D: Imputation with the mode

77. Which of the following methods is commonly used for hyperparameter tuning in machine learning models?

A: Cross-validation
B: Logistic regression
C: Grid search
D: Decision tree pruning

78. Which of the following best describes cross-validation in model evaluation?

A: A method used to prevent overfitting by splitting the data into multiple training and testing sets.
B: A technique for reducing the dimensionality of the dataset.
C: A model evaluation metric for calculating the mean squared error.
D: A method used to impute missing values.

79. In the context of model evaluation, which of the following best describes k-fold cross-validation?

A: The data is split into k equal parts, and the model is trained k times, each time leaving out one part for validation.
B: The data is split into k parts, with each part representing a different feature.
C: The model is trained on k different machine learning algorithms and combined.
D: The model is evaluated k times on the training data.

80. In a time series analysis, which of the following techniques is used to remove seasonal components from the data?

A: Autoregressive models (AR)
B: Differencing
C: Exponential smoothing
D: Moving average

81. Which of the following models is commonly used for time series forecasting that captures both the trend and seasonality in the data?

A: k-Means clustering
B: Principal Component Analysis (PCA)
C: ARIMA (AutoRegressive Integrated Moving Average)
D: Logistic regression

82. Which of the following algorithms is commonly used for binary classification tasks?

A: Linear Regression
B: Logistic Regression
C: k-Means clustering
D: Principal Component Analysis (PCA)

83. Which of the following techniques can be used to handle class imbalance in a dataset for a classification task?

A: Random undersampling of the majority class
B: Feature scaling using Min-Max scaling
C: Principal Component Analysis (PCA)
D: Cross-validation

84. Which of the following is a key benefit of using ensemble methods like Random Forests?

A: They always provide higher accuracy than other models.
B: They combine multiple weak learners to create a stronger overall model.
C: They reduce the number of features in the dataset.
D: They require fewer data points for training.

85. Which of the following is a characteristic of the bagging technique in ensemble learning?

A: It uses the majority voting mechanism to select the final prediction.
B: It trains models sequentially and each model corrects the errors of the previous one.
C: It reduces variance by training multiple models in parallel on different subsets of the data.
D: It increases bias by focusing on difficult-to-predict samples.

86. Which of the following is an advantage of using Gradient Boosting over bagging methods?

A: It reduces bias by building models sequentially and focusing on difficult samples.
B: It always outperforms bagging in terms of accuracy.
C: It trains models in parallel, reducing computation time.
D: It removes the need for hyperparameter tuning.

87. Which of the following techniques is used to improve the performance of neural networks by scaling the input data in each layer to have zero mean and unit variance?

A: Dropout
B: Batch normalization
C: Gradient clipping
D: Weight initialization

88. Which of the following is a reason for applying feature scaling to the input data before training a machine learning model?

A: To reduce the number of features in the dataset.
B: To ensure that features with larger values do not dominate the learning process.
C: To encode categorical variables into numerical format.
D: To improve the model’s ability to handle missing values.

89. Which of the following is an advantage of decision trees in machine learning?

A: They require fewer data points than other algorithms.
B: They are highly interpretable and easy to visualize.
C: They are not prone to overfitting, even with large trees.
D: They require all features to be on the same scale.

90. Which of the following metrics would you use to evaluate a regression model?

A: Accuracy
B: Precision
C: Mean Squared Error (MSE)
D: F1 score

91. Which of the following machine learning algorithms is best suited for multiclass classification tasks?

A: Support Vector Machine (SVM)
B: Naive Bayes
C: Decision Tree
D: Linear Regression

92. Which of the following neural network architectures is best suited for image classification tasks?

A: Recurrent Neural Networks (RNN)
B: Convolutional Neural Networks (CNN)
C: Autoencoders
D: Support Vector Machines (SVM)