CDSP Exam Set 2 (New) Quiz

1. Which of the following machine learning models is best suited for a binary classification task with imbalanced data?





2. What is the primary difference between L1 and L2 regularization in machine learning?





3. Given the confusion matrix below, calculate the precision for the positive class:





4. Which of the following metrics is most appropriate to evaluate the performance of a model on imbalanced datasets?





5. Which of the following Python libraries provides a Gradient Boosting classifier implementation commonly used for machine learning tasks?





6. Which of the following techniques is typically used to handle categorical variables with many categories in machine learning?





7. You are working on a dataset with missing values. Which of the following methods would you use if the missing values are randomly distributed and only affect a small percentage of the data?





8. In principal component analysis (PCA), which of the following describes the first principal component?





9. You are preprocessing a dataset with highly skewed data. Which of the following transformations is most appropriate to normalize the distribution?





10. Which of the following algorithms is most suitable for a multiclass classification task where the number of classes is greater than two?





11. In k-means clustering, which of the following is true about the k parameter?





12. Which evaluation metric would you use for a regression model predicting house prices, and why?





13. Consider the ROC curve of two models (Model A and Model B). Based on the ROC curve illustration, which model performs better in distinguishing between the two classes?





14. What is the purpose of hyperparameter tuning in machine learning?





15. Which of the following is a key feature of ensemble methods like Random Forest or Gradient Boosting?





16. Which of the following optimization techniques is commonly used in neural networks to update weights and minimize the loss function?





17. Which Python library is primarily used for natural language processing (NLP) tasks, such as tokenization and stemming?





18. In deep learning, what is the role of an activation function in a neural network?





19. Which of the following is a common method to prevent overfitting in deep learning models?





20. Which of the following feature scaling techniques ensures that all features have a mean of 0 and a standard deviation of 1?





21. You are working with a dataset that contains highly correlated features. Which of the following methods is most appropriate for reducing the dimensionality of the dataset while retaining the most variance?





22. Which of the following is an example of data leakage in a machine learning pipeline?





23. Which technique is commonly used to handle class imbalance in a dataset where one class significantly outnumbers the other?





24. In time series analysis, which of the following is typically used to account for seasonality in the data?





25. Which of the following is a characteristic of stationary time series data?





26. Which of the following metrics is most commonly used to evaluate the performance of forecasting models in time series analysis?





27. You are working with daily sales data and want to make short-term predictions for the next 7 days. Which of the following models is most appropriate for this task?





28. In a time series forecasting problem, which of the following techniques is used to make the data stationary?





29. Which of the following deep learning architectures is best suited for image recognition tasks?





30. In deep learning, what does dropout refer to?





31. Which of the following is a key advantage of using transfer learning in deep learning?





32. In a Recurrent Neural Network (RNN), which type of problem is best suited for its architecture?





33. Which activation function is most commonly used in the hidden layers of a deep neural network to introduce non-linearity?





34. In deep learning, which of the following is a primary advantage of using batch normalization?





35. Which of the following describes the primary purpose of attention mechanisms in deep learning models?





36. Which of the following is a key component of Convolutional Neural Networks (CNNs) that allows them to detect local features in images?





37. Which of the following best describes feature selection in machine learning?





38. Which of the following is an appropriate method for handling categorical features with high cardinality in a machine learning model?





39. You are working with a dataset that contains missing values in multiple columns. Which method is typically used to impute missing values for numerical features based on their distribution?





40. Which of the following describes the goal of dimensionality reduction in machine learning?





41. In a machine learning workflow, which of the following dimensionality reduction techniques is based on finding the linear combinations of features that maximize variance?





42. Which of the following NLP tasks involves breaking down text into individual words or phrases?





43. Which of the following models is commonly used for word embeddings in NLP tasks, capturing semantic relationships between words?





44. In sentiment analysis, which of the following machine learning algorithms is most suitable for classifying text as positive or negative?





45. Which of the following is the purpose of Named Entity Recognition (NER) in NLP?





46. Which of the following metrics is used to evaluate a binary classification model when class distribution is imbalanced?





47. Which of the following metrics measures the proportion of true positives out of all actual positives in a classification problem?





48. Which of the following is the primary purpose of using a confusion matrix in a classification problem?





49. In a binary classification model, which metric would you use if false negatives are more costly than false positives?





50. Which of the following describes the F1 score in the context of model evaluation?





51. Which of the following metrics is best suited for evaluating the performance of a regression model?





52. Which of the following evaluation metrics would you prioritize when testing a binary classifier on an imbalanced dataset?





53. You are building a model to detect credit card fraud, where minimizing false negatives is a priority. Which of the following evaluation metrics should you prioritize?





54. You are tasked with engineering new features from a dataset. Which of the following is considered a derived feature in feature engineering?





55. Which of the following is an appropriate method for handling missing values in categorical data?





56. When performing feature scaling, which of the following is used to scale features to a range between 0 and 1?





57. In feature engineering, which of the following methods is commonly used to reduce the dimensionality of a dataset while retaining as much information as possible?





58. Which of the following is an effective method to handle an imbalanced dataset where the positive class is much smaller than the negative class?





59. Which of the following machine learning algorithms is most commonly used for clustering tasks?





60. Which of the following unsupervised learning techniques can be used to reduce the number of features in a dataset while retaining important information?





61. Which of the following is a characteristic of unsupervised learning?





62. Which of the following metrics is commonly used to evaluate the performance of a clustering algorithm?





63. Which of the following describes the purpose of hierarchical clustering?





64. Which of the following is the main challenge of unsupervised learning compared to supervised learning?





65. Which of the following methods is used in k-means clustering to assign data points to clusters?





66. Which of the following techniques is commonly used to determine the optimal number of clusters in k-means clustering?





67. Which of the following best describes dimensionality reduction techniques such as t-SNE?





68. Which of the following is a common strategy to address the vanishing gradient problem in deep neural networks?





69. Which of the following techniques is used to prevent overfitting in deep neural networks?





70. Which of the following describes the purpose of batch gradient descent in training neural networks?





71. Which of the following neural network architectures is most effective for sequential data like time series or natural language processing?





72. Which of the following is an appropriate method to handle outliers in a dataset?





73. Which of the following methods is used to impute missing values for numerical features in a dataset with a skewed distribution?





74. You are working with a large dataset with a high number of features. Which of the following techniques would help in reducing the number of features without significant loss of information?





75. Which of the following techniques is used to ensure that all features in a dataset are on the same scale?





76. Which of the following is a common practice to handle categorical features with many unique categories in a dataset?





77. Which of the following methods is commonly used for hyperparameter tuning in machine learning models?





78. Which of the following best describes cross-validation in model evaluation?





79. In the context of model evaluation, which of the following best describes k-fold cross-validation?





80. In a time series analysis, which of the following techniques is used to remove seasonal components from the data?





81. Which of the following models is commonly used for time series forecasting that captures both the trend and seasonality in the data?





82. Which of the following algorithms is commonly used for binary classification tasks?





83. Which of the following techniques can be used to handle class imbalance in a dataset for a classification task?





84. Which of the following is a key benefit of using ensemble methods like Random Forests?





85. Which of the following is a characteristic of the bagging technique in ensemble learning?





86. Which of the following is an advantage of using Gradient Boosting over bagging methods?





87. Which of the following techniques is used to improve the performance of neural networks by scaling the input data in each layer to have zero mean and unit variance?





88. Which of the following is a reason for applying feature scaling to the input data before training a machine learning model?





89. Which of the following is an advantage of decision trees in machine learning?





90. Which of the following metrics would you use to evaluate a regression model?





91. Which of the following machine learning algorithms is best suited for multiclass classification tasks?





92. Which of the following neural network architectures is best suited for image classification tasks?