1. Scikit-learn pipelines are used *
2. Stratified splitting is used to *
3. Which of the following is FALSE *
4. Which of the following are the two ways a machine learning platform provides predictions? *
5. Shown below is an example of a data frame a data scientist can use to build a machine learning model *
6. Which of the following is the BEST definition of an outlier? It must be *
7. Which of the following models has the highest tendency to overfit by the nature of its design? *
8. Which of the following is TRUE about ridge regression? *
9. Which of the following are sampling techniques that can be used to resample imbalanced datasets? Choose all that apply *
10. Which of the following is a model that is insensitive to feature scaling? *
11. Which of the following is TRUE about the ROC curve? *
12. Which of the following is NOT considered a preprocessing task? *
13. Which of the following pairs of metrics is resistant to outliers? *
14. Which of the following is a method used for reporting a pattern underlying timeseries data? *
15. Which of the following is the difference between "union" and "union all"? *
16. Which of the following are libraries in Python to create visualizations? *
17. R-squared is used for evaluating *
18. Which of the following is TRUE of forecasting sales for a retail business? *
19. Which of the following functions from the pandas module can read an Excel file (.xlsx) in Python? *
20. Which of the following formats cannot be read directly into data frames using pandas methods? *
21. Which of the following is TRUE about TF-IDF? *
22. Which of the following pairs of techniques helps to explain a model? *
23. Which of the following statements is TRUE about feature scaling? *
24. A data scientist has just implemented a churn model that predicts customers with a high probability of leaving based on their transaction data in thelast six months. The company takes effective action with the test group. What outcome would demonstrate t
25. Which of the following metrics are used to evaluate regression models? *
26. Which of the following helps with avoiding overfitting? *
27. When identifying outliers, which of the following is the generally accepted number of standard deviations away from the mean at which a data scientistshould remove them? *
28. Which of the following is a method that can be applied on a dataset involving 500 variables to avoid the curse of dimensionality? *
29. Which of the following plots describes relationships among several variables simultaneously? *
30. Which of the following is an advantage of gathering more training examples? *
31. Which of the following is a platform or technology that can be used to deploy ML models? *
32. If the following numbers represent the p-value, which of them is significant at 95%? *
33. What is the difference between boosting and bagging? *
34. A model in production works by crawling the web and downloading HTML content. Which of the following is a disadvantage of the model? *
35. Which of the following is a term that can be used to represent missing data of type object in Python? *
36. Which of the following statements is FALSE about gradient descent? *
37. Which of the following is a classification algorithm? *
38. Which of the following are the techniques or values that can be used to impute missing data in a predictor variable? *
39. Which of the following functions from scikit-learn is a feature scaling method? *
40. Which of the models shown is the best performing model? *
41. Which of the following data visualizations is useful for identifying outliers in the data? *
42. Which of the following are good practices to manage stakeholder expectations when working on a new data science project? *
43. What do residuals represent in the simple linear regression model? *
44. Which of the following are popular metrics that can be used to measure user engagement? *
45. Which of the following is an algorithm that can handle values without any transformation? *
46. Which of the following is a characteristic of K-Means clustering? *
47. Which of the following methods can be used to tune the hyperparameters of a machine learning model? *
48. Which of the following is the purpose of feature selection in machine learning? *
49. Which of the following is a common metric for evaluating clustering algorithms? *
50. Which of the following correctly describes principal component analysis (PCA)? *
51. Which of the following is NOT an advantage of using Random Forest models? *
52. Which of the following can reduce model overfitting? *
53. Which of the following is a tree-based ensemble algorithm? *
54. Which of the following is NOT true about cross-validation? *
55. Which of the following statements about SMOTE is FALSE? *
56. Which of the following is a common feature selection method? *
57. Which of the following is a measure of central tendency? *
58. Which of the following data types is commonly used for categorical features in Python? *
59. Which of the following statements is TRUE about one-hot encoding? *
60. Which of the following methods is commonly used to reduce the number of features? *
61. Which of the following metrics is suitable for evaluating a regression model? *
62. Which of the following techniques helps in reducing multicollinearity among features? *
63. Which of the following is commonly used to avoid overfitting in a decision tree model? *
64. Which of the following methods is used to balance an imbalanced dataset? *
65. Which of the following is a characteristic of a left-skewed distribution? *
66. Which of the following describes logistic regression? *
67. Which of the following is commonly used for feature scaling in machine learning? *
68. Which of the following techniques helps in reducing model variance? *
69. Which of the following methods can be used for clustering? *
70. Which of the following can be used to handle class imbalance in data? *
71. Which of the following is an ensemble learning technique? *
72. Which of the following is used to measure model accuracy for classification tasks? *
73. Which of the following describes K-fold cross-validation? *
74. Which of the following models is typically used for time series forecasting? *
75. Which of the following metrics evaluates the goodness of fit for a regression model? *
76. Which of the following models would be best suited to predict if a person has a disease (yes/no)? *
77. Which of the following describes the process of normalization? *
78. Which of the following is a purpose of one-hot encoding? *
79. Which of the following is NOT a commonly used metric for classification models? *
80. Which of the following best describes the purpose of regularization in machine learning? *
81. Which of the following is a method of evaluating the accuracy of a binary classifier? *
82. Which of the following algorithms is typically used for unsupervised learning tasks? *
83. Which of the following methods is used to handle missing data in a dataset? *
84. Which of the following statements about clustering is TRUE? *
85. Which of the following is an example of a supervised learning task? *
86. Which of the following describes the purpose of feature engineering? *
87. Which of the following is a common metric for evaluating classification models? *
88. Which of the following is commonly used to reduce the feature set in machine learning? *
89. Which of the following is a benefit of using a confusion matrix? *
90. Which of the following is an unsupervised learning method? *
91. Which of the following is a type of dimensionality reduction technique? *
92. Which of the following helps to identify multicollinearity in data? *
93. Which of the following is the role of a loss function in machine learning? *
94. Which of the following describes the concept of ensemble learning? *
95. Which of the following algorithms is commonly used for classification tasks? *
96. Which of the following statements is TRUE about overfitting? *
97. Which of the following metrics is commonly used to evaluate regression models? *
98. Which of the following is used for handling imbalanced datasets? *
99. Which of the following describes the purpose of cross-validation? *
100. Which of the following algorithms is a type of supervised learning? *
101. Which of the following describes a benefit of using batch gradient descent? *
102. Which of the following is a characteristic of decision trees? *
103. Which of the following describes the purpose of a test dataset? *
104. Which of the following is a commonly used distance metric in K-means clustering? *
105. Which of the following statements about regularization is FALSE? *
106. Which of the following describes the purpose of an ROC curve? *
107. Which of the following is a limitation of using K-means clustering? *
108. Which of the following methods helps reduce the dimensionality of data? *
109. Which of the following is a common technique for feature scaling? *
110. Which of the following is a common loss function for binary classification? *