Mock Exam Set 1 ADS Quiz

1. Scikit-learn pipelines are used *





2. Stratified splitting is used to *





3. Which of the following is FALSE *





4. Which of the following are the two ways a machine learning platform provides predictions? *





5. Shown below is an example of a data frame a data scientist can use to build a machine learning model *





6. Which of the following is the BEST definition of an outlier? It must be *





7. Which of the following models has the highest tendency to overfit by the nature of its design? *





8. Which of the following is TRUE about ridge regression? *





9. Which of the following are sampling techniques that can be used to resample imbalanced datasets? Choose all that apply *





10. Which of the following is a model that is insensitive to feature scaling? *





11. Which of the following is TRUE about the ROC curve? *





12. Which of the following is NOT considered a preprocessing task? *





13. Which of the following pairs of metrics is resistant to outliers? *





14. Which of the following is a method used for reporting a pattern underlying timeseries data? *





15. Which of the following is the difference between "union" and "union all"? *





16. Which of the following are libraries in Python to create visualizations? *





17. R-squared is used for evaluating *





18. Which of the following is TRUE of forecasting sales for a retail business? *





19. Which of the following functions from the pandas module can read an Excel file (.xlsx) in Python? *





20. Which of the following formats cannot be read directly into data frames using pandas methods? *





21. Which of the following is TRUE about TF-IDF? *





22. Which of the following pairs of techniques helps to explain a model? *





23. Which of the following statements is TRUE about feature scaling? *





24. A data scientist has just implemented a churn model that predicts customers with a high probability of leaving based on their transaction data in thelast six months. The company takes effective action with the test group. What outcome would demonstrate t





25. Which of the following metrics are used to evaluate regression models? *





26. Which of the following helps with avoiding overfitting? *





27. When identifying outliers, which of the following is the generally accepted number of standard deviations away from the mean at which a data scientistshould remove them? *





28. Which of the following is a method that can be applied on a dataset involving 500 variables to avoid the curse of dimensionality? *





29. Which of the following plots describes relationships among several variables simultaneously? *





30. Which of the following is an advantage of gathering more training examples? *





31. Which of the following is a platform or technology that can be used to deploy ML models? *





32. If the following numbers represent the p-value, which of them is significant at 95%? *





33. What is the difference between boosting and bagging? *





34. A model in production works by crawling the web and downloading HTML content. Which of the following is a disadvantage of the model? *





35. Which of the following is a term that can be used to represent missing data of type object in Python? *





36. Which of the following statements is FALSE about gradient descent? *





37. Which of the following is a classification algorithm? *





38. Which of the following are the techniques or values that can be used to impute missing data in a predictor variable? *





39. Which of the following functions from scikit-learn is a feature scaling method? *





40. Which of the models shown is the best performing model? *





41. Which of the following data visualizations is useful for identifying outliers in the data? *





42. Which of the following are good practices to manage stakeholder expectations when working on a new data science project? *





43. What do residuals represent in the simple linear regression model? *





44. Which of the following are popular metrics that can be used to measure user engagement? *





45. Which of the following is an algorithm that can handle values without any transformation? *





46. Which of the following is a characteristic of K-Means clustering? *





47. Which of the following methods can be used to tune the hyperparameters of a machine learning model? *





48. Which of the following is the purpose of feature selection in machine learning? *





49. Which of the following is a common metric for evaluating clustering algorithms? *





50. Which of the following correctly describes principal component analysis (PCA)? *





51. Which of the following is NOT an advantage of using Random Forest models? *





52. Which of the following can reduce model overfitting? *





53. Which of the following is a tree-based ensemble algorithm? *





54. Which of the following is NOT true about cross-validation? *





55. Which of the following statements about SMOTE is FALSE? *





56. Which of the following is a common feature selection method? *





57. Which of the following is a measure of central tendency? *





58. Which of the following data types is commonly used for categorical features in Python? *





59. Which of the following statements is TRUE about one-hot encoding? *





60. Which of the following methods is commonly used to reduce the number of features? *





61. Which of the following metrics is suitable for evaluating a regression model? *





62. Which of the following techniques helps in reducing multicollinearity among features? *





63. Which of the following is commonly used to avoid overfitting in a decision tree model? *





64. Which of the following methods is used to balance an imbalanced dataset? *





65. Which of the following is a characteristic of a left-skewed distribution? *





66. Which of the following describes logistic regression? *





67. Which of the following is commonly used for feature scaling in machine learning? *





68. Which of the following techniques helps in reducing model variance? *





69. Which of the following methods can be used for clustering? *





70. Which of the following can be used to handle class imbalance in data? *





71. Which of the following is an ensemble learning technique? *





72. Which of the following is used to measure model accuracy for classification tasks? *





73. Which of the following describes K-fold cross-validation? *





74. Which of the following models is typically used for time series forecasting? *





75. Which of the following metrics evaluates the goodness of fit for a regression model? *





76. Which of the following models would be best suited to predict if a person has a disease (yes/no)? *





77. Which of the following describes the process of normalization? *





78. Which of the following is a purpose of one-hot encoding? *





79. Which of the following is NOT a commonly used metric for classification models? *





80. Which of the following best describes the purpose of regularization in machine learning? *





81. Which of the following is a method of evaluating the accuracy of a binary classifier? *





82. Which of the following algorithms is typically used for unsupervised learning tasks? *





83. Which of the following methods is used to handle missing data in a dataset? *





84. Which of the following statements about clustering is TRUE? *





85. Which of the following is an example of a supervised learning task? *





86. Which of the following describes the purpose of feature engineering? *





87. Which of the following is a common metric for evaluating classification models? *





88. Which of the following is commonly used to reduce the feature set in machine learning? *





89. Which of the following is a benefit of using a confusion matrix? *





90. Which of the following is an unsupervised learning method? *





91. Which of the following is a type of dimensionality reduction technique? *





92. Which of the following helps to identify multicollinearity in data? *





93. Which of the following is the role of a loss function in machine learning? *





94. Which of the following describes the concept of ensemble learning? *





95. Which of the following algorithms is commonly used for classification tasks? *





96. Which of the following statements is TRUE about overfitting? *





97. Which of the following metrics is commonly used to evaluate regression models? *





98. Which of the following is used for handling imbalanced datasets? *





99. Which of the following describes the purpose of cross-validation? *





100. Which of the following algorithms is a type of supervised learning? *





101. Which of the following describes a benefit of using batch gradient descent? *





102. Which of the following is a characteristic of decision trees? *





103. Which of the following describes the purpose of a test dataset? *





104. Which of the following is a commonly used distance metric in K-means clustering? *





105. Which of the following statements about regularization is FALSE? *





106. Which of the following describes the purpose of an ROC curve? *





107. Which of the following is a limitation of using K-means clustering? *





108. Which of the following methods helps reduce the dimensionality of data? *





109. Which of the following is a common technique for feature scaling? *





110. Which of the following is a common loss function for binary classification? *