Mock Exam Set 1 ADS Quiz

1. Scikit-learn pipelines are used *

A: for ETL
B: for dimensionality reduction
C: to run transformers and estimators in one function
D: for online predictions

2. Stratified splitting is used to *

A: shuffle the dataset
B: have the same proportion of classes in training and test data
C: balance an imbalanced dataset
D: have different proportions of classes in training and test data

3. Which of the following is FALSE *

A: The range for RMSE is -100 to 100
B: RMSE does not always have a set range
C: R-squared has a range between 0 and 1
D: R-squared cannot determine whether the coefficient estimates are biased

4. Which of the following are the two ways a machine learning platform provides predictions? *

A: continuous predictions and targeted predictions
B: targeted predictions and batch predictions
C: batch predictions and online predictions
D: online predictions and probability predictions

5. Shown below is an example of a data frame a data scientist can use to build a machine learning model *

A: two objects, one timedelta[ns], one float64, and one bool
B: three objects, one datetime64, and one float64
C: two objects, one datetime64, one float64, and one bool
D: one object, one datetime64, one float64 and two bool

6. Which of the following is the BEST definition of an outlier? It must be *

A: more than 1.5 times IQR below Q1 or more than 1.5 times IQR above Q3
B: at least 1.5 standard deviations from the mean
C: a value in the upper quartile (Q3)
D: greater than one interquartile range (IQR)

7. Which of the following models has the highest tendency to overfit by the nature of its design? *

A: Logistic regression
B: Soft margin support vector machine
C: Random forest
D: Hard margin support vector machine

8. Which of the following is TRUE about ridge regression? *

A: Ridge regression is used for feature selection, compensating for overfitting, and smoothing
B: In ridge regression, as the regularization parameter increases, regression coefficients decrease
C: Ridge regression is used to obtain the subset of predictors that minimizes prediction error for a quantitative response variable
D: Ridge regression is used to obtain the subset of predictors that maximizes prediction error for a quantitative response variable

9. Which of the following are sampling techniques that can be used to resample imbalanced datasets? Choose all that apply *

A: Cross validation, regression, Random oversampling
B: Regression, Random oversampling, Random undersampling
C: Random oversampling, Random undersampling, SMOTE
D: SMOTE, cross validation, regression

10. Which of the following is a model that is insensitive to feature scaling? *

A: Random forest
B: Support-vector machines
C: KNN
D: Linear regression

11. Which of the following is TRUE about the ROC curve? *

A: Its x-axis represents the false positive rate (FPR) and its y-axis represents the true positive rate (TPR)
B: Its x-axis represents the false positive (FN) and its y-axis represents the true positive (TP)
C: It is the area below the AUC curve
D: It is used to visualize the performance of the binary classifier

12. Which of the following is NOT considered a preprocessing task? *

A: Tokenization
B: Removing stop words
C: Sentiment analysis
D: Stemming and lemmatization

13. Which of the following pairs of metrics is resistant to outliers? *

A: Median and interquartile range
B: Mean and interquartile range
C: Median and standard deviation
D: Mean and standard deviation

14. Which of the following is a method used for reporting a pattern underlying timeseries data? *

A: Weighted arithmetic mean
B: Correlation
C: Seasonality
D: F1 score

15. Which of the following is the difference between "union" and "union all"? *

A: They are both for concatenating two or more queries, but "union" eliminates the duplications
B: They are both for concatenating two or more queries, but "union all" eliminates the duplications
C: While "union all" joins the tables based on selected columns, "union" joins the tables based on all the columns
D: While "union" joins the tables based on selected columns, "union all" joins the tables based on all the columns

16. Which of the following are libraries in Python to create visualizations? *

A: ggplot2, matplotlib, and seaborn
B: ggplot2, matplotlib, and pandas
C: pandas, numpy, and seaborn
D: ggplot2, matplotlib, and numpy

17. R-squared is used for evaluating *

A: regression models
B: clustering models
C: correlation between features
D: classification models

18. Which of the following is TRUE of forecasting sales for a retail business? *

A: Historical data does not affect the prediction results
B: It can be solved using classification algorithms
C: Sales forecasting is a static problem
D: It can be solved using time-series approaches

19. Which of the following functions from the pandas module can read an Excel file (.xlsx) in Python? *

A: read_excel()
B: to_csv()
C: to_excel()
D: read_csv()

20. Which of the following formats cannot be read directly into data frames using pandas methods? *

A: json string
B: parquet file
C: xml file
D: pickle object

21. Which of the following is TRUE about TF-IDF? *

A: It assigns a higher score to stop words
B: It is inversely related to term frequency
C: It is inversely related to document frequency
D: It is directly related to document frequency

22. Which of the following pairs of techniques helps to explain a model? *

A: Generating large training data and having balanced datasets
B: Applying dimensionality reduction and reducing the complexity of the model
C: Understanding local interpretability and global interpretability
D: Identifying local minima and local maxima of a function

23. Which of the following statements is TRUE about feature scaling? *

A: Tree-based algorithms are the most sensitive to the scale of the features
B: All machine-learning algorithms are sensitive to the scale of the features
C: Tree-based algorithms are the least sensitive to the scale of the features
D: Distance-based algorithms are the least sensitive to the scale of the features

24. A data scientist has just implemented a churn model that predicts customers with a high probability of leaving based on their transaction data in thelast six months. The company takes effective action with the test group. What outcome would demonstrate t

A: It is not possible to measure the performance of the model
B: The control group has a significantly lower average churn rate over the next thirty days than the test group
C: The test group does not have a statistically significant average churn rate over the next thirty days than the control group
D: The test group has a significantly lower average churn rate over the next thirty days than the control group

25. Which of the following metrics are used to evaluate regression models? *

A: F1 score, Accuracy, and R-squared score
B: Accuracy, R-squared score, and Median absolute error
C: R-squared score, Median absolute error, and Root-mean-square error
D: F1 score, Median absolute error, and Root-mean-squared error

26. Which of the following helps with avoiding overfitting? *

A: Use more features in the training data, Use cross-validation when training the data, and Use the model with the lowest training error
B: Use more features in the training data, Use cross-validation when training the data, and Use a linear model so it generalizes well
C: Use more features in the training data, Use the best hyperparameter value that produces a model with the lowest generalization error, and Use the model with the lowest training error
D: Use more features in the training data, Use the best hyperparameter value that produces a model with the lowest generalization error, and Use a linear model so it generalizes well

27. When identifying outliers, which of the following is the generally accepted number of standard deviations away from the mean at which a data scientistshould remove them? *

A: 3
B: 6
C: 1
D: 0.5

28. Which of the following is a method that can be applied on a dataset involving 500 variables to avoid the curse of dimensionality? *

A: Stochastic gradient descent
B: Segmentation
C: Principal component analysis
D: Batch normalization

29. Which of the following plots describes relationships among several variables simultaneously? *

A: Violin plot
B: Pair plot
C: Scatter plot
D: Box plot

30. Which of the following is an advantage of gathering more training examples? *

A: It helps in generalizing the model
B: It saves on data storage costs
C: It helps identify common use cases
D: It takes less time to train the model

31. Which of the following is a platform or technology that can be used to deploy ML models? *

A: SageMaker
B: MongoDB
C: Jupyter notebooks
D: Django

32. If the following numbers represent the p-value, which of them is significant at 95%? *

A: 0.90
B: 0.98
C: 0.01
D: 0.06

33. What is the difference between boosting and bagging? *

A: Bagging and boosting are exactly the same sampling methods
B: Bagging is random sampling without replacement, whereas boosting is oversampling the data
C: Bagging is random sampling with replacement, whereas boosting is sampling with replacement of weighted data
D: Bagging is parallel and boosting is serial

34. A model in production works by crawling the web and downloading HTML content. Which of the following is a disadvantage of the model? *

A: HTML can be easily converted into JSON objects
B: The data is structured
C: If the structure of HTML content changes, the code would need to change as well
D: It requires advanced knowledge of HTML

35. Which of the following is a term that can be used to represent missing data of type object in Python? *

A: Any
B: All
C: None
D: Null

36. Which of the following statements is FALSE about gradient descent? *

A: It works by iteratively moving in the direction of steepest descent as defined by the negative of the gradient
B: It converges slower when features are on similar scales
C: It is used to find the local minimum of a differentiable function
D: It updates the parameters of the model

37. Which of the following is a classification algorithm? *

A: Hierarchical clustering
B: ARIMA
C: Linear regression
D: Logistic regression

38. Which of the following are the techniques or values that can be used to impute missing data in a predictor variable? *

A: Support vectors and clustering
B: Clustering and Mode
C: Mode and Mean
D: Mean and Neural Networks

39. Which of the following functions from scikit-learn is a feature scaling method? *

A: add_dummy_feature()
B: LabelEncoder()
C: MinMaxScaler()
D: RangeScaler()

40. Which of the models shown is the best performing model? *

A: XGBoost
B: Random Forest
C: Logistic Regression
D: Naive Bayes

41. Which of the following data visualizations is useful for identifying outliers in the data? *

A: Box plot
B: Pie chart
C: Heatmap
D: Bar chart

42. Which of the following are good practices to manage stakeholder expectations when working on a new data science project? *

A: Agree on all the deliverables before looking at the data
B: Assess the quality and completeness of the data
C: Clearly communicate it is not possible to proceed with the project without complete data
D: It updates the parameters of the model

43. What do residuals represent in the simple linear regression model? *

A: The difference between the actual Y values and the predicted Y values
B: The predicted value of Y for the average X value
C: The difference between the actual Y values and the mean of Y
D: The square root of the slope

44. Which of the following are popular metrics that can be used to measure user engagement? *

A: Monthly active users and Churn rate
B: Cost per acquisition and Net promoter score
C: Net promoter score and Returning users
D: Returning users and Churn rate

45. Which of the following is an algorithm that can handle values without any transformation? *

A: XGBoost
B: K-means clustering
C: Decision tree
D: Linear regression

46. Which of the following is a characteristic of K-Means clustering? *

A: It groups data into clusters based on supervised learning
B: It finds clusters with varying sizes and shapes
C: It minimizes the sum of squared distances between points and the cluster centroid
D: It is primarily used for classification tasks

47. Which of the following methods can be used to tune the hyperparameters of a machine learning model? *

A: Cross-validation
B: Grid search
C: Regularization
D: Feature engineering

48. Which of the following is the purpose of feature selection in machine learning? *

A: To add more features to the dataset
B: To improve model interpretability and performance
C: To increase model complexity
D: To make the dataset larger

49. Which of the following is a common metric for evaluating clustering algorithms? *

A: Mean squared error (MSE)
B: Confusion matrix
C: Silhouette score
D: R-squared

50. Which of the following correctly describes principal component analysis (PCA)? *

A: PCA reduces dimensionality by increasing the number of features
B: PCA retains the most important features by removing redundancy
C: PCA increases interpretability by adding new variables
D: PCA is primarily used to create new target variables

51. Which of the following is NOT an advantage of using Random Forest models? *

A: High accuracy
B: Handles missing values well
C: Reduces overfitting
D: Highly interpretable

52. Which of the following can reduce model overfitting? *

A: Using a smaller training dataset
B: Increasing model complexity
C: Adding regularization
D: Reducing the amount of training data

53. Which of the following is a tree-based ensemble algorithm? *

A: K-Nearest Neighbors
B: Linear regression
C: Random forest
D: Logistic regression

54. Which of the following is NOT true about cross-validation? *

A: It helps to reduce overfitting
B: It provides an unbiased estimate of model performance
C: It is only used for model evaluation
D: It splits data into training and validation sets multiple times

55. Which of the following statements about SMOTE is FALSE? *

A: It is a technique for handling imbalanced data
B: It oversamples the minority class by creating synthetic samples
C: It reduces the number of instances in the majority class
D: It improves model performance in imbalanced datasets

56. Which of the following is a common feature selection method? *

A: K-Means clustering
B: Recursive Feature Elimination
C: Hyperparameter tuning
D: Feature engineering

57. Which of the following is a measure of central tendency? *

A: Variance
B: Standard deviation
C: Median
D: Range

58. Which of the following data types is commonly used for categorical features in Python? *

A: Int
B: Float
C: Boolean
D: Object

59. Which of the following statements is TRUE about one-hot encoding? *

A: It can be used on numerical data
B: It increases dimensionality
C: It combines multiple categories into one feature
D: It reduces the number of features

60. Which of the following methods is commonly used to reduce the number of features? *

A: SMOTE
B: One-hot encoding
C: Dimensionality reduction
D: Synthetic data generation

61. Which of the following metrics is suitable for evaluating a regression model? *

A: Accuracy
B: F1 Score
C: Mean Absolute Error (MAE)
D: Recall

62. Which of the following techniques helps in reducing multicollinearity among features? *

A: Adding more features
B: Feature selection
C: Increasing the sample size
D: Cross-validation

63. Which of the following is commonly used to avoid overfitting in a decision tree model? *

A: Pruning
B: Increasing tree depth
C: Adding more leaves
D: Increasing data points

64. Which of the following methods is used to balance an imbalanced dataset? *

A: Cross-validation
B: Grid search
C: Random undersampling
D: Hyperparameter tuning

65. Which of the following is a characteristic of a left-skewed distribution? *

A: The mean is greater than the median
B: The mean is less than the median
C: The mean and median are equal
D: There is no skewness

66. Which of the following describes logistic regression? *

A: It is used to predict continuous values
B: It provides a binary classification
C: It is an unsupervised learning algorithm
D: It minimizes the mean squared error

67. Which of the following is commonly used for feature scaling in machine learning? *

A: Box plot
B: StandardScaler
C: One-hot encoding
D: Dummy variables

68. Which of the following techniques helps in reducing model variance? *

A: Using a smaller dataset
B: Bagging
C: Increasing model complexity
D: Removing regularization

69. Which of the following methods can be used for clustering? *

A: K-means
B: Linear regression
C: Decision trees
D: Logistic regression

70. Which of the following can be used to handle class imbalance in data? *

A: Resampling
B: Dimensionality reduction
C: Cross-validation
D: Feature selection

71. Which of the following is an ensemble learning technique? *

A: K-means
B: Linear regression
C: Decision trees
D: Random forest

72. Which of the following is used to measure model accuracy for classification tasks? *

A: Root Mean Square Error
B: F1 Score
C: Mean Absolute Error
D: R-squared

73. Which of the following describes K-fold cross-validation? *

A: The dataset is split into two parts
B: It divides the dataset into K equally sized parts for training and validation
C: It repeats the training with all available data
D: It evaluates the model on the same data used for training

74. Which of the following models is typically used for time series forecasting? *

A: Logistic regression
B: K-means clustering
C: Decision tree
D: ARIMA

75. Which of the following metrics evaluates the goodness of fit for a regression model? *

A: Accuracy
B: F1 Score
C: R-squared
D: Recall

76. Which of the following models would be best suited to predict if a person has a disease (yes/no)? *

A: Linear regression
B: Logistic regression
C: Decision tree regression
D: K-means clustering

77. Which of the following describes the process of normalization? *

A: Scaling the data to have zero mean and unit variance
B: Scaling the data to fall within a specified range, such as [0,1]
C: Reducing the number of data points
D: Increasing the number of features

78. Which of the following is a purpose of one-hot encoding? *

A: To scale numerical data
B: To transform categorical data into binary vectors
C: To reduce the number of features
D: To handle missing values

79. Which of the following is NOT a commonly used metric for classification models? *

A: Precision
B: Recall
C: Mean Squared Error (MSE)
D: F1 Score

80. Which of the following best describes the purpose of regularization in machine learning? *

A: To improve the interpretability of the model
B: To increase the model’s accuracy on the training set
C: To reduce overfitting by penalizing large coefficients
D: To decrease the amount of data needed

81. Which of the following is a method of evaluating the accuracy of a binary classifier? *

A: Mean Absolute Error
B: Area Under the ROC Curve (AUC)
C: R-squared
D: Silhouette score

82. Which of the following algorithms is typically used for unsupervised learning tasks? *

A: K-means clustering
B: Logistic regression
C: Decision tree
D: Random forest

83. Which of the following methods is used to handle missing data in a dataset? *

A: Normalization
B: Imputation
C: Feature scaling
D: Cross-validation

84. Which of the following statements about clustering is TRUE? *

A: Clustering requires labeled data
B: Clustering aims to predict future values
C: Clustering groups similar data points together
D: Clustering assigns data points into pre-defined groups

85. Which of the following is an example of a supervised learning task? *

A: Clustering customer segments
B: Identifying fraudulent transactions
C: Finding the optimal number of clusters
D: Reducing the dimensions of a dataset

86. Which of the following describes the purpose of feature engineering? *

A: To create new features to improve model performance
B: To remove outliers
C: To standardize the data
D: To decrease model accuracy

87. Which of the following is a common metric for evaluating classification models? *

A: Mean Absolute Error
B: R-squared
C: Confusion matrix
D: Silhouette score

88. Which of the following is commonly used to reduce the feature set in machine learning? *

A: Normalization
B: Feature selection
C: Cross-validation
D: Label encoding

89. Which of the following is a benefit of using a confusion matrix? *

A: It gives a single accuracy score for a model
B: It visualizes both true and false predictions
C: It shows the total number of correct predictions only
D: It helps identify clusters in data

90. Which of the following is an unsupervised learning method? *

A: K-means clustering
B: Logistic regression
C: Linear regression
D: Decision trees

91. Which of the following is a type of dimensionality reduction technique? *

A: Feature selection
B: Regularization
C: PCA (Principal Component Analysis)
D: Cross-validation

92. Which of the following helps to identify multicollinearity in data? *

A: R-squared
B: Confusion matrix
C: Variance Inflation Factor (VIF)
D: Silhouette score

93. Which of the following is the role of a loss function in machine learning? *

A: To decrease the number of features
B: To calculate error between predictions and actual values
C: To remove irrelevant data
D: To improve model interpretability

94. Which of the following describes the concept of ensemble learning? *

A: Using a single strong learner for predictions
B: Combining multiple weak learners to improve accuracy
C: Training a model on more data
D: Using fewer features for prediction

95. Which of the following algorithms is commonly used for classification tasks? *

A: Linear regression
B: K-means clustering
C: Logistic regression
D: PCA

96. Which of the following statements is TRUE about overfitting? *

A: Overfitting occurs when a model performs well on training data but poorly on unseen data
B: Overfitting leads to better performance on test data
C: Overfitting results in a model that generalizes well to new data
D: Overfitting only happens with small datasets

97. Which of the following metrics is commonly used to evaluate regression models? *

A: Accuracy
B: Mean Squared Error (MSE)
C: F1 Score
D: Confusion matrix

98. Which of the following is used for handling imbalanced datasets? *

A: Dimensionality reduction
B: SMOTE
C: One-hot encoding
D: Feature scaling

99. Which of the following describes the purpose of cross-validation? *

A: To test model performance on different subsets of data
B: To combine multiple models for improved accuracy
C: To reduce the number of features
D: To normalize the data

100. Which of the following algorithms is a type of supervised learning? *

A: K-means clustering
B: Decision trees
C: PCA
D: Spectral clustering

101. Which of the following describes a benefit of using batch gradient descent? *

A: Faster updates with each data point
B: Convergence with smaller memory usage
C: Requires fewer data points for convergence
D: Consistent updates across all data

102. Which of the following is a characteristic of decision trees? *

A: They are insensitive to feature scaling
B: They require data to be normalized
C: They cannot handle categorical features
D: They are less interpretable than neural networks

103. Which of the following describes the purpose of a test dataset? *

A: To train the model parameters
B: To validate hyperparameters
C: To evaluate the model’s generalization on unseen data
D: To increase model complexity

104. Which of the following is a commonly used distance metric in K-means clustering? *

A: Hamming distance
B: Manhattan distance
C: Euclidean distance
D: Jaccard distance

105. Which of the following statements about regularization is FALSE? *

A: Regularization reduces model complexity
B: Regularization penalizes large coefficients
C: Regularization increases model interpretability
D: Regularization increases training accuracy

106. Which of the following describes the purpose of an ROC curve? *

A: To compare actual values against predicted values
B: To show the trade-off between sensitivity and specificity
C: To evaluate the residuals in regression
D: To show the accuracy of a clustering model

107. Which of the following is a limitation of using K-means clustering? *

A: It works only on categorical data
B: It assumes spherical clusters of similar size
C: It doesn’t require scaling
D: It’s suitable for high-dimensional data

108. Which of the following methods helps reduce the dimensionality of data? *

A: One-hot encoding
B: Cross-validation
C: Principal Component Analysis (PCA)
D: Hyperparameter tuning

109. Which of the following is a common technique for feature scaling? *

A: Imputation
B: Standardization
C: Clustering
D: Dimensionality reduction

110. Which of the following is a common loss function for binary classification? *

A: Mean Squared Error
B: Log Loss (Cross-Entropy)
C: Hinge Loss
D: R-squared