Download ML MCQs Unit I-III

ML Unit I 1. What is Machine learning? a) The autonomous acquisition of knowledge through the use of computer programs b) The autonomous acquisition of knowledge through the use of manual programs c) The selective acquisition of knowledge through the use of computer programs d) The selective acquisition of knowledge through the use of manual programs 2. Choose the options that are correct regarding machine learning (ML) and artificial intelligence (AI): (A) ML is an alternate way of programming intelligent machines. (B) ML and AI have very different goals. (C) ML is a set of techniques that turns a dataset into a software. (D) AI is a software that can emulate the human mind. 3. Which of the following sentence is FALSE regarding regression? (A) It relates inputs to outputs. (B) It is used for prediction. (C) It may be used for interpretation. (D) It discovers causal relationships. 4. Choose correct options- What does a simple linear regression analysis examine? a. The relationship between only two variables b. The relationship between one dependent and one independent variable c. The relationship between many variables d. The relationship between two dependent and one independent variable 5. Which of the following is not an assumption for simple linear regression? a. b. c. d. e. Normally distributed variables Multicollinearity Linear relationship Constant variance Normally distributed residuals 6. The simple linear regression equation can be wri en as yˆ = b0 +b1x. The symbol yˆ represents the A. average or predicted response B. estimated intercept C. estimated slope D. explanatory variable 7. The simple linear regression equa on can be wri en as yˆ = b0 +b1x. The term b0 represents the A. estimated or predicted response B. estimated intercept C. estimated slope D. explanatory variable 8. In the simple linear regression equation wri en as yˆ = b0 +b1x, the term b1 represents the A. estimated or predicted response B. estimated intercept C. estimated slope D. explanatory variable 9. In the simple linear regression equation wri en as yˆ = b0 +b1x, the symbol x represents the A. estimated or predicted response B. estimated intercept C. estimated slope D. explanatory variable 10. A regression between foot length (response variable in cm) and height (explanatory variable in inches) for 33 students resulted in the following regression equa on: yˆ = 10.9 + 0.23 x. One student in the sample was 73 inches tall with a foot length of 29 cm. What is the predicted foot length for this student? A. 17.57 cm B. 27.69 cm C. 29 cm D. 33 cm 11. A regression between foot length (response variable in cm) and height (explanatory variable in inches) for 33 students resulted in the following regression equa on: yˆ = 10.9 + 0.23 x. One student in the sample was 73 inches tall with a foot length of 29 cm. What is the residual for this student? A. 29 cm B. 1.31 cm C. 0.00 cm D. -1.31 cm 12. What is true about Machine Learning? A. Machine Learning (ML) is that field of computer science B. ML is a type of artificial intelligence that extract patterns out of raw data by using an algorithm or method. C. The main focus of ML is to allow computer systems learn from experience without being explicitly programmed or human intervention. D. All of the above 13. ML is a field of AI consisting of learning algorithms that? A. Improve their performance B. At executing some task C. Over time with experience D. All of the above 14.Which of the following is NOT supervised learning? a) PCA b) Decision Tree c) Linear Regression d) Naive Bayesian 15. Imagine, you have 1000 input features and 1 target feature in a machine learning problem. You have to select 100 most important features based on the relationship between input features and the target features. Do you think, this is an example of dimensionality reduction? A. Yes B. No 16. Dimensionality reduction algorithms are one of the possible ways to reduce the computation time required to build a model. A. TRUE B. FALSE 17. Reducing the dimension of data will take less time to train a model. A. TRUE B. FALSE 18. PCA can be used for projecting and visualizing data in lower dimensions. A. TRUE B. FALSE 19. What is the purpose of performing cross-validation? a. To assess the predictive performance of the models b. To judge how the trained model performs outside the sample on test data c. Both A and B 20. Which of the following are feature selection techniques? a. Filter b. wrapper c. Embedded d. PCA 21. __________ refers to a group of techniques for fitting and studying the straight-line relationship between two variables. a) Linear regression b) Logistic regression c) Gradient Descent d) Greedy algorithms ML Unit II 1. What does the least squares method do exactly? a. Minimizes the distance between the data points b. Finds the least problematic regression line c. Finds those (best) values of the intercept and slope that provide us with the smallest value of the residual sum of squares d. Finds those (best) values of the intercept and slope that provide us with the smallest value of the sum of residuals 2. Fit the straight line to the following data. x 1 2 3 4 5 y 1 2 3 4 5 a) y=x b) y=x+1 c) y=2x d) y=2x+1 3. The normal equations for a straight line y = ax + b are: a) Σy = aΣx + nb and Σxy = aΣx2 + bΣx b) Σxy = aΣx + nb and Σy = aΣx2 + bΣx c) Σy = aΣx + nb and Σxy = aΣx2 + bΣxy d) Σy = aΣx + nb and Σx2y = aΣx2 + bΣx 4. Fit the straight line curve to the following data. x 75 80 93 65 87 71 98 68 84 77 y 82 78 86 72 91 80 95 72 89 74 a. y = 0.9288x + 7.78155 b. y = 7.78155x + 0.9288 c. y = 0.8288x + 6.78155 d. y = 6.78155x + 0.8288 5. Least Squares Estimation minimizes: a. summation of squares of errors b. summation of errors c. summation of absolute values of errors d. All 6. Parameter Estimation problem is about: a. Identifying Input Parameters b. Identifying Output Parameters c. Identifying Model Parameters d. All 7. ________ is a simple approach to supervised learning. It assumes that the dependence of Y on X1, X2, . . . Xp is linear. (Unit I) a) Linear regression b) Logistic regression c) Gradient Descent d) Greedy algorithms 8. When there are more than one independent variables in the model, then the linear model is termed as _______ a) Unimoda b) Multiple model c) Multiple Linear model d) Multiple Logistic model 9. The parameter β0 is termed as intercept term and the parameter β1 is termed as slope parameter. These parameters are usually called as _________ a) Regressionists b) Coefficients c) Regressive d) Regression coefficients 10 . The following is true of the Knn algorithm 1. It has slow training phase 2. It has a fast classification phase 3. Makes no assumptions about the data distribution 4. Produces a predictive model 11. The trade off between over fitting and under fitting training data is called 1. The bias-variance tradeoff 2. The residual sum of squares 3. The tradeoff curve 4. The null deviance 12 . The following is not true of the naïve Bayes classifier 1. It is easy to obtain the estimated probability for a prediction 2. It deals well with noisy and missing data 3. It is simple, fast and effective 4. It is ideal for data sets with large number of numeric variables 13. Suppose 40% of spam messages contain the word “free”. 10% of messages are spam.10% of messages contain the word “free”. The probability that a message is spam, given that it contains the word “free” is 1. 0.5 2. 0.1 3. 0.2 4. 0.4 14 . The following is one of the strengths of support vector machines (SVM) 1. Not very prone to overfitting 2. Not very prone to underfitting 3. Is generally quick to train 4. Easy to interpret 15. If we increase the k value in k-nearest neighbor, the model will _____ the bias and ______ the variance. a) Decrease, Decrease b) Increase, Decrease c) Decrease, Increase d) Increase, Increase 16. As the number of training examples goes to infinity, your model trained on that data will have: a) Lower variance b) Higher variance c) Same variance d) None of the above 17. Suppose we like to calculate P(H|E, F) and we have no conditional independence information. Which of the following sets of numbers are sufficient for the calculation? a) P(E, F), P(H), P(E|H), P(F|H) b) P(E, F), P(H), P(E, F|H) c) P(H), P(E|H), P(F|H) d) P(E, F), P(E|H), P(F|H) 18. Which of the following is/are true regarding an SVM? a) For two dimensional data points, the separating hyperplane learnt by a linear SVM will be a straight line. b) In theory, a Gaussian kernel SVM cannot model any complex separating hyperplane. c) For every kernel function used in a SVM, one can obtain an equivalent closed form basis expansion. d) Overfitting in an SVM is not a function of number of support vectors. 19. Which of the following best describes what discriminative approaches try to model? (w are the parameters in the model) a) p(y|x, w) b) p(y, x) c) p(w|x, w) d) None of the above 20. Which of the following distance metric can not be used in k-NN? A) Manhattan B) Minkowski C) Tanimoto D) Jaccard E) Mahalanobis F) All can be used 21. Which of the following option is true about k-NN algorithm? A) It can be used for classification B) It can be used for regression C) It can be used in both classification and regression 22. The minimum time complexity for training an SVM is O(n2). According to this fact, what sizes of datasets are not best suited for SVM’s? A) Large datasets B) Small datasets C) Medium sized datasets D) Size does not matter 23. The effectiveness of an SVM depends upon: A) Selection of Kernel B) Kernel Parameters C) Soft Margin Parameter C D) All of the above 24. Support vectors are the data points that lie closest to the decision surface. A) TRUE B) FALSE 25. The SVM’s are less effective when: A) The data is linearly separable B) The data is clean and ready to use C) The data is noisy and contains overlapping points 26. Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify? A) The model would consider even far away points from hyperplane for modeling B) The model would consider only the points close to the hyperplane for modeling C) The model would not be affected by distance of points from hyperplane for modeling D) None of the above 27. The cost parameter in the SVM means: A) The number of cross-validations to be made B) The kernel to be used C) The tradeoff between misclassification and simplicity of the model D) None of the above 28. If I am using all features of my dataset and I achieve 100% accuracy on my training set, but ~70% on validation set, what should I look out for? A) Underfitting B) Nothing, the model is perfect C) Overfitting 29. Which of the following are real world applications of the SVM? A) Text and Hypertext Categorization B) Image Classification C) Clustering of News Articles D) All of the above Question Context: 32 – 34 Suppose you have trained an SVM with linear decision boundary after training SVM, you correctly infer that your SVM model is under fitting. 32. Which of the following option would you more likely to consider iterating SVM next time? A) You want to increase your data points B) You want to decrease your data points C) You will try to calculate more variables D) You will try to reduce the features 33. Suppose you gave the correct answer in previous question. What do you think that is actually happening? 1. We are lowering the bias 2. We are lowering the variance 3. We are increasing the bias 4. We are increasing the variance A) 1 and 2 B) 2 and 3 C) 1 and 4 D) 2 and 4 34. In above question suppose you want to change one of it’s(SVM) hyperparameter so that effect would be same as previous questions i.e model will not under fit? A) We will increase the parameter C B) We will decrease the parameter C C) Changing in C don’t effect D) None of these 35. We usually use feature normalization before using the Gaussian kernel in SVM. What is true about feature normalization? 1. We do feature normalization so that new feature will dominate other 2. Some times, feature normalization is not feasible in case of categorical variables 3. Feature normalization always helps when we use Gaussian kernel in SVM A) 1 B) 1 and 2 C) 1 and 3 D) 2 and 3 Question context: 39 – 40 Suppose you are using SVM with linear kernel of polynomial degree 2, Now think that you have applied this on data and found that it perfectly fit the data that means, Training and testing accuracy is 100%. 39) Now, think that you increase the complexity(or degree of polynomial of this kernel). What would you think will happen? A) Increasing the complexity will overfit the data B) Increasing the complexity will underfit the data C) Nothing will happen since your model was already 100% accurate D) None of these 40) In the previous question after increasing the complexity you found that training accuracy was still 100%. According to you what is the reason behind that? 1. Since data is fixed and we are fitting more polynomial term or parameters so the algorithm starts memorizing everything in the data 2. Since data is fixed and SVM doesn’t need to search in big hypothesis space A) 1 B) 2 C) 1 and 2 D) None of these 41) What is/are true about kernel in SVM? 1. Kernel function map low dimensional data to high dimensional space 2. It’s a similarity function A) 1 B) 2 C) 1 and 2 42. Which of the following might be valid reasons for preferring an SVM over a neural network? (a) An SVM can automatically learn to apply a non-linear transformation on the input space; a neural net cannot. (b) An SVM can effectively map the data to an infinite-dimensional space; a neural net cannot. (c) An SVM should not get stuck in local minima, unlike a neural net. (d) The transformed (basis function) representation constructed by an SVM is usually easier to visualise/interpret than for a neural net. 43. Support vector machine (SVM) is a _________ classifier? a. Discriminative b. Generative 44) Which of the following distance metric can not be used in k-NN? A) Manhattan B) Minkowski C) Tanimoto D) Jaccard E) Mahalanobis F) All can be used 45) Which of the following option is true about k-NN algorithm? A) It can be used for classification B) It can be used for regression C) It can be used in both classification and regression 46. SVM can be used to solve ___________ problems. a. Classification b. Regression c. Clustering d. Both Classification and Regression 47 . SVM is a ___________ learning algorithm a. Supervised b. Unsupervised 48.SVM is termed as ________ classifier a. Minimum margin b. Maximum margin 49. The training examples closest to the separating hyperplane are called as _______ a. Training vectors b. Test vectors c. Support vectors 50. What does the least squares method do exactly? a. Minimizes the distance between the data points b. Finds the least problematic regression line c. Finds those (best) values of the intercept and slope that provide us with the smallest value of the residual sum of squares d. Finds those (best) values of the intercept and slope that provide us with the smallest value of the sum of residuals 51. Which of the following is a disadvantage of decision trees? A. Factor analysis B. Decision trees are robust to outliers C. Decision trees are prone to be overfit D. None of the above ML Unit III 1. Principal component analysis (PCA) can be used with variables of any mathematical types: quantitative, qualitative, or a mixture of these types. True/ False 2. Principal component analysis (PCA) requires quantitative multivariate data. True/ False 3. The sum of the PCA eigenvalues is equal to the sum of the variances of the variables. True/ False 4. The variables subjected to PCA must all have the same physical dimensions. True/ False 5. The most meaningful and interpretable principal components are those that have the largest eigenvalues. True/ False 6. PCA is technique for a. Variance normalisation b. Dimensionality reduction c. Feature Extraction d. Data augmentation 7. Which of the following is/are true about PCA? a. PCA is an unsupervised method. b. It searches for the directions that have the largest variance c. Maximum number of principal components<= number of features d. All principal components are orthogonal to each other 8. Which of the following is true about MDA? a. It aims to minimize both distance between class and distance within class. b. It aims to minimize the distance between class and maximize the distance within class. c. It aims to miximize the distance between class and manimize the distance within class. d. It aims to miximize both distance between class and distance within class. 9. What happens when you get features in lower dimensions using PCA? a. The features must carry all information present in data b. The features will lose interpretability c. The features will still have interpretability d. The features may not carry all information present in data. 10. PCA can be used for projecting and visualizing data in lower dimensions. A. TRUE B. FALSE 11. Which of the following is true about LDA? A. LDA aims to maximize the distance between class and minimize the within class distance B. LDA aims to minimize both distance between class and distance within class C. LDA aims to minimize the distance between class and maximize the distance within class D. LDA aims to maximize both distance between class and distance within class 12. In which of the following case LDA will fail? A. If the discriminatory information is not in the mean but in the variance of the data B. If the discriminatory information is in the mean but not in the variance of the data C. If the discriminatory information is in the mean and variance of the data D. None of these 13. Which of the following comparison(s) are true about PCA and LDA? 1. Both LDA and PCA are linear transformation techniques 2. LDA is supervised whereas PCA is unsupervised 3. PCA maximize the variance of the data, whereas LDA maximize the separation between different classes, A. 1 and 2 B. 2 and 3 C. 1 and 3 D. Only 3 E. 1, 2 and 3 14. What will happen when eigenvalues are roughly equal? A. PCA will perform outstandingly B. PCA will perform badly C. Can’t Say D.None of above 15. PCA works better if there is? 1. A linear structure in the data 2. If the data lies on a curved surface and not on a flat surface 3. If variables are scaled in the same unit A. 1 and 2 B. 2 and 3 C. 1 and 3 D. 1 ,2 and 3 16. Which of the following option(s) is / are true? 1. 2. 3. 4. You need to initialize parameters in PCA You don’t need to initialize parameters in PCA PCA can be trapped into local minima problem PCA can’t be trapped into local minima problem A. 1 and 3 B. 1 and 4 C. 2 and 3 D. 2 and 4 17. Which of the following option is true? A. LDA explicitly attempts to model the difference between the classes of data. PCA on the other hand does not take into account any difference in class. B. Both attempt to model the difference between the classes of data. C. PCA explicitly attempts to model the difference between the classes of data. LDA on the other hand does not take into account any difference in class. D. Both don’t attempt to model the difference between the classes of data. 18. Nature of the decision boundary is determined by a. Decision Rule b. Decision boundary c. Discriminant function d. None of the above 19. In Supervised learning, class labels of the training samples are a. Known b. Unknown c. Doesn’t matter d. Partially known 20. PCA is used for a. Dimensionality Enhancement b. Dimensionality Reduction c. Both d. None 21. PCA is used for a. Supervised Classification b. Unsupervised Classification c. Semi-supervised Classification d. Cannot be used for classification 22. The largest Eigen vector gives the direction of the a. Maximum scatter of the data b. Minimum scatter of the data c. No such information can be interpreted d. Second largest Eigen vector which is in the same direction. 23.. Which of the following is unsupervised technique? a. PCA b. LDA c. Bayes d. None of the above 24. Linear Discriminant Analysis is a. Unsupervised Learning b. Supervised Learning c. Semi-supervised Learning d. None of the above 25. The following property of a within-class scatter matrix is a must for LDA: a. Singular b. Non-singular c. Does not matter d. Problem-specific 26. In Supervised learning, class labels of the training samples are a. Known b. Unknown c. Doesn’t matter d. Partially known 27. A method to estimate the parameters of a distribution is a. Maximum Likelihood b. Linear Programming c. Dynamic Programming d. Convex Optimization 28. Gaussian mixtures are also known as a. Gaussian multiplication b. Non-linear super-position of Gaussians c. Linear super-position of Gaussians d. None of the above 29. For Gaussian mixture models, parameters are estimated using a closed form solution by a. Expectation Minimization b. Expectation Maximization c. Maximum Likelihood d. None of the above 30. Latent Variable in GMM is also known as: a. Prior Probability b. Posterior Probability c. Responsibility d. None of the above 31. Which of the following statements are true regarding dimensionality reduction? A. The high dimensional data can always be reduced losslessly B. Dimensionality reduction is always a non-linear transformation C. Principal Component Analysis is a technique used to reduce dimensions D. B and C E. All of the above are true 32. The most popularly used dimensionality reduction algorithm is Principal Component Analysis (PCA). Which of the following is/are true about PCA? 1. PCA is an unsupervised method 2. It searches for the directions that data have the largest variance 3. Maximum number of principal components <= number of features 4. All principal components are orthogonal to each other A. 1 and 2 B. 1 and 3 C. 2 and 3 D. 1, 2 and 3 E. 1,2 and 4 F. All of the above 33. Which of the following is true about Naive Bayes ? a. Assumes that all the features in a dataset are equally important b. Assumes that all the features in a dataset are independent c. Both A and B d. None of the above options

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download ML MCQs Unit I-III