Download ML MCQs Unit I-III

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
ML Unit I
1. What is Machine learning?
a) The autonomous acquisition of knowledge through the use of computer
programs
b) The autonomous acquisition of knowledge through the use of manual
programs
c) The selective acquisition of knowledge through the use of computer
programs
d) The selective acquisition of knowledge through the use of manual programs
2. Choose the options that are correct regarding machine learning (ML) and
artificial intelligence (AI):
(A) ML is an alternate way of programming intelligent machines.
(B) ML and AI have very different goals.
(C) ML is a set of techniques that turns a dataset into a software.
(D) AI is a software that can emulate the human mind.
3. Which of the following sentence is FALSE regarding regression?
(A) It relates inputs to outputs.
(B) It is used for prediction.
(C) It may be used for interpretation.
(D) It discovers causal relationships.
4. Choose correct options- What does a simple linear regression analysis
examine?
a. The relationship between only two variables
b. The relationship between one dependent and one independent variable
c. The relationship between many variables
d. The relationship between two dependent and one independent variable
5. Which of the following is not an assumption for simple linear regression?
a.
b.
c.
d.
e.
Normally distributed variables
Multicollinearity
Linear relationship
Constant variance
Normally distributed residuals
6. The simple linear regression equation can be wri en as yˆ = b0 +b1x. The
symbol yˆ represents the
A. average or predicted response
B. estimated intercept
C. estimated slope
D. explanatory variable
7. The simple linear regression equa on can be wri en as yˆ = b0 +b1x. The
term b0 represents the
A. estimated or predicted response
B. estimated intercept
C. estimated slope
D. explanatory variable
8. In the simple linear regression equation wri en as yˆ = b0 +b1x, the term b1
represents the
A. estimated or predicted response
B. estimated intercept
C. estimated slope
D. explanatory variable
9. In the simple linear regression equation wri en as yˆ = b0 +b1x, the symbol
x represents the
A. estimated or predicted response
B. estimated intercept
C. estimated slope
D. explanatory variable
10. A regression between foot length (response variable in cm) and height
(explanatory variable in inches) for 33 students resulted in the following
regression equa on: yˆ = 10.9 + 0.23 x. One student in the sample was 73
inches tall with a foot length of 29 cm.
What is the predicted foot length for this student?
A. 17.57 cm
B. 27.69 cm
C. 29 cm
D. 33 cm
11. A regression between foot length (response variable in cm) and height
(explanatory variable in inches) for 33 students resulted in the following
regression equa on: yˆ = 10.9 + 0.23 x. One student in the sample was 73
inches tall with a foot length of 29 cm. What is the residual for this student?
A. 29 cm
B. 1.31 cm
C. 0.00 cm
D. -1.31 cm
12. What is true about Machine Learning?
A. Machine Learning (ML) is that field of computer science
B. ML is a type of artificial intelligence that extract patterns out of raw data
by using an algorithm or method.
C. The main focus of ML is to allow computer systems learn from experience
without being explicitly programmed or human intervention.
D. All of the above
13. ML is a field of AI consisting of learning algorithms that?
A. Improve their performance
B. At executing some task
C. Over time with experience
D. All of the above
14.Which of the following is NOT supervised learning?
a) PCA
b) Decision Tree
c) Linear Regression
d) Naive Bayesian
15. Imagine, you have 1000 input features and 1 target feature in a machine
learning problem. You have to select 100 most important features based on
the relationship between input features and the target features.
Do you think, this is an example of dimensionality reduction?
A. Yes
B. No
16. Dimensionality reduction algorithms are one of the possible ways to reduce
the computation time required to build a model.
A. TRUE
B. FALSE
17. Reducing the dimension of data will take less time to train a model.
A. TRUE
B. FALSE
18. PCA can be used for projecting and visualizing data in lower dimensions.
A. TRUE
B. FALSE
19. What is the purpose of performing cross-validation?
a. To assess the predictive performance of the models
b. To judge how the trained model performs outside the sample on test data
c. Both A and B
20. Which of the following are feature selection techniques?
a. Filter
b. wrapper
c. Embedded
d. PCA
21. __________ refers to a group of techniques for fitting and studying the
straight-line relationship between two variables.
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
ML Unit II
1. What does the least squares method do exactly?
a. Minimizes the distance between the data points
b. Finds the least problematic regression line
c. Finds those (best) values of the intercept and slope that provide us with the
smallest value of the residual sum of squares
d. Finds those (best) values of the intercept and slope that provide us with the
smallest value of the sum of residuals
2. Fit the straight line to the following data.
x
1
2
3
4
5
y
1
2
3
4
5
a) y=x
b) y=x+1
c) y=2x
d) y=2x+1
3. The normal equations for a straight line y = ax + b are:
a) Σy = aΣx + nb and Σxy = aΣx2 + bΣx
b) Σxy = aΣx + nb and Σy = aΣx2 + bΣx
c) Σy = aΣx + nb and Σxy = aΣx2 + bΣxy
d) Σy = aΣx + nb and Σx2y = aΣx2 + bΣx
4. Fit the straight line curve to the following data.
x
75
80
93
65
87
71
98
68
84
77
y
82
78
86
72
91
80
95
72
89
74
a. y = 0.9288x + 7.78155
b. y = 7.78155x + 0.9288
c. y = 0.8288x + 6.78155
d. y = 6.78155x + 0.8288
5. Least Squares Estimation minimizes:
a. summation of squares of errors
b. summation of errors
c. summation of absolute values of errors
d. All
6. Parameter Estimation problem is about:
a. Identifying Input Parameters
b. Identifying Output Parameters
c. Identifying Model Parameters
d. All
7. ________ is a simple approach to supervised learning. It assumes that the
dependence of Y on X1, X2, . . . Xp is linear. (Unit I)
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
8. When there are more than one independent variables in the model, then the
linear model is termed as _______
a) Unimoda
b) Multiple model
c) Multiple Linear model
d) Multiple Logistic model
9. The parameter β0 is termed as intercept term and the parameter β1 is
termed as slope parameter. These parameters are usually called as _________
a) Regressionists
b) Coefficients
c) Regressive
d) Regression coefficients
10 . The following is true of the Knn algorithm
1. It has slow training phase
2. It has a fast classification phase
3. Makes no assumptions about the data distribution
4. Produces a predictive model
11. The trade off between over fitting and under fitting training data is called
1. The bias-variance tradeoff
2. The residual sum of squares
3. The tradeoff curve
4. The null deviance
12 . The following is not true of the naïve Bayes classifier
1. It is easy to obtain the estimated probability for a prediction
2. It deals well with noisy and missing data
3. It is simple, fast and effective
4. It is ideal for data sets with large number of numeric variables
13. Suppose 40% of spam messages contain the word “free”. 10% of messages
are spam.10% of messages contain the word “free”. The probability that a
message is spam, given that it contains the word “free” is
1. 0.5
2. 0.1
3. 0.2
4. 0.4
14 . The following is one of the strengths of support vector machines (SVM)
1. Not very prone to overfitting
2. Not very prone to underfitting
3. Is generally quick to train
4. Easy to interpret
15. If we increase the k value in k-nearest neighbor, the model will _____ the
bias and ______ the variance.
a) Decrease, Decrease
b) Increase, Decrease
c) Decrease, Increase
d) Increase, Increase
16. As the number of training examples goes to infinity, your model trained on
that data will have:
a) Lower variance
b) Higher variance
c) Same variance
d) None of the above
17. Suppose we like to calculate P(H|E, F) and we have no conditional
independence information. Which of the following sets of numbers are
sufficient for the calculation?
a) P(E, F), P(H), P(E|H), P(F|H)
b) P(E, F), P(H), P(E, F|H)
c) P(H), P(E|H), P(F|H)
d) P(E, F), P(E|H), P(F|H)
18. Which of the following is/are true regarding an SVM?
a) For two dimensional data points, the separating hyperplane learnt by a linear
SVM will be a straight line.
b) In theory, a Gaussian kernel SVM cannot model any complex separating
hyperplane.
c) For every kernel function used in a SVM, one can obtain an equivalent closed
form basis expansion.
d) Overfitting in an SVM is not a function of number of support vectors.
19. Which of the following best describes what discriminative approaches try to
model? (w are the parameters in the model)
a) p(y|x, w)
b) p(y, x)
c) p(w|x, w)
d) None of the above
20. Which of the following distance metric can not be used in k-NN?
A) Manhattan
B) Minkowski
C) Tanimoto
D) Jaccard
E) Mahalanobis
F) All can be used
21. Which of the following option is true about k-NN algorithm?
A) It can be used for classification
B) It can be used for regression
C) It can be used in both classification and regression
22. The minimum time complexity for training an SVM is O(n2). According to this
fact, what sizes of datasets are not best suited for SVM’s?
A) Large datasets
B) Small datasets
C) Medium sized datasets
D) Size does not matter
23. The effectiveness of an SVM depends upon:
A) Selection of Kernel
B) Kernel Parameters
C) Soft Margin Parameter C
D) All of the above
24. Support vectors are the data points that lie closest to the decision surface.
A) TRUE
B) FALSE
25. The SVM’s are less effective when:
A) The data is linearly separable
B) The data is clean and ready to use
C) The data is noisy and contains overlapping points
26. Suppose you are using RBF kernel in SVM with high Gamma value. What
does this signify?
A) The model would consider even far away points from hyperplane for
modeling
B) The model would consider only the points close to the hyperplane for
modeling
C) The model would not be affected by distance of points from hyperplane for
modeling
D) None of the above
27. The cost parameter in the SVM means:
A) The number of cross-validations to be made
B) The kernel to be used
C) The tradeoff between misclassification and simplicity of the model
D) None of the above
28. If I am using all features of my dataset and I achieve 100% accuracy on my
training set, but ~70% on validation set, what should I look out for?
A) Underfitting
B) Nothing, the model is perfect
C) Overfitting
29. Which of the following are real world applications of the SVM?
A) Text and Hypertext Categorization
B) Image Classification
C) Clustering of News Articles
D) All of the above
Question Context: 32 – 34
Suppose you have trained an SVM with linear decision boundary after training
SVM, you correctly infer that your SVM model is under fitting.
32. Which of the following option would you more likely to consider iterating
SVM next time?
A) You want to increase your data points
B) You want to decrease your data points
C) You will try to calculate more variables
D) You will try to reduce the features
33. Suppose you gave the correct answer in previous question. What do you
think that is actually happening?
1. We are lowering the bias
2. We are lowering the variance
3. We are increasing the bias
4. We are increasing the variance
A) 1 and 2
B) 2 and 3
C) 1 and 4
D) 2 and 4
34. In above question suppose you want to change one of it’s(SVM)
hyperparameter so that effect would be same as previous questions i.e model
will not under fit?
A) We will increase the parameter C
B) We will decrease the parameter C
C) Changing in C don’t effect
D) None of these
35. We usually use feature normalization before using the Gaussian kernel in
SVM. What is true about feature normalization?
1. We do feature normalization so that new feature will dominate other
2. Some times, feature normalization is not feasible in case of categorical
variables
3. Feature normalization always helps when we use Gaussian kernel in SVM
A) 1
B) 1 and 2
C) 1 and 3
D) 2 and 3
Question context: 39 – 40
Suppose you are using SVM with linear kernel of polynomial degree 2, Now
think that you have applied this on data and found that it perfectly fit the data
that means, Training and testing accuracy is 100%.
39) Now, think that you increase the complexity(or degree of polynomial of this
kernel). What would you think will happen?
A)
Increasing
the
complexity
will
overfit
the
data
B)
Increasing
the
complexity
will
underfit
the
data
C) Nothing will happen since your model was already 100% accurate
D) None of these
40) In the previous question after increasing the complexity you found that
training accuracy was still 100%. According to you what is the reason behind
that?
1. Since data is fixed and we are fitting more polynomial term or parameters so
the
algorithm
starts
memorizing
everything
in
the
data
2. Since data is fixed and SVM doesn’t need to search in big hypothesis space
A) 1
B) 2
C) 1 and 2
D) None of these
41) What is/are true about kernel in SVM?
1. Kernel function map low dimensional data to high dimensional space
2. It’s a similarity function
A) 1
B) 2
C) 1 and 2
42. Which of the following might be valid reasons for preferring an SVM over a
neural network?
(a) An SVM can automatically learn to apply a non-linear transformation on the
input space; a neural net cannot.
(b) An SVM can effectively map the data to an infinite-dimensional space; a
neural net cannot.
(c) An SVM should not get stuck in local minima, unlike a neural net.
(d) The transformed (basis function) representation constructed by an SVM is
usually easier to visualise/interpret than for a neural net.
43. Support vector machine (SVM) is a _________ classifier?
a. Discriminative
b. Generative
44) Which of the following distance metric can not be used in k-NN?
A) Manhattan
B) Minkowski
C) Tanimoto
D) Jaccard
E) Mahalanobis
F) All can be used
45) Which of the following option is true about k-NN algorithm?
A) It can be used for classification
B) It can be used for regression
C) It can be used in both classification and regression
46. SVM can be used to solve ___________ problems.
a. Classification
b. Regression
c. Clustering
d. Both Classification and Regression
47 . SVM is a ___________ learning algorithm
a. Supervised
b. Unsupervised
48.SVM is termed as ________ classifier
a. Minimum margin
b. Maximum margin
49. The training examples closest to the separating hyperplane are called as
_______
a. Training vectors
b. Test vectors
c. Support vectors
50. What does the least squares method do exactly?
a. Minimizes the distance between the data points
b. Finds the least problematic regression line
c. Finds those (best) values of the intercept and slope that provide us with the
smallest value of the residual sum of squares
d. Finds those (best) values of the intercept and slope that provide us with the
smallest value of the sum of residuals
51. Which of the following is a disadvantage of decision trees?
A. Factor analysis
B. Decision trees are robust to outliers
C. Decision trees are prone to be overfit
D. None of the above
ML Unit III
1. Principal component analysis (PCA) can be used with variables of any
mathematical types: quantitative, qualitative, or a mixture of these types.
True/ False
2. Principal component analysis (PCA) requires quantitative multivariate
data.
True/ False
3. The sum of the PCA eigenvalues is equal to the sum of the variances of
the variables.
True/ False
4. The variables subjected to PCA must all have the same physical dimensions.
True/ False
5. The most meaningful and interpretable principal components are those
that have the largest eigenvalues.
True/ False
6. PCA is technique for
a. Variance normalisation
b. Dimensionality reduction
c. Feature Extraction
d. Data augmentation
7. Which of the following is/are true about PCA?
a. PCA is an unsupervised method.
b. It searches for the directions that have the largest variance
c. Maximum number of principal components<= number of features
d. All principal components are orthogonal to each other
8. Which of the following is true about MDA?
a. It aims to minimize both distance between class and distance within class.
b. It aims to minimize the distance between class and maximize the distance
within class.
c. It aims to miximize the distance between class and manimize the distance
within class.
d. It aims to miximize both distance between class and distance within class.
9. What happens when you get features in lower dimensions using PCA?
a. The features must carry all information present in data
b. The features will lose interpretability
c. The features will still have interpretability
d. The features may not carry all information present in data.
10. PCA can be used for projecting and visualizing data in lower dimensions.
A. TRUE
B. FALSE
11. Which of the following is true about LDA?
A. LDA aims to maximize the distance between class and minimize the within class
distance
B. LDA aims to minimize both distance between class and distance within class
C. LDA aims to minimize the distance between class and maximize the distance
within class
D. LDA aims to maximize both distance between class and distance within class
12. In which of the following case LDA will fail?
A. If the discriminatory information is not in the mean but in the variance of the data
B. If the discriminatory information is in the mean but not in the variance of the data
C. If the discriminatory information is in the mean and variance of the data
D. None of these
13. Which of the following comparison(s) are true about PCA and LDA?
1. Both LDA and PCA are linear transformation techniques
2. LDA is supervised whereas PCA is unsupervised
3. PCA maximize the variance of the data, whereas LDA maximize the
separation between different classes,
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. Only 3
E. 1, 2 and 3
14. What will happen when eigenvalues are roughly equal?
A. PCA will perform outstandingly
B. PCA will perform badly
C. Can’t Say
D.None of above
15. PCA works better if there is?
1. A linear structure in the data
2. If the data lies on a curved surface and not on a flat surface
3. If variables are scaled in the same unit
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. 1 ,2 and 3
16. Which of the following option(s) is / are true?
1.
2.
3.
4.
You need to initialize parameters in PCA
You don’t need to initialize parameters in PCA
PCA can be trapped into local minima problem
PCA can’t be trapped into local minima problem
A. 1 and 3
B. 1 and 4
C. 2 and 3
D. 2 and 4
17. Which of the following option is true?
A. LDA explicitly attempts to model the difference between the classes of data. PCA
on the other hand does not take into account any difference in class.
B. Both attempt to model the difference between the classes of data.
C. PCA explicitly attempts to model the difference between the classes of data. LDA
on the other hand does not take into account any difference in class.
D. Both don’t attempt to model the difference between the classes of data.
18. Nature of the decision boundary is determined by
a. Decision Rule
b. Decision boundary
c. Discriminant function
d. None of the above
19. In Supervised learning, class labels of the training samples are
a. Known
b. Unknown
c. Doesn’t matter
d. Partially known
20. PCA is used for
a. Dimensionality Enhancement
b. Dimensionality Reduction
c. Both
d. None
21. PCA is used for
a. Supervised Classification
b. Unsupervised Classification
c. Semi-supervised Classification
d. Cannot be used for classification
22. The largest Eigen vector gives the direction of the
a. Maximum scatter of the data
b. Minimum scatter of the data
c. No such information can be interpreted
d. Second largest Eigen vector which is in the same direction.
23..
Which of the following is unsupervised technique?
a. PCA
b. LDA
c. Bayes
d. None of the above
24. Linear Discriminant Analysis is
a. Unsupervised Learning
b. Supervised Learning
c. Semi-supervised Learning
d. None of the above
25. The following property of a within-class scatter matrix is a must for LDA:
a. Singular
b. Non-singular
c. Does not matter
d. Problem-specific
26. In Supervised learning, class labels of the training samples are
a. Known
b. Unknown
c. Doesn’t matter
d. Partially known
27. A method to estimate the parameters of a distribution is
a. Maximum Likelihood
b. Linear Programming
c. Dynamic Programming
d. Convex Optimization
28. Gaussian mixtures are also known as
a. Gaussian multiplication
b. Non-linear super-position of Gaussians
c. Linear super-position of Gaussians
d. None of the above
29.
For Gaussian mixture models, parameters are estimated using a closed form solution by
a. Expectation Minimization
b. Expectation Maximization
c. Maximum Likelihood
d. None of the above
30. Latent Variable in GMM is also known as:
a. Prior Probability
b. Posterior Probability
c. Responsibility
d. None of the above
31. Which of the following statements are true regarding dimensionality
reduction?
A. The high dimensional data can always be reduced losslessly
B. Dimensionality reduction is always a non-linear transformation
C. Principal Component Analysis is a technique used to reduce dimensions
D. B and C
E. All of the above are true
32. The most popularly used dimensionality reduction algorithm is Principal
Component Analysis (PCA). Which of the following is/are true about PCA?
1. PCA is an unsupervised method
2. It searches for the directions that data have the largest variance
3. Maximum number of principal components <= number of features
4. All principal components are orthogonal to each other
A. 1 and 2
B. 1 and 3
C. 2 and 3
D. 1, 2 and 3
E. 1,2 and 4
F. All of the above
33. Which of the following is true about Naive Bayes ?
a. Assumes that all the features in a dataset are equally important
b. Assumes that all the features in a dataset are independent
c. Both A and B
d. None of the above options