R4RIN.com – MCQs, Mock Tests | Machine Learning (ML) Machine Learning (ML) MCQ Set 04

1. What is gini index?
gini index??operates on the categorical target variables
it is a measure of purity
gini index performs only binary split
all (1,2 and 3)

2. Tree/Rule based classification algorithms generate ... rule to perform the classification.
if-then.
while
do while
switch

3. Decision Tree is
flow-chart
structure in which internal node represents test on an attribute, each branch represents outcome of test and each leaf node represents class label
both a &amp; b
class of instance

4. Which of the following is true about Manhattan distance?
it can be used for continuous variables
it can be used for categorical variables
it can be used for categorical as well as continuous
it can be used for constants

5. A company has build a kNN classifier that gets 100% accuracy on training data. When they deployed this model on client side it has been found that the model is not at all accurate. Which of the following thing might gone wrong?Note: Model has successfully deployed and no technical issues are found at client side except the model performance
it is probably a overfitted model
??it is probably a underfitted model
??can’t say
wrong client data

6. Which statement is true about the K-Means algorithm? Select one:
the output attribute must be cateogrical.
all attribute values must be categorical.
all attributes must be numeric
attribute values may be either categorical or numeric

7. Which of the following can act as possible termination conditions in K-Means? 1. For a fixed number of iterations. 2. Assignment of observations to clusters does not change between iterations. Except for cases with a bad local minimum. 3. Centroids do not change between successive iterations. 4. Terminate when RSS falls below a threshold.
1, 3 and 4
1, 2 and 3
1, 2 and 4
1,2,3,4

8. Which of the following statement is true about k-NN algorithm? 1) k-NN performs much better if all of the data have the same scale 2) k-NN works well with a small number of input variables (p), but struggles when the number of inputs is very large 3) k-NN makes no assumptions about the functional form of the problem being solved
1 and 2
1 and 3
only 1
1,2 and 3

9. In which of the following cases will K-means clustering fail to give good results? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes
1 and 2
2 and 3
1, 2, and 3??
1 and 3

10. his clustering algorithm terminates when mean values computed for the current iteration of the algorithm are identical to the computed mean values for the previous iteration Select one:
k-means clustering
conceptual clustering
expectation maximization
agglomerative clustering

11. Which one of the following is the main reason for pruning a Decision Tree?
to save computing time during testing
to save space for storing the decision tree
to make the training set error smaller
to avoid overfitting the training set

12. You've just finished training a decision tree for spam classification, and it is getting abnormally bad performance on both your training and test sets. You know that your implementation has no bugs, so what could be causing the problem?
your decision trees are too shallow.
you need to increase the learning rate.
you are overfitting.
incorrect data

13. The K-means algorithm:
requires the dimension of the feature space to be no bigger than the number of samples
has the smallest value of the objective function when k = 1
minimizes the within class variance for a given number of clusters
converges to the global optimum if and only if the initial means are chosen as some of the samples themselves

14. Which of the following metrics, do we have for finding dissimilarity between two clusters in hierarchical clustering? 1. Single-link 2. Complete-link 3. Average-link
1 and 2
1 and 3
2 and 3
1, 2 and 3

15. In which of the following cases will K-Means clustering fail to give good results?
Data points with outliers
Data points with different densities
Data points with round shapes
Data points with non-convex shapes

16. Hierarchical clustering is slower than non-hierarchical clustering?
True
false
depends on data
cannot say

17. High entropy means that the partitions in classification are
pure
not pure
useful
useless

18. Suppose we would like to perform clustering on spatial data such as the geometrical locations of houses. We wish to produce clusters of many different sizes and shapes. Which of the following methods is the most appropriate?
decision trees
density-based clustering
model-based clustering
k-means clustering

19. The main disadvantage of maximum likelihood methods is that they are _____
mathematically less folded
mathematically less complex
mathematically less complex
computationally intense

20. The maximum likelihood method can be used to explore relationships among more diverse sequences, conditions that are not well handled by maximum parsimony methods.
true
false
-
-

21. Which Statement is not true statement.
k-means clustering is a linear clustering algorithm.
k-means clustering aims to partition n observations into k clusters
k-nearest neighbor is same as k-means
k-means is sensitive to outlier

22. what is Feature scaling done before applying K-Mean algorithm?
in distance calculation it will give the same weights for all features
you always get the same clusters. if you use or don use feature scaling
in manhattan distance it is an important step but in euclidian it is not
none of these

23. With Bayes theorem the probability of hypothesis HÂ¾ specified by P(H) Â¾ is referred to as
a conditional probability
an a priori probability
a bidirectional probability
a posterior probability

24. The probability that a person owns a sports car given that they subscribe to automotive magazine is 40%. We also know that 3% of the adult population subscribes to automotive magazine. The probability of a person owning a sports car given that they don’t subscribe to automotive magazine is 30%. Use this information to compute the probability that a person subscribes to automotive magazine given that they own a sports car
0.0398
0.0389
0.0368
0.0396

25. What is the naïve assumption in a Naïve Bayes Classifier.
all the classes are independent of each other
all the features of a class are independent of each other
the most probable feature for a class is the most important feature to be cinsidered for classification
all the features of a class are conditionally dependent on each other

26. What is the actual number of independent parameters which need to be estimated in P dimensional Gaussian distribution model?
p
2p
p(p+1)/2
p(p+3)/2

27. Give the correct Answer for following statements. 1. It is important to perform feature normalization before using the Gaussian kernel. 2. The maximum value of the Gaussian kernel is 1.
1 is true, 2 is false
1 is false, 2 is true
1 is true, 2 is true
1 is false, 2 is false

28. Consider the following dataset. x,y,z are the features and T is a class(1/0). Classify the test data (0,0,1) as values of x,y,z respectively.
0
1
0.1
0.9

29. Which of the following statements about Naive Bayes is incorrect?
attributes are equally important.
attributes are statistically dependent of one another given the class value.
attributes are statistically independent of one another given the class value.
attributes can be nominal or numeric

30. How the entries in the full joint probability distribution can be calculated?
using variables
using information
both using variables &amp; information
None of the mentioned

31. How many terms are required for building a bayes model?
1
2
3
4

32. Skewness of Normal distribution is ___________
negative
positive
0
undefined

33. The correlation coefficient for two real-valued attributes is –0.85. What does this value tell you?
the attributes are not linearly related.
as the value of one attribute increases the value of the second attribute also increases
as the value of one attribute decreases the value of the second attribute increases
the attributes show a linear relationship

34. 8 observations are clustered into 3 clusters using K-Means clustering algorithm. After first iteration clusters, C1, C2, C3 has following observations: C1: {(2,2), (4,4), (6,6)} C2: {(0,4), (4,0),(2,5)} C3: {(5,5), (9,9)} What will be the cluster centroids if you want to proceed for second iteration?
c1: (4,4), c2: (2,2), c3: (7,7)
c1: (6,6), c2: (4,4), c3: (9,9)
c1: (2,2), c2: (0,0), c3: (5,5)
c1: (4,4), c2: (3,3), c3: (7,7)

35. In Naive Bayes equation P(C / X)= (P(X / C) *P(C) ) / P(X) which part considers "likelihood"?
p(x/c)
p(c/x)
p(c)
p(x)

36. Which of the following option is / are correct regarding benefits of ensemble model? 1. Better performance 2. Generalized models 3. Better interpretability
1 and 3
B. 2 and 3
1, 2 and 3
1 and 2

37. What is back propagation?
it is another name given to the curvy function in the perceptron
it is the transmission of error back through the network to adjust the inputs
it is the transmission of error back through the network to allow weights to be adjusted so that the network can learn
none of the mentioned

38. Which of the following is an application of NN (Neural Network)?
sales forecasting
data validation
risk management
All of the mentioned

39. Neural Networks are complex ______________ with many parameters.
linear functions
nonlinear functions
discrete functions
exponential functions

40. Having multiple perceptrons can actually solve the XOR problem satisfactorily: this is because each perceptron can partition off a linear part of the space itself, and they can then combine their results.
true – this works always, and these multiple perceptrons learn to classify even complex problems
false – perceptrons are mathematically incapable of solving linearly inseparable functions, no matter what you do
true – perceptrons can do this but are unable to learn to do it – they have to be explicitly hand-coded
alse – just having a single perceptron is enough

41. Which one of the following is not a major strength of the neural network approach?
neural network learning algorithms are guaranteed to converge to an optimal solution
neural networks work well with datasets containing noisy data
neural networks can be used for both supervised learning and unsupervised clustering
neural networks can be used for applications that require a time element to be included in the data

42. The network that involves backward links from output to the input and hidden layers is called
self organizing maps
perceptrons
recurrent neural network
multi layered perceptron

43. Which of the following parameters can be tuned for finding good ensemble model in bagging based algorithms? 1. Max number of samples 2. Max features 3. Bootstrapping of samples 4. Bootstrapping of features
1
2
3&amp;4
1,2,3&amp;4

44. What is back propagation? a) It is another name given to the curvy function in the perceptron b) It is the transmission of error back through the network to adjust the inputs c) It is the transmission of error back through the network to allow weights to be adjusted so that the network can learn d) None of the mentioned
a
b
c
b&amp;c

45. What is the sequence of the following tasks in a perceptron? Initialize weights of perceptron randomly Go to the next batch of dataset If the prediction does not match the output, change the weights For a sample input, compute an output
1, 4, 3, 2
3, 1, 2, 4
4, 3, 2, 1
1, 2, 3, 4

46. In which neural net architecture, does weight sharing occur?
recurrent neural network
convolutional neural network
fully connected neural network
both a and b

47. Which of the following are correct statement(s) about stacking? 1. A machine learning model is trained on predictions of multiple machine learning models 2. A Logistic regression will definitely work better in the second stage as compared to other classification methods 3. First stage models are trained on full / partial feature space of training data
1 and 2
2 and 3
1 and 3
1,2 and 3

48. The F-test
an omnibus test
considers the reduction in error when moving from the complete model to the reduced model
considers the reduction in error when moving from the reduced model to the complete model
can only be conceptualized as a reduction in error

49. What is true about an ensembled classifier? 1. Classifiers that are more “sure” can vote with more conviction 2. Classifiers can be more “sure” about a particular part of the space 3. Most of the times, it performs better than a single classifier
1 and 2
1 and 3
2 and 3
all of the above

50. Which of the following option is / are correct regarding benefits of ensemble model? 1. Better performance 2. Generalized models 3. Better interpretability
1 and 3
2 and 3
1 and 2
1, 2 and 3

51. Which of the following can be true for selecting base learners for an ensemble? 1. Different learners can come from same algorithm with different hyper parameters 2. Different learners can come from different algorithms 3. Different learners can come from different training spaces
1
2
1 and 3
1, 2 and 3

52. If you use an ensemble of different base models, is it necessary to tune the hyper parameters of all base models to improve the ensemble performance?
yes
no
can’t say
None of These

53. Generally, an ensemble method works better, if the individual base models have ____________?Note: Suppose each individual base models have accuracy greater than 50%.
less correlation among predictions
high correlation among predictions
correlation does not have any impact on ensemble output
none of the above

54. In an election, N candidates are competing against each other and people are voting for either of the candidates. Voters don’t communicate with each other while casting their votes. Which of the following ensemble method works similar to above-discussed election procedure? Hint: Persons are like base models of ensemble method.
bagging
boosting
a or b
none of these

55. Suppose there are 25 base classifiers. Each classifier has error rates of e = 0.35. Suppose you are using averaging as ensemble technique. What will be the probabilities that ensemble of above 25 classifiers will make a wrong prediction? Note: All classifiers are independent of each other
0.05
0.06
0.07
0.09