R4RIN.com – MCQs, Mock Tests | Machine Learning (ML) Machine Learning (ML) MCQ Set 05

1. Which of the following parameters can be tuned for finding good ensemble model in bagging based algorithms? 1. Max number of samples 2. Max features 3. Bootstrapping of samples 4. Bootstrapping of features
1 and 3
2 and 3
1 and 2
all of above

2. How is the model capacity affected with dropout rate (where model capacity means the ability of a neural network to approximate complex functions)?
model capacity increases in increase in dropout rate
model capacity decreases in increase in dropout rate
model capacity is not affected on increase in dropout rate
none of these

3. Suppose, you want to apply a stepwise forward selection method for choosing the best models for an ensemble model. Which of the following is the correct order of the steps? Note: You have more than 1000 models predictions 1. Add the models predictions (or in another term take the average) one by one in the ensemble which improves the metrics in the validation set. 2. Start with empty ensemble 3. Return the ensemble from the nested set of ensembles that has maximum performance on the validation set
1-2-3
1-3-4
2-1-3
none of above

4. Below are the two ensemble models: 1. E1(M1, M2, M3) and 2. E2(M4, M5, M6) Above, Mx is the individual base models. Which of the following are more likely to choose if following conditions for E1 and E2 are given? E1: Individual Models accuracies are high but models are of the same type or in another term less diverse E2: Individual Models accuracies are high but they are of different types in another term high diverse in nature
e1
e2
any of e1 and e2
none of these

5. Which of the following is true about bagging? 1. Bagging can be parallel 2. The aim of bagging is to reduce bias not variance 3. Bagging helps in reducing overfitting
1 and 2
2 and 3
1 and 3
all of these

6. Suppose you are using stacking with n different machine learning algorithms with k folds on data. Which of the following is true about one level (m base models + 1 stacker) stacking? Note: Here, we are working on binary classification problem All base models are trained on all features You are using k folds for base models
you will have only k features after the first stage
you will have only m features after the first stage
you will have k+m features after the first stage
you will have k*n features after the first stage

7. Which of the following is the difference between stacking and blending?
A. stacking has less stable cv compared to blending
in blending, you create out of fold prediction
stacking is simpler than blending
none of these

8. Which of the following can be one of the steps in stacking? 1. Divide the training data into k folds 2. Train k models on each k-1 folds and get the out of fold predictions for remaining one fold 3. Divide the test data set in “k” folds and get individual fold predictions by different algorithms
1 and 2
2 and 3
1 and 3
all of above

9. Q25. Which of the following are advantages of stacking? 1) More robust model 2) better prediction 3) Lower time of execution
1 and 2
2 and 3
1 and 3
all of the above

10. Which of the following are correct statement(s) about stacking? A machine learning model is trained on predictions of multiple machine learning models A Logistic regression will definitely work better in the second stage as compared to other classification methods First stage models are trained on full / partial feature space of training data
1 and 2
2 and 3
1 and 3
all of above

11. Which of the following is true about weighted majority votes? 1. We want to give higher weights to better performing models 2. Inferior models can overrule the best model if collective weighted votes for inferior models is higher than best model 3. Voting is special case of weighted voting
1 and 3
2 and 3
1 and 2
1, 2 and 3

12. Which of the following is true about averaging ensemble?
it can only be used in classification problem
it can only be used in regression problem
it can be used in both classification as well as regression
none of these

13. How can we assign the weights to output of different models in an ensemble? 1. Use an algorithm to return the optimal weights 2. Choose the weights using cross validation 3. Give high weights to more accurate models
1 and 2
1 and 3
2 and 3
all of above

14. Suppose you are given ‘n’ predictions on test data by ‘n’ different models (M1, M2, …. Mn) respectively. Which of the following method(s) can be used to combine the predictions of these models? Note: We are working on a regression problem 1. Median 2. Product 3. Average 4. Weighted sum 5. Minimum and Maximum 6. Generalized mean rule
1, 3 and 4
1,3 and 6
1,3, 4 and 6
all of above

15. In an election, N candidates are competing against each other and people are voting for either of the candidates. Voters don’t communicate with each other while casting their votes. Which of the following ensemble method works similar to above-discussed election procedure?Hint: Persons are like base models of ensemble method
bagging
1,3 and 6
a or b
None of these

16. Which of the following is NOT supervised learning?
pca
decision tree
linear regression
naive bayesian

17. How can you avoid overfitting ?
by using a lot of data
by using inductive machine learning
by using validation only
None of the above

18. What are the popular algorithms of Machine Learning?
decision trees and neural networks (back propagation)
probabilistic networks and nearest neighbor
support vector machines
all

19. What is Training set?
training set is used to test the accuracy of the hypotheses generated by the learner.
a set of data is used to discover the potentially predictive relationship.
both a &amp; b
none of above

20. Common deep learning applications include
image classification, real-time visual tracking
autonomous car driving, logistic optimization
bioinformatics, speech recognition
All of the above

21. what is the function of Supervised Learning?
classifications, predict time series, annotate strings
speech recognition, regression
both a &amp; b
None of the above

22. Commons unsupervised applications include
object segmentation
similarity detection
automatic labeling
All of the above

23. Reinforcement learning is particularly efficient when
the environment is not completely deterministic
its often very dynamic
its impossible to have a precise error measure
All of the above

24. if there is only a discrete number of possible outcomes (called categories), the process becomes a
regression
classification
modelfree
categories

25. Which of the following are supervised learning applications
spam detection, pattern detection, natural language processing
image classification, real-time visual tracking
autonomous car driving, logistic optimization
bioinformatics, speech recognition

26. During the last few years, many algorithms have been applied to deep neural networks to learn the best policy for playing Atari video games and to teach an agent how to associate the right action with an input representing the state.
logical
classical
classification
None of the above

27. Which of the following sentence is correct?
machine learning relates with the study, design and
data mining can be defined as the process in which the
both a &amp; b
none of the above

28. What is Overfitting in Machine learning?
when a statistical model describes random error or noise instead of underlying relationship overfitting occurs.
robots are programed so that they can perform the task based on data they gather from sensors.
while involving the process of learning overfitting occurs.
a set of data is used to discover the potentially predictive relationship

29. What is Test set?
test set is used to test the accuracy of the hypotheses generated by the learner.
it is a set of data is used to discover the potentially predictive relationship.
both a &amp; b
none of above

30. is much more difficult because it's necessary to determine a supervised strategy to train a model for each feature and, finally, to predict their value
removing the whole line
creating sub-model to predict those features
using an automatic strategy to input them according to the other known values
all above

31. How it's possible to use a different placeholder through the parameter .
regression
classification
random_state
missing_values

32. If you need a more powerful scaling feature, with a superior control on outliers and the possibility to select a quantile range, there's also the class
robustscaler
dictvectorizer
labelbinarizer
featurehasher

33. scikit-learn also provides a class for per- sample normalization, Normalizer. It can apply to each element of a dataset
max, l0 and l1 norms
max, l1 and l2 norms
max, l2 and l3 norms
max, l3 and l4 norms

34. There are also many univariate methods that can be used in order to select the best features according to specific criteria based on
f-tests and p-values
chi-square
anova
all above

35. Which of the following selects only a subset of features belonging to a certain percentile
selectpercentile
featurehasher
selectkbest
all above

36. performs a PCA with non-linearly separable data sets.
sparsepca
kernelpca
svd
None of the mentioned

37. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a college.Which of the following statement is true in following case?
feature f1 is an example of nominal variable.
feature f1 is an example of ordinal variable.
it doesnt belong to any of the above category.
both of these

38. What would you do in PCA to get the same projection as SVD?
transform data to zero mean
transform data to zero median
not possible
none of these

39. What is PCA, KPCA and ICA used for?
principal components analysis
kernel based principal component analysis
independent component analysis
all above

40. What are common feature selection methods in regression task?
correlation coefficient
greedy algorithms
all above
none of these

41. The parameter allows specifying the percentage of elements to put into the test/training set
test_size
training_size
all above
none of these

42. In many classification problems, the target is made up of categorical labels which cannot immediately be processed by any algorithm.
random_state
dataset
test_size
all above

43. adopts a dictionary-oriented approach, associating to each category label a progressive integer number.
labelencoder class
labelbinarizer class
dictvectorizer
featurehasher

44. If Linear regression model perfectly first i.e., train error is zero, then
test error is also always zero
test error is non zero
couldnt comment on test error
test error is equal to train error

45. Which of the following metrics can be used for evaluating regression models?i) R Squaredii) Adjusted R Squarediii) F Statisticsiv) RMSE / MSE / MAE
ii and iv
i and ii
ii, iii and iv
i, ii, iii and iv

46. In a simple linear regression model (One independent variable), If we change the input variable by 1 unit. How much output variable will change?
a) by 1
no change
by intercept
by its slope

47. Function used for linear regression in R is
lm(formula, data)
lr(formula, data)
lrm(formula, data)
regression.linear(formula, data)

48. In syntax of linear model lm(formula,data,..), data refers to
matrix
vector
array
list

49. In the mathematical Equation of Linear Regression Y?=??1 + ?2X + ?, (?1, ?2) refers to
(x-intercept, slope)
(slope, x-intercept)
(y-intercept, slope)
(slope, y-intercept)

50. Which of the following methods do we use to find the best fit line for data in Linear Regression?
least square error
maximum likelihood
logarithmic loss
both a and b