R4RIN – MCQs, Mock Tests | Machine Learning (ML) Machine Learning (ML) MCQ Set 05

Which of the following parameters can be tuned for finding good ensemble model in bagging based algorithms? 1. Max number of samples 2. Max features 3. Bootstrapping of samples 4. Bootstrapping of features

1. 1 and 3
2. 2 and 3
3. 1 and 2
4. all of above

✅ Correct Answer: 4

How is the model capacity affected with dropout rate (where model capacity means the ability of a neural network to approximate complex functions)?

1. model capacity increases in increase in dropout rate
2. model capacity decreases in increase in dropout rate
3. model capacity is not affected on increase in dropout rate
4. none of these

✅ Correct Answer: 2

Suppose, you want to apply a stepwise forward selection method for choosing the best models for an ensemble model. Which of the following is the correct order of the steps? Note: You have more than 1000 models predictions 1. Add the models predictions (or in another term take the average) one by one in the ensemble which improves the metrics in the validation set. 2. Start with empty ensemble 3. Return the ensemble from the nested set of ensembles that has maximum performance on the validation set

1. 1-2-3
2. 1-3-4
3. 2-1-3
4. none of above

✅ Correct Answer: 4

Below are the two ensemble models: 1. E1(M1, M2, M3) and 2. E2(M4, M5, M6) Above, Mx is the individual base models. Which of the following are more likely to choose if following conditions for E1 and E2 are given? E1: Individual Models accuracies are high but models are of the same type or in another term less diverse E2: Individual Models accuracies are high but they are of different types in another term high diverse in nature

1. e1
2. e2
3. any of e1 and e2
4. none of these

✅ Correct Answer: 2

Which of the following is true about bagging? 1. Bagging can be parallel 2. The aim of bagging is to reduce bias not variance 3. Bagging helps in reducing overfitting

1. 1 and 2
2. 2 and 3
3. 1 and 3
4. all of these

✅ Correct Answer: 3

Suppose you are using stacking with n different machine learning algorithms with k folds on data. Which of the following is true about one level (m base models + 1 stacker) stacking? Note: Here, we are working on binary classification problem All base models are trained on all features You are using k folds for base models

1. you will have only k features after the first stage
2. you will have only m features after the first stage
3. you will have k+m features after the first stage
4. you will have k*n features after the first stage

✅ Correct Answer: 2

Which of the following is the difference between stacking and blending?

1. A. stacking has less stable cv compared to blending
2. in blending, you create out of fold prediction
3. stacking is simpler than blending
4. none of these

✅ Correct Answer: 4

Which of the following can be one of the steps in stacking? 1. Divide the training data into k folds 2. Train k models on each k-1 folds and get the out of fold predictions for remaining one fold 3. Divide the test data set in “k” folds and get individual fold predictions by different algorithms

1. 1 and 2
2. 2 and 3
3. 1 and 3
4. all of above

✅ Correct Answer: 1

Q25. Which of the following are advantages of stacking? 1) More robust model 2) better prediction 3) Lower time of execution

1. 1 and 2
2. 2 and 3
3. 1 and 3
4. all of the above

✅ Correct Answer: 1

Which of the following are correct statement(s) about stacking? A machine learning model is trained on predictions of multiple machine learning models A Logistic regression will definitely work better in the second stage as compared to other classification methods First stage models are trained on full / partial feature space of training data

1. 1 and 2
2. 2 and 3
3. 1 and 3
4. all of above

✅ Correct Answer: 3

Which of the following is true about weighted majority votes? 1. We want to give higher weights to better performing models 2. Inferior models can overrule the best model if collective weighted votes for inferior models is higher than best model 3. Voting is special case of weighted voting

1. 1 and 3
2. 2 and 3
3. 1 and 2
4. 1, 2 and 3

✅ Correct Answer: 4

Which of the following is true about averaging ensemble?

1. it can only be used in classification problem
2. it can only be used in regression problem
3. it can be used in both classification as well as regression
4. none of these

✅ Correct Answer: 3

How can we assign the weights to output of different models in an ensemble? 1. Use an algorithm to return the optimal weights 2. Choose the weights using cross validation 3. Give high weights to more accurate models

1. 1 and 2
2. 1 and 3
3. 2 and 3
4. all of above

✅ Correct Answer: 4

Suppose you are given ‘n’ predictions on test data by ‘n’ different models (M1, M2, …. Mn) respectively. Which of the following method(s) can be used to combine the predictions of these models? Note: We are working on a regression problem 1. Median 2. Product 3. Average 4. Weighted sum 5. Minimum and Maximum 6. Generalized mean rule

1. 1, 3 and 4
2. 1,3 and 6
3. 1,3, 4 and 6
4. all of above

✅ Correct Answer: 4

In an election, N candidates are competing against each other and people are voting for either of the candidates. Voters don’t communicate with each other while casting their votes. Which of the following ensemble method works similar to above-discussed election procedure?Hint: Persons are like base models of ensemble method

1. bagging
2. 1,3 and 6
3. a or b
4. None of these

✅ Correct Answer: 1

Which of the following is NOT supervised learning?

1. pca
2. decision tree
3. linear regression
4. naive bayesian

✅ Correct Answer: 3

How can you avoid overfitting ?

1. by using a lot of data
2. by using inductive machine learning
3. by using validation only
4. None of the above

✅ Correct Answer: 1

What are the popular algorithms of Machine Learning?

1. decision trees and neural networks (back propagation)
2. probabilistic networks and nearest neighbor
3. support vector machines
4. all

✅ Correct Answer: 4

What is Training set?

1. training set is used to test the accuracy of the hypotheses generated by the learner.
2. a set of data is used to discover the potentially predictive relationship.
3. both a & b
4. none of above

✅ Correct Answer: 2

Common deep learning applications include

1. image classification, real-time visual tracking
2. autonomous car driving, logistic optimization
3. bioinformatics, speech recognition
4. All of the above

✅ Correct Answer: 4

what is the function of Supervised Learning?

1. classifications, predict time series, annotate strings
2. speech recognition, regression
3. both a & b
4. None of the above

✅ Correct Answer: 3

Commons unsupervised applications include

1. object segmentation
2. similarity detection
3. automatic labeling
4. All of the above

✅ Correct Answer: 4

Reinforcement learning is particularly efficient when

1. the environment is not completely deterministic
2. its often very dynamic
3. its impossible to have a precise error measure
4. All of the above

✅ Correct Answer: 4

if there is only a discrete number of possible outcomes (called categories), the process becomes a

1. regression
2. classification
3. modelfree
4. categories

✅ Correct Answer: 2

Which of the following are supervised learning applications

1. spam detection, pattern detection, natural language processing
2. image classification, real-time visual tracking
3. autonomous car driving, logistic optimization
4. bioinformatics, speech recognition

✅ Correct Answer: 1

During the last few years, many algorithms have been applied to deep neural networks to learn the best policy for playing Atari video games and to teach an agent how to associate the right action with an input representing the state.

1. logical
2. classical
3. classification
4. None of the above

✅ Correct Answer: 4

Which of the following sentence is correct?

1. machine learning relates with the study, design and
2. data mining can be defined as the process in which the
3. both a & b
4. none of the above

✅ Correct Answer: 3

What is Overfitting in Machine learning?

1. when a statistical model describes random error or noise instead of underlying relationship overfitting occurs.
2. robots are programed so that they can perform the task based on data they gather from sensors.
3. while involving the process of learning overfitting occurs.
4. a set of data is used to discover the potentially predictive relationship

✅ Correct Answer: 1

What is Test set?

1. test set is used to test the accuracy of the hypotheses generated by the learner.
2. it is a set of data is used to discover the potentially predictive relationship.
3. both a & b
4. none of above

✅ Correct Answer: 1

is much more difficult because it's necessary to determine a supervised strategy to train a model for each feature and, finally, to predict their value

1. removing the whole line
2. creating sub-model to predict those features
3. using an automatic strategy to input them according to the other known values
4. all above

✅ Correct Answer: 2

How it's possible to use a different placeholder through the parameter .

1. regression
2. classification
3. random_state
4. missing_values

✅ Correct Answer: 4

If you need a more powerful scaling feature, with a superior control on outliers and the possibility to select a quantile range, there's also the class

1. robustscaler
2. dictvectorizer
3. labelbinarizer
4. featurehasher

✅ Correct Answer: 1

scikit-learn also provides a class for per- sample normalization, Normalizer. It can apply to each element of a dataset

1. max, l0 and l1 norms
2. max, l1 and l2 norms
3. max, l2 and l3 norms
4. max, l3 and l4 norms

✅ Correct Answer: 2

There are also many univariate methods that can be used in order to select the best features according to specific criteria based on

1. f-tests and p-values
2. chi-square
3. anova
4. all above

✅ Correct Answer: 1

Which of the following selects only a subset of features belonging to a certain percentile

1. selectpercentile
2. featurehasher
3. selectkbest
4. all above

✅ Correct Answer: 1

performs a PCA with non-linearly separable data sets.

1. sparsepca
2. kernelpca
3. svd
4. None of the mentioned

✅ Correct Answer: 2

A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a college.Which of the following statement is true in following case?

1. feature f1 is an example of nominal variable.
2. feature f1 is an example of ordinal variable.
3. it doesnt belong to any of the above category.
4. both of these

✅ Correct Answer: 2

What would you do in PCA to get the same projection as SVD?

1. transform data to zero mean
2. transform data to zero median
3. not possible
4. none of these

✅ Correct Answer: 1

What is PCA, KPCA and ICA used for?

1. principal components analysis
2. kernel based principal component analysis
3. independent component analysis
4. all above

✅ Correct Answer: 4

What are common feature selection methods in regression task?

1. correlation coefficient
2. greedy algorithms
3. all above
4. none of these

✅ Correct Answer: 3

The parameter allows specifying the percentage of elements to put into the test/training set

1. test_size
2. training_size
3. all above
4. none of these

✅ Correct Answer: 3

In many classification problems, the target is made up of categorical labels which cannot immediately be processed by any algorithm.

1. random_state
2. dataset
3. test_size
4. all above

✅ Correct Answer: 2

adopts a dictionary-oriented approach, associating to each category label a progressive integer number.

1. labelencoder class
2. labelbinarizer class
3. dictvectorizer
4. featurehasher

✅ Correct Answer: 1

If Linear regression model perfectly first i.e., train error is zero, then

1. test error is also always zero
2. test error is non zero
3. couldnt comment on test error
4. test error is equal to train error

✅ Correct Answer: 3

Which of the following metrics can be used for evaluating regression models?i) R Squaredii) Adjusted R Squarediii) F Statisticsiv) RMSE / MSE / MAE

1. ii and iv
2. i and ii
3. ii, iii and iv
4. i, ii, iii and iv

✅ Correct Answer: 4

In a simple linear regression model (One independent variable), If we change the input variable by 1 unit. How much output variable will change?

1. a) by 1
2. no change
3. by intercept
4. by its slope

✅ Correct Answer: 4

Function used for linear regression in R is

1. lm(formula, data)
2. lr(formula, data)
3. lrm(formula, data)
4. regression.linear(formula, data)

✅ Correct Answer: 1

In syntax of linear model lm(formula,data,..), data refers to

1. matrix
2. vector
3. array
4. list

✅ Correct Answer: 2

In the mathematical Equation of Linear Regression Y?=??1 + ?2X + ?, (?1, ?2) refers to

1. (x-intercept, slope)
2. (slope, x-intercept)
3. (y-intercept, slope)
4. (slope, y-intercept)

✅ Correct Answer: 3

Which of the following methods do we use to find the best fit line for data in Linear Regression?

1. least square error
2. maximum likelihood
3. logarithmic loss
4. both a and b

✅ Correct Answer: 1