You and your team need to process large datasets of images as fast as possible for a machine learning task. The project will also use a modular framework with extensible code and an active developer community. Which of the following would BEST meet your needs?
For a particular classification problem, you are tasked with determining the best algorithm among SVM, random forest, K-nearest neighbors, and a deep neural network. Each of the algorithms has similar accuracy on your data. The stakeholders indicate that they need a model that can convey each feature's relative contribution to the model's accuracy. Which is the best algorithm for this use case?
Which two of the following criteria are essential for machine learning models to achieve before deployment? (Select two.)
Which of the following is a common negative side effect of not using regularization?
Which of the following metrics is being captured when performing principal component analysis?
In which of the following scenarios is lasso regression preferable over ridge regression?
Your dependent variable Y is a count, ranging from 0 to infinity. Because Y is approximately log-normally distributed, you decide to log-transform the data prior to performing a linear regression.
What should you do before log-transforming Y?
A company is developing a merchandise sales application The product team uses training data to teach the AI model predicting sales, and discovers emergent bias. What caused the biased results?
Which of the following options is a correct approach for scheduling model retraining in a weather prediction application?
Your dependent variable data is a proportion. The observed range of your data is 0.01 to 0.99. The instrument used to generate the dependent variable data is known to generate low quality data for values close to 0 and close to 1. A colleague suggests performing a logit-transformation on the data prior to performing a linear regression. Which of the following is a concern with this approach?
Definition of logit-transformation
If p is the proportion: logit(p)=log(p/(l-p))
Which of the following approaches is best if a limited portion of your training data is labeled?
Which of the following items should be included in a handover to the end user to enable them to use and run a trained model on their own system? (Select three.)
Which of the following is the correct definition of the quality criteria that describes completeness?
A healthcare company experiences a cyberattack, where the hackers were able to reverse-engineer a dataset to break confidentiality.
Which of the following is TRUE regarding the dataset parameters?
Which of the following principles supports building an ML system with a Privacy by Design methodology?
Which two encodes can be used to transform categories data into numerical features? (Select two.)
When working with textual data and trying to classify text into different languages, which approach to representing features makes the most sense?
A data scientist is tasked to extract business intelligence from primary data captured from the public. Which of the following is the most important aspect that the scientist cannot forget to include?
A change in the relationship between the target variable and input features is
Which of the following is a type 1 error in statistical hypothesis testing?
You have a dataset with many features that you are using to classify a dependent variable. Because the sample size is small, you are worried about overfitting. Which algorithm is ideal to prevent overfitting?