AI organizations divide their work into data engineering, modeling, deployment, business analysis, and AI infrastructure. Each task requires specific skills and can be the focus of multiple roles. If you apply to a role that carries out the modeling task such as Machine Learning Engineer (MLE), Data Scientist (DS), Machine Learning Researcher (MLR) or Software Engineer-Machine Learning (SE-ML), you’ll often encounter the machine learning algorithms interview during the onsite round. You can learn more about these roles in our AI Career Pathways report and about other types of interviews in The Skills Boost.

I What to expect in the machine learning algorithms interview

The interviewer will try to uncover how deeply you understand (usually classic) machine learning algorithms. Here’s a list of interview questions Workera candidates have been asked onsite:

Derive the binary cross-entropy loss function.
How does Logistic Regression differ from Linear Regression?
What is the difference between Batch Gradient Descent and Stochastic Gradient Descent?
Explain a classic machine learning algorithm, among the following list: Linear Regression, Logistic Regression, Decision Trees, Random Forest, XGBoost, Support Vector Machines, K-means, K-Nearest Neighbors, Neural Networks, Principal Component Analysis, Naive Bayes Classifier, L1/L2 regularization, etc.
Why is the EM algorithm useful?
Why is the Naive Bayes classifier called Naive?
How does a discriminative model differ from a generative model?
In K-Nearest Neighbors, how does the value of K impact bias and variance?
In Support Vector Machines, what is the kernel trick?

II Interview tips

Every interview is an opportunity to show your skills and motivation for the role. Thus, it is important to prepare in advance. Here are useful rules of thumb to follow:

Listen to the hints given by your interviewer.

Example: You’re explaining PCA and state that “we should find the eigenvalues and eigenvectors of the data matrix X”. If your interviewer questions you with “are you sure?” or “can you interpret the eigenvalues of X?”, there is a high chance your answer is imprecise or wrong. You should react by reconsidering and talking through your answer. In this case, the interviewer expects you to introduce the covariance matrix of X and find its eigenvalues/eigenvectors.

Don’t mention methods you’re not able to explain.

Example: You’re explaining logistic regression and state that “we’re using logistic regression for binary classification problems. For a multi-class problem we would use a softmax regression.” In this scenario, you can expect the interviewer to ask: “could you explain softmax regression?”

Write clearly, draw charts, and introduce a notation if necessary.

The interviewer will judge your scientific rigor.

Example: You’re asked to write the binary cross entropy cost function. Instead of writing $\mathcal{J}= -[y\log\hat{y}+(1-y)\log(1-\hat{y})]$, write $\mathcal{J}(\hat{y}, y) = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(\hat{y}^{(i)}, y^{(i)}) = - \frac{1}{m} \sum_{i=1}^m [y^{(i)}\log\hat{y}^{(i)}+(1-y^{(i)})\log(1-\hat{y}^{(i)})]$. In this fashion, you’ll display your meticulous understanding of cost functions, their arguments, and how they differ from loss functions.

Interviewers will often ask you questions about methods they use at work.

Before going onsite, read online about the product the company is building and try to infer the methods they might be using.

Example: If you’re interviewing with a fraud detection team, you might want to learn about the methods to deal with imbalanced datasets, precision, recall, and F1 score before going onsite.

When you are not sure of your answer, be honest and say so.

Interviewers value honesty and penalize bluffing far more than lack of knowledge.

Example: Assume the interviewer asks you about Bayes error. You remember multiple concepts named after the statistician Thomas Bayes including the Bayes theorem and the Naive Bayes classifier, but not the Bayes error. Rather than answering vaguely, you could say “I’m familiar with the Bayes theorem and the Naive Bayes classifier, but I don’t think I’ve been exposed to the Bayes error. Could you expand?”. This allows the interviewer to add context on Bayes Error and help you answer without prior knowledge of the subject, or move to the next question.

When out of ideas or stuck, think out loud rather than staying silent.

Talking through your thought process will help the interviewer correct you and point you in the right direction.

III Resources

The machine learning section of the Workera test is a great way to prepare for this interview. It’ll provide you with a personalized study plan which includes a list of your strengths and weaknesses, along with curated training material to prepare for interviews or transition in your career. Additionally, here’s a list of useful resources to prepare for the machine learning algorithms interview.

Stanford’s CS229 lecture notes:
Machine Learning on Coursera

Machine learning engineers carry out data engineering, modeling, and deployment tasks. They demonstrate solid scientific and engineering skills (see Figure above). Communication skills requirements vary among teams.

Data scientists carry out data engineering, modeling, and business analysis tasks. They demonstrate solid scientific foundations as well as business acumen (see Figure above). Communication skills are usually required, but the level depends on the team.

Machine learning researchers carry out data engineering and modeling tasks. They demonstrate outstanding scientific skills (see Figure above). Communication skills requirements vary among teams.

People who have the title software engineer-machine learning carry out data engineering, modeling, deployment and AI infrastructure tasks. They demonstrate solid engineering skills and are developing scientific skills (see Figure above). Communication skills requirements vary among teams.

Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. (Wikipedia)

Logistic regression is a machine learning model that uses a sigmoid function to model a binary dependent variable.

The loss computes a penalty for an inaccurate prediction given a single training example, while the cost is usually the average of the loss of the entire training set.

In machine learning, Bayes error is the lowest achievable error for a classifier. It is analogous to the irreducible error.

Machine learning algorithms interview