Interview resources : ML/Data Science/AI Research Engineer

A curated list of topics, resources and questions

3 min readFeb 15, 2021

Interviewing is a grueling process, specially during COVID. I recently interviewed with Microsoft (Data Scientist ll), Amazon (Applied AI Scientist) and Apple (Software Development : Machine Learning).

Though all these interviews differed a bit, but the basic questions asked were the same. During the process I curated this list which would help you pass all ML interviews.

NOTE : This list is just for end moment revising

Machine Learning

Linear, Logistic regression-http://cs229.stanford.edu/notes2020spring/cs229-notes1.pdf

Naive Bayes- https://towardsdatascience.com/naive-bayes-classifier-81d512f50a7c

SVM / Kernel- http://cs229.stanford.edu/notes2020fall/notes2020fall/cs229-notes3.pdf

Random Forests, decision Trees, Boosting, Bagging, Xgboost- StatQuest Youtube videos https://www.youtube.com/watch?v=J4Wdy0Wc_xQ

EM Algorithm- http://cs229.stanford.edu/notes2020spring/cs229-notes8.pdf

K means-https://towardsdatascience.com/understanding-k-means-clustering-in-machine-learning-6a6e67336aa1

K nearest neighbors- https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/

Evaluation Metrics (scroll to the definition section, you need to know the confusion metrics, precision, recall, type I, type II, FP rate, sensitivity)-https://en.wikipedia.org/wiki/Precision_and_recall

Regularization (L1,L2, Why is L1 sparse?) https://explained.ai/regularization/L1vsL2.html

Bias Variance Trade off

Dimensionality Reduction-

PCA deeplearning book chapter 2 (last pages) https://www.deeplearningbook.org/contents/linear_algebra.html
TSNE https://distill.pub/2016/misread-tsne/

Deep Learning

The first thing I would suggest to do is to go through all the deeplearnig.ai courses which is pretty basic. If someone already publishes/ works in these topics they might just skip watching all the videos and can go through the following questions/ resources-

Know what is- K fold cross validation, dropout, batch norm [Difference between batch norm and layer norm], early stopping
Weight decay https://www.coursera.org/lecture/deep-neural-network/regularization-Srsrc
Calibration https://arxiv.org/abs/1706.04599 (Look to what is calibration and ECE score)
Transformer- The Illustrated Transformer — Jay Alammar — Visualizing machine learning one concept at a time. (jalammar.github.io)
Attention (Multi head, single head) Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) — Jay Alammar — Visualizing machine learning one concept at a time. (jalammar.github.io)
Different Optimizers (Important are- Gradient descent, Adam, RMSprop, Adagrad, Adamax)- An overview of gradient descent optimization algorithms (ruder.io)
Initialization- Initializing neural networks — deeplearning.ai

NLP

For NLP CS224 (5) Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 1 — Introduction and Word Vectors — YouTube) covers the basics of NLP with Deep Learning. This might cover 3/4 of the questions asked in an interview. Other questions are usually more state of the art models as the interviewer wants to check how updated you are.

LSTM, GRU- Understanding LSTM Networks — colah’s blog
BERT- A Visual Guide to Using BERT for the First Time — Jay Alammar — Visualizing machine learning one concept at a time. (jalammar.github.io) , GPT- GPT models explained. Open AI’s GPT-1,GPT-2,GPT-3 | Walmart Global Tech Blog (medium.com) and you can read the respective papers
Different types of embeddings (Bag of words, TFIDF, Word2vec(skipgram[How is it trained], pre-trained (Google word2vec, Stanford Glove, fasttext, ELMo))). Need to know how incremental changes were brought into place.

Interview resources : ML/Data Science/AI Research Engineer

A curated list of topics, resources and questions

Machine Learning

Deep Learning

NLP

Other topics —

Written by Purvanshi Mehta

Responses (9)