# Interview resources : ML/Data Science/AI Research Engineer

## A curated list of topics, resources and questions

Interviewing is a grueling process, specially during COVID. I recently interviewed with Microsoft (Data Scientist ll), Amazon (Applied AI Scientist) and Apple (Software Development : Machine Learning).

Though all these interviews differed a bit, but the basic questions asked were the same. During the process I curated this list which would help you pass all ML interviews.

NOTE : This list is just for end moment revising

# Machine Learning

**Linear, Logistic regression-**http://cs229.stanford.edu/notes2020spring/cs229-notes1.pdf

**Naive Bayes**- https://towardsdatascience.com/naive-bayes-classifier-81d512f50a7c

**SVM / Kernel**- http://cs229.stanford.edu/notes2020fall/notes2020fall/cs229-notes3.pdf

**Random Forests, decision Trees, Boosting, Bagging, Xgboost- **StatQuest Youtube videos https://www.youtube.com/watch?v=J4Wdy0Wc_xQ

**EM Algorithm- **http://cs229.stanford.edu/notes2020spring/cs229-notes8.pdf

**K means**-https://towardsdatascience.com/understanding-k-means-clustering-in-machine-learning-6a6e67336aa1

**K nearest neighbors- **https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/

**Evaluation Metrics** (scroll to the definition section, you need to know the confusion metrics, precision, recall, type I, type II, FP rate, sensitivity)-https://en.wikipedia.org/wiki/Precision_and_recall

**Regularization** (L1,L2, Why is L1 sparse?) https://explained.ai/regularization/L1vsL2.html

**Bias Variance Trade off**

**Dimensionality Reduction-**

**PCA**deeplearning book chapter 2 (last pages) https://www.deeplearningbook.org/contents/linear_algebra.html**TSNE**https://distill.pub/2016/misread-tsne/

# Deep Learning

The first thing I would suggest to do is to go through all the deeplearnig.ai courses which is pretty basic. If someone already publishes/ works in these topics they might just skip watching all the videos and can go through the following questions/ resources-

- Know what is- K fold cross validation, dropout, batch norm [Difference between batch norm and layer norm], early stopping
**Weight decay**https://www.coursera.org/lecture/deep-neural-network/regularization-Srsrc**Calibration**https://arxiv.org/abs/1706.04599 (Look to what is calibration and ECE score)**Transformer-**The Illustrated Transformer — Jay Alammar — Visualizing machine learning one concept at a time. (jalammar.github.io)**Attention (Multi head, single head)**Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) — Jay Alammar — Visualizing machine learning one concept at a time. (jalammar.github.io)- Different
**Optimizers**(Important are- Gradient descent, Adam, RMSprop, Adagrad, Adamax)- An overview of gradient descent optimization algorithms (ruder.io) **Initialization-**Initializing neural networks — deeplearning.ai

# NLP

For NLP CS224 (5) Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 1 — Introduction and Word Vectors — YouTube) covers the basics of NLP with Deep Learning. This might cover 3/4 of the questions asked in an interview. Other questions are usually more state of the art models as the interviewer wants to check how updated you are.

**LSTM, GRU-**Understanding LSTM Networks — colah’s blog**BERT-**A Visual Guide to Using BERT for the First Time — Jay Alammar — Visualizing machine learning one concept at a time. (jalammar.github.io) ,**GPT-**GPT models explained. Open AI’s GPT-1,GPT-2,GPT-3 | Walmart Global Tech Blog (medium.com)- Different types of embeddings (Bag of words, TFIDF, Word2vec(skipgram[How is it trained], pre-trained (Google word2vec, Stanford Glove, fasttext, ELMo))). Need to know how incremental changes were brought into place.

# Other topics —

**Linear Algebra**-https://www.deeplearningbook.org/contents/linear_algebra.html**Probability**basics- http://www2.ece.rochester.edu/~gmateosb/ECE440/Slides/block_2_probability_review_part_a.pdf**Stats-**I had taken a graduate level Statistics class so I didnt need to brush this up but Khan Academy https://www.khanacademy.org/math/statistics-probability is a very good source for learning basics with examples.

These are the topics which are asked in all interviews, obvious then some questions were specific to research I had done. There were also live coding rounds both of algorithms and NN models. Let me know if I missed something.