July 19, 2026

4 min read

Machine Learning Scientist / Engineer Roadmap

This roadmap details the comprehensive journey to becoming a machine learning specialist, emphasising foundational skills, core algorithms, deep learning, MLOps, and the significance of responsible AI in production environments.

A complete path to becoming a machine learning specialist, the person who builds, trains, deploys, and maintains models that actually work in production. This roadmap goes far beyond fitting a model in a notebook. It takes you through the full breadth of modern machine learning: classical algorithms, deep learning, natural language processing, computer vision, big data tooling, MLOps, and the large language models reshaping the field today.

Will be adding articles here as we go,
Andrei

Start With the Data Scientist Roadmap First

Before beginning this roadmap, you should complete the Data Scientist Roadmap. That earlier path gives you the mathematical and statistical foundations, the Python fluency, and the data handling skills that everything here depends on. Machine learning is applied statistics built on solid engineering, and trying to learn it without that groundwork leaves you running algorithms you cannot reason about. Finish the Data Scientist Roadmap, and you arrive here ready to build on a foundation that holds. Begin here without it, and you will spend half your time backfilling concepts you should already own.

The Curriculum

The roadmap is organised into stages that build on one another, moving from core algorithms through deep learning and specialised domains, into the engineering and deployment skills that turn a model into a product, and finishing with the cutting edge of generative AI.

Core machine learning

Supervised learning with scikit-learn: https://datalad.co.uk/supervised-machine-learning-with-scikit-learn/
- Code along: https://datalad.co.uk/supervised-machine-learning-with-scikit-learn-10-code-along-examples/
- Cheatsheet: https://datalad.co.uk/supervised-machine-learning-with-scikit-learn-cheatsheet/
- Feynman Technique:
- Video:
- The math: https://datalad.co.uk/the-mathematics-behind-supervised-machine-learning/
Unsupervised learning techniques in Python: https://datalad.co.uk/unsupervised-machine-learning-clustering-dimensionality-reduction-and-topic-modeling/
- Code along: https://datalad.co.uk/unsupervised-machine-learning-10-code-along-examples/
- Cheatsheet:
- Feynman Technique:
- Video:
- The math: https://datalad.co.uk/the-mathematics-behind-unsupervised-learning-pca-and-k-means/
Building linear classifiers: https://datalad.co.uk/linear-classifiers-in-python/
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
- The math: https://datalad.co.uk/the-mathematics-behind-linear-classifiers-margins-and-the-svm/
Decision trees and tree-based models
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Gradient boosting with XGBoost: https://datalad.co.uk/xgboost-a-practical-guide-to-extreme-gradient-boosting/
- Code along: https://datalad.co.uk/xgboost-10-code-along-examples/
- Cheatsheet:
- Feynman Technique:
- Video:
- The math: https://datalad.co.uk/the-mathematics-behind-gradient-boosting-and-ensembles/
Clustering and cluster analysis: https://datalad.co.uk/cluster-analysis-in-python/
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Reducing dimensionality: https://datalad.co.uk/dimensionality-reduction-in-python/
- Code along: https://datalad.co.uk/dimensionality-reduction-in-python-10-code-along-examples/
- Cheatsheet: https://datalad.co.uk/dimensionality-reduction-in-python-cheatsheet/
- Feynman Technique:
- Video:
Preparing data for machine learning: https://datalad.co.uk/feature-engineering-and-preprocessing-getting-data-ready-for-a-model/
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Machine learning on time series: https://datalad.co.uk/machine-learning-for-time-series-data/
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
- The math: https://datalad.co.uk/the-mathematics-behind-time-series-ml/
Engineering features that improve models: https://datalad.co.uk/feature-engineering-in-python/
- Code along: https://datalad.co.uk/feature-engineering-in-python-10-code-along-examples/
- Cheatsheet: https://datalad.co.uk/feature-engineering-in-python-cheatsheet/
- Feynman Technique:
- Video:
- The math: https://datalad.co.uk/the-mathematics-behind-feature-engineering/
Validating model performance: https://datalad.co.uk/model-validation-in-machine-learning/
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
- The math: https://datalad.co.uk/the-mathematics-behind-model-validation/
Tuning hyperparameters: https://datalad.co.uk/hyperparameter-tuning-in-python/
- Code along: https://datalad.co.uk/hyperparameter-tuning-in-python-10-code-along-examples/
- Cheatsheet: https://datalad.co.uk/hyperparameter-tuning-in-python-cheatsheet/
- Feynman Technique:
- Video:
- The math: https://datalad.co.uk/the-mathematics-behind-hyperparameter-search/
Combining models with ensemble methods: https://datalad.co.uk/ensemble-methods-in-python/
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:

Natural language processing

Working with text: NLP foundations
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Practical NLP using spaCy
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Turning text into features
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:

Deep learning

Getting started with PyTorch: https://datalad.co.uk/pytorch-deep-learning-fundamentals/
- Code along: https://datalad.co.uk/pytorch-deep-learning-fundamentals-10-code-along-examples/
- Cheatsheet: https://datalad.co.uk/pytorch-deep-learning-fundamentals-cheatsheet/
- Feynman Technique:
- Video:
Going deeper with PyTorch: https://datalad.co.uk/intermediate-pytorch-datasets-cnns-rnns-and-multi-branch-models/
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Working with images: image processing
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Image models in PyTorch
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Text models in PyTorch
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Neural networks with Keras: https://datalad.co.uk/neural-networks-with-keras/
- Code along: https://datalad.co.uk/neural-networks-with-keras-10-code-along-examples/
- Cheatsheet: https://datalad.co.uk/neural-networks-with-keras-cheatsheet/
- Feynman Technique:
- Video:
- The math: https://datalad.co.uk/the-mathematics-behind-neural-networks/
Advanced neural networks in Keras: https://datalad.co.uk/advanced-neural-networks-in-keras-the-functional-api/
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Building image models in Keras
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Sequence models and RNNs in Keras
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Understanding transformer models in PyTorch: https://datalad.co.uk/transformer-models-with-pytorch/
- Code along: https://datalad.co.uk/transformer-models-with-pytorch-10-code-along-examples/
- Cheatsheet: https://datalad.co.uk/transformer-models-with-pytorch-cheatsheet/
- Feynman Technique:
- Video:

Reinforcement learning

Reinforcement learning with Gymnasium
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Deep reinforcement learning
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Reinforcement learning from human feedback (RLHF)
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:

Big data and distributed computing

Getting started with PySpark
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Machine learning at scale with PySpark
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Getting started with Databricks
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Querying data with Databricks SQL
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Managing data in Databricks
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Visualising data in Databricks
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:

MLOps and engineering for ML

Command line essentials for ML
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Tracking experiments with MLflow
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Building data pipelines: ETL and ELT
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Validating data quality with Great Expectations
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Versioning data with DVC
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Monitoring models in production
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Containerising work with Docker
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
CI/CD pipelines for machine learning
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:

Large language models and generative AI

Getting started with Hugging Face: https://datalad.co.uk/working-with-hugging-face/
- Code along: https://datalad.co.uk/working-with-hugging-face-10-code-along-examples/
- Cheatsheet: https://datalad.co.uk/hugging-face-cheatsheet/
- Feynman Technique:
- Video:
Building with LLMs in Python: https://datalad.co.uk/introduction-to-llms-in-python/
- Code along: https://datalad.co.uk/introduction-to-llms-in-python-10-code-along-examples/
- Cheatsheet: https://datalad.co.uk/introduction-to-llms-in-python-cheatsheet/
- Feynman Technique:
- Video:
Multi-modal models on Hugging Face: https://datalad.co.uk/multi-modal-models-with-hugging-face/
- Code along: https://datalad.co.uk/multi-modal-models-with-hugging-face-10-code-along-examples/
- Cheatsheet: https://datalad.co.uk/multi-modal-models-with-hugging-face-cheatsheet/
- Feynman Technique:
- Video:
Building AI agents with smolagents
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:
Working with Llama 3: https://datalad.co.uk/working-with-llama-3/
- Code along: https://datalad.co.uk/working-with-llama-3-10-code-along-examples/
- Cheatsheet: https://datalad.co.uk/working-with-llama-3-cheatsheet/
- Feynman Technique:
- Video:
Optimising Llama 3 for your own tasks: https://datalad.co.uk/optimising-llama-3/
- Code along:
- Cheatsheet:
- Feynman Technique:
- Video:

Responsible and interpretable AI

Making models explainable

Code Along

Hyperparameter Tuning: https://datalad.co.uk/hyperparameter-tuning-start-to-finish-a-code-along/

How to Use This Roadmap

Work through the stages in order. The core machine learning stage gives you command of the classical algorithms and the workflow around them: preprocessing, feature engineering, validation, and tuning. From there, deep learning and the specialised domains of NLP, vision, and reinforcement learning expand what you can build. The big data and MLOps stages are what separate a model that works on your laptop from one that runs reliably in production, serving real users and staying healthy over time. And the final stages on large language models and explainable AI bring you to the frontier of the field.

A word on emphasis: it is tempting to rush toward the exciting topics, the LLMs and the deep learning, and skip the engineering. Resist that. The ability to deploy, monitor, version, and validate a model is what makes you an engineer rather than someone who fits models in notebooks. Those skills are also where much of the real demand sits. Give the MLOps stage the attention it deserves, and you will be the rare specialist who can take a model all the way from idea to production and keep it running.

See you soon.

Andrei

July 19, 2026

4 min read

View Comments (1)

Recommended for You

Working with GitHub: Collaboration, Workflow, and the Wider Ecosystem

Git is the version control software managing local files, while GitHub is the cloud platform offering collaboration tools. This guide explores using GitHub effectively, including repository setup, file management, and collaboration features.

June 23, 2026

9 min read

Feature Engineering and Preprocessing: Getting Data Ready for a Model

Effective machine learning relies on preparing data through meticulous preprocessing, involving cleaning, scaling, encoding, and feature engineering to ensure models can accurately learn and predict from data.

June 23, 2026

11 min read

CSS Display Property Cheatsheet

Understanding the CSS Display Property: 10 Code-Along Examples

Standard SQL in BigQuery Cheatsheet

Standard SQL in BigQuery: 10 Code-Along Examples

Machine Learning Scientist / Engineer Roadmap

Start With the Data Scientist Roadmap First

The Curriculum

Core machine learning

Natural language processing

Deep learning

Reinforcement learning

Big data and distributed computing

MLOps and engineering for ML

Large language models and generative AI

Code Along

How to Use This Roadmap

Related

Leave a ReplyCancel reply

Recommended for You

Working with GitHub: Collaboration, Workflow, and the Wider Ecosystem

Feature Engineering and Preprocessing: Getting Data Ready for a Model

CSS Display Property Cheatsheet

Understanding the CSS Display Property: 10 Code-Along Examples

Standard SQL in BigQuery Cheatsheet

Standard SQL in BigQuery: 10 Code-Along Examples

Machine Learning Scientist / Engineer Roadmap

Start With the Data Scientist Roadmap First

The Curriculum

Core machine learning

Natural language processing

Deep learning

Reinforcement learning

Big data and distributed computing

MLOps and engineering for ML

Large language models and generative AI

Code Along

How to Use This Roadmap

Related

Leave a ReplyCancel reply

Subscribe to My Newsletter

Recommended for You

Working with GitHub: Collaboration, Workflow, and the Wider Ecosystem

Feature Engineering and Preprocessing: Getting Data Ready for a Model

Discover more from Discuss Data Science, Machine Learning and Analytics