A complete path to becoming a machine learning specialist, the person who builds, trains, deploys, and maintains models that actually work in production. This roadmap goes far beyond fitting a model in a notebook. It takes you through the full breadth of modern machine learning: classical algorithms, deep learning, natural language processing, computer vision, big data tooling, MLOps, and the large language models reshaping the field today.
I will be adding articles here as we go.
Andrei
Start With the Data Scientist Roadmap First
Before beginning this roadmap, you should complete the Data Scientist Roadmap. That earlier path gives you the mathematical and statistical foundations, the Python fluency, and the data handling skills that everything here depends on. Machine learning is applied statistics built on solid engineering, and trying to learn it without that groundwork leaves you running algorithms you cannot reason about. Finish the Data Scientist Roadmap, and you arrive here ready to build on a foundation that holds. Begin here without it, and you will spend half your time backfilling concepts you should already own.
The Curriculum
The roadmap is organised into stages that build on one another, moving from core algorithms through deep learning and specialised domains, into the engineering and deployment skills that turn a model into a product, and finishing with the cutting edge of generative AI.
Core machine learning
- Supervised learning with scikit-learn: https://datalad.co.uk/supervised-machine-learning-with-scikit-learn/
- Unsupervised learning techniques in Python: https://datalad.co.uk/unsupervised-machine-learning-clustering-dimensionality-reduction-and-topic-modeling/
- Building linear classifiers
- Decision trees and tree-based models
- Gradient boosting with XGBoost
- Clustering and cluster analysis
- Reducing dimensionality
- Preparing data for machine learning
- Machine learning on time series
- Engineering features that improve models
- Validating model performance
- Tuning hyperparameters
- Combining models with ensemble methods
Natural language processing
- Working with text: NLP foundations
- Practical NLP using spaCy
- Turning text into features
Deep learning
- Getting started with PyTorch
- Going deeper with PyTorch
- Working with images: image processing
- Image models in PyTorch
- Text models in PyTorch
- Neural networks with Keras
- Advanced neural networks in Keras
- Building image models in Keras
- Sequence models and RNNs in Keras
- Understanding transformer models in PyTorch
Reinforcement learning
- Reinforcement learning with Gymnasium
- Deep reinforcement learning
- Reinforcement learning from human feedback (RLHF)
Big data and distributed computing
- Getting started with PySpark
- Machine learning at scale with PySpark
- Getting started with Databricks
- Querying data with Databricks SQL
- Managing data in Databricks
- Visualising data in Databricks
MLOps and engineering for ML
- Command line essentials for ML
- Tracking experiments with MLflow
- Building data pipelines: ETL and ELT
- Validating data quality with Great Expectations
- Versioning data with DVC
- Monitoring models in production
- Containerising work with Docker
- CI/CD pipelines for machine learning
Large language models and generative AI
- Getting started with Hugging Face
- Building with LLMs in Python
- Multi-modal models on Hugging Face
- Building AI agents with smolagents
- Working with Llama 3
- Fine-tuning Llama 3 for your own tasks
Responsible and interpretable AI
- Making models explainable
How to Use This Roadmap
Work through the stages in order. The core machine learning stage gives you command of the classical algorithms and the workflow around them: preprocessing, feature engineering, validation, and tuning. From there, deep learning and the specialised domains of NLP, vision, and reinforcement learning expand what you can build. The big data and MLOps stages are what separate a model that works on your laptop from one that runs reliably in production, serving real users and staying healthy over time. And the final stages on large language models and explainable AI bring you to the frontier of the field.
A word on emphasis: it is tempting to rush toward the exciting topics, the LLMs and the deep learning, and skip the engineering. Resist that. The ability to deploy, monitor, version, and validate a model is what makes you an engineer rather than someone who fits models in notebooks. Those skills are also where much of the real demand sits. Give the MLOps stage the attention it deserves, and you will be the rare specialist who can take a model all the way from idea to production and keep it running.
See you soon.