
Anatoly Makarevich
Senior Staff Data Scientist
"Full stack" data scientist, with a focus on econometric modelling using modern machine learning techniques.
I can lead projects throughout all stages: requirements discovery, EDA, POC, modelling and experimentation, MVP, optimization, deployment and monitoring.
I can work in multiple project roles: data scientist, technical leader, software developer, software architect, consultant. I prefer a mix of roles, especially technical.
I have also created courses and manage/mentor multiple other data scientists in my unit.
I am fluent in general and scientific Python (PyData stack, PySpark, Pydantic) and deliver high-quality software.
My main domain experience is in the Retail and Consumer Packaged Goods industries.
Skills
Data Science Core
Master
- Statistics
- Exploratory Data Analysis (EDA)
- Optimization
- Model Tuning
- Evaluation
Tabular Machine Learning
Master
- Data Preparation
- Regression
- Classification
- Clustering
- Boosting Models
Time Series Analysis
Master
- Forecasting
- Structural Modelling
- Econometrics
- TS Features
- Time-Dependent System Design
Interpretable ML
Advanced
- Bayesian Modelling
- Probabilistic Programming
- PyMC
- Black-Box Interpretability
- SHAP
- Causal ML
Deep Learning
Intermediate (Personal Projects)
- Natural Language Processing (NLP)
- Large Language Models (LLMs)
- Computer Vision (CV)
- Reinforcement Learning
Domain Expertise
Advanced
- Data Science Consulting
- Demand Planning
- Retail
- Consumer Goods (CPG)
Software Engineering
Advanced
- Python
- Software Architecture
- API Design
- Type Hinting
- Linting
- Automated Testing
- Pydantic
Data Engineering
Advanced
- Database Design
- SQL
- Pandas
- PySpark
- Polars
- Xarray
- Pandera
MLOps and ML Engineering
Advanced
- Model Lifecycle
- MLFlow
- Model Deployment
- Kedro
DevOps
Advanced
- Continuous Integration (CI)
- Continuous Delivery (CD)
- GitHub Actions
- Azure DevOps
- Docker
- Linux
Cloud
Intermediate
- Databricks Platform
- Microsoft Azure
- Google Cloud Platform
Other Languages
Beginner
- Julia ♥
- Rust
- C++
- R
- Java
- Web (JavaScript, HTML, CSS)
Work Experience (9)
Nov 2025 - Current
Senior Staff Data Scientist, Enterprise AI
GE HealthCare
Developing a world-class machine learning, forecasting and AI platform.
- Technical lead for forecasting teams.
- Technical co-lead for development of the ML Ops platform.
Jul 2024 - Oct 2025
Lead Data Scientist
EPAM Systems, Client: Multinational Consumer Goods Company
Development of end-to-end data products in Sales Excellence.
- Played a technical leadership role for data science and data software engineering.
- Conducted exploratory data analysis and modelling.
- Completed a pilot for in-store sales execution with two retailers.
- Created a platform for forecasting and econometric modelling (extensive Python library, CI/CD templates) to enable scale-up.
- Set up best practices for software development and data science lifecycles.
- Environment: Azure Databricks, Azure DevOps
Jun 2023 - Aug 2024
Lead Data Scientist
EPAM Systems, Client: Global CPG (Consumer Packaged Goods) Company
Creation of a global next-gen forecasting and replenishment platform.
- Co-led the technical development of the platform, responsible for the Data Science and Software Architecture streams.
- Co-developed architecture on top of Azure Databricks platforms.
- Developed APIs for Core functionality, ML library, time travel databases.
- Screened and interviewed candidates for both the core and market-specific teams.
- Handed off technical leadership responsibilities and switched to a technology and data science consulting role.
- Environment: Azure Databricks, GitHub Actions, Jira, Confluence
Aug 2021 - Jun 2023
Lead Data Scientist
EPAM Systems, Client: Global CPG (Consumer Packaged Goods) Company
Evolution of a simple predictive model for e-commerce into a platform for multi-channel sales forecasting.
- Created Bayesian model for e-commerce channel that outperformed existing solution by at least 10% (real-world).
- Designed and implemented new process for data gathering that allowed an additional 20-50% MAPE improvement.
- Designed Python library architecture and part of architecture (in collaboration with architect) of a new forecasting platform.
- Implemented ~60% of total lines of code within core API code, reusable utilities, ML transformers and models, job and unit tests.
- Ran experiments and applied various ML models for forecasting and optimization tasks.
- Co-led decisions on data science and acted as "glue" between data science, data engineering and customer teams.
- Re-implemented CI/CD pipelines on GitHub Actions and Databricks.
- Environment: Azure Databricks, GitHub Actions, Jira, Confluence
Feb 2021 - Aug 2021
Senior Data Scientist
EPAM Systems, Client: Global CPG (Consumer Packaged Goods) Company
Development of predictive distribution models for countries in APAC region.
- Created additional features and applied model ensembling.
- Achieved a 5% accuracy improvement over the current solution.
- Initiated refactoring of existing solution to incorporate new features.
- Environment: Azure Databricks, Jira
Jul 2019 - Jan 2021
Senior Data Scientist
EPAM Systems, Client: Multinational Consumer Goods Manufacturer
Identification and estimation of Key Business Drivers (e.g. price elasticity, promotion, distribution).
- Won an inter-company "hackathon" using Bayesian statistical modelling.
- Created a model (PyMC) and backend (FastAPI) for a proof-of-concept web tool.
- Developed a large portion of the software stack, including data ingestion, ETL, modelling and pipelining.
- Implemented KBD (key business driver) "extraction" via sensitivity analysis and simulation.
- Created a model-based simulator for what-if analysis, comparison and explainability.
- Optimized and parallelized heavy computation (MCMC/NUTS within PyMC).
- Presented results to project leadership.
- Environment: Azure Cloud, Azure Databricks
Sep 2018 - Jun 2019
Data Scientist
EPAM Systems, Client: North American Retailer
Forecasting of consumer demand to direct replenishment and prevent stock-outs.
- Performed EDA, integrated external datasets (weather and demographics), and updated models for predicting sparse sales.
- Experimented with multiple modelling approaches, including price bucketing and hierarchical reconcilliation.
- Built an explainable statistical optimization algorithm for product size distribution.
- Developed a data/ML pipeline library on top of Airflow and Google Cloud; our team deployed using this library.
- Environment: Google Compute Engine, Google Cloud Composer, GitLab
Aug 2017 - Aug 2018
Data Scientist
EPAM Systems, Interal Workforce Planning Team
Worked with workforce planning team to forecast and optimize employee utilization/time allocation.
Jun 2017 - Aug 2017
Data Scientist Intern
EPAM Systems, Client: News Aggregator
Helped implement a model predicting risks associated with entities from text, using classical NLP features, SVM and XGBoost.
Volunteer
Nov 2020 - Current
Primary Developer
pydantic-yaml
YAML extension for Pydantic library.
- Full library implementation with tests.
- CI/CD through GitHub Actions and ReadTheDocs
- Reviewed and approved contributions.
- >170k downloads/week and used in production-grade projects.
- See https://pypistats.org/packages/pydantic-yaml
Feb 2023 - Dec 2025
Primary Developer
pydantic-kedro
Extension to serialize and load arbitrary Pydantic models through Kedro datasets.
- Full library implementation with tests.
- CI/CD through GitHub Actions and ReadTheDocs
Oct 2021 - Oct 2025
Course Developer and Mentor
EPAM Systems
Beginner and Intermediate Course on Time Series Analysis
- Co-authored a course on Time Series Analysis, partially based on Hyndman's FPP3.
- Mentored multiple students and established data scientists.
Education (2)
2018 - 2020
Master's (MSc)
Mathematical Modelling and Data Analysis
Belarusian State University
2014 - 2018
Bachelor's (BSc)
Applied Mathematics and Computer Science
Belarusian State University
Publications
The Markov-switching Vector Autoregressive with eXogenous variables is a mathematical time series model. It is applied to the problem of the business cycle analysis and forecasting of the growth rates of the real GDP of the Belarusian economy based on the Economic Sentiment Index (ESI) as a leading indicator.
This paper presents a comparative analysis of the Hodrick-Prescott and Hamilton statistical filters with regard to the tasks of extraction of the long-term trend and cycle components of the macroeconomic time series. A comparative analysis of the turning points of the real GDP cycles and Economic Sentiment Indicator (ESI) of the Belarusian economy obtained for both of these filters is conducted.
Languages
English
Native Speaker
Russian
Native Speaker
Polish
Beginner
Interests
Homelab and DIY
- 3D printing
- CAD (Computer-Aided Design)
- Microcomputers (Raspberry Pi)
- Self-hosting
- Networking
Gaming
- Board Gaming
- Board Game Design (4+ prototypes)
- PC Games
Playing Instruments
- Piano
- Theremin
References
“I hereby give consent for my personal data included in the application to be processed for the purposes of the recruitment process in accordance with Art. 6 paragraph 1 letter a of the Regulation of the European Parliament and of the Council (EU) 2016/679 of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). ”GDPR Consent