Anatoly Makarevich

"Full stack" data scientist, with a focus on econometric modelling using modern machine learning techniques. I can lead projects throughout all stages: requirements discovery, EDA, POC, modelling and experimentation, MVP, optimization, deployment and monitoring. I can work in multiple project roles: data scientist, technical leader, software developer, software architect, consultant. I prefer a mix of roles, especially technical. I have also created courses and manage/mentor multiple other data scientists in my unit. I am fluent in general and scientific Python (PyData stack, PySpark, Pydantic) and deliver high-quality software. My main domain experience is in the Retail and Consumer Packaged Goods industries.

Skills

Data Science Core

Master

Statistics
Exploratory Data Analysis (EDA)
Optimization
Model Tuning
Evaluation

Tabular Machine Learning

Master

Data Preparation
Regression
Classification
Clustering
Boosting Models

Time Series Analysis

Master

Forecasting
Structural Modelling
Econometrics
TS Features
Time-Dependent System Design

Interpretable ML

Advanced

Bayesian Modelling
Probabilistic Programming
PyMC
Black-Box Interpretability
SHAP
Causal ML

Deep Learning

Intermediate (Personal Projects)

Natural Language Processing (NLP)
Large Language Models (LLMs)
Computer Vision (CV)
Reinforcement Learning

Domain Expertise

Advanced

Data Science Consulting
Demand Planning
Retail
Consumer Goods (CPG)

Software Engineering

Advanced

Python
Software Architecture
API Design
Type Hinting
Linting
Automated Testing
Pydantic

Data Engineering

Advanced

Database Design
SQL
Pandas
PySpark
Polars
Xarray
Pandera

MLOps and ML Engineering

Advanced

Model Lifecycle
MLFlow
Model Deployment
Kedro

DevOps

Advanced

Continuous Integration (CI)
Continuous Delivery (CD)
GitHub Actions
Azure DevOps
Docker
Linux

Cloud

Intermediate

Databricks Platform
Microsoft Azure
Google Cloud Platform

Other Languages

Beginner

Julia ♥
Rust
C++
R
Java
Web (JavaScript, HTML, CSS)

Work Experience (8)

Jul 2024 - Current

Lead Data Scientist

EPAM Systems, Client: Multinational Consumer Goods Company

Krakow, PL https://www.epam.com/

Development of end-to-end data products in Sales Excellence.

Conducted exploratory data analysis and KPI value-sizing.
Set up best practices for software development and data science lifecycles.
Completed a pilot for in-store sales execution with two retailers.
Created a Python library/platform to enable scale-up to new retailers.
Currently scaling up, improving ML models, and playing a data science technical leadership role.
Environment: Azure Databricks, Azure DevOps (Repos, Pipelines, Artifacts, Issues)

Jun 2023 - Aug 2024

Lead Data Scientist

EPAM Systems, Client: Global CPG (Consumer Packaged Goods) Company

Krakow, PL

Creation of a global next-gen forecasting and replenishment platform.

Co-led the technical development of the platform, responsible for the Data Science and Software Architecture streams.
Co-developed architecture on top of Azure Databricks platforms.
Developed APIs for Core functionality, ML library, time travel databases.
Screened and interviewed candidates for both the core and market-specific teams.
Handed off technical leadership responsibilities and switched to a technology and data science consulting role.
Environment: Azure Databricks, GitHub Actions, Jira, Confluence

Aug 2021 - Jun 2023

Lead Data Scientist

EPAM Systems, Client: Global CPG (Consumer Packaged Goods) Company

Krakow, PL

Evolution of a simple predictive model for e-commerce into a platform for multi-channel sales forecasting.

Created Bayesian model for e-commerce channel that outperformed existing solution by at least 10% (real-world).
Designed and implemented new process for data gathering that allowed an additional 20-50% MAPE improvement.
Designed Python library architecture and part of architecture (in collaboration with architect) of a new forecasting platform.
Implemented ~60% of total lines of code within core API code, reusable utilities, ML transformers and models, job and unit tests.
Ran experiments and applied various ML models for forecasting and optimization tasks.
Co-led decisions on data science and acted as "glue" between data science, data engineering and customer teams.
Re-implemented CI/CD pipelines on GitHub Actions and Databricks.
Environment: Azure Databricks, GitHub Actions, Jira, Confluence

Feb 2021 - Aug 2021

Senior Data Scientist

EPAM Systems, Client: Global CPG (Consumer Packaged Goods) Company

Krakow, PL

Development of predictive distribution models for countries in APAC region.

Created additional features and applied model ensembling.
Achieved a 5% accuracy improvement over the current solution.
Initiated refactoring of existing solution to incorporate new features.
Environment: Azure Databricks, Jira

Jul 2019 - Jan 2021

Senior Data Scientist

EPAM Systems, Client: Multinational Consumer Goods Manufacturer

Minsk, BY

Identification and estimation of Key Business Drivers (e.g. price elasticity, promotion, distribution).

Won an inter-company "hackathon" using Bayesian statistical modelling.
Created a model (PyMC) and backend (FastAPI) for a proof-of-concept web tool.
Developed a large portion of the software stack, including data ingestion, ETL, modelling and pipelining.
Implemented KBD (key business driver) "extraction" via sensitivity analysis and simulation.
Created a model-based simulator for what-if analysis, comparison and explainability.
Optimized and parallelized heavy computation (MCMC/NUTS within PyMC).
Presented results to project leadership.
Environment: Azure Cloud, Azure Databricks

Sep 2018 - Jun 2019

Data Scientist

EPAM Systems, Client: North American Retailer

Minsk, BY

Forecasting of consumer demand to direct replenishment and prevent stock-outs.

Performed EDA, integrated external datasets (weather and demographics), and updated models for predicting sparse sales.
Experimented with multiple modelling approaches, including price bucketing and hierarchical reconcilliation.
Built an explainable statistical optimization algorithm for product size distribution.
Developed a data/ML pipeline library on top of Airflow and Google Cloud; our team deployed using this library.
Environment: Google Compute Engine, Google Cloud Composer, GitLab

Aug 2017 - Aug 2018

Data Scientist

EPAM Systems, Interal Workforce Planning Team

Minsk, BY

Worked with workforce planning team to forecast and optimize employee utilization/time allocation.

Jun 2017 - Aug 2017

Data Scientist Intern

EPAM Systems, Client: News Aggregator

Minsk, BY

Helped implement a model predicting risks associated with entities from text, using classical NLP features, SVM and XGBoost.

Volunteer

11/22/2020 - Current

Primary Developer

pydantic-yaml

https://pydantic-yaml.readthedocs.io/en/latest/

YAML extension for Pydantic library.

Full library implementation with tests.
CI/CD through GitHub Actions and ReadTheDocs
Reviewed and approved contributions.
>170k downloads/week and used in production-grade projects.
See https://pypistats.org/packages/pydantic-yaml

2/21/2023 - Current

Primary Developer

pydantic-kedro

https://pydantic-kedro.readthedocs.io/en/latest/

Extension to serialize and load arbitrary Pydantic models through Kedro datasets.

Full library implementation with tests.
CI/CD through GitHub Actions and ReadTheDocs

10/3/2021 - Current

Course Developer and Mentor

EPAM Systems

https://www.epam.com/

Beginner and Intermediate Course on Time Series Analysis

Co-authored a course on Time Series Analysis, partially based on Hyndman's FPP3.
Mentored multiple students and established data scientists.

Education (2)

2018 - 2020

Master's (MSc)

Mathematical Modelling and Data Analysis

Belarusian State University

2014 - 2018

Bachelor's (BSc)

Applied Mathematics and Computer Science

Belarusian State University

Publications

1 Apr 2020

MS-VARX Model and Its Use to Analyze the Business Cycle of the Belarusian Economy [RU]

The Markov-switching Vector Autoregressive with eXogenous variables is a mathematical time series model. It is applied to the problem of the business cycle analysis and forecasting of the growth rates of the real GDP of the Belarusian economy based on the Economic Sentiment Index (ESI) as a leading indicator.

1 Aug 2018

A Comparative Analysis of the Hodrick-Prescott and Hamilton Filters for Business Cycle Turning Points Estimation for the Belarusian Economy [RU]

This paper presents a comparative analysis of the Hodrick-Prescott and Hamilton statistical filters with regard to the tasks of extraction of the long-term trend and cycle components of the macroeconomic time series. A comparative analysis of the turning points of the real GDP cycles and Economic Sentiment Indicator (ESI) of the Belarusian economy obtained for both of these filters is conducted.

Languages

English

Native Speaker

Russian

Native Speaker

Polish

Beginner

Interests

Homelab and DIY

3D printing
CAD (Computer-Aided Design)
Microcomputers (Raspberry Pi)
Self-hosting
Networking

Gaming

Board Gaming
Board Game Design (4+ prototypes)
PC Games

Playing Instruments

Piano
Theremin

References

“I hereby give consent for my personal data included in the application to be processed for the purposes of the recruitment process in accordance with Art. 6 paragraph 1 letter a of the Regulation of the European Parliament and of the Council (EU) 2016/679 of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). ”
GDPR Consent