"Full stack" data scientist, with a focus on econometric modelling using modern machine learning techniques. I can lead projects throughout all stages: requirements discovery, EDA, POC, modelling and experimentation, MVP, optimization, deployment and monitoring. I can work in multiple project roles: data scientist, technical leader, software developer, software architect, consultant. I prefer a mix of roles, especially technical. I have also created courses and manage/mentor multiple other data scientists in my unit. I am fluent in general and scientific Python (PyData stack, PySpark, Pydantic) and deliver high-quality software. My main domain experience is in the Retail and Consumer Packaged Goods industries.

Skills

Data Science Core

Master
  • Statistics
  • Exploratory Data Analysis (EDA)
  • Optimization
  • Model Tuning
  • Evaluation

Tabular Machine Learning

Master
  • Data Preparation
  • Regression
  • Classification
  • Clustering
  • Boosting Models

Time Series Analysis

Master
  • Forecasting
  • Structural Modelling
  • Econometrics
  • TS Features
  • Time-Dependent System Design

Interpretable ML

Advanced
  • Bayesian Modelling
  • Probabilistic Programming
  • PyMC
  • Black-Box Interpretability
  • SHAP
  • Causal ML

Deep Learning

Intermediate (Personal Projects)
  • Natural Language Processing (NLP)
  • Large Language Models (LLMs)
  • Computer Vision (CV)
  • Reinforcement Learning

Domain Expertise

Advanced
  • Data Science Consulting
  • Demand Planning
  • Retail
  • Consumer Goods (CPG)

Software Engineering

Advanced
  • Python
  • Software Architecture
  • API Design
  • Type Hinting
  • Linting
  • Automated Testing
  • Pydantic

Data Engineering

Advanced
  • Database Design
  • SQL
  • Pandas
  • PySpark
  • Polars
  • Xarray
  • Pandera

MLOps and ML Engineering

Advanced
  • Model Lifecycle
  • MLFlow
  • Model Deployment
  • Kedro

DevOps

Advanced
  • Continuous Integration (CI)
  • Continuous Delivery (CD)
  • GitHub Actions
  • Docker
  • Linux

Cloud

Intermediate
  • Databricks Platform
  • Microsoft Azure
  • Google Cloud Platform

Other Languages

Beginner
  • Julia ♥
  • Rust
  • C++
  • R
  • Java
  • Web (JavaScript, HTML, CSS)

Work Experience (7)

Jul 2024 - Current
Lead Data Scientist
EPAM Systems | Client: Multinational Consumer Goods Company
 Krakow, PL https://www.epam.com/
Development of end-to-end data products in Sales Excellence.
  • Conducting exploratory data analysis and KPI value-sizing.
  • Playing data science and technical leadership role in a chaotic multi-vendor environment.
  • Setting up best practices for software development and data science lifecycles.
  • Environment: Azure Databricks, Azure DevOps
Jun 2023 - Aug 2024
Lead Data Scientist
EPAM Systems | Client: Global CPG (Consumer Packaged Goods) Company
 Krakow, PL
Creation of a global next-gen forecasting and replenishment platform.
  • Co-led the technical development of the platform, responsible for the Data Science and Software Architecture streams.
  • Co-developed architecture on top of Azure Databricks platforms.
  • Developed APIs for Core functionality, ML library, time travel databases.
  • Screened and interviewed cancidates for both the core and market-specific teams.
  • Handed off technical leadership responsibilities and switched to a technology and data science consulting role.
  • Environment: Azure Databricks, GitHub Actions, Jira, Confluence
Aug 2021 - Jun 2023
Lead Data Scientist
EPAM Systems | Client: Global CPG (Consumer Packaged Goods) Company
 Krakow, PL
Evolution of a simple predictive model for e-commerce into a platform for multi-channel sales forecasting.
  • Created Bayesian model for e-commerce channel that outperformed existing solution by at least 10% (real-world).
  • Designed and implemented new process for data gathering that allowed an additional 20-50% MAPE improvement.
  • Designed Python library architecture and part of architecture (in collaboration with architect) of a new forecasting platform.
  • Implemented ~60% of total lines of code within core API code, reusable utilities, ML transformers and models, job and unit tests.
  • Ran experiments and applied various ML models for forecasting and optimization tasks.
  • Co-led decisions on data science and acted as "glue" between data science, data engineering and customer teams.
  • Re-implemented CI/CD pipelines on GitHub Actions and Databricks.
  • Environment: Azure Databricks, GitHub Actions, Jira, Confluence
Feb 2021 - Aug 2021
Senior Data Scientist
EPAM Systems | Client: Global CPG (Consumer Packaged Goods) Company
 Krakow, PL
Development of predictive distribution models for countries in APAC region.
  • Created additional features and applied model ensembling.
  • Achieved a 5% accuracy improvement over the current solution.
  • Initiated refactoring of existing solution to incorporate new features.
  • Environment: Azure Databricks, Jira
Jul 2019 - Jan 2021
Senior Data Scientist
EPAM Systems | Client: Multinational Consumer Goods Manufacturer
 Minsk, BY
Identification and estimation of Key Business Drivers (e.g. price elasticity, promotion, distribution).
  • Won an inter-company "hackathon" using Bayesian statistical modelling.
  • Created a model (PyMC) and backend (FastAPI) for a proof-of-concept web tool.
  • Developed a large portion of the software stack, including data ingestion, ETL, modelling and pipelining.
  • Implemented KBD (key business driver) "extraction" via sensitivity analysis and simulation.
  • Created a model-based simulator for what-if analysis, comparison and explainability.
  • Optimized and parallelized heavy computation (MCMC/NUTS within PyMC).
  • Presented results to project leadership.
  • Environment: Azure Cloud, Azure Databricks
Sep 2018 - Jun 2019
Data Scientist
EPAM Systems | Client: North American Retailer
 Minsk, BY
Forecasting of consumer demand to direct replenishment and prevent stock-outs.
  • Performed EDA, integrated external datasets (weather and demographics), and updated models for predicting sparse sales.
  • Experimented with multiple modelling approaches, including price bucketing and hierarchical reconcilliation.
  • Built an explainable statistical optimization algorithm for product size distribution.
  • Developed a data/ML pipeline library on top of Airflow and Google Cloud; our team deployed using this library.
  • Environment: Google Compute Engine, Google Cloud Composer, GitLab
Aug 2017 - Aug 2018
Data Scientist
EPAM Systems | Interal Workforce Planning Team
 Minsk, BY
Worked with workforce planning team to forecast employee utilization/time allocation.

Volunteer

11/22/2020  - Current
Primary Developer
pydantic-yaml
YAML extension for Pydantic library.
  • Full library implementation with tests.
  • CI/CD through GitHub Actions and ReadTheDocs
  • Reviewed and approved contributions.
  • ~90k downloads/week and used in production-grade projects.
2/21/2023  - Current
Primary Developer
pydantic-kedro
Extension to serialize and load arbitrary Pydantic models through Kedro datasets.
  • Full library implementation with tests.
  • CI/CD through GitHub Actions and ReadTheDocs
- Current
Course Developer and Mentor
EPAM Systems
Beginner and Intermediate Course on Time Series Analysis
  • Co-authored a course on Time Series Analysis, partially based on Hyndman's FPP3.
  • Mentored multiple students and established data scientists.

Education (2)

2018 - 2020
Master's (MSc)
 Mathematical Modelling and Data Analysis
Belarusian State University
2014 - 2018
Bachelor's (BSc)
 Applied Mathematics and Computer Science
Belarusian State University

Publications

1 Apr 2020
The Markov-switching Vector Autoregressive with eXogenous variables is a mathematical time series model. It is applied to the problem of the business cycle analysis and forecasting of the growth rates of the real GDP of the Belarusian economy based on the Economic Sentiment Index (ESI) as a leading indicator.
1 Aug 2018
This paper presents a comparative analysis of the Hodrick-Prescott and Hamilton statistical filters with regard to the tasks of extraction of the long-term trend and cycle components of the macroeconomic time series. A comparative analysis of the turning points of the real GDP cycles and Economic Sentiment Indicator (ESI) of the Belarusian economy obtained for both of these filters is conducted.

Languages

English

Native Speaker

Russian

Native Speaker

Polish

Beginner

Interests

Homelab and DIY

  • 3D printing
  • CAD (Computer-Aided Design)
  • Microcomputers (Raspberry Pi)
  • Self-hosting
  • Networking

Gaming

  • Board Gaming
  • Board Game Design (4+ prototypes)
  • PC Games

Playing Instruments

  • Piano
  • Theremin

References

I hereby give consent for my personal data included in the application to be processed for the purposes of the recruitment process in accordance with Art. 6 paragraph 1 letter a of the Regulation of the European Parliament and of the Council (EU) 2016/679 of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation).
GDPR Consent