Leonid (Leo) Pekelis, Data Scientist
| San Francisco, California, US
SUMMARY
I'm an applied science leader with AI/ML, data stack and full stack developer experience. I've managed research and data orgs, owned high impact enterprise solutions, and protoyped novel research in production systems. I prefer working at places with a mission, and engaging with community.
EDUCATION
Stanford University 2010-12-31 — 2016-12-31
Doctor of Philosophy (PhD) - Statistics
Stanford University 2008-12-31 — 2010-12-31
MS - Statistics
Stanford University 2004-12-31 — 2008-12-31
BASH - Economics, Math
SKILLS
Technologies: Python, R, SQL, Javascript, React, Java, Matlab
Disciplines: Statistics, AI, Machine Learning, Economics
EXPERIENCE
Gradient | Chief Scientist 2024-01-01 — Present
https://gradient.ai/

Gradient automates complex data workflows for enterprise using AI. I lead the applied science team, where we develop and deploy prototypes using language models and agentic systems. I also engage with customers and the AI community through blog posts, webinars, and speaking events.

  • Leading and developing AI solutions and evals for $XMM of contracts
  • First 1M context length Llama-3 70B model. Downloaded by over 1K developers on HuggingFace. 4th place on RULER long context benchmark (at the time of publishing), https://x.com/Gradient_AI_/status/1786876543509434473
  • Finance Specific Large Language Model, https://gradient.ai/finance-whitepaper
Eppo | Technical Advisor 2021-06-01 — 2025-06-01
https://www.geteppo.com/

Eppo is an experimentation and feature management platform. I advised on statistics and ML research ideas, primarily around sequential testing through it's acquisition by Datadog. I also helped with early hiring and community engagement.

CloudTrucks | Head of Data 2020-04-01 — 2023-12-31
https://www.cloudtrucks.com/

CloudTrucks is a business management backend and marketplace for small freight trucking fleets. I was the 1st data hire, eventualy leading the Data org. We owned analytics (+eng), data eng, data science, and machine learning. I also built and maintained data infrastructure, and solutions to routing optimization, business intelligence, spot market and contract pricing.

  • Managed a manager, tech leads and ICs. The data org grew to 10 people at its peak.
  • Led data from Seed stage to Series B.
  • Launched scheduling tool for drivers, https://www.freightwaves.com/news/cloudtrucks-launches-scheduling-tool-for-drivers
  • Managed freight recommender system initiative, https://www.cloudtrucks.com/blog-post/personalized-load-recommendation-system
  • Python (Django, Celery), SQL, Javascript (React), GCP (Airflow, BigQuery, Redis, Vertex AI)
Opendoor Labs Inc | Data Scientist 2017-01-31 — 2020-01-01
https://www.opendoor.com/

Opendoor is an online buyer and seller of residential real estate. I developed and deployed algorithms to optimize a range of business practices. Projects include: algorithmic list pricing of $X B of home inventory; risk + demand modeling of inventory; micro forecasts and counterfactual analysis; recommending homes to buyers; reactive UIs for interpretable ML and expert belief updates

  • Facilitating customization and proliferation of state models, Patent # 11556701 US
  • Updating projections using listing data, Patent # 11164199 US
  • Python, SQL, Javascript (React)
  • Gradient Boosted Trees, Survival Analysis, Causal Inference
Optimizely | Statistician 2015-01-31 — 2020-04-01
https://www.optimizely.com/

Optimizely provides A/B Testing, personalization and content management software as a service. I co-authored Stats Engine - a sequential testing procedure with multiple testing corrections for online evaluation of randomized controlled trials (RCT) - and wrote production implementation to power statistics of all commercial products ($XX MM ARR). Statistical voice internally and externally through conferences, blog posts and webinars. Consultant from 2017 to 2020.

  • Acceleration of A/B/n Testing under time-varying signals, CODE MIT 2018 accepted talk - speed up A/B/n testing through dynamic traffic allocation, while maintaining always valid inference
  • Implementing a reset policy during a sequential variation test of content, Patent # 9760471 US
  • Python, Java
Stanford University | Instructor 2014-09-21 — 2014-12-15

I taught Mathematics of Sports - STATS 50 @ Stanford. Topics include sports physics, optimal strategies, and statistics of analyzing outcomes and predictions.

Earnest Inc | Risk & Research 2013-09-31 — 2014-09-31

Earnest is a fintech lender. I designed the initial underwriting platform and process, and trained a team in its daily use.

Pixar Animation Studios | Research Intern 2013-07-01 — 2014-04-31

I developed an algorithm to efficiently importance sample light reflected from hair.

  • Statistical hair scattering model, Patent # 9905045 US
PUBLICATIONS
Training 1M Context Length Models on NVIDIA L40S GPUs 2024
Crusoe Energy
The Haystack Matters for NIAH Evals 2024
Gradient AI
Widening the Wingspan of Foundational Models in Finance 2024
Gradient AI
Building a Business Intelligence Backend 2021
https://www.cloudtrucks.com/blog-post/building-a-business-intelligence-backend
Taming missing features at serving time 2019
https://www.opendoor.com/w/blog/taming-missing-features-at-serving-time
Acceleration of A/B/n Testing Under Time-varying Signals 2018
Conference on Digital Experimentation
p-Hacking and False Discovery in A/B Testing 2018
SSRN
Peeking at a/b tests: Why it matters, and what to do about it 2017
ACM SIGKDD Conference on Knowledge Discovery and Data Mining
A Data-Driven Light Scattering Model for Hair Leonid Pekelis 2015
Pixar Technical Memo #15-02
The brain stethoscope: A device that turns brain activity into sound 2015
Epilepsy & Behavior, Volume 46, 53 - 54
Survival benefit for adjuvant radiation therapy in minor salivary gland cancers 2015
Oral oncology 51.5
Deterministic matrices matching the compressed sensing phase transitions of gaussian random matrices 2012
Proceedings of the National Academy of Sciences, 110(4)
VOLUNTEERING
Speaker | SIGGRAPH 2024 - NVIDIA Presents Generative AI Day 2024-07-31 — 2024-07-31

How Gradient Extended Llama 3’s Context Length to 1M on Crusoe

Speaker | AI Engineer World's Fair 2024 2024-06-25 — 2024-06-27

Training Albatross: An Expert Finance LLM

Speaker & Panelist | AAAI 2024 Spring Symposium on Clinical Foundation Models @ Stanford 2024-03-25 — 2024-03-27

Using the Gradient AI Stack to Transform Healthcare Operations

Panelist | Statistics Empowering Data Science (SEEDS) Conference 2024 2024-01-12 — 2024-01-13

Industry Panel

Speaker | 7th International Business Analytics Conference (iBAC) @USC 2019-03-08 — 2019-03-08

A/B Testing: Examples of hypothesis testing in industry

Speaker | MCQMC 2016-08-14 — 2016-08-19

Data-driven light scattering models: An application to rendering hair

LANGUAGES
English (Native Speaker) , Russian (Working Proficiency) , Spanish (Working Proficiency)
INTERESTS
Sports [ Climbing , Surfing ]