Sergio Santiago

Background

About

Staff Site Reliability Engineer with 20+ years building mission-critical infrastructure at scale. Expert in cloud infrastructure modernization, Kubernetes platform engineering, and SRE best practices. Strategic leader focused on designing resilient systems, reducing operational complexity, and enabling teams through self-service infrastructure. Proven track record architecting systems supporting 100,000+ concurrent users while optimizing costs and driving organizational reliability culture.

Work Experience

Staff Site Reliability Engineer, Doctolib
Jan, 2024 - Present
Leading infrastructure modernization and reliability initiatives for Europe's largest health tech platform serving 100,000+ concurrent users across France, Germany, and Italy.
- Engineered Argo Workflows ecosystem as complete replacement for legacy kubetoolbox with production-grade automation (Azure AD SSO, GitHub notifications, ECR integration, event sensors, CLI tooling) — enabling teams to automate complex operational tasks at scale
- Designed Datadog Deployment Gates configurable validation ecosystem with 10+ dedicated monitors replacing static queries, improving canary rollout reliability and deployment confidence
- Co-authored 'Paradigm Shift' strategy shifting from centralized to decentralized, library-based configuration — enabling team autonomy and reducing SRE review bottlenecks
- Implemented EKS workload optimization strategy (cost reduction), redesigned monolith pod sizing, and optimized preproduction environments
- Advanced production-grade canary deployment strategy and ArgoCD integration for strengthened release confidence and rollback capabilities
Senior Site Reliability Engineer, Templafy
Nov, 2021 - Mar, 20242 years 4 months
Architected and maintained highly available infrastructure for content enablement platform while scaling operations and improving reliability culture.
- Designed monitoring, alerting, and incident response frameworks for 99.99% uptime targets
- Led disaster recovery planning and business continuity strategy
- Mentored engineering teams on SRE principles and operational excellence
- Owned capacity planning and cost optimization initiatives
Cloud Platform Engineer, The Adecco Group
Jun, 2021 - Nov, 20215 months
SME for Kubernetes/CNCF in cloud platform team supporting enterprise application deployments.
- Automated cloud infrastructure deployment using Azure DevOps and Terraform
- Designed CI/CD pipelines supporting agile development teams
- Containerized applications using Helm and Kubernetes
Azure Rapid Response Senior Engineer, Microsoft
Jun, 2019 - May, 20211 year 11 months
Supported critical enterprise customers and top startups on Azure platform as subject matter expert for Azure Core Platform domains.
- Specialized in Azure Compute, Linux, and Kubernetes (AKS)
- Provided end-to-end Azure solution support for enterprise customers and startups
- Member of global Azure Rapid Response team (EMEA, Americas, Asia)
Senior Telco NFV Engineer, VMware
Nov, 2016 - Mar, 20181 year 5 months
Provided carrier-grade support for Telco NFV platforms to communications service providers globally.
- Supported VMware vCloud NFV stack for CSP customers
- Expert in vSphere, VSAN, NSX, vCloud Director, and vRealize
- Delivered mission-critical support for telecommunications infrastructure

Skills

Cloud Infrastructure & Platforms
AWS (EKS, RDS/Aurora, ElastiCache, MKS)

Azure (AKS, App Services, Service Bus)

Kubernetes

Docker

Multi-cloud architecture
Infrastructure-as-Code
Terraform

Helm

ArgoCD

Ruby DSL

GitOps

Ansible
Automation & Reliability Engineering
Argo Workflows

CI/CD Pipelines

SRE Practices

Incident Management

Disaster Recovery

Capacity Planning
Observability & Monitoring
Datadog

Prometheus

Grafana

Application Performance Monitoring

Metrics & Alerting
Leadership & Strategy
Team Leadership

SRE Culture

Infrastructure Modernization

Cost Optimization

Architectural Design
Virtualization & Networking
VMware (vSphere, NSX, vCloud)

Network Security

Load Balancing

DNS/DHCP

Education

Electronic Engineering and Telecommunications, Bachelor of Science, University of Aveiro
Jan, 1988 - Jan, 1994

Certificates

CKA: Certified Kubernetes Administrator, The Linux Foundation
Issued on: Mar 01, 2022
CKAD: Certified Kubernetes Application Developer, The Linux Foundation
Issued on: Mar 01, 2022
VMware VCAP6-DCV: Data Center Virtualization Deployment, VMware, Inc
Issued on: Jun 01, 2017
CCNA: Routing and Switching, Cisco
Issued on: Apr 01, 2014
ITIL v3: Foundation, ITIL
Issued on: Jun 01, 2011