Senior Staff Cloud Engineer, Ocado Group
Feb, 2020 - Present
In 2020 I was promoted to a cross-team role at the staff engineer level. I report to the Engineering manager for three AWS cloud teams. My time is split between a 'home team' and strategic initiatives to impact the business's AWS cloud usage and support the growing number of developer platform teams. I have a remit to decide architectural principles through RFCs, identify potential points of risk and failure within the teams and challenge local decisions when they stray from our high-level goals and culture.
Reduced costs and support effort by migrating our on-premises robotic control system to the Cloud. Video about this: https://www.youtube.com/watch?v=hxgo_CdRF5k
Reduced latency, saved compute costs and raised the quality of data used for Ecommerce conversion metrics by implementing a bot control system using AWS Web Application Firewall. This required close collaboration with senior Ecommerce engineers to integrate with the webshop front-end. This feature blocks over 500k requests a day. In places where latency and DDoS attacks are a massive business risk, I decided to introduce guard rails when users wanted to apply changes to avoid blocking legitimate users. Live production traffic was impossible to replicate in test environments, so I enforced the analysis of rule changes in 'count' mode before being applied, accepting the trade-off in the speed of change releases.
Matured our security posture by helping launch a new team to implement security tooling and best practices on AWS for the whole organisation
Maintain our security posture as the single-threaded owner of the overall AWS Cloud security architecture for the company across >150 AWS accounts, 17 regions with a seven-figure cloud-spend and differing regulatory requirements for our 11 international partners around the world.
Set our AWS identity strategy to implement change management policies, reduce the blast radius of our teams' permissions, and enforce separation of duties and production controls
Helped teams address security faults by building a security issue management system that aggregates vulnerabilities and security findings in AWS Security Hub and raises JIRA tickets with teams to resolve issues
Improved reliability and reduced cost by migrating a business stream (~20 teams) from an on-premises Kubernetes solution to AWS
Gathered learning from incidents by running 'post-incident retrospectives' that examine failure handling and track improvements
Align cloud teams' architectures by writing RFCs, taking part in design reviews and technical governance for my area
Help development teams innovate their tech stack by evaluating newly released Cloud products to see how they can fit into our existing standards for maturity, stability and maintainability.
Drive PCI DSS 4.0 compliance for cloud teams by collecting and standardising requirements and evidence of compliance for processing card payments as a merchant and service-provider, and demonstrate these requirements to our third-party auditor. PCI DSS is a strong influence on our security architecture for the handling of sensitive data, systemic access control, least-privilege enforcement and maintaining non-reputable audit trails. As such many of these requirements must be turned into product capabilities on our roadmap.