×
Joseph Ralph

Joseph Ralph

Platform Reliabilty Engineer (SRE/DevOps)


Background


About

About

I am a highly enthusiastic Engineering Lead specialising in cloud architecture and DevOps practices.

I have been working as a DevOps engiineer with a leading mindset and attitude for the last 6 years and have firmly demonstrated my technical knowledge and aptitude for leading engineers in each of my roles.

I have been fortunate enough in my roles to cover a wide variety of technologies and processes to be able to support a wide variety of business needs and come up with further ideas to enhance the value the business provides. I am able to work with engineers at both a team and individual level to help bring in a DevOps culture and foster positive improvements around full engineering department, sometimes spanning into data analytics and IT operations.

One of the major challenges businesses face with building a strong development environment and culture is that they don't focus on the engineering needs, and tend to drift heavily towards the business objectives. I've worked hard on honing my ability to identify these areas and help the business and its engineers work together to provide a culture that works for everyone and is built by everyone.

I consider myself to be a calm and logical thinker when it comes to solving any challenge. I enjoy the challenge of enabling engineers and engineering teams to be able to output their absolute best without compromise, while meeting business goals and providing value. I have a passion for working as a team and helping other teams members grow with me, but I can also use my initiative and enjoy a good challenge.

Work Experience

Work Experience

  • Lead Platform Reliability Engineer

    Jul, 2021 - Jan, 209977 years 6 months

    Once the PRE team started hiring new members, I was promoted to Lead Platform Reliability Engineer.

    Unlike my previous roles at Gymshark, this one heavily involved enhancing and enabling the rest of the engineering teams to do the work, as well as parachuting in to do any glue work where necessary.

    I got to work a lot on understanding the business objectives and translating them into technical ideas, as well as presenting these to the team. There are times where a technical decision may not make sense without the business objectives, such as the decision to refocus work. I was able to hone my ability to understand these decisions, and work with my team to ensure everyone understood what the business wanted and its goals. I also spent a lot of time working on identifying business value and honing my ability to discuss highly technical topics at a business level, providing a clear risk/reward analysis along with structured objectives and outcomes, and detailing value and how/when it will be realised.

    Another part of this role was providing technical architecture guidance to all of the engineering teams in Gymshark, including my own. I have been involved in multiple cross-team projects spanning from simple builds to entire ecomerce site rebuilds and have been able to provide multiple aveneues of investigation for the teams to persue and to give them the knowledge and tools necessary to do so. All of this work leads to individual engineering teams to be able to make highly informed decisions on the best tech/approach for the task at hand, and to give steakholders the confidence that we have thought about as much as we can and identified any potential blockers or issues that might come up.

    I also worked heavily with out engineering leads team to help influence change and encourage a DevOps culture and midset across engineering. One of my big wins in this was identify the ownership of technical areas and how these ares are influenced by business decisions. To help with this, I came up with a Policies, Standards, & Guidelines structure that helped convert business Policies (such as GDPR, data retention, logging & alerting, and infrastructure identification/tagging), to more engineer focused Standards and Guidelines. In this approach, Policies are the 'What We Have To Do', Standards are the 'How We Will Do It', and guidelines are the 'How Can We Make It Better'.

    • Worked with our Shop team to deploy and optimise a rebuild of the entire Gymshark ecom website into both Kuberntes and Serverless, providing guidance and solutions to various challenges along the way, with both infrastructures able to handle well over 100,000 requests per second, with almost no warm-up period

    • Restructured our AWS authentication setup to use the company-wide SSO through Okta in a fully automated way

    • Reorganised our AWS infrastructure and accounts under a single organisation with full auditing and logging capabilities

    • Created an automated sandbox provisioning setup to enable engineers to spin up their own sandbox accounts for proof of concepts and reserach

    • Hired more engineers for my team and supported them at various levels of progression

    • Worked with hiring juniors and ensuring they get the mentoring and training they need

    • Heavily refined my own ability to translate highly-technical highly complex concepts and ideas into business compatible values and objectives

    • Worked on business level strategy for the PRE team and projects we take on

    • Came up with our Policies/Standards/Guidelines document structure to provide both technical and non-technical overarching goals and general guidance that teams need to and can follow to meet business needs

  • Senior Platform Reliability Engineer

    Nov, 2020 - Jul, 20218 months

    After proposing and getting approval for a DevOps team within Gymshark, I was promoted to Senior Platform Reliability Engineer (I was the only engineer on the team at the time), and work started on getting the team setup and situated within the rest of the engineering department.

    During t his time I was heavily focused on assisting teams with architectual decisions and design choices, as well as starting the very early stages of our Engineering Platform.

    I worked heavily on my own to provide a strong and stable Kubernetes cluster that offered teams a lot of out-the-box benefits, such as security, network segregation, autoscaling, automatic logging, tracing, and metrics (using Datadog and integrating it to our Kubernetes cluster). The goal of this inital cluster was to provide an alternative to the serverless architecture we currently had. This was aimed mostly at our non-event-based workloads (such as scheduled tasks and web interfaces) but later progressed to also encompas some of our event driven workloads.

    Once this started to gain traction and people were using the setup, I set out to refine this and provide even more to the engineers. I strived to build an engineering platform that makes the job of the engineers less complex when it comes to matters of security, networking, deployments, and monotiring/visibility. This sparked the initial phase of our engineering platform setup that spans AWS, Kubernetes, Datadog, Codefresh, and the users machines themselves.

    I worked hard on designing a Kubernetes cluster setup, along with CI setup, that enabled engineers to deploy with confidense, but still have the flexibility to do what they wanted without too much red tape. This was a continious task, and eventually lead to becomming one of the core projects of the PRE team.

    • Became the first member of Gymsharks Platform Reliability Engineering (PRE) Team.

    • Built and presented the business case the PRE across the engineering department of Gymshark

    • Outlined the who, what, and why of PRE and DevOps

    • Started a DevOps community of practice

    • Rebuilt our entire Docker & Kubernetes offering in EKS to provide the level of resilience we require for high-volume and high-throughput sale periods

    • Build the documentation for our Engineering Platform Stack (Kubernetes, AWS, Docker, Codefresh, Datadog) to provide detail for engineers on how to get started with these services and how we configure them at Gymshark to meet best practice and business requirements

  • Senior Software Engineer

    Nov, 2020 - Jan, 20212 months

    My initial start at Gymshark was based on the Software Engineering team. As soon as I joined I took it uppon myself to focus my efforts around the operational aspects of the team and, later on, engineering as a whole.

    When I started, the deployment process was very much manual, and involved deploying Lambda functions using the serverless framework, as well as very small amount of terraform. These tools were then used on an engineers local machine to deploy to staging or production manually. Within the first two months of joining, I had presented a business case for introducing some CI/CD tooling of some kind and had provided research on some of the potential options of CI/CD providers we could choose to go with.

    We decided on going with a CI/CD provider called Codefresh. I championed this choice due to its flexibilty. Unlike most CI/CD platforms, Codefresh does not need to be tied to a git repository, and can be triggered by various other means (events/webhooks/api calls), as well as git actions. This gave the teams the flexibility to use CI/CD for more than just deploying of code. I setup quite a few pipelines to run on a schedule to generate and cache docker images, and to generate and build multiple versions of tooling so what we were able to support the different versions engineers were using.

    During this time as a Senior Software Engineer, I was having talks with the Engineering Director about building out some kind of DevOps function. I had presented the idea for this, and some of the reasons why this would be a good idea, and got sign-off on this being a direction the company was interested in.

    Before officially moving to what would be known as the PRE Team, I took it uppon myself to shift my focus to the AWS environments and the deployment processes. During this time, I worked a lot on ensuring that we are using the AWS accounts in a secure way, enforcing things such as 2FA and password rotation.

    I also spent a lot of time on developing my own skills, taking an NLP (neuro-linguistic programming) course to help better understand and communicate with people and with the business.

    • Worked on a highly 'serverless' setup involving mostly Lambda, API Gateway, Kinesis, and Step Functions.

    • Introduced Docker & Kubernetes for processes that are less event/batch-process based, such as machine learning, cron-jobs, and internal web services

    • Helped introduce and start a DevOps culture.

    • Introduced Gymshark to CI/CD pipelines and the benefits to them. Setup these pipelines for multiple teams, and provided guidance to the teams so that they can own them.

    • Undertook in-depth NLP training to understand and help adapt to different people and approaches people take to life and how they understand it. How people react to different stimulation and how to engage with all types of people in an effective manner.

  • Lead DevOps Engineer

    Nov, 2016 - Jan, 20203 years 1 month

    PMConnect is a mobile payment services company providing payment and web content solutions to companies such as NBA, WWE, Opta, and more. They provide Direct Carrier Billing services to enable users to pay for services through a single click using a mobile data connection.
    I started at PM Connect as a Senior Developer, brought onboard to help with their existing PHP applications but quickly found myself able to provide the company much more in terms of a DevOps role. Upon starting at the company I made it my mission to improve their current workflow and deployment structure to something that enabled much easier growth and flexibility than their current “ssh + git pull” setup. In my first week I setup GitLab as our VCS and utilised its built-in pipelines to replace our manual deployment process. I then gradually progressed this towards using AWS ElasticBeanstalk deployments to give us a quick solution for auto scaling and rollbacks.
    ElasticBeanstalk is fairly old and slow, it worked as a stop gap while I investigated and tested a deployment setup using Terraform and Packer (with Ansible). This deployment setup again was a fairly quick win, but was not the most cost effective, due to some of our services being very small and not requiring much hardware to run, we had under-utilised instances. My goal after using the Terraform/Packer setup for around 6 months was to investigate Containerisation and the benefits it could bring us. Our deployment times took around 30 minutes per branch so I wanted to improve this as much as possible. I settled on using Nomad instead of Kubernetes as our orchestrator as it provided a much smaller learning curve for our developers and the server setup was all done with a single binary, so was very easy to get up and running. This system proved to be much more efficient than our previous setup and we saw cost savings in our infrastructure and performance increases form using thin containers. We also managed to get pipeline times down to around 10 minutes, including tests.
    We’ve been using Nomad in production for nearly a year and a half now, but as we grew the DevOps team we decided it was time to revisit Kubernetes as I strongly believed it could provide an even more flexible and robust solution, as well as provide easier hiring potential due to the community adoption. We’ve been running a Kubernetes cluster along side our Nomad cluster, while providing self created training workshops for developers to ease the onboarding process and we’re confident that we’re almost in a position where we can switch all of our production services over to Kubernetes without any downtime, and are happy that the Kubernetes cluster provides us with a solid amount of security and compliance for our billing applications as well as the flexibility to deploy quickly and reliably.

    • Introduced AWS ElasticBeanstalk to help move away from FTP based deployments to a single server.

    • Introduced Load Balancing.

    • Introduces CI/CD pipelines to enable easy and repeatable deployments.

    • Setup and managed a terraform based infrastructure to replace ElasticBeanstalk to enable more custom deployment setups.

    • Migrated the existing terraform infrastructure over to Hashicorp Nomad and Docker containers.

    • Migrated the existing Nomad setup over to Kubernetes.

    • Trained multiple dev teams on how to use docker for local and production development.

    • Assisted with filling in for the CTO position for 6 months.

    • Worked with upper management in bringing in a DevOps culture to help remove the siloed way of thinking about work and projects.

    • Promoted project ownership by developers, where developers are responsible for the project from conception to deployment to long-term monitoring.

    • Managed Jira and Confluence and the initial setup for 3 development teams.

    • Assisted with the setup of SCRUM and Kanban development procedures.

  • Web Developer

    Jun, 2016 - Nov, 20165 months

    AdHere/JustSayPlease/Stickee is a creative technology company providing comparison services and technology services.
    I worked at AdHere for a short time in 2016 helping build innovate the price comparison software that they produce. This primarily involved developing software to track the price of mobile devices, investigating the use of machine learning and pattern matching to come up with a semi-automated way of grouping devices together to have their price compared. I also moved the infrastructure of the application over to AWS, as well as setup an elasticsearch cluster to help with comparison.

    • Maintained an existing mobile phone deal comparison service that ran for various mobile phone providers enabling them to provide competitive pricing for their customers. This included companies such as Car Phone Warehouse.

    • Conceptualised and created a new data collection system to enable more efficient and rapid collection of details from various mobile phone retailers and contract providers.

    • Worked on machine learning to automatically work out what phones are being sold so they can be grouped by phone and exact phone specifications easily without much manual intervention. Managing to create a system that was around 80% accurate was a great achievment of mine, especially considering a lot of the phones we collected data on were simply named 'iPhone' or 'iPhone 64GB'. The system managed to work out which version of the device was being sold based on the name and specification information 80% of the time.

  • PHP Developer

    Nov, 2016 - Jan, 20151 year 10 months

    CHKS is a provider of healthcare intelligence and quality improvement services.
    While working for CHKS I was responsible for leading a small team on a full redesign and redevelopment of their NHS constituency management applications. This involved deciphering a 10 year old PHP application written by someone no longer at the company, and putting together a plan for building a replacement. I also frequently assisted the other development team working with big data with NHS patient data and algorithms to analyse and compare this data against other hospitals. I was also responsible for setting up our on-prem server infrastructure.

    • Maintained a 10 year old PHP application running in PHP 3.

    • Lead the development and technology stack choices behind a replacement application for NHS constituency management. This involved a React frontend with a php backend. We also used mysql and elasticsearch for data storage and rapid analytics on membership/constituency data and provided information on the balance of ethnic groups and minorities that the NHS membership group needed to include to be representative.

  • Web Developer

    Jan, 2013 - Jan, 20152 years

    Bullivant Media are the owners of the Standard and Advertiser series of newspapers in the midlands.

    • Worked on designing and redeveloping the entire series of websites.

    • Build a support ticketing system for the internal IT team.

    • Developed a careers/jobs listing system for job posting and advertising.

    • Developed a property sales site years ahead of On The Market, on a much smaller budget. This project received no direct marketing and ended up failing.

  • Apprentice IT Technician

    Jan, 2011 - Jan, 20132 years

    RSA Academy Arrow Vale is a high school in Redditch.

    • Helped with the setup of a network revamp using windows and microsoft systems manager and associated tools.

    • Built a bespoke IT support ticketing system.

    • Assisted with day-to-day problem solving and assisting teachers and students with IT issues.

    • Taught a few optional lessons to some students on IT and computer hardware.

Skills

Skills

  • Software Engineering

    GoLang

    PHP (Symfony, Laravel, Cake)

    Bash Scripting

    SQL

    Javascript

    NodeJS

    React

    VueJS

    Lambda

    Multithreading/Parallel Processing

    Stack/Heap Analysis

    Testing

    Benchmarking

    Performance Analysis

    GIT

    CI/CD

  • Data Engineering

    Elasticsearch

    MySQL

    PostgreSQL

    Redis

    AWS Glue

    AWS Athena

    Graph Databases + GraphQL

    NoSQL

  • DevOps/SRE

    Docker

    Efficient Containerisation

    Kubernetes Management

    Kubernetes Setup

    Database Clustering

    Linux (alpine, ubuntu, scratch)

    Windows server management

    Basic windows containers

    Basic ARM containers

    Network Security

    Application Security and WAF Setup

    Docker image security

    Security monitoring

    Docker security best practices

    Infrastructure Monitoring

    Terraform and CloudFormation

    Immutable Infrastructure

    CI/CD

  • Project Management

    SCRUM/Kanban process management

    JIRA

    Confluence

    Notion

    Requirement gathering

    Identifying business value

    Documenting challenges and process optimisation points

    Fixing problem causes not symptoms

    Clear communication of outcomes and values to stakeholders

  • Team Leadership/Technical Leadership

    Recruitment of Staff

    Constructive feedback

    NLP

    Business goal identification

    Setting team member targets

    Working to deadlines and restrictions

    Identifying business improvements

    Helping with agile processes

    Mentoring and support

    Translating technical knowledge and ideas to business actionable objectives/values/goals

  • Technical Architecture

    Serverless/Lambda Architecture

    Complex application architecture

    Development workflow architecture

    High performance architecture decision making

    Architecture security

    Microservices

    Nanoservices

    Containerisation

    Serverless

    FaaS

    GDPR Compliance

Education

Education

  • IT, Software, Web & Telecoms, Apprenticeship, TDM (The Development Manager)

    Jan, 2011 - Jan, 2013

    Advanced Level Apprenticeship in IT Software, Web & Telecoms

    Intermediate Level Apprenticeship in IT Software, Web & Telecoms

    CompTIA Server+

    CompTIA Security+

    CompTIA Network+

  • Progrmaming, Games Development, IT Hardware, College Study + Evening Courses, N.E.W College Bromsgrove

    Jan, 2009 - Jan, 2011

    BTEC Games Development (Course failed due to college server crash and loss of work)

    CompTIA A+

    Web Design

    Web Development

  • English, Maths, IT BTEC, Science, School, North Bromsgrove High School

    Jan, 2007 - Jan, 2009

Interests

Interests

  • DJ'ing

    Tech HouseTechnoTrance
  • Gaming

    StrategyCity/Civilisation Builder GamesVirtual RealityFPS GamesRoguelike/Roguelite
  • Food

    PizzaSteak
  • Cars