Data Engineer, Pulsifi
Mar, 2023 - Apr, 20263 years 1 month
Supported a production Change Data Capture (CDC) streaming pipeline (Apache Beam/Dataflow) by provisioning and maintaining its cloud infrastructure, including Dataflow flex template deployments on Artifact Registry; implemented and maintained BigQuery primary and foreign key constraint validation across tables within the same dataset, ensuring referential integrity that enabled query optimizer block pruning and delivered significant reduction in bytes scanned and query costs.
Built an end-to-end Cloud Function from scratch integrating GCP Pub/Sub, AWS SQS, and BigQuery to expose talent analytics usage data; provisioned multi-region infrastructure (Cloud Functions, IAM bindings, Pub/Sub push subscriptions) using Pulumi across sandbox, staging, and production environments.
Deployed a self-hosted Langfuse LLM observability platform on GKE, managing full infrastructure provisioning — Kubernetes autopilot cluster, Helm chart deployment, Cloud SQL (PostgreSQL) backend, Secret Manager integration, and dynamically injected environment secrets into Helm values for environment-aware deployments.
Orchestrated a Freshdesk ML pipeline end-to-end using Dagster, including asset dependency design, scheduling, SQL query refinements, and dynamic Pydantic-based environment configuration for accurate multi-environment pipeline runs.
Built a BigQuery data validation framework from scratch using Pandera, standardizing schema validation across datasets with multi-environment CI/CD support (sandbox, staging, production, EU region); migrated the project to a monorepo polylith architecture and standardized GCP client usage.
Designed and provisioned BigQuery datasets, materialized views, scheduled queries, and day-partitioned tables across multi-region deployments (SG and EU), supporting BI and analytics workloads for customer success, talent acquisition, and finance; managed fine-grained IAM access at dataset and table level for internal teams and service accounts.
Maintained a PostgreSQL-to-BigQuery data reconciliation function with OIDC authentication, EU region support, and unit test coverage; developed and maintained an internal back-office application on Cloud Run with role-based file export restrictions, centralised authorization config, and access control unit tests.
Standardized CI/CD pipeline patterns across the data engineering team; led migration of multiple projects from Poetry to uv; adopted Pulumi as the primary IaC tool replacing Terraform; implemented OIDC-based short-lived credential authentication; established semantic versioning and automated release workflows using python-semantic-release.