Anh Hoang Chu
Software Engineer & Data Engineer

About

I'm a Data Software Engineer who is passionate about working with data and bringing data insights closer to business users through the help of technology. I have experience in data engineering, big data, data science, data warehouse, back-end databases for web applications on GCP, Azure and AWS. My tech stack is Python, SQL, Linux, PySpark, Kafka, Airflow, Tableau, Kubernetes, BigQuery, Redshift, and Azure Synapse Analytics

Work Experience

Seattle, Washington
February 2022 – January 2023
Software Engineer
Software Engineer building, configuring, and managing back-end infrastructure for Flip, a video-powered social-learning platform owned by Microsoft
Highlights
  • Led the data warehouse migration of AWS Redshift to Synapse Data Lakehouse (DLH) from architecture design to production operation
  • Built and maintained batch and streaming pipelines from transactional databases and telemetry data to Data Lakehouse
  • Provided a fast, stable, and consistent data platform on Azure Cloud for analytics downstream
  • Performed data transformation and analytics with Python, Azure Synapse Spark, Change Data Capture with Debezium, and streaming service with Kafka and Azure EventHub
  • Actively resolved performance issues, applied data loading and table design optimization resulting in 4-5x times faster queries
  • Ensured data quality, and data security through data validation, data management, and monitoring best practices
  • Built and maintained a more reliable and consistent downstream sync from DLH to CRM system using REST API
  • Ensured highly available and performant application by maintaining a multitude of Azure cloud services including storage, CI/CD, database, data warehouse, and Kubernetes
Dallas, Texas
January 2020 – February 2022
Software Engineer
Software Engineer building an end-to-end analytical Supply Chain web application to track inventory and transportation from Suppliers to Stores for international markets
Highlights
  • Led a team of 4 developers in migrating on-prem Data Warehouse (Teradata) to Google Cloud Platform for 10 markets using Big Query, Dataproc, Python, PySpark, and Aiflow
  • Continuously delivered new data features by analyzing and calculating supply chain metrics with SQL and Spark
  • Built and maintained ETL data pipelines that load analytical datasets to MSSQL Server from multiple data sources in Teradata, BigQuery, Oracle Database, Informix Database
  • Perform data validation and unit testing to ensure data quality
  • Improved application performance by 70% with the implementation of caching, indexing and data aggregation in the database instead of in back-end web service, which reduced the volume of data flow through the network.
  • Reduced development time and codebase complexity by 80% with code refactoring, SQL reformating, Git and CI/CD pipeline
Dallas, Texas
October 2017 – January 2020
Tableau Developer
Gathering requirements and delivering analytical projects that provide data democratization to the healthcare account IT service team
Highlights
  • Designed and distributed ~50 operations and financial KPI reports to executives and leaders resulting in the reduction of outstanding IT tickets by 70% using Excel, Tableau, SQL Server and Alteryx
  • Reduced time to deliver data insights to the operations team by 90% with a new reporting process that automates existing ad-hoc reports from Excel into interactive and dynamic dashboards in Tableau
Dallas, Texas
March 2017 – August 2017
Business Analyst Intern
Worked with Director of Business Value Delivery to create business insights and sales portfolio through operations and financial KPIs
Highlights
  • Collected, cleaned, and prepared data from financial reports of 200 companies to calculate 20 different business and supply chain KPIs ranging from Profitability to Efficiency Indicators: Profits Margin, Cash Conversion Cycle, DIO (Days Inventory Outstanding), DSO (Days Sales Outstanding), DPO (Days Payable Outstanding), etc.
  • Developed financial and supply chain KPI dashboards using Excel (VBA), PowerPoint, and QlikView for sales consultants in the USA and EU to leverage the service quality and product offerings to potential and existing clients

Awards

  • January 2022

    Walmart

    Excellence Award

    Consistent and examplary demonstration of aspiration with significant contribution to business

  • August 2017

    UT Dallas

    Graduation with Highest Distinction

    Summa cum laude

  • April 2016

    UT Dallas

    First-place Winner in 2 Supply Chain Case Competitions

    Competed with 20 teams from universities in Dallas in supply chain optimization case studies

  • April 2017

    APICS Houston

    Fourth-place APICS Terra Grande Competition

    Competed with teams from Universities in North Texas in the Fresh Connection Simulation Competition

  • November 2016

    Informs, UT Dallas

    Third-place Operations Competition

    Forecasted 2-year demand for a small business with Holt-Winters and ARIMA method using R

  • July 2016

    UT Dallas

    Dean's List

    Demonstrated academic achievements through GPA and competitions

  • April 2016

    Hong Kong Shanghai Banking Corporation (HSBC) Vietnam

    Quarterly Sales Award

    Awarded to the top sales and service associates with best sale results quarterly

Contact

Seattle, Washington US
LinkedIn

Education

  • 2021 2022

    DeepLearning AI

    Certification

    Neural Network & Deep Learning

    Courses
    • Deep Learning
    • Artificial Neural Network
    • Backpropagation
    • Python Programming
    • Neural Network Architecture
  • 2019 2020

    Udacity

    Certification

    Data Scientist NanoDegree

    Courses
    • Supervised Learning
    • Deep Learning
    • Unsupervised Learning
    • Software Engineer for Data Scientist
    • Data Engineering
    • Experiment Design & Recommendation
    • Data Science with Big Data
  • 2015 2017

    University of Texas at Dallas

    Masters

    Supply Chain Management

    Grade: 3.97

    Courses
    • Business Data Warehouse
    • Advanced Analytics with SAS I/II
    • Operations Management
    • Statistics
    • Prescriptive Analytics
  • 2015 2016

    Udacity

    Certification

    Data Analyst NanoDegree

    Courses
    • Descriptive & Inferential Statistics
    • Intro to Data Analysis
    • Data Wrangling
    • Intro to Machine Learning
    • Data Visualization in Tableau
    • Matrix Maths & Numpy with Python

Skills

Data Platform Engineering Proficient
Data Pipeline Azure Synapse Analytics Airflow Hadoop/PySpark GCP BigQuery Azure Data Lake Teradata SQL Data Warehouse Kafka/CDC/Data Streaming Elastic Search
Data Analytics/Data Science Proficient
Python Tableau Machine Learning Numpy Pandas Scikitlearn Pytorch Tensorflow Jupyter Notebook Pyspark
Cloud & DevOps Working
Docker Kubernetes Azure DevOps Bitbucket Linux/Shell Terraform
App Development Basic
HTML CSS Javascript Hugo Django Firebase NodeJS