Abraham Vargas

Tampa, US

abevargas1@outlook.com

813-370-7867

Active Top Secret Clearance with SCI access. Data Engineer with extensive expertise in cloud technologies, including AWS and Databricks. Skilled in designing and optimizing data workflows using Spark (PySpark, Spark SQL) and Delta Lake. Proficient in Python and SQL, delivering robust, scalable solutions for complex data challenges.

Experience

Senior Data Engineer

March 2023

—

February 2025

TG Federal

As Senior Data Engineer for Customs and Border Protection (CBP) under the United States Department of Homeland Security (DHS), spearheaded the optimization and modernization of data pipelines in Databricks on AWS. Leveraged Delta Lake, PySpark, and Spark SQL to implement scalable solutions for real-time and batch data processing, while streamlining legacy data workflows. Regularly collaborated with internal teams and external database administrators to ensure data quality, accessibility, and security across the organization.

Refactored an application to eliminate Delta concurrency issues, improving processing times from ~5 minutes to milliseconds.
Imported legacy data from Excel files into a streaming pipeline using medallion architecture for bronze, silver, and gold tables.
Implemented Spark streaming to optimize Kafka ingestion and streamline temporary storage with Parquet tables.
Migrated notebook code into a production-ready Python solution with testing and version control using GitLab.
Designed a scalable ID system to track records, solving duplicate conflicts and ensuring long-term data scalability.
Collaborated with Oracle DBAs to ensure secure and seamless data integration into Delta tables.
Troubleshot data conversion between Delta Lake and Cassandra tables, helping teams address technical challenges.
Taught teammates Databricks CLI and AWS CLI to enhance automation and reduce reliance on web APIs.
Generated Databricks usage reports, giving leadership insights into cluster performance and cost efficiency.
Resolved data discrepancies, maintaining high-quality datasets for shared use.

Senior Data Engineer

March 2020

—

March 2023

Iron EagleX

Led data engineering initiatives for the Command Chief Digital and Artificial Intelligence Office (CDAO) of USSOCOM. Contributed to the team's adoption of Databricks on AWS by leveraging expertise in Spark, PySpark, and Spark SQL. Collaborated closely with data scientists to process and analyze classified data, while also driving knowledge sharing, cloud migration, and operational efficiency. Focused on supporting mission-critical objectives through innovative workflows and scalable solutions.

Developed and optimized storage layouts for classified data by working with data scientists to ensure task-specific processing needs were met using bronze, silver, and gold table structures.
Converted legacy R and SQL Server code into PySpark and Spark SQL as part of an ongoing AWS cloud migration effort.
Traveled to sites to meet mechanics and pilots, gathering insights to improve data collection for predictive maintenance.
Standardized LaTeX usage for USSOCOM CDAO, creating an official template to document mathematical proofs and technical processes.
Gave periodic training sessions on Databricks, Spark, and best practices, helping teammates leverage cloud tools effectively.
Built a Databricks pipeline for flight data, reducing over a month of manual work to under two days.
Evaluated analytics products, writing reports on usability and feasibility for USSOCOM's unique operational needs.
Helped troubleshoot machine learning projects in Python, resolving technical challenges and optimizing code efficiency.
Created a local Spark environment for unit testing and debugging in Python, enabling workflows outside of Databricks.
Converted aircraft sensor data stored in MongoDB into Delta Tables, facilitating access for data scientists and enhancing data analysis capabilities.

Data Engineer

March 2017

—

April 2019

The Nielsen Company

Designed and implemented scalable data solutions as a Data Engineer. Migrated legacy data systems to modern cloud environments, developed tools to automate critical processes, and optimized workflows to reduce operational bottlenecks. Collaborated with directors and data science teams to support efficient data storage, analysis, and quality assurance processes. Proficient in reading and understanding Java code, experienced with Hadoop and MapReduce, familiar with access modifiers and class structures. Proficient in reading and understanding C# applications, familiar with its similarities to Java. Used SQL Alchemy to create applications in Python that utilized PostgreSQL and SQLite databases.

Developed pipelines to migrate data from legacy systems (flat files, mainframe, traditional databases) into cloud environments like Databricks, Hadoop, and Hive on AWS.
Improved QA processes for Data Science teams by reducing the time from 15 days to under 2 hours by creating a Python package that leveraged Spark and Hive to automatically scan new data for issues.
Partnered with directors and data science teams to create and administer a PostgreSQL database, determining ongoing data requirements and streamlining data abstraction from mainframe and UNIX servers.
Automated data transformation using Python scripts, reducing processing time from over 2 days to less than 15 minutes.
Conducted data analysis for business decisions.
Developed dashboards using Tableau based on leadership specifications.
Managed ETL processes ensuring data integrity.
Collaborated with cross-functional teams.
Provided written communication presenting findings.

Data Analyst

March 2015

—

March 2017

The Nielsen Company

Promoted to the position to develop Python applications to enhance production initiatives and quality control functions. Primary duties included auditing current SPSS code, researching fluctuations in trends, and translating new survey questions into Spanish. Proofread company-wide documentation. Analyzed mobile carrier and device data for market penetration and other components. Reported findings to product managers in the form of Tableau and Excel visualizations.

Created a Python script that read Excel documents and performed all business requirements and summarized any deficiencies, resulting in a simple, accurate script execution that took less than a minute versus 16-hours.
Developed a multi-language script utilizing Python to read Excel sheets and to discern which records required further research, formatted summaries, and SPSS code for investigative purposes, concluding in the ability to generate Excel reports for trend breaks in less than a minute with no manual activities by analysts.

Bilingual Survey Writer

March 2014

—

March 2015

The Nielsen Company

Developed a web-based application using HTML/CSS, JavaScript, and MySQL to store and manage survey questions and responses. Utilized natural language processing (NLP) techniques to classify survey questions and answers into different contexts. Designed and implemented a MySQL database to store categorized survey questions and answers. Received an award for developing the application. Applied NLP techniques to categorize verbs, nouns, and other elements within survey questions. Demonstrated strong teamwork and project management skills.

Created a web-based application using HTML/CSS, JavaScript, and MySQL.
Utilized NLP techniques to classify survey questions based on scene content.
Designed a MySQL database for easy retrieval of survey components.
Received an award for developing the application.
Applied NLP techniques for contextual analysis.
Demonstrated strong teamwork skills.

Education

University of South Florida

August 2012

—

December 2014

Bachelor in Statistics, Minor in Technical Writing

Languages

Spanish:

Native or Bilingual Proficiency

English:

Native or Bilingual Proficiency

Portuguese:

Conversational Proficiency

Skills

Programming & Scripting:

Python (expert, primarily with PySpark, object-oriented programming), Bash (automation for Databricks CLI and AWS CLI), Scala (fundamental knowledge for Spark programming), SQL (Spark SQL, Oracle SQL, PostgreSQL administration, SQL Alchemy), Java (proficient in reading and understanding Java code), C# (proficient in reading and understanding C# applications), C/C++ (high-level understanding of pointers, experienced in reading and understanding C and C++ code)

Data Engineering & Cloud Platforms:

Databricks (on AWS, with Delta Lake/Tables), Spark, ETL Pipelines, HiveSQL, AWS, Hadoop, Relational Databases (PostgreSQL, SQL Server, Oracle), NoSQL Databases (MongoDB, Cassandra)

Web Development:

HTML/CSS, JavaScript, PHP

Technical Writing & Documentation:

Sphinx Documentation, Python Docstrings (PEP 257) for API documentation, LaTeX

Operating Systems:

GNU/Linux (Debian/Ubuntu, Red Hat/CentOS, Fedora), UNIX/BSD (FreeBSD, OpenBSD, Solaris)

Data Science & Machine Learning:

AI/ML Ops, Natural Language Processing (NLP), Predictive Maintenance, Data Analysis, Model Development and Deployment

Project Management & Collaboration:

Git (GitLab, GitHub), Jira, CI/CD (Jenkins)

Interests

Operating Systems:

Exploration and configuration of diverse operating systems, Concepts of operating system kernels, Headless server setups and applications

Networking & Security:

Interest in secure communication protocols and supporting Tor Relay Operations for global network privacy.

Computing:

System Architecture, Computing History, Software Ecosystems