Skip to main contentSkip to navigationSkip to about sectionSkip to skills sectionSkip to experience sectionSkip to contact sectionSkip to AI assistant
Bhavya Gada

Bhavya Gada

Data Engineer

2+ years building high-throughput data solutions on GCP, AWS & Azure

About Me

🛠️ Behind the Data & AI 🧑‍💻

Hello! I'm Bhavya, a Data Engineer with 2+ years of experience building and operating high-throughput data solutions across GCP, AWS, and Azure. Expert in BigQuery architecture, including partitioned/clustered tables, Stored Procedures, UDFs, and Nested/Repeated schemas. Proven track record of re-architecting CDC ingestion and distributed processing to scale data volumes 10× while meeting strict accuracy, latency, and SLA requirements. Certified Google Cloud Professional Data Engineer, Cloud Architect, and AWS Solutions Architect with deep expertise in ANSI SQL, Python, and SDLC-driven delivery.

🚀 Current: Data Engineer @ UPS (Contract) Jul 2024 - Present

Own and operate mission-critical BigQuery datasets, supporting downstream financial insurance reporting and audits with strict accuracy and 99.9% SLA guarantees. Implemented column-based partitioning and clustering in BigQuery for large analytical tables, reducing data scan volume by 70% and lowering query costs. Designed scalable CDC-based ingestion pipelines; re-architected workflows to enable reliable processing of 10× higher data volumes. Developed BigQuery Stored Procedures and JavaScript UDFs for deduplication, late-arriving updates, and incremental MERGE logic for partitioned fact tables.

BigQueryDataflow & DataprocCloud StorageGreat Expectations

🔒 AI/Privacy Data Engineer @ Ardent Privacy Jul 2023 - Jun 2024

Designed and operated compliance-centric data pipelines powering enterprise privacy operations across healthcare, finance, and government clients. Modeled high-volume event data using BigQuery-native Nested and Repeated fields (STRUCTs/ARRAYs) to preserve source fidelity. Built event-driven ingestion pipelines using Pub/Sub and GCS-native messaging patterns with exactly-once semantics. Integrated Vertex AI assisted data discovery and classification workflows using controlled prompts to accelerate DSAR processing.

🎓 Software Developer & Graduate Assistant @ UMBC Sep 2022 - Jun 2023

Developed backend integrations and data services with Python, SQL, and dbt on GCP, enabling real-time reporting and reducing data-pipeline latency for operational teams. Implemented automated testing pipelines integrated into CI workflows to detect regressions before release. Mentored graduate students on applied data analysis and machine learning projects using Looker dashboards and Kafka streaming.

💻 Data Engineer @ Virtuals Designs Apr 2020 - Aug 2022

Designed CDC-based replication pipelines and batch ETL workflows using Python, Docker, and GCP, processing over 1 million daily records and adding data validation gates that improved data quality. Optimized data models and query performance through partitioning, indexing, and materialized views, cutting latency by 40%.

Tech Expertise

I deliver production-ready data engineering solutions with expertise across GCP, AWS, and Azure. Specialized in BigQuery, Redshift, and Synapse architecture with CDC-based ingestion pipelines. My focus is on building reliable, scalable data systems that support analytics, reporting, and compliance workloads.

GCP / BigQueryAWS / RedshiftAzure / SynapseSQL & PythonAirflow & dbtKafka & Pub/SubDatabricksDocker & K8s

🏆 Achievements & Certifications

  • 🥇IIT Bombay eYRC Finalist - Autonomous quadcopter rescue system
  • 🏆Smart India Hackathon National Finalist - Analytics dashboard for Adani Ports
  • ☁️Google Cloud Professional Data Engineer & Cloud Architect
  • 🧱Databricks Certified Professional Data Engineer
  • ☁️AWS Solutions Architect Professional
  • ⚙️Certified Kubernetes Administrator (CKA)
  • 🔒IAPP Certified Information Privacy Technologist (CIPT)

💬 Let's Connect!

I love connecting with engineers, students, and innovators! I share insights and provide advice on cloud architecture, data engineering, AI/ML, studying abroad, and career growth.

When I'm not coding or mentoring, you'll find me exploring new AI frameworks and staying at the cutting edge of responsible AI and MLOps.

2+ Years
Data Engineering Experience
Data Engineering
10× Scale
CDC throughput increase
Performance
Multi-Cloud
GCP, AWS & Azure
Cloud Platforms
HIPAA/GDPR
Privacy & Compliance
Compliance

Technical Skills

Hover over skills for details

Data Warehouses & Lakes

BigQuery
Redshift
Synapse
Databricks
Snowflake
Delta Lake

Cloud Platforms

GCP
AWS
Azure
Vertex AI
SageMaker
Dataproc/EMR

Programming & Orchestration

Python
SQL (BigQuery/ANSI)
Java
Airflow
dbt
Kafka

Data Security & Compliance

HIPAA
GDPR
SOC 2
PHIPA
PCI DSS
Data Governance

Monitoring & Operations

Cloud Monitoring
Cloud Logging
Grafana
Prometheus
Great Expectations
Incident Response

Delivery & Infrastructure

CI/CD
Docker
GKE/Kubernetes
Agile/Scrum
Jira & Confluence
Databricks

Key Achievements

Delivering impactful solutions at scale

BigQuery Architecture & CDC Pipelines

Re-architected CDC ingestion workflows enabling reliable processing of 10× higher data volumes while remaining within SLA using BigQuery, Dataflow, and Dataproc

10× throughput increase
99.9% SLA compliance

BigQuery Optimization

Implemented column-based partitioning and clustering in BigQuery for large analytical tables, reducing data scan volume by 70% and lowering query costs for finance and audit workloads

70% scan reduction
Cost optimization

Privacy & Compliance Engineering

Designed compliance-centric pipelines with Vertex AI assisted data discovery, data classification, masking, and pseudonymization aligned with HIPAA, GDPR, SOC 2, and PCI DSS

DSAR acceleration
Zero audit exceptions

BigQuery Stored Procedures & UDFs

Developed BigQuery Stored Procedures and JavaScript UDFs to handle deduplication, late-arriving updates, and incremental MERGE logic for partitioned fact tables

Automated ETL logic
Data consistency

Data Quality with Great Expectations

Implemented automated data quality validations using Great Expectations, preventing silent data corruption and downstream reporting defects in payroll-related datasets

Pre-release validation
Automated monitoring

GCP Monitoring & Operations

Integrated GCP-native metrics and logs (Cloud Monitoring, Cloud Logging) to monitor pipeline health, execution latency, and data freshness for rapid root-cause analysis

Real-time observability
Incident response

Education

Master of Science in Information Systems

University of Maryland Baltimore County

Get In Touch

Let's collaborate on your next big project

🚀

Ready to Build Something Amazing?

I'm always excited to discuss new opportunities, innovative projects, and ways to leverage data and AI to solve complex challenges.

Let's Connect