Yashwanth Pandi

Data Engineer | Pipeline Architect | Infrastructure Enthusiast

I build automated systems that ensure data is high-quality, accessible, and ready for scale. Transitioning from Web Development, I focus on the “plumbing” of the data world: creating idempotent ETL pipelines, implementing data observability, and designing cost-effective cloud architectures. I believe that a data platform is only as good as its worst failure—so I build for reliability.

Core Competencies

  • Data Modeling: Star/Snowflake Schema, Vault 2.0, Slowly Changing Dimensions (SCD).
  • Distributed Systems: Tuning Spark jobs, partitioning strategies, and handling skew.
  • Data Governance: Implementing data lineage, cataloging (Amundsen/DataHub), and RBAC.
  • DevOps for Data: CI/CD for SQL/Python, infrastructure as code, and containerization.

Tech Stack and Philosophy

Domain Tools Philosophy
Orchestration Airflow, Prefect, Cron Avoid “Spaghetti DAGs”; build modular tasks.
Transformation dbt, SQL, PySpark Use version control for everything (Data-as-Code).
Storage S3, Redshift, Snowflake Schema-on-write for quality; Schema-on-read for speed.
Infrastructure Docker, Terraform, Git Infrastructure should be reproducible and documented.