Yashwanth Pandi

Data Engineer | Pipeline Architect | Infrastructure Enthusiast

I build automated systems that ensure data is high-quality, accessible, and ready for scale. Transitioning from Web Development, I focus on the “plumbing” of the data world: creating idempotent ETL pipelines, implementing data observability, and designing cost-effective cloud architectures. I believe that a data platform is only as good as its worst failure—so I build for reliability.

Core Competencies

Data Modeling: Star/Snowflake Schema, Vault 2.0, Slowly Changing Dimensions (SCD).
Distributed Systems: Tuning Spark jobs, partitioning strategies, and handling skew.
Data Governance: Implementing data lineage, cataloging (Amundsen/DataHub), and RBAC.
DevOps for Data: CI/CD for SQL/Python, infrastructure as code, and containerization.

Tech Stack and Philosophy

Domain	Tools	Philosophy
Orchestration	Airflow, Prefect, Cron	Avoid “Spaghetti DAGs”; build modular tasks.
Transformation	dbt, SQL, PySpark	Use version control for everything (Data-as-Code).
Storage	S3, Redshift, Snowflake	Schema-on-write for quality; Schema-on-read for speed.
Infrastructure	Docker, Terraform, Git	Infrastructure should be reproducible and documented.