Yashwanth Pandi
Data Engineer | Pipeline Architect | Infrastructure Enthusiast
I build automated systems that ensure data is high-quality, accessible, and ready for scale. Transitioning from Web Development, I focus on the “plumbing” of the data world: creating idempotent ETL pipelines, implementing data observability, and designing cost-effective cloud architectures. I believe that a data platform is only as good as its worst failure—so I build for reliability.
Core Competencies
- Data Modeling: Star/Snowflake Schema, Vault 2.0, Slowly Changing Dimensions (SCD).
- Distributed Systems: Tuning Spark jobs, partitioning strategies, and handling skew.
- Data Governance: Implementing data lineage, cataloging (Amundsen/DataHub), and RBAC.
- DevOps for Data: CI/CD for SQL/Python, infrastructure as code, and containerization.
Tech Stack and Philosophy
| Domain | Tools | Philosophy |
|---|---|---|
| Orchestration | Airflow, Prefect, Cron | Avoid “Spaghetti DAGs”; build modular tasks. |
| Transformation | dbt, SQL, PySpark | Use version control for everything (Data-as-Code). |
| Storage | S3, Redshift, Snowflake | Schema-on-write for quality; Schema-on-read for speed. |
| Infrastructure | Docker, Terraform, Git | Infrastructure should be reproducible and documented. |