5+ years transforming data chaos into actionable insights with Python, PySpark, Azure, and modern data stacks
I'm a data engineer based in Boston with a passion for building production-grade ETL systems that process millions of records efficiently. My background in Cloud Engineering gives me a unique perspective on building scalable, reliable data infrastructure.
Currently, I'm actively pursuing Data Engineer positions in the Boston area while building my professional brand through technical projects and LinkedIn content. I specialize in designing and implementing dimensional data models, optimizing query performance, and mentoring teams on modern data practices.
When I'm not engineering data pipelines, you'll find me training calisthenics at Franklin Park, exploring Afrobeats music, or planning my next travel adventure.
Expert in Python, SQL, PySpark, Airflow, and modern cloud platforms
Specializing in medallion architecture and dimensional modeling
Proficient with Azure services, containerization, and CI/CD pipelines
Building enterprise-grade systems with logging, monitoring, and error handling
Fully containerized real-time data pipeline solving the problem of no real-time visibility into stock price movements. Streams data from Alpha Vantage API through Kafka and Spark into PostgreSQL every 5 minutes. Live Grafana dashboard visualizes closing prices, price spreads, and ingestion metrics. Entire 9-service stack launches with a single docker compose command.
Production-ready ETL pipeline transforming severely corrupted sales data into analytics-ready format. Processed 37,432 records, reconstructed 1,385 missing SKUs with 100% recovery rate, and implemented star schema with 11 automated quality checks.
Integrated data pipeline combining Salesforce, Stripe, and Google Sheets data. Built automated workflows for data reconciliation and reporting. Implemented error handling and retry logic for production reliability.
ETL system integrating NYC Citi Bike data with weather patterns. Built dimensional models for analysis. Scheduled daily orchestration using Apache Airflow with automated data quality checks.
Web scraping solution extracting company data from UK government business registry. Built robust parsing logic with error recovery. Automated daily data collection with intelligent caching.
Enterprise database system deployed on Google Cloud Platform. Designed a 7-table normalized schema (3NF), populated with 21,000+ rows of synthetic data using Python and Faker, and wrote analytics queries answering key business questions.
I'm currently open to opportunities in Boston and actively seeking Data Engineer positions. Whether you have a question or just want to say hi, feel free to reach out!