Hi, I'm
Data Engineer & Analytics Practitioner
Data Science graduate from the University of Maryland, College Park. Passionate about building production-ready data systems, from real-time pipelines and scalable cloud architectures to applied machine learning and generative AI.
I am a Data Science professional focused on data engineering and applied machine learning, with a background in Computer Science Engineering (AI & ML specialization) from SRM University and a Master's in Data Science from the University of Maryland. With hands-on experience in pipeline development and production ML workflows, I build efficient data systems and translate complex datasets into decisions that teams can act on.
My current research focus is Scalable Data Engineering and Pipeline Optimization, exploring how modern orchestration frameworks and cloud-native architectures can maximize throughput, reliability, and observability in large-scale data workflows.
I have received a Best Paper Award at an international conference for research on dense CNN architectures for deepfake detection, and ranked in the top 10 out of 2,500 participants in a national Computer Vision hackathon during my undergraduate studies at SRM.
University of Maryland, College Park
SRM University
Efficient Deepfake Image Detection Using Dense CNN Architecture
College Park, MD, USA
A selection of data science and engineering work. View all on GitHub
Built a machine learning pipeline to predict patient/appointment no-show risk from EHR-style scheduling data using Random Forest and Logistic Regression models, enabling proactive scheduling interventions and reduced manual review effort.
Benchmarked a distributed AI orchestration system (Claude + MCP) across Google Colab, AWS EC2, and AWS ECS Fargate under 50–100 concurrent users. ECS achieved 457 RPS with 50% fewer resources than EC2 (460 RPS), validating serverless containers for scalable agentic AI.
Designed a closed-loop neuro-symbolic pipeline combining LLM fluency with deterministic entity verification. Achieved 100% preservation of critical medical entities (dosages, vitals) across 300 texts vs ~99.7% for pure neural baselines (BART, T5).
Developed a binary deepfake image classifier using a Dense CNN architecture with data augmentation and dropout regularization. Leveraged TensorFlow and OpenCV for preprocessing and feature extraction, trained across 5 diverse datasets achieving 97% accuracy on unseen data. Published at IRCCTSD 2024.
Built a real-time Bitcoin monitoring system using Python and CoinGecko API with automated data ingestion via Power Query. Delivered an interactive Power BI dashboard featuring 7/30/90-day moving averages, volatility metrics, and time series forecasting with scheduled auto-refresh.
Analyzed EV adoption patterns using GeoPandas and Scikit-learn to map infrastructure gaps across Washington state. Built Random Forest and Logistic Regression models achieving 98.42% CAFV eligibility accuracy and R²=0.918 for electric range prediction.
Performed end-to-end retail analytics on real-world e-commerce data using MySQL covering complex multi-table joins, window functions, aggregations, and customer segmentation to surface actionable sales and product performance insights.
Architected a production-grade ETL pipeline on Snowflake following the Medallion architecture (Bronze → Silver → Gold). Implemented data quality checks, schema evolution, and query optimization using SQL and Snowflake-native features for scalable analytics.
Designed a scalable data lake pipeline on Databricks using Apache Spark and Delta Lake with Medallion architecture. Performed large-scale transformations via Spark SQL, enabling reliable, versioned, and query-optimized data for downstream analytics workloads.
Python
SQL
R
Bash
Scala
Apache Spark
dbt
Apache Airflow
Kafka
ETL/ELT Pipelines
Data Warehousing
Databricks
Snowflake
Hadoop
Amazon Web Services (AWS)
Amazon S3
AWS Glue
Amazon Redshift
AWS Step Functions
Amazon EMR
Amazon Athena
Amazon CloudWatch
Docker
CI/CD
Matplotlib
Seaborn
Tableau
Power BI
Amazon QuickSight
KPI Tracking
Regression
Hypothesis Testing
Excel
Scikit-learn
XGBoost
TensorFlow
PyTorch
Keras
SpaCy / NLP
Feature Engineering
Anomaly Detection
A/B Testing
LangChain
Hugging Face
MySQL
PostgreSQL
MongoDB
SQLite
NoSQL
Git / GitHub
Flask / FastAPI
Time Series Analysis
MLOps
Geospatial Analysis
Research Analyst · College Park, MD
Data Engineer Intern · Remote
Teaching Assistant · Data Science Research Program
Generative AI Intern · Chennai, India
Data Analyst · Remote, India
International Research Conference on Computing Technologies for Sustainable Development (IRCCTSD 2024)
Communications in Computer and Information Science, vol 2361. Springer, Cham.
Deepfakes innovative manipulations of digital visual content using deep learning methods have emerged as a significant threat, raising concerns about misinformation and privacy violations. Their influence spans social media, political discourse, and beyond, highlighting the urgent need for robust detection tools. This research addresses the deepfake threat through a thorough examination of binary classification techniques. Focused on distinguishing genuine from manipulated images, the study leverages diverse datasets to train and evaluate methods utilizing Convolutional Neural Networks (CNNs) with emphasis on spatial feature extraction. Experimental results demonstrate the model's effectiveness in detecting manipulated images across various scenarios, achieving 97% accuracy on unseen data.
"Efficient Deepfake Image Detection Using Dense CNN Architecture"
Ranked in the top 10 out of 2,500 participants in Proglint's Alliance University Computer Vision Hackathon.
Download my full resume to see my complete experience, skills, and qualifications.
Download ResumeI'm always open to interesting conversations, collaborations, or new opportunities.