Hi, I'm
Aspiring Data Engineer & Analyst
Masters student in Data Science at the University of Maryland, College Park. Passionate about turning raw data into real-world impact from real-time pipelines to deep learning models and generative AI.
I'm a data science professional with a background in Computer Science Engineering (AI & ML specialization) from SRM University, now completing my Masters at the University of Maryland. With hands-on experience in data engineering, pipeline development, and analytics, I'm passionate about building efficient data systems and translating complex datasets into clear, actionable insights.
My current research focus is Scalable Data Engineering & Pipeline Optimization exploring how modern orchestration frameworks and cloud-native architectures can maximize throughput, reliability, and observability in large-scale data workflows. I've won a Best Paper Award at an international conference and ranked in the top 10 out of 2,500 participants in a national Computer Vision hackathon.
University of Maryland, College Park
SRM University
Efficient Deepfake Image Detection Using Dense CNN Architecture
College Park, MD, USA
A selection of data science and engineering work. View all on GitHub
Benchmarked a distributed AI orchestration system (Claude + MCP) across Google Colab, AWS EC2, and AWS ECS Fargate under 50–100 concurrent users. ECS achieved 457 RPS with 50% fewer resources than EC2 (460 RPS), validating serverless containers for scalable agentic AI.
Designed a closed-loop neuro-symbolic pipeline combining LLM fluency with deterministic entity verification. Achieved 100% preservation of critical medical entities (dosages, vitals) across 300 texts vs ~99.7% for pure neural baselines (BART, T5).
Developed a binary deepfake image classifier using a Dense CNN architecture with data augmentation and dropout regularization. Leveraged TensorFlow and OpenCV for preprocessing and feature extraction, trained across 5 diverse datasets achieving 97% accuracy on unseen data. Published at IRCCTSD 2024.
Built a real-time Bitcoin monitoring system using Python and CoinGecko API with automated data ingestion via Power Query. Delivered an interactive Power BI dashboard featuring 7/30/90-day moving averages, volatility metrics, and time series forecasting with scheduled auto-refresh.
Analyzed EV adoption patterns using GeoPandas and Scikit-learn to map infrastructure gaps across Washington state. Built Random Forest and Logistic Regression models achieving 98.42% CAFV eligibility accuracy and R²=0.918 for electric range prediction.
Performed end-to-end retail analytics on real-world e-commerce data using MySQL covering complex multi-table joins, window functions, aggregations, and customer segmentation to surface actionable sales and product performance insights.
Architected a production-grade ETL pipeline on Snowflake following the Medallion architecture (Bronze → Silver → Gold). Implemented data quality checks, schema evolution, and query optimization using SQL and Snowflake-native features for scalable analytics.
Designed a scalable data lake pipeline on Databricks using Apache Spark and Delta Lake with Medallion architecture. Performed large-scale transformations via Spark SQL, enabling reliable, versioned, and query-optimized data for downstream analytics workloads.
Graduate Student Data Analyst · College Park, MD
Data Engineer Intern · Remote
Teaching Assistant · Data Science Research Program
Generative AI Intern · Chennai, India
Data Analyst · Remote, India
International Research Conference on Computing Technologies for Sustainable Development (IRCCTSD 2024)
Communications in Computer and Information Science, vol 2361. Springer, Cham.
Deepfakes innovative manipulations of digital visual content using deep learning methods have emerged as a significant threat, raising concerns about misinformation and privacy violations. Their influence spans social media, political discourse, and beyond, highlighting the urgent need for robust detection tools. This research addresses the deepfake threat through a thorough examination of binary classification techniques. Focused on distinguishing genuine from manipulated images, the study leverages diverse datasets to train and evaluate methods utilizing Convolutional Neural Networks (CNNs) with emphasis on spatial feature extraction. Experimental results demonstrate the model's effectiveness in detecting manipulated images across various scenarios, achieving 97% accuracy on unseen data.
"Efficient Deepfake Image Detection Using Dense CNN Architecture"
Ranked in the top 10 out of 2,500 participants in Proglint's Alliance University Computer Vision Hackathon.
Download my full resume to see my complete experience, skills, and qualifications.
Download ResumeI'm always open to interesting conversations, collaborations, or new opportunities.