Abhishek Rithik Origanti

About Me

I am a Data Science professional focused on data engineering and applied machine learning, with a background in Computer Science Engineering (AI & ML specialization) from SRM University and a Master's in Data Science from the University of Maryland. With hands-on experience in pipeline development and production ML workflows, I build efficient data systems and translate complex datasets into decisions that teams can act on.

My current research focus is Scalable Data Engineering and Pipeline Optimization, exploring how modern orchestration frameworks and cloud-native architectures can maximize throughput, reliability, and observability in large-scale data workflows.

I have received a Best Paper Award at an international conference for research on dense CNN architectures for deepfake detection, and ranked in the top 10 out of 2,500 participants in a national Computer Vision hackathon during my undergraduate studies at SRM.

View Resume LinkedIn

Education

Masters in Data Science

University of Maryland, College Park

2024 – 2026

B.Tech — CS (AI & ML Specialization)

SRM University

2020 – 2024

Best Paper Award · ICCTSD-2024

Efficient Deepfake Image Detection Using Dense CNN Architecture

Location

College Park, MD, USA

Projects

A selection of data science and engineering work. View all on GitHub

Appointment No-Show Prediction

Built a machine learning pipeline to predict patient/appointment no-show risk from EHR-style scheduling data using Random Forest and Logistic Regression models, enabling proactive scheduling interventions and reduced manual review effort.

PythonScikit-learnRandom ForestPandas

View on GitHub

TravelGenie - AI-Powered Travel Planning Ecosystem

Benchmarked a distributed AI orchestration system (Claude + MCP) across Google Colab, AWS EC2, and AWS ECS Fargate under 50–100 concurrent users. ECS achieved 457 RPS with 50% fewer resources than EC2 (460 RPS), validating serverless containers for scalable agentic AI.

Claude AIMCPAWS ECSAWS EC2Slack BotPython

SafeSim - Neuro-Symbolic Medical Text Simplification

Designed a closed-loop neuro-symbolic pipeline combining LLM fluency with deterministic entity verification. Achieved 100% preservation of critical medical entities (dosages, vitals) across 300 texts vs ~99.7% for pure neural baselines (BART, T5).

LLMsBARTT5SpaCyNLPPython

Best Paper Award

Efficient Deep Fake Image Detection Using Dense CNN Architecture

Developed a binary deepfake image classifier using a Dense CNN architecture with data augmentation and dropout regularization. Leveraged TensorFlow and OpenCV for preprocessing and feature extraction, trained across 5 diverse datasets achieving 97% accuracy on unseen data. Published at IRCCTSD 2024.

KerasOpenCVCNNDeep Learning

View Publication

Real-Time Bitcoin Data Pipeline & Power BI Dashboard

Built a real-time Bitcoin monitoring system using Python and CoinGecko API with automated data ingestion via Power Query. Delivered an interactive Power BI dashboard featuring 7/30/90-day moving averages, volatility metrics, and time series forecasting with scheduled auto-refresh.

PythonPower BIPandasTime Series

Predicting EV Trends & Charging Infrastructure

Analyzed EV adoption patterns using GeoPandas and Scikit-learn to map infrastructure gaps across Washington state. Built Random Forest and Logistic Regression models achieving 98.42% CAFV eligibility accuracy and R²=0.918 for electric range prediction.

Scikit-learnGeoPandasSeabornML

Retail Data Analysis with SQL

Performed end-to-end retail analytics on real-world e-commerce data using MySQL covering complex multi-table joins, window functions, aggregations, and customer segmentation to surface actionable sales and product performance insights.

SQLMySQLData AnalysisETL

End-to-End Data Pipeline with Snowflake

Architected a production-grade ETL pipeline on Snowflake following the Medallion architecture (Bronze → Silver → Gold). Implemented data quality checks, schema evolution, and query optimization using SQL and Snowflake-native features for scalable analytics.

SnowflakeSQLData EngineeringETL

Scalable Pipelines with Databricks

Designed a scalable data lake pipeline on Databricks using Apache Spark and Delta Lake with Medallion architecture. Performed large-scale transformations via Spark SQL, enabling reliable, versioned, and query-optimized data for downstream analytics workloads.

DatabricksApache SparkSpark SQLDelta Lake

Professional Experience

UMD Counseling Center

Research Analyst · College Park, MD

Sep 2024 – Present

Consolidated 4 to 5 campus data sources into a structured PostgreSQL and Python data model, validating 10+ years of FERPA and HIPAA-compliant records to enable demand modeling and operational reporting across institutional workflows.
Translated requirements from 40 to 50 clinicians into SQL, Excel, and Tableau dashboards, delivering quarterly utilization reports that improved scheduling efficiency by 25% and informed resource allocation decisions.
Trained Random Forest and Logistic Regression models on EHR-style appointment data to predict no-show risk, achieving 84.7% accuracy with ROC-AUC 0.81 and cutting manual effort by 30% across 3 automated workflows.
Applied NLP (TF-IDF, sentiment analysis) to 4,000+ student feedback responses and ran t-tests and chi-square analyses, surfacing service quality patterns presented via Power BI to the VP for accreditation reporting.

PythonMySQLTableauPandas

Interlinked Corp

Data Engineer Intern · Remote

May 2025 – Sep 2025

Architected a Medallion lakehouse on AWS S3 using Kafka for ingestion, PySpark for distributed transformation, and AWS Glue for orchestration, processing 2M+ daily records and cutting ETL latency by 85% (3 hrs to 25 min).
Established GitHub Actions CI/CD pipelines for nightly data validation runs, artifact versioning, and failure alerts, cutting p95 inference latency under 350 ms by streamlining feature pipelines and adding prediction caching.
Benchmarked Redshift schema configurations including sort keys and distribution styles via A/B testing, improving dashboard query performance by 40% and accelerating business reporting turnaround for BI consumers.
Instrumented production pipeline monitoring via AWS CloudWatch and Airflow metrics across 5 DAGs, performing root cause analysis on incident signals to reduce unplanned downtime and maintain SLA compliance.

PythonApache AirflowPostgreSQLGeoPandasDocker

The Coding School

Teaching Assistant · Data Science Research Program

Jun 2025 – Aug 2025

Supported 20+ high school students through end-to-end data science research in Python, Scikit-learn, and TensorFlow, accompanying them from initial EDA through model evaluation and final presentations.
Wove Git/GitHub into the core curriculum and walked students through regression, classification, and clustering workflows, contributing to a 15% improvement in overall project quality by the end of the program.
Facilitated workshops on experiment design and error analysis, and ran mock client reviews modelled on consulting practice - helping students articulate their methodology with clarity and professional confidence.

PythonScikit-learnTensorFlowJupyterGit

Open Weaver

Generative AI Intern · Chennai, India

Jul 2023 – Sep 2023

Conceived and delivered a voice-to-image generator using GANs and CLIP in TensorFlow, achieving 95% accuracy in mapping voice inputs to visual outputs across large and varied test datasets.
Reduced model training time by 30% through targeted hyperparameter tuning, regularization strategies, and GPU acceleration preserving 98% accuracy and cross-dataset scalability throughout.
Assembled large-scale training datasets via web scraping with Beautiful Soup and maintained SQL/MySQL databases to keep AI pipelines reproducible, well-documented, and accessible to the broader team.

GANsCLIPTensorFlowHugging FacePower BI

GANfinity.AI

Data Analyst · Remote, India

Dec 2022 - Jun 2023

Developed and deployed an AI-enabled Fintech B2B cloud application with ML-powered financial risk prediction, embedding anomaly detection models that elevated fraud detection capabilities by 90%.
Evaluated Gradient Boosting, XGBoost, and Random Forest classifiers against one another, then applied A/B testing on transaction flows to identify which approaches meaningfully improved conversion outcomes.
Revamped SQL-based ETL processes for centralized data management, tightening ingestion and transformation logic so downstream analytics teams consistently had clean, reliable data at their disposal.

XGBoostRandom ForestSQLTableauPython

Publications

Best Paper Award

Efficient Deep Fake Image Detection Using Dense CNN Architecture

International Research Conference on Computing Technologies for Sustainable Development (IRCCTSD 2024)
Communications in Computer and Information Science, vol 2361. Springer, Cham.

Deepfakes innovative manipulations of digital visual content using deep learning methods have emerged as a significant threat, raising concerns about misinformation and privacy violations. Their influence spans social media, political discourse, and beyond, highlighting the urgent need for robust detection tools. This research addresses the deepfake threat through a thorough examination of binary classification techniques. Focused on distinguishing genuine from manipulated images, the study leverages diverse datasets to train and evaluate methods utilizing Convolutional Neural Networks (CNNs) with emphasis on spatial feature extraction. Experimental results demonstrate the model's effectiveness in detecting manipulated images across various scenarios, achieving 97% accuracy on unseen data.

About Me

Masters in Data Science

B.Tech — CS (AI & ML Specialization)

Best Paper Award · ICCTSD-2024

Location

Projects

Appointment No-Show Prediction

TravelGenie - AI-Powered Travel Planning Ecosystem

SafeSim - Neuro-Symbolic Medical Text Simplification

Efficient Deep Fake Image Detection Using Dense CNN Architecture

Real-Time Bitcoin Data Pipeline & Power BI Dashboard

Predicting EV Trends & Charging Infrastructure

Retail Data Analysis with SQL

End-to-End Data Pipeline with Snowflake

Scalable Pipelines with Databricks

Skills

Programming Languages

Data Engineering

Cloud & Infrastructure

Data Visualization

Machine Learning & AI

Databases

Other Skills

Professional Experience

UMD Counseling Center

Interlinked Corp

The Coding School

Open Weaver

GANfinity.AI

Publications

Efficient Deep Fake Image Detection Using Dense CNN Architecture

Certifications & Awards

AWS Certified Solutions Architect – Associate

Oracle Database Foundations

MongoDB Basics

Neural Networks and Deep Learning

AI For Everyone

Machine Learning Introduction for Everyone

Introduction to Machine Learning

Awards & Recognition

Best Paper Award · ICCTSD-2024

Top 10 · Proglint CV Hackathon 2023

Want to Know More?

Get In Touch

Email

LinkedIn

GitHub

Abhishek Rithik
Origanti