Hi, I'm

Abhishek Rithik
Origanti

Data Engineer & Analytics Practitioner

Data Science graduate from the University of Maryland, College Park. Passionate about building production-ready data systems, from real-time pipelines and scalable cloud architectures to applied machine learning and generative AI.

Abhishek Rithik Origanti
UMD · College Park

About Me

I am a Data Science professional focused on data engineering and applied machine learning, with a background in Computer Science Engineering (AI & ML specialization) from SRM University and a Master's in Data Science from the University of Maryland. With hands-on experience in pipeline development and production ML workflows, I build efficient data systems and translate complex datasets into decisions that teams can act on.

My current research focus is Scalable Data Engineering and Pipeline Optimization, exploring how modern orchestration frameworks and cloud-native architectures can maximize throughput, reliability, and observability in large-scale data workflows.

I have received a Best Paper Award at an international conference for research on dense CNN architectures for deepfake detection, and ranked in the top 10 out of 2,500 participants in a national Computer Vision hackathon during my undergraduate studies at SRM.

University of Maryland
Masters in Data Science

University of Maryland, College Park

2024 – 2026
SRM University
B.Tech — CS (AI & ML Specialization)

SRM University

2020 – 2024
Best Paper Award · ICCTSD-2024

Efficient Deepfake Image Detection Using Dense CNN Architecture

Location

College Park, MD, USA

Projects

A selection of data science and engineering work. View all on GitHub

Bitcoin Pipeline project preview
Real-Time Bitcoin Data Pipeline & Power BI Dashboard

Built a real-time Bitcoin monitoring system using Python and CoinGecko API with automated data ingestion via Power Query. Delivered an interactive Power BI dashboard featuring 7/30/90-day moving averages, volatility metrics, and time series forecasting with scheduled auto-refresh.

PythonPower BIPandasTime Series
EV Trends project preview
Predicting EV Trends & Charging Infrastructure

Analyzed EV adoption patterns using GeoPandas and Scikit-learn to map infrastructure gaps across Washington state. Built Random Forest and Logistic Regression models achieving 98.42% CAFV eligibility accuracy and R²=0.918 for electric range prediction.

Scikit-learnGeoPandasSeabornML
SQL Retail project preview
Retail Data Analysis with SQL

Performed end-to-end retail analytics on real-world e-commerce data using MySQL covering complex multi-table joins, window functions, aggregations, and customer segmentation to surface actionable sales and product performance insights.

SQLMySQLData AnalysisETL
Snowflake Pipeline project preview
End-to-End Data Pipeline with Snowflake

Architected a production-grade ETL pipeline on Snowflake following the Medallion architecture (Bronze → Silver → Gold). Implemented data quality checks, schema evolution, and query optimization using SQL and Snowflake-native features for scalable analytics.

SnowflakeSQLData EngineeringETL
Databricks Pipeline project preview
Scalable Pipelines with Databricks

Designed a scalable data lake pipeline on Databricks using Apache Spark and Delta Lake with Medallion architecture. Performed large-scale transformations via Spark SQL, enabling reliable, versioned, and query-optimized data for downstream analytics workloads.

DatabricksApache SparkSpark SQLDelta Lake

Skills

Programming Languages
Python SQL R Bash Scala
Data Engineering
Apache Spark dbt Apache Airflow Kafka ETL/ELT Pipelines Data Warehousing Databricks Snowflake Hadoop
Cloud & Infrastructure
Amazon Web Services (AWS) Amazon S3 AWS Glue Amazon Redshift AWS Step Functions Amazon EMR Amazon Athena Amazon CloudWatch Docker CI/CD
Data Visualization
Matplotlib Seaborn Tableau Power BI Amazon QuickSight KPI Tracking Regression Hypothesis Testing Excel
Machine Learning & AI
Scikit-learn XGBoost TensorFlow PyTorch Keras SpaCy / NLP Feature Engineering Anomaly Detection A/B Testing LangChain Hugging Face
Databases
MySQL PostgreSQL MongoDB SQLite NoSQL
Other Skills
Git / GitHub Flask / FastAPI Time Series Analysis MLOps Geospatial Analysis

Professional Experience

UMD Counseling Center

Research Analyst · College Park, MD

Sep 2024 – Present
  • Consolidated 4 to 5 campus data sources into a structured PostgreSQL and Python data model, validating 10+ years of FERPA and HIPAA-compliant records to enable demand modeling and operational reporting across institutional workflows.
  • Translated requirements from 40 to 50 clinicians into SQL, Excel, and Tableau dashboards, delivering quarterly utilization reports that improved scheduling efficiency by 25% and informed resource allocation decisions.
  • Trained Random Forest and Logistic Regression models on EHR-style appointment data to predict no-show risk, achieving 84.7% accuracy with ROC-AUC 0.81 and cutting manual effort by 30% across 3 automated workflows.
  • Applied NLP (TF-IDF, sentiment analysis) to 4,000+ student feedback responses and ran t-tests and chi-square analyses, surfacing service quality patterns presented via Power BI to the VP for accreditation reporting.
PythonMySQLTableauPandas
Interlinked Corp

Data Engineer Intern · Remote

May 2025 – Sep 2025
  • Architected a Medallion lakehouse on AWS S3 using Kafka for ingestion, PySpark for distributed transformation, and AWS Glue for orchestration, processing 2M+ daily records and cutting ETL latency by 85% (3 hrs to 25 min).
  • Established GitHub Actions CI/CD pipelines for nightly data validation runs, artifact versioning, and failure alerts, cutting p95 inference latency under 350 ms by streamlining feature pipelines and adding prediction caching.
  • Benchmarked Redshift schema configurations including sort keys and distribution styles via A/B testing, improving dashboard query performance by 40% and accelerating business reporting turnaround for BI consumers.
  • Instrumented production pipeline monitoring via AWS CloudWatch and Airflow metrics across 5 DAGs, performing root cause analysis on incident signals to reduce unplanned downtime and maintain SLA compliance.
PythonApache AirflowPostgreSQLGeoPandasDocker
The Coding School

Teaching Assistant · Data Science Research Program

Jun 2025 – Aug 2025
  • Supported 20+ high school students through end-to-end data science research in Python, Scikit-learn, and TensorFlow, accompanying them from initial EDA through model evaluation and final presentations.
  • Wove Git/GitHub into the core curriculum and walked students through regression, classification, and clustering workflows, contributing to a 15% improvement in overall project quality by the end of the program.
  • Facilitated workshops on experiment design and error analysis, and ran mock client reviews modelled on consulting practice - helping students articulate their methodology with clarity and professional confidence.
PythonScikit-learnTensorFlowJupyterGit
Open Weaver

Generative AI Intern · Chennai, India

Jul 2023 – Sep 2023
  • Conceived and delivered a voice-to-image generator using GANs and CLIP in TensorFlow, achieving 95% accuracy in mapping voice inputs to visual outputs across large and varied test datasets.
  • Reduced model training time by 30% through targeted hyperparameter tuning, regularization strategies, and GPU acceleration preserving 98% accuracy and cross-dataset scalability throughout.
  • Assembled large-scale training datasets via web scraping with Beautiful Soup and maintained SQL/MySQL databases to keep AI pipelines reproducible, well-documented, and accessible to the broader team.
GANsCLIPTensorFlowHugging FacePower BI
GANfinity.AI

Data Analyst · Remote, India

Dec 2022 - Jun 2023
  • Developed and deployed an AI-enabled Fintech B2B cloud application with ML-powered financial risk prediction, embedding anomaly detection models that elevated fraud detection capabilities by 90%.
  • Evaluated Gradient Boosting, XGBoost, and Random Forest classifiers against one another, then applied A/B testing on transaction flows to identify which approaches meaningfully improved conversion outcomes.
  • Revamped SQL-based ETL processes for centralized data management, tightening ingestion and transformation logic so downstream analytics teams consistently had clean, reliable data at their disposal.
XGBoostRandom ForestSQLTableauPython

Publications

Best Paper Award

Efficient Deep Fake Image Detection Using Dense CNN Architecture

International Research Conference on Computing Technologies for Sustainable Development (IRCCTSD 2024)
Communications in Computer and Information Science, vol 2361. Springer, Cham.

Deepfakes innovative manipulations of digital visual content using deep learning methods have emerged as a significant threat, raising concerns about misinformation and privacy violations. Their influence spans social media, political discourse, and beyond, highlighting the urgent need for robust detection tools. This research addresses the deepfake threat through a thorough examination of binary classification techniques. Focused on distinguishing genuine from manipulated images, the study leverages diverse datasets to train and evaluate methods utilizing Convolutional Neural Networks (CNNs) with emphasis on spatial feature extraction. Experimental results demonstrate the model's effectiveness in detecting manipulated images across various scenarios, achieving 97% accuracy on unseen data.

Certifications & Awards

Awards & Recognition

Best Paper Award · ICCTSD-2024

"Efficient Deepfake Image Detection Using Dense CNN Architecture"

Top 10 · Proglint CV Hackathon 2023

Ranked in the top 10 out of 2,500 participants in Proglint's Alliance University Computer Vision Hackathon.

Want to Know More?

Download my full resume to see my complete experience, skills, and qualifications.

Download Resume

Get In Touch

I'm always open to interesting conversations, collaborations, or new opportunities.