Ritvika Sonawane
Profile Picture

I'm a Senior Associate Data Scientist at Bank of New York, where I previously interned as an Applied AI Intern — my work on LLM prompt compression was accepted to the ICAIF Workshop on LLMs and Generative AI for Finance. I recently completed my Master of Science in Electrical and Computer Engineering (AI/ML Systems) at Carnegie Mellon University with a 4.00 GPA, and was a Graduate Fellow at the Carnegie Bosch Institute — Corporate Startup Lab. My research interests lie broadly in efficient machine learning and federated learning, and I worked in the LIONS research group led by Dr. Carlee Joe-Wong. I completed my B.Tech in Electrical and Electronics Engineering from NIT Andhra Pradesh, India. Prior to CMU, I worked as a Systems Engineer in Research at Tata Consultancy Services — Research, designing novel ultra wideband antennas and developing ML solutions to enhance the usability of Reconfigurable Intelligent Surfaces.


Experience

Bank of New York

Bank of New York

Senior Associate Data Scientist · Mar 2026 – Present

  • Working in digital assets custody, architecting an enterprise-wide Digital Assets Agent — a one-stop platform for all digital assets knowledge at BNY — using multi-agent orchestration, hybrid RAG, and prompt compression.
  • Developed and shipped production APIs, resolved critical bugs, and delivered new features in digital assets custody systems.

Applied AI Intern · Jun 2025 – Aug 2025

  • Built a unified text compression pipeline reducing token usage by 70% while preserving semantic fidelity (0.97 embedding similarity). Results accepted as a demo paper at the ICAIF Workshop on LLMs and Generative AI for Finance.
  • Developed Eliza-Trace, an optimization tool that improved agent accuracy by +10% through custom fine-tuning (LoRA, Q-Galore) and integration with Microsoft Trace.
Carnegie Bosch Institute

Carnegie Mellon University — Carnegie Bosch Institute

Graduate Fellow · Aug 2025 – Dec 2025

  • Assessed market gaps in cyber-physical AI and spatial computing for industrial automation, identifying LLM-assisted CNC operations as the highest-impact, lowest-friction opportunity.
  • Developed a phased roadmap for deploying operator-facing LLM assistants on CNC lines targeting zero-downtime integration and measurable OEE improvement.
Honda

Carnegie Mellon University — Corporate Startup Lab

Research Strategy Development for Honda's HALO Project · Jan 2025 – Apr 2025

  • Identified a critical market gap in aeroacoustics R&D through 180+ stakeholder outreach and 14 expert interviews across new mobility platforms (EVs, eVTOLs, drones).
  • Designed a go-to-market strategy for the Aeroacoustics Innovation Consortium around Honda's HALO wind tunnel, including a tiered membership model ($100K–$1M/year) and a 15-year roadmap projecting $57M cumulative net income.
Tata Consultancy Services

Tata Consultancy Services

Systems Engineer (Developer) · Jul 2021 – Nov 2023

  • Applied Python-based real-time optimization models to improve Antipodal Vivaldi Antenna bandwidth by 50% using NumPy, SciPy, and scikit-learn.
  • Built a full-stack web application with Flask, JavaScript/React, and REST APIs for real-time configuration of a Reconfigurable Intelligent Surface.

Publications

CompactPrompt: A Unified Pipeline for Prompt Data Compression in LLM Workflows

Joong Ho Choi, Jiayang Zhao, Jeel Shah, Ritvika Sonawane, Vedant Singh, Avani Appalla, Will Flanagan, Filipe Condessa
ICAIF Workshop on LLMs and Generative AI for Finance / [pdf]

Electronic beam-steering reflectarray antenna system with varactor diode embedded comb-shaped unit cell

Tapas Chakravarty, Poornima Surojia, Ritvika Sonawane, Sai Sarath Chandra Chaitanya Sayinedi, Meda Lakshmi Narayana, Soumya Chakravarty, Rowdra Ghatak
Patent Pending: US 2024/0372255A1 / [pdf]

Topologically modulated reflecting intelligent surfaces and method to enable sectoral area coverage under network applications

Amartya Banerjee, Soumya Chakravarty, Ritvika Sonawane, Poornima Surojia, Tapas Chakravarty, Rowdra Ghatak
Notice of Allowance received from the US Patent Office — Patent will be granted: US 2024/0364007A1 / [pdf]

Gradient Phase Profiled Reflecting Surface Design for Sectoral Sensing Application

Amartya Banerjee, Soumya Chakravarty, Ritvika Sonawane, Poornima Surojia, Tapas Chakravarty, Rowdra Ghatak
APSCON 2024 / [pdf]

Metasurface-Based Reconfigurable Intelligent Surface With Novel Comb-Shaped Unit Cell Design

Soumya Chakravarty, Poornima Surojia, Ritvika Sonawane, Tapas Chakravarty, Achanna Anil Kumar, Rowdra Ghatak
MAPCON 2023 / [pdf]


Projects

Movie Recommendation System: Machine Learning in Production

Spearheaded a team to engineer and deploy a personalized movie recommendation system (SVD, collaborative filtering, SQL, API, Docker), optimizing search relevance and user engagement. Implemented CI/CD pipelines, A/B testing, model monitoring using MLFlow and Grafana, and load balancing, ensuring 100% uptime in production. Built scalable ML pipelines with Docker following MLOps best practices, with automated model updates deployed behind a load balancer.

MHA-to-MLA Conversion for KV Cache Compression in Branchformer for ASR

Converted Branchformer ASR attention layers from MHA to MLA via SVD-based weight factorization and LoRA fine-tuning, achieving 50% KV cache reduction (70.31 MB → 35.16 MB) with no WER degradation on TED-LIUM2. Benchmarked MHA, MLA, and GQA compression strategies on LibriSpeech and TED-LIUM2, establishing MLA as the optimal memory-accuracy tradeoff for production ASR deployment.

Fine-Tuning LLaMA-2 with QLoRA

Fine-tuned LLaMA-2 using QLoRA, implementing 4-bit quantization to enable efficient model adaptation on consumer hardware while preserving high performance and optimizing memory usage and training speed. Designed and executed parameter-efficient fine-tuning with Low-Rank Adapters, minimizing computational overhead while achieving effective model specialization for real-world NLP applications.

Search Ranking System for Streaming Content

Developed a scalable search ranking system for streaming content, optimizing query relevance using TF-IDF, BM25, and BERT embeddings. The system improves content discovery by integrating collaborative filtering and content-based embeddings, leading to a 12% increase in NDCG score. A Flask API enables real-time ranked search queries, demonstrating efficient search optimization for streaming platforms.


Teaching Experience

Teaching Assistant for 17-685: Machine Learning in Production

Spring 2025, Carnegie Mellon University

Teaching Assistant for 18-667: Algorithms for Large-scale Distributed Machine Learning and Optimization

Fall 2024, Carnegie Mellon University

Teaching Assistant for 18-x13: Foundations of Computer Systems

Spring 2024, Carnegie Mellon University