AI/ML Engineer · Builder · Researcher · Visakhapatnam, India

Bobbili Somanadh
Vasudevara Sasi Sundar

I build reliable LLM systems, run controlled AI experiments, and automate operations for accounting firms through FirmRunner.

View FirmRunner See Projects Download Resume (PDF)

15+ systems shipped 17+ OSS contributions BITS Pilani Pre-seed founder

Education & Experience

BITS Pilani

Research Intern

Microsoft

TechSaksham AI

SAP

TechSaksham AI

Deloitte

Job Simulation

Research Home

Research Focus

I focus on agentic AI reliability, reasoning quality, and behavior under real constraints.

1. Agentic AI systems and reasoning

2. How LLM agents perform multi-step tasks

3. Reliability and failure modes in agent workflows

4. Trade-offs between autonomy and control in AI systems

Current Work

Designing and evaluating LLM agents with controlled experiments

Research Style

Question-first workflows, baseline comparisons, and metrics

Education

BTech AIML, Year 3 of 4, CGPA 8.2 — BITS Pilani intern

Location

Visakhapatnam, Andhra Pradesh, India

Platforms

LinkedIn ↗ X ↗ Substack ↗ GitHub ↗

sasisundhar2211@gmail.com ↗

Product

FirmRunner

AI operations platform that automates client intake, document follow-ups, and invoice reminders for accounting firms — replacing $90K/year in manual work.

Founder / CEO Pre-seed stage

Visit FirmRunner Book a Demo

What it automates

Client intake and onboarding workflows
Document request and follow-up sequences
Invoice reminders and payment tracking
Status updates to clients without manual emails
Internal task routing between team members

Research Experiments

Experiments

Controlled tests on how LLM agents reason, use tools, and fail under complexity.

Each experiment follows the same structure: Question → Setup → Result → Insight. Results are from early internal evaluations and should be treated as directional, not benchmark claims.

Evaluation style: baseline comparison · controlled inputs · measurable output

Tool-use vs Direct Reasoning

Question: Does tool use improve reasoning accuracy? Result: Tool-enabled runs were more consistent on structured tasks but less adaptable on ambiguous prompts. Insight: tool use improves precision but can reduce flexibility.

Multi-agent vs Single-agent

Question: Do multiple agents improve task completion? Result: Multi-agent setups handled complex workflows better, but introduced latency and coordination failure modes. Insight: capability can rise faster than stability.

Prompt Depth vs Hallucination

Question: How does prompt depth affect hallucinations? Result: Longer instruction chains increased drift and unsupported claims. Insight: instruction density has a tipping point.

Retrieval Strategy Comparison

Question: Which retrieval setup reduces drift? Result: Hybrid retrieval produced more stable reasoning than naive top-k retrieval in document-heavy tasks. Insight: retrieval quality dominates downstream reasoning.

Autonomy vs Control

Question: How much control is optimal? Result: Adding guardrails reduced invalid actions but slowed completion. Insight: reliability gains are usually paid for in speed.

Current experiment tracks documented on this page

Research log

Recurring failure patterns observed across projects

Failure notes

Project case studies rewritten with problem-to-insight format

Portfolio audit update

View Experiment Code Discuss Research

Project Cases

Projects rewritten as
research case studies.

PythonLLMsNLPFastAPI

40% reduction in manual correction cycles

Procurement Agent System

Problem: Contract review quality varied across reviewers and document types.

Approach: LLM pipeline for field extraction and pricing anomaly checks.

Experimentation: Compared prompt styles and validation rules on representative procurement samples.

Results: Produced more consistent extraction outputs and reduced manual correction in repeated runs.

Insights: Post-processing constraints improved reliability more than prompt verbosity.

GitHub

FastAPILangChainTool UsePython

2× structured-task completion vs single-agent baseline

Multitool LLM Agent

Problem: Multi-step agent behavior is hard to evaluate without a controlled setup.

Approach: API-integrated agent chaining tools and reasoning steps via FastAPI.

Experimentation: Tested mixed task sets across structured and ambiguous categories.

Results: Better structured-task performance, with clear weaknesses on ambiguous tasks.

Insights: Tool selection policy is a major failure point.

GitHub

scikit-learnNLPFlaskTF-IDF

Consistent ranking scores vs high manual reviewer variance

AI Resume Ranking System

Problem: Resume scoring changed across reviewers and sessions.

Approach: TF-IDF + ML ranking pipeline exposed through Flask API.

Experimentation: Evaluated multiple model variants on labeled resume sets.

Results: Improved ranking consistency versus manual-only review in internal evaluation.

Insights: Feature calibration matters more than model family choice.

GitHub

PythonAnalyticsScikit-learnClustering

Fewer low-confidence cluster assignments vs manual baseline

User Profiling & Segmentation

Problem: Manual user segmentation produced unstable clusters.

Approach: Feature-engineering and clustering pipeline with stability scoring.

Experimentation: Benchmarked cluster quality across multiple feature sets on a medium-sized user dataset.

Results: Produced more stable clusters and fewer low-confidence assignments than the initial baseline.

Insights: Domain feature design outperformed algorithm switching.

GitHub

RAGMCPEvaluationCI/CD

Research Notes

Note 01

Apr 2026

Why LLMs fail in long workflows

Observed failure compounding in chained tasks. Error-recovery steps helped, but did not eliminate cascading errors. The pattern holds across 4+ project types tested.

Updated weekly

Read on Substack

Note 02

Mar 2026

Hallucination patterns in tool calls

Hallucinations rose when tool responses were sparse or under-specified. Schema-constrained outputs improved argument validity by reducing ambiguous inputs.

Updated weekly

Read on Substack

Note 03

Feb 2026

Patterns in agent breakdown

Frequent breakdown chain: weak retrieval, premature tool selection, and overconfident final synthesis under uncertainty. Checkpoints help but add latency.

Updated weekly

Read on Substack

Active Research Themes

Reasoning Evaluation Failure Taxonomy Hallucination Analysis Tool Selection Policy RAG Reliability Autonomy vs Control Context Handling Agent Coordination

High Signal

Failures and Learnings

Failure 01

Agent fails when context exceeds limit

What I tried: Chunking + retrieval.
What happened: Partial improvement only; long workflows still degraded.
Learning: Context handling is a core bottleneck.

Failure 02

Over-planning without execution

What I tried: Added stricter planning scaffolds.
What happened: More plan steps but weaker completion behavior.
Learning: Planning depth must be bounded by execution checkpoints.

Failure 03

Cross-agent coordination drift

What I tried: Multi-agent decomposition with shared memory.
What happened: Coverage improved, but contradictory outputs increased.
Learning: Coordination protocols matter as much as specialization.

Experience

Applied research and engineering experience

Dec 2025 – Present

Founder, GIANT / FirmRunner

GIANT (Pre-Seed Stage)

Led product and agent architecture through iterative experiments. Defined reliability, latency, and completion metrics before production integration.

2023 – Present

Independent AI/ML Developer

Self-directed — 15+ systems shipped

Delivered 15+ ML/LLM systems with explicit evaluation loops across RAG, tool-use agents, and function-calling workflows. Focused on reproducibility with Docker and CI/CD.

May 2025 – Jul 2025

AI/ML Intern

BITS Pilani, Hyderabad Campus

Built and optimized ML models for decision-support tasks under academic mentorship, applying systematic EDA, feature engineering, and cross-validation.

Feb 2025 – Mar 2025

AI Intern, TechSaksham

Microsoft and SAP via Edunet Foundation / AICTE

Designed and debugged an AI prototype, presented findings to Microsoft and SAP experts, and produced a research-backed technical proposal.

Feb 2025

Data Analyst (Job Simulation)

Deloitte

Completed data and forensic analysis simulation, delivering Tableau-based insight reporting and classification-driven conclusions.

Jan 2025 – May 2025

Undergraduate Research Contributor

PSCMR College of Engineering and Technology

Co-authored an applied AI/ML academic research paper. Led literature analysis, technical writing, and structured methodology sections. Demonstrated ability to operate at research depth while maintaining parallel engineering output.

Jun 2023 – Dec 2023

Full-Stack Development Trainee

MentorKart — Visakhapatnam

Completed full-stack engineering training and deployed API-connected projects with database integration.

Sep 2023 – Present

BTech, Artificial Intelligence and Machine Learning

PSCMR College of Engineering and Technology, Vijayawada

CGPA 8.2. Coursework in LLMs, agentic workflows, deep learning, and full-stack development. Running GIANT in parallel with the degree.

Methods and Tooling

Technical stack for research execution

Languages & Frameworks

Python (primary)
FastAPI, Flask
LangChain, LlamaIndex
REST API design
JavaScript / Node.js

ML & AI

LLM evaluation & reasoning
RAG and hybrid retrieval
NLP pipelines & embeddings
Scikit-learn, PyTorch
Prompt engineering & policy

Cloud & Infra

Google Cloud Platform
Docker & CI/CD
PostgreSQL, MongoDB
Function-calling systems
API integrations

Tools & Platforms

A/B experiment design
Metric instrumentation
Tableau, Excel
Git, GitHub Actions
Error taxonomy tracking

Certifications and Achievements

Credentials and technical milestones

5-Day AI Agents Intensive Course

Google

Google Cloud Security Summit

Google Cloud — Asia Pacific

Google Cloud Training

Google Cloud Platform — compute, deployment, orchestration

TechSaksham AI Internship

Microsoft and SAP via Edunet Foundation / AICTE

Data Analyst Job Simulation

Deloitte — Tableau, Excel classification

JPMorgan Quantitative Research Simulation

JPMorgan Chase

Python Programming Training

Externsclub / AICTE

Introduction to MongoDB

MongoDB University

AI Fluency Certification

Verified AI concepts proficiency

Hackathon Participant

Devnovate and GDG Build With AI

17+ Open-Source Contributions

ML and automation repositories on GitHub

GenAI Prototyping Group Lead

3-member team — experimentation and evaluation cycles

Get in Touch

Open to AIML roles and accounting firm partnerships.

I welcome conversations with researchers, engineers, and teams working on LLM reliability, agent design, and evaluation infrastructure — or accounting firms looking to automate their operations with FirmRunner.

Send an Email LinkedIn X / Twitter Substack GitHub FirmRunner

Bobbili SomanadhVasudevara Sasi Sundar

Research Focus

FirmRunner

Experiments

Projects rewritten asresearch case studies.

Research Notes

Failures and Learnings

Applied research and engineering experience

Technical stack for research execution

Credentials and technical milestones

Open to AIML roles and accounting firm partnerships.

Bobbili Somanadh
Vasudevara Sasi Sundar

Projects rewritten as
research case studies.