AI/ML Engineer · Builder · Researcher  ·  Visakhapatnam, India

Bobbili Somanadh
Vasudevara Sasi Sundar

I build reliable LLM systems, run controlled AI experiments, and automate operations for accounting firms through FirmRunner.

15+ systems shipped 17+ OSS contributions BITS Pilani Pre-seed founder
Education & Experience
Research Home

Research Focus

I focus on agentic AI reliability, reasoning quality, and behavior under real constraints.

1. Agentic AI systems and reasoning

2. How LLM agents perform multi-step tasks

3. Reliability and failure modes in agent workflows

4. Trade-offs between autonomy and control in AI systems

Current Work
Designing and evaluating LLM agents with controlled experiments
Research Style
Question-first workflows, baseline comparisons, and metrics
Education
BTech AIML, Year 3 of 4, CGPA 8.2 — BITS Pilani intern
Location
Visakhapatnam, Andhra Pradesh, India
Product

FirmRunner

AI operations platform that automates client intake, document follow-ups, and invoice reminders for accounting firms — replacing $90K/year in manual work.

Founder / CEO Pre-seed stage
What it automates
  • Client intake and onboarding workflows
  • Document request and follow-up sequences
  • Invoice reminders and payment tracking
  • Status updates to clients without manual emails
  • Internal task routing between team members
Research Experiments

Experiments

Controlled tests on how LLM agents reason, use tools, and fail under complexity.

Each experiment follows the same structure: Question → Setup → Result → Insight. Results are from early internal evaluations and should be treated as directional, not benchmark claims.

Evaluation style: baseline comparison · controlled inputs · measurable output

01
Tool-use vs Direct Reasoning
Question: Does tool use improve reasoning accuracy? Result: Tool-enabled runs were more consistent on structured tasks but less adaptable on ambiguous prompts. Insight: tool use improves precision but can reduce flexibility.
02
Multi-agent vs Single-agent
Question: Do multiple agents improve task completion? Result: Multi-agent setups handled complex workflows better, but introduced latency and coordination failure modes. Insight: capability can rise faster than stability.
03
Prompt Depth vs Hallucination
Question: How does prompt depth affect hallucinations? Result: Longer instruction chains increased drift and unsupported claims. Insight: instruction density has a tipping point.
04
Retrieval Strategy Comparison
Question: Which retrieval setup reduces drift? Result: Hybrid retrieval produced more stable reasoning than naive top-k retrieval in document-heavy tasks. Insight: retrieval quality dominates downstream reasoning.
05
Autonomy vs Control
Question: How much control is optimal? Result: Adding guardrails reduced invalid actions but slowed completion. Insight: reliability gains are usually paid for in speed.
5
Current experiment tracks documented on this page
Research log
3
Recurring failure patterns observed across projects
Failure notes
4
Project case studies rewritten with problem-to-insight format
Portfolio audit update
View Experiment Code Discuss Research
Project Cases

Projects rewritten as
research case studies.

01
PythonLLMsNLPFastAPI
40% reduction in manual correction cycles
Procurement Agent System

Problem: Contract review quality varied across reviewers and document types.

Approach: LLM pipeline for field extraction and pricing anomaly checks.

Experimentation: Compared prompt styles and validation rules on representative procurement samples.

Results: Produced more consistent extraction outputs and reduced manual correction in repeated runs.

Insights: Post-processing constraints improved reliability more than prompt verbosity.

02
FastAPILangChainTool UsePython
2× structured-task completion vs single-agent baseline
Multitool LLM Agent

Problem: Multi-step agent behavior is hard to evaluate without a controlled setup.

Approach: API-integrated agent chaining tools and reasoning steps via FastAPI.

Experimentation: Tested mixed task sets across structured and ambiguous categories.

Results: Better structured-task performance, with clear weaknesses on ambiguous tasks.

Insights: Tool selection policy is a major failure point.

03
scikit-learnNLPFlaskTF-IDF
Consistent ranking scores vs high manual reviewer variance
AI Resume Ranking System

Problem: Resume scoring changed across reviewers and sessions.

Approach: TF-IDF + ML ranking pipeline exposed through Flask API.

Experimentation: Evaluated multiple model variants on labeled resume sets.

Results: Improved ranking consistency versus manual-only review in internal evaluation.

Insights: Feature calibration matters more than model family choice.

04
PythonAnalyticsScikit-learnClustering
Fewer low-confidence cluster assignments vs manual baseline
User Profiling & Segmentation

Problem: Manual user segmentation produced unstable clusters.

Approach: Feature-engineering and clustering pipeline with stability scoring.

Experimentation: Benchmarked cluster quality across multiple feature sets on a medium-sized user dataset.

Results: Produced more stable clusters and fewer low-confidence assignments than the initial baseline.

Insights: Domain feature design outperformed algorithm switching.

RAGMCPEvaluationCI/CD
More on GitHub

Additional repositories include RAG systems, classifier pipelines, agent frameworks, and API services with repeatable evaluation workflows.

All Projects
Daily Thinking

Research Notes

Note 01
Apr 2026
Why LLMs fail in long workflows

Observed failure compounding in chained tasks. Error-recovery steps helped, but did not eliminate cascading errors. The pattern holds across 4+ project types tested.

Updated weekly
Read on Substack
Note 02
Mar 2026
Hallucination patterns in tool calls

Hallucinations rose when tool responses were sparse or under-specified. Schema-constrained outputs improved argument validity by reducing ambiguous inputs.

Updated weekly
Read on Substack
Note 03
Feb 2026
Patterns in agent breakdown

Frequent breakdown chain: weak retrieval, premature tool selection, and overconfident final synthesis under uncertainty. Checkpoints help but add latency.

Updated weekly
Read on Substack
Active Research Themes
Reasoning Evaluation Failure Taxonomy Hallucination Analysis Tool Selection Policy RAG Reliability Autonomy vs Control Context Handling Agent Coordination
High Signal

Failures and Learnings

Failure 01
Agent fails when context exceeds limit

What I tried: Chunking + retrieval.
What happened: Partial improvement only; long workflows still degraded.
Learning: Context handling is a core bottleneck.

Failure 02
Over-planning without execution

What I tried: Added stricter planning scaffolds.
What happened: More plan steps but weaker completion behavior.
Learning: Planning depth must be bounded by execution checkpoints.

Failure 03
Cross-agent coordination drift

What I tried: Multi-agent decomposition with shared memory.
What happened: Coverage improved, but contradictory outputs increased.
Learning: Coordination protocols matter as much as specialization.

Experience

Applied research and engineering experience

Dec 2025 – Present
Founder, GIANT / FirmRunner
GIANT (Pre-Seed Stage)

Led product and agent architecture through iterative experiments. Defined reliability, latency, and completion metrics before production integration.

2023 – Present
Independent AI/ML Developer
Self-directed — 15+ systems shipped

Delivered 15+ ML/LLM systems with explicit evaluation loops across RAG, tool-use agents, and function-calling workflows. Focused on reproducibility with Docker and CI/CD.

May 2025 – Jul 2025
AI/ML Intern
BITS Pilani, Hyderabad Campus

Built and optimized ML models for decision-support tasks under academic mentorship, applying systematic EDA, feature engineering, and cross-validation.

Feb 2025 – Mar 2025
AI Intern, TechSaksham
Microsoft and SAP via Edunet Foundation / AICTE

Designed and debugged an AI prototype, presented findings to Microsoft and SAP experts, and produced a research-backed technical proposal.

Feb 2025
Data Analyst (Job Simulation)
Deloitte

Completed data and forensic analysis simulation, delivering Tableau-based insight reporting and classification-driven conclusions.

Jan 2025 – May 2025
Undergraduate Research Contributor
PSCMR College of Engineering and Technology

Co-authored an applied AI/ML academic research paper. Led literature analysis, technical writing, and structured methodology sections. Demonstrated ability to operate at research depth while maintaining parallel engineering output.

Jun 2023 – Dec 2023
Full-Stack Development Trainee
MentorKart — Visakhapatnam

Completed full-stack engineering training and deployed API-connected projects with database integration.

Sep 2023 – Present
BTech, Artificial Intelligence and Machine Learning
PSCMR College of Engineering and Technology, Vijayawada

CGPA 8.2. Coursework in LLMs, agentic workflows, deep learning, and full-stack development. Running GIANT in parallel with the degree.

Methods and Tooling

Technical stack for research execution

Languages & Frameworks
  • Python (primary)
  • FastAPI, Flask
  • LangChain, LlamaIndex
  • REST API design
  • JavaScript / Node.js
ML & AI
  • LLM evaluation & reasoning
  • RAG and hybrid retrieval
  • NLP pipelines & embeddings
  • Scikit-learn, PyTorch
  • Prompt engineering & policy
Cloud & Infra
  • Google Cloud Platform
  • Docker & CI/CD
  • PostgreSQL, MongoDB
  • Function-calling systems
  • API integrations
Tools & Platforms
  • A/B experiment design
  • Metric instrumentation
  • Tableau, Excel
  • Git, GitHub Actions
  • Error taxonomy tracking
Certifications and Achievements

Credentials and technical milestones

5-Day AI Agents Intensive Course
Google
Google Cloud Security Summit
Google Cloud — Asia Pacific
Google Cloud Training
Google Cloud Platform — compute, deployment, orchestration
TechSaksham AI Internship
Microsoft and SAP via Edunet Foundation / AICTE
Data Analyst Job Simulation
Deloitte — Tableau, Excel classification
JPMorgan Quantitative Research Simulation
JPMorgan Chase
Python Programming Training
Externsclub / AICTE
Introduction to MongoDB
MongoDB University
AI Fluency Certification
Verified AI concepts proficiency
Hackathon Participant
Devnovate and GDG Build With AI
17+ Open-Source Contributions
ML and automation repositories on GitHub
GenAI Prototyping Group Lead
3-member team — experimentation and evaluation cycles
Get in Touch

Open to AIML roles and accounting firm partnerships.

I welcome conversations with researchers, engineers, and teams working on LLM reliability, agent design, and evaluation infrastructure — or accounting firms looking to automate their operations with FirmRunner.