Top Manifesto Principles About Experience Work Writing Currents Toolkit Contact
 AI Engineer · Senior Data Scientist · Dallas, TX

I engineer AI that earns its place in production.

Eight years of shipping LLM, RAG, NLP, and MLOps systems into businesses that need them to work — quietly, accurately, and under audit.

Get in touch Download résumé neeraj.agarwala151@gmail.com
Now in Dallas
Currentlyshipping a vector-search retrieval layer
Scroll to explore
A short manifesto

I don't build AI to impress. I build it to do work no one else wants to do — protect sensitive records, surface revenue hiding in unstructured data, and make predictions operations teams actually act on.

The interesting question is never can the model do it. It's whether anyone uses what the model produces, whether the system holds up under audit, and whether the metric it moves is one the business already cares about. Those are the only three things that matter.

Privacy isn't a compliance checkbox. It's an architectural decision made on the first day. Models that handle people's data should be designed like banks are designed — trust is the product, and everything else follows.

Neeraj Agarwala · Dallas, TX
$0M+
In settlements analyzed
& modeled
0%
Accuracy — BERT NER
PII detection model
0%
Fewer PII exposure
incidents
0+
ML & GenAI systems
shipped to production
0d → 2h
Breach detection time
via anomaly models
+0%
Revenue forecast
accuracy lift
+0%
Prediction lift via
federated learning
0%
Manual PII review
eliminated
01 / Principles

Six rules I engineer by.

These aren't slogans. They're the filter every system passes through before I'd put my name on it. Most failed AI projects I've seen broke one of these six rules first — usually rule five.

01
Compliance is architecture.
Privacy, access control, and auditability are not bolt-ons. They live in the system diagram alongside the model — or they don't live at all. The federated learning architecture I designed didn't add privacy to a model; it made the model possible because of privacy. Data never moved. That's the entire point.
02
Every model has a number.
A model that doesn't move a metric the business already tracks is a research project, not a product. Settlement success rate. PII exposure incidents. Churn. Forecast variance. Pick the number first. Build backwards. If you can't name the number, don't build the model.
03
Production is the only proof.
Notebooks don't ship. The bar is a system that survives real traffic, real edge cases, and the slow drift of real-world data — with monitoring honest enough to admit when it stops working. Ten production systems beats a hundred decks. Every time.
04
Ship the smallest useful version first.
A 70% solution shipping in week three beats a 95% solution presented in month six. The 70% solution tells you what 95% should actually look like — and half the time the answer is not what you thought. Iteration on real data is the only feedback loop that matters.
05
Talk to the people who use it.
Operations teams, analysts, marketing leads — they know the failure modes you'll never see in evaluation. They know the inputs that look fine and produce nonsense. Embedded work beats handoff every time. This is the rule most teams break first, and most don't recover.
06
Models drift. Keep them honest.
The world the model was trained on isn't the world the model lives in. Drift detection, calibration checks, and evaluation harnesses aren't optional infrastructure — they're the difference between a system that ages well and one that quietly poisons decisions for six months before anyone notices.
02 / About

I build AI systems that do things that matter — systems that protect millions of sensitive records, surface revenue hiding in unstructured data, and make predictions operations teams actually act on.

Over eight years, I've shipped production ML across the full stack: BERT-based NER pipelines that auto-tag PII at 97.3% accuracy across 5M+ records, LLM-powered RAG tools that cut manual acquisition effort for marketing teams, federated learning architectures that train across siloed data without moving a single sensitive record, and anomaly-detection systems that slashed breach detection time from 14 days to under 2 hours.

My engineering instinct is to build things that scale and comply — AWS SageMaker, Lambda, Step Functions, KMS. My product instinct is to connect model outputs to decisions that move numbers: churn, conversion, settlement success, cost. Compliance isn't a constraint — it's a design requirement.

I publish what I learn. The AI Career Radar series on LinkedIn — eight guides, 250+ pages — is the resource I wish I'd had when I started. If something in my head might save someone a week of debugging, it belongs in writing.

Based inDallas, Texas
Experience8+ years
FocusLLMs · RAG · NLP · MLOps
RecognitionOutstanding · Making IT Happen Awards
EducationM.S. Business Analytics — UT Dallas
LanguagesPython · SQL · PL/SQL · JavaScript
03 / Experience

Where I've shipped.

Data Scientist / AI Engineer
  • Own the full AI lifecycle — problem framing to production — having built and deployed 10+ ML & GenAI systems protecting sensitive data, optimizing lead channels, and improving settlement outcomes.
  • Architected a GPT-based PII redaction pipeline (Microsoft Presidio + AWS Bedrock, serverless + SageMaker) processing 50K+ documents/month — 92% fewer PII exposure incidents while keeping documents analytically usable.
  • Built a BERT-based NER model (spaCy) detecting SSNs, account & routing data across intake records — 97.3% accuracy, 80% less manual review, deployed via AWS Lambda into reporting dashboards.
  • Designed a TensorFlow Federated system training settlement-prediction models across 3 data centers with encrypted updates — +18% accuracy with zero transfer of 2M+ sensitive records.
  • Stood up an anomaly-detection system cutting breach-detection time from 14 days to under 2 hours.
  • Built Markov-chain attribution on 1.2M+ touchpoints (AWS Glue + TensorFlow), uncovering a 35% lift from underweighted channels.
  • Built survival-analysis pipelines that improved client retention 24% and revenue-forecast accuracy 31% across $200M+ in settlements.
Outstanding AwardMaking IT Happen Award
Data Scientist
  • Built predictive models (ML with Python) that deepened customer relationships, strengthened longevity, and personalized interactions.
  • Ran n-gram analysis on Google AdWords data for Marketing — saved the company >$500K.
  • Cut operating cost >$100K by matching legacy products to new SKUs and optimizing on-hand inventory.
Analytics, Reporting Infrastructure
  • Optimized ETL stored procedures so reporting infrastructure stayed within SLA.
  • Built KPI dashboards across Oracle, SAP BW & QVDS; scheduled Qlik Sense QVD extracts.
  • Wrote PL/SQL stored procedures supporting cross-functional data integration.
Analytics Delivery Services
  • Missing-value imputation & outlier detection (Pandas, PySpark) to reduce customer complaints.
  • Real-time analysis & visualization (Hive, Zeppelin, SparkSQL, Seaborn, Plotly) to root-cause complaints.
  • Built CI/CD pipelines (Jenkins, GitHub, Docker, Maven, Ansible) to speed product releases.
Senior Technical Analyst → Technical Analyst
  • Recommended an inventory solution on a telecom project cutting processing time 20%; automated client-side validation in Advanced Excel for $15K annual savings.
  • Built custom Tableau / Excel dashboards; delivered Big Data initiatives on the Hadoop ecosystem.
  • Wrote PL/SQL procedures (MySQL, SSRS, VB6) improving performance 15% and meeting SLA response times.
Chapter II
Now, the work.
04 / Selected Work

Systems I'm proud of.

01 — Featured

GPT-Based PII Redaction Pipeline

A production GenAI pipeline combining Microsoft Presidio and AWS Bedrock, deployed via serverless orchestration and SageMaker, automatically redacting PII from 50K+ debt-settlement documents per month — keeping records analytically usable while sharply reducing exposure risk in a heavily regulated domain.

92% fewer PII exposure incidents · 50K+ docs/month
GPTAWS BedrockPresidioSageMakerServerless
02

Semantic Search & Vector Retrieval

A production RAG retrieval layer using sentence-transformer embeddings, an HNSW-indexed vector store, and hybrid (vector + BM25) search with a cross-encoder re-ranking pass over the top 50 candidates. Chunking tuned for legal–finance docs so boundaries don't break across clauses.

Powers the RAG that cut manual acquisition effort for marketing
RAGHNSWSentence EmbeddingsHybrid SearchBM25Cross-encoder Re-ranking
03

BERT NER for PII Detection

A spaCy / BERT named-entity model detecting SSNs, account and routing numbers across intake records, deployed via AWS Lambda and feeding clean data straight to reporting dashboards.

97.3% accuracy · 80% less manual review
BERTspaCyAWS LambdaNER
04

Federated Learning System

TensorFlow Federated training settlement-prediction models across 3 data centers with encrypted model updates and cloud key management — improving accuracy without ever moving 2M+ sensitive records.

+18% over single-source baselines
TF FederatedPrivacyKMSDistributed ML
05

Anomaly Detection for Data Security

Real-time anomaly-detection layer over data-access patterns, surfacing exfiltration risk and integrity issues fast enough that the security team can act before damage compounds.

14 days → under 2 hours to detection
Anomaly DetectionStreamingSecurityPython
06

Survival Analysis & Retention

Survival-analysis pipelines modeling client lifetime and settlement risk across $200M+ in settlements — lifting retention and tightening revenue forecasts the business plans against.

+24% retention · +31% forecast accuracy
Survival AnalysisForecastingPythonRetention
07

Markov-Chain Attribution

Multi-touch attribution across 1.2M+ marketing touchpoints via AWS Glue ETL and TensorFlow sequence modeling — exposing channels the business had been underweighting.

35% lift uncovered from underweighted channels
Markov ChainsAWS GlueTensorFlowAttribution
05 / Writing & Thought Leadership

Teaching the field forward.

I publish deep interview guides on the systems I build with — RAG, embeddings, LangGraph memory, MCP, fine-tuning, AI governance. Each one is the resource I wish I'd had when I started.

Published guides8
Total pages250+
LinkedIn followers3.9K
Drag to browse
G/01
30 Embeddings Interview Questions
NLP · Vectors32 pages
G/02
30 LangGraph Memory Questions
Agents32 pages
G/03
30 Fine-Tuning AI Models Questions
LoRA · DPO · RLHF35 pages
G/04
30 MCP Server Questions
Architecture32 pages
G/05
30 LangChain & LangGraph Questions
Orchestration32 pages
G/06
30 AI Governance Interview Questions
Compliance32 pages
G/07
30 AI Fundamentals — Helpful in 2026
For Everyone32 pages
G/08
AI Career Radar — Ongoing Series
WeeklyLinkedIn
06 / Currents

What I'm thinking about right now.

A field this fast moves in waves. These are the ones I'm sitting closest to in 2026 — the ones I think will reshape how AI gets built in production over the next eighteen months.

Active study

Agent memory at scale

How short-term checkpointing versus long-term stores in LangGraph change agent behaviour as conversations stretch from minutes into weeks. The boundary between RAM and disk for an agent isn't obvious — and getting it wrong is expensive.

Building toward

Privacy-preserving fine-tuning

DPO and LoRA on sensitive corpora without leaking training data. Federated learning solved the training-distribution problem; the open question now is what happens when the model itself memorises what it shouldn't.

Watching closely

MCP as the integration layer

If Model Context Protocol holds, the N×M tool-integration problem collapses to N+M. That's not a small refactor — it's a redesign of how every internal AI system at every enterprise gets wired together.

Daily decision

The vector-index trade-off

HNSW for recall, IVF-PQ for memory, flat for ground truth — and re-ranking layered on top of any of them. Picking the wrong index for the corpus shape is the silent killer of RAG quality. Most teams discover this six months too late.

Persistent obsession

Retrieval evals, not vibes

RAG quality is still mostly judged by feel. The teams that turn precision-at-k, recall, faithfulness, and groundedness into routine dashboards — not occasional spot-checks — are the ones whose enterprise GenAI actually compounds over time.

Quietly maturing

Hybrid search beats pure vector

Pure semantic search misses exact-match queries — account numbers, statutes, SKUs, error codes — that BM25 finds trivially. The interesting work right now is in fusion: reciprocal rank, learned sparse encoders, cross-encoder re-ranking. The boring answer wins.

07 / Toolkit

What I build with.

A LLM & Generative AI
LLMs (GPT, BERT)RAGAWS BedrockLangChainLangGraphNERLLM EvaluationPrompt EngineeringFine-tuning (LoRA, DPO)spaCyPresidio
C ML, Stats & Modeling
PyTorchTensorFlowTF Federatedscikit-learnSurvival AnalysisMarkov ChainsAnomaly DetectionForecasting
D Cloud & MLOps
AWS SageMakerBedrockLambdaStep FunctionsGlueKMSS3DockerJenkinsCI/CDMLflowAnsible
E Analytics & Experimentation
A/B TestingCausal InferenceAttributionKPI DefinitionCohort AnalysisPySparkHadoop
F Languages & Data
PythonSQLPL/SQLJavaScriptJavaTableauPower BIQlikOracleMongoDBSAP BWETL
08 / Education & Recognition

Where it started.

Education

The University of Texas at Dallas
M.S. Business Analytics — Data Science
2017 — 2019 · GPA 3.72
Teaching Assistant — Web Analytics, under Prof. Amit Mehra
B. M. S. College of Engineering
B.E. Computer Science
2011 — 2015 · GPA 3.8

Recognition

Outstanding Award
Landmark Management Group
Recent
Making IT Happen Award
Landmark Management Group
Recent
Best Application of Analytics
The University of Texas at Dallas
2019
CSI Certification — Online Examination System (Java)
Computer Society of India
2013
09 / Contact

Let's talk.

neeraj.agarwala151@gmail.com

If you're building AI infrastructure, operationalising LLMs, or need ML that holds up under compliance scrutiny — I'd be glad to hear about it.