Lekshmi Thulasidharan

Data Scientist | AI/ML/NLP Specialist

LinkedIn

About

Highly accomplished PhD candidate with over 5 years of experience in Data Science, specializing in AI, Machine Learning, and Natural Language Processing. Proven ability to build and deploy complex ML solutions using Python, TensorFlow, and Hugging Face, consistently delivering production-ready tools and leading impactful research initiatives.

Work Experience

Data Scientist Intern

Auxillium Health

Jun 2025 - Dec 2025

Remote, N/A, US

Currently developing and optimizing AI-powered conversational agents for healthcare applications, focusing on robust data retrieval and model refinement.

  • Built a Retrieval-Augmented Generation (RAG)-based chatbot for Wound Tele.AI Pro, leveraging LlamaIndex, ChromaDB, and Hugging Face to provide evidence-backed answers to wound care queries.
  • Developed a semantic chunking and preprocessing pipeline for dense passage retrieval across PubMed articles, enhancing data preparation for AI models.
  • Evaluated 5+ embedding models and chunking strategies on 50+ wound-related queries, improving precision@5 by 17%.
  • Prototyped a local QA tool using Haystack, ChromaDB, Hugging Face, and Streamlit for efficient offline testing and user validation.
  • Collaborated with wound care experts to validate chatbot output and refine domain alignment for patient use cases, demonstrating effective leadership and cross-functional teamwork.

Graduate Researcher

University of Wisconsin Madison

Aug 2019 - Dec 2025

Madison, WI, US

Led an 8-member research team in astrophysics, applying advanced data science and statistical modeling to analyze large-scale stellar datasets and generate significant scientific insights.

  • Led an 8-member team to author a peer-reviewed study on vertical kinematics of the Milky Way, utilizing Gaia survey data.
  • Queried, cleaned, and integrated 500,000+ stellar records using SQL and Python; applied bootstrapping, correlation analysis, and hypothesis testing to extract vertical motion patterns.
  • Validated findings with greater than 3-sigma confidence, demonstrating end-to-end execution from data engineering to insight generation.

Summer Data Intern

Tata Institute of Fundamental Research

May 2018 - Aug 2018

Mumbai, Maharashtra, IN

Engineered features and applied machine learning models to classify particles from a large-scale simulated detector dataset, achieving high accuracy.

  • Engineered features from a 500k-instance simulated Belle detector dataset, applying a Boosted Decision Tree model to classify low-momentum muons from background particles.
  • Achieved 85% accuracy in classifying low-momentum muons, significantly improving particle identification.

Education

Physics

University of Wisconsin – Madison

3.82/4.00 GPA

Sep 2019 - Dec 2025

Madison, WI, US

Courses

  • Foundations of Data Science
  • Data Management in Data Science
  • Theory and Methods of Mathematical Statistics

Projects

PathoPredictX-GC: Auditing and Interpretability Tool for Gastric Cancer Histopathology

May 2025 - May 2025

Developed and deployed an auditing and interpretability tool for gastric cancer histopathology, leveraging deep learning to classify gastric cancer tumor microenvironment tissue images and enhance diagnostic trust.

Cloud-Based ELT Pipeline and Trading Analytics

Jan 2025 - Apr 2025

Designed and implemented a robust cloud-based ELT pipeline to integrate diverse data sources into Snowflake, enabling comprehensive trading performance analytics.

Skills

Languages

  • Python
  • SQL

Frameworks & Libraries

  • Pandas
  • NumPy
  • Scikit-learn
  • TensorFlow
  • Hugging Face Transformers
  • Streamlit
  • Power BI (basic)
  • Git
  • LlamaIndex
  • Haystack
  • ChromaDB

ML & GenAI

  • Deep Learning
  • NLP
  • Retrieval-Augmented Generation (RAG)
  • LLM Fine-Tuning
  • Embeddings
  • Vector Search

Tools & Platforms

  • Docker
  • Google Cloud
  • Snowflake
  • dbt
  • Airbyte
  • GitHub

Data Handling & Modeling

  • Data Wrangling
  • Data Cleaning
  • Feature Engineering
  • ETL Pipelines
  • Statistical Modeling