Disease Prediction System

Aug 2023 to Feb 2024

Live preview blocked by site, visit directly

Overview

This is an AI-powered health tool that tackles disease prediction from three angles: predicting diseases from symptoms across 41 conditions, assessing risk for diabetes, heart disease, and stroke, and flagging weather-driven disease outbreaks like dengue, malaria, and chikungunya.

You pick from 130+ symptoms and the system runs them through an ensemble of Random Forest and Gradient Boosting classifiers. It gives you ranked predictions with confidence percentages, shows which symptoms matched, and explains everything in plain English. The health risk models take your clinical measurements and return clear risk levels with practical next steps.

Key Features

Symptom Checker: select from 130+ symptoms, get ranked predictions across 41 diseases with confidence scores
Health Risk Assessment: separate models for Type 2 Diabetes (90.5%), Heart Disease (88.8%), and Stroke (94.7%)
Weather Disease Alerts: predicts dengue, malaria, chikungunya risk from temperature, humidity, and rainfall
Explainable predictions that show which symptoms matched and why each condition was flagged
Risk factor identification with tailored health recommendations for each assessment
5 region types supported for weather alerts: Tropical, Subtropical, Temperate, Arid, Mediterranean

How It's Built

The system uses a layered architecture:

Frontend: React 18 SPA with TailwindCSS, Framer Motion animations, Recharts for data visualization, and React Select for symptom multi-select input
Backend: FastAPI with Pydantic validation and Uvicorn server. One router per feature domain (symptoms, risk, weather) with a clean service layer pattern
ML Pipeline: Ensemble of Random Forest + Gradient Boosting for symptom prediction (97.2% accuracy). Separate RF models for diabetes, heart, and stroke risk. Multi-output RF classifier for weather-disease correlations (93–95% accuracy)
Infrastructure: Models trained as part of Docker build, so no separate model storage needed. Containerized with Docker Compose for local development

Interesting Challenges

Ensemble Design: Combining multiple classifiers improved accuracy, but most gains came from dataset design, not algorithm tuning. Balancing synthetic data to mirror real-world medical distributions was the real engineering challenge
Multi-Output Classification: Weather-disease prediction required predicting three disease risks simultaneously. A multi-output Random Forest handled correlated outputs cleanly while maintaining 93–95% accuracy per disease
Explainability: Health predictions need transparency. Each result shows which specific inputs drove the prediction, building trust in the system's recommendations

Screenshots

Risk Dashboard

Prediction Charts