Disease Prediction System
Aug 2023 to Feb 2024
Overview
This is an AI-powered health tool that tackles disease prediction from three angles: predicting diseases from symptoms across 41 conditions, assessing risk for diabetes, heart disease, and stroke, and flagging weather-driven disease outbreaks like dengue, malaria, and chikungunya.
You pick from 130+ symptoms and the system runs them through an ensemble of Random Forest and Gradient Boosting classifiers. It gives you ranked predictions with confidence percentages, shows which symptoms matched, and explains everything in plain English. The health risk models take your clinical measurements and return clear risk levels with practical next steps.
Key Features
- Symptom Checker: select from 130+ symptoms, get ranked predictions across 41 diseases with confidence scores
- Health Risk Assessment: separate models for Type 2 Diabetes (90.5%), Heart Disease (88.8%), and Stroke (94.7%)
- Weather Disease Alerts: predicts dengue, malaria, chikungunya risk from temperature, humidity, and rainfall
- Explainable predictions that show which symptoms matched and why each condition was flagged
- Risk factor identification with tailored health recommendations for each assessment
- 5 region types supported for weather alerts: Tropical, Subtropical, Temperate, Arid, Mediterranean
How It's Built
The system uses a layered architecture:
- Frontend: React 18 SPA with TailwindCSS, Framer Motion animations, Recharts for data visualization, and React Select for symptom multi-select input
- Backend: FastAPI with Pydantic validation and Uvicorn server. One router per feature domain (symptoms, risk, weather) with a clean service layer pattern
- ML Pipeline: Ensemble of Random Forest + Gradient Boosting for symptom prediction (97.2% accuracy). Separate RF models for diabetes, heart, and stroke risk. Multi-output RF classifier for weather-disease correlations (93–95% accuracy)
- Infrastructure: Models trained as part of Docker build, so no separate model storage needed. Containerized with Docker Compose for local development
Interesting Challenges
- Ensemble Design: Combining multiple classifiers improved accuracy, but most gains came from dataset design, not algorithm tuning. Balancing synthetic data to mirror real-world medical distributions was the real engineering challenge
- Multi-Output Classification: Weather-disease prediction required predicting three disease risks simultaneously. A multi-output Random Forest handled correlated outputs cleanly while maintaining 93–95% accuracy per disease
- Explainability: Health predictions need transparency. Each result shows which specific inputs drove the prediction, building trust in the system's recommendations