Disease Prediction System
Aug 2023 — Feb 2024
Overview
Disease Prediction System is an AI-powered health companion that approaches disease prediction from three angles: symptom-based disease prediction across 41 conditions, health risk assessment for diabetes, heart disease, and stroke, and weather-driven disease alerts for dengue, malaria, and chikungunya.
Users select from 130+ symptoms and the system runs them through an ensemble of Random Forest and Gradient Boosting classifiers. Each prediction includes ranked probable diseases with confidence percentages, matching symptoms, and plain-English descriptions. The health risk models evaluate clinical measurements and return clear risk levels with actionable recommendations.
Key Features
- Symptom Checker — select from 130+ symptoms, get ranked predictions across 41 diseases with confidence scores
- Health Risk Assessment — separate models for Type 2 Diabetes (90.5%), Heart Disease (88.8%), and Stroke (94.7%)
- Weather Disease Alerts — predicts dengue, malaria, chikungunya risk from temperature, humidity, and rainfall
- Explainable predictions — shows which symptoms matched and why each condition was flagged
- Risk factor identification with tailored health recommendations for each assessment
- 5 region types supported for weather alerts: Tropical, Subtropical, Temperate, Arid, Mediterranean
How It's Built
The system uses a layered architecture:
- Frontend: React 18 SPA with TailwindCSS, Framer Motion animations, Recharts for data visualization, and React Select for symptom multi-select input
- Backend: FastAPI with Pydantic validation and Uvicorn server. One router per feature domain (symptoms, risk, weather) with a clean service layer pattern
- ML Pipeline: Ensemble of Random Forest + Gradient Boosting for symptom prediction (97.2% accuracy). Separate RF models for diabetes, heart, and stroke risk. Multi-output RF classifier for weather-disease correlations (93–95% accuracy)
- Infrastructure: Models trained as part of Docker build — no separate model storage needed. Containerized with Docker Compose for local development
Interesting Challenges
- Ensemble Design: Combining multiple classifiers improved accuracy, but most gains came from dataset design, not algorithm tuning. Balancing synthetic data to mirror real-world medical distributions was the real engineering challenge
- Multi-Output Classification: Weather-disease prediction required predicting three disease risks simultaneously. A multi-output Random Forest handled correlated outputs cleanly while maintaining 93–95% accuracy per disease
- Explainability: Health predictions need transparency. Each result shows which specific inputs drove the prediction, building trust in the system's recommendations