Overview

This project is an implementation of the model that predicts diseases and traditional Chinese herbs based on the symptoms that patients have referencing the paper Herb recommendation system for Traditional Chinese Medicine.

Software Implementation

Our project is composed of two main parts - the back-end data processing and front-end data visualization. The detailed descriptions of the main components are as follows:

Model training

The training data files are located under data/. The main training data set is data/HIS_Tuple.txt where each line is of the form diseases symptoms herbs where each entry is indexes separated by :. Mapping from index to actual diseases, symptoms, and herbs can be found at dis_dct.txt, sym_dct.txt, and herb_dict.txt. We train our model following the EM algorithm provided in the paper. Specifically, we have the following steps for training:

Since the dataset contains 9000+ records and there are about 10k possible values for diseases, herbs, and symptoms, the computation overhead is too much. Instead, we extracted the top 92 popular diseases, most frequent 500 symptoms and herbs and the first 985 records as training data for our model. The detailed training code is located at /src/TCM_Model_training.ipynb. We then store the parameters of our model in pickle file under training_result/.

Disease / Herb prediction

Using the tuned model parameter, we are now able to predict the diseases of the patient based on symptoms. In order to predict the diseases, we used the following method suggested in the paper: $Pr(d|{s_1,s_2, … ,s_n}) = c * Pr(s_1|d) * … * Pr(s_n|d) * Pr(d)$. Given the symptoms, we compute the rank of all possible diseases and return the top 10 ranked diseases. For each disease, we recommend herbs based on our model parameter $Pr(h|d)$. The commendation function is written in src/server.py which serves as the server for our front-end visualization.

Front End

We also implement a front-end user interface that interacts with the back-end server. The patients are prompted to input their symptoms, and we then display the top ten most possible diseases and four herbs for each disease.

To make our website more user-friendly, we add the feature of auto-completion where users can type in a few characters, and the system will infer related symptoms. It’s particularly useful when the user is not sure about the exact description of his/her symptoms. Our website displays three results at the beginning, and allows the user to choose to see more, up to ten diseases, or less.