- Chronic liver damage from a variety of causes leading to scarring and liver failure.
- Hepatitis and chronic alcohol abuse are frequent causes.
- Liver damage caused by cirrhosis can't be undone, but further damage can be limited.
- Initially patients may experience fatigue, weakness and weight loss.
- During later stages, patients may develop jaundice (yellowing of the skin), gastrointestinal bleeding, abdominal swelling and confusion.
- This data set contains 416 liver patient records and 167 non liver patient records collected from North East of Andhra Pradesh, India.
- The "Dataset" column is a class label used to divide groups into liver patient (liver disease) or not (no disease).
- This data set contains 441 male patient records and 142 female patient records.
- The main aim of the project is to create a ANN model which classifies patients as Infected or not infected based on various protiens in the blood.
- By using the simple blood tests we can predict whether he is infected or not.
- Import the Libraries.
- Read the Dataset.
- Check for Null Values, if there are any fill them.
- Check for duplicated values, if there are any remove them.
- Transform Categorical into Numerical values.
- Check Correlation Values for each feature.
- Drop UnCorrelated Featuers.
- Assign X and Y.
- Split Dataset into testing and training.
- Apply MLP Classifier and predict accuracy
- Analyze the metrics.
- Predict for a given input
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("./Liver.csv")
df
df.info()
df.describe()
df.columns
df.isnull().sum()
df['Albumin_and_Globulin_Ratio'] =
df['Albumin_and_Globulin_Ratio'].fillna(df['Albumin_and_Globulin_Ratio'].mean())
df.isnull().sum()
print("Duplicate Values =",df.duplicated().sum())
df[df.duplicated()]
df=df.drop_duplicates()
df.duplicated().sum()
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df["Gender"] = le.fit_transform(df["Gender"])
df['Dataset']=df['Dataset'].map({1:1,2:0})
df
plt.figure(figsize=(10,5))
df.corr()['Dataset'].sort_values(ascending=False).plot(kind='bar',color='black')
plt.xticks(rotation=90)
plt.xlabel('Variables in the Data')
plt.ylabel('Correlation Values')
plt.show()
df = df.drop(["Total_Protiens","Albumin","Albumin_and_Globulin_Ratio"],axis=1)
df
X = df.drop(['Dataset'], axis=1)
X
y = df['Dataset']
y
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.30,random_state=101)
print("Training sample shape =",X_train.shape)
print("Testing sample sample =",X_test.shape)
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix
from sklearn.neural_network import MLPClassifier
reg = MLPClassifier(hidden_layer_sizes=(8), learning_rate_init=0.0001, max_iter=10000)
reg.fit(X_train, y_train)
log_predicted= reg.predict(X_test)
print('Accuracy: \n', accuracy_score(y_test,log_predicted))
print('Confusion Matrix: \n', confusion_matrix(y_test,log_predicted))
sns.heatmap(confusion_matrix(y_test,log_predicted),annot=True,fmt="d")
print('Classification Report: \n', classification_report(y_test,log_predicted))
pred_0 = reg.predict([[25,0,0.1,0.1,44,4,8]])
pred_1 = reg.predict([[50,1,5,1,200,50,50]])
if(pred_0 == 1 or pred_1 ==1):
print("Infected with Liver Cirrohisis")
else:
print("Not Infected with Liver Cirrohisis")
Normal Levels
- Total bilirubin: 0.1 to 1.2 mg/dL
- Direct bilirubin: less than 0.3 mg/dL
- Alkaline_Phosphotase -44 to 147 international units per liter
- Alamine_Aminotransferase - 4 to 36 U/L
- Aspartate_Aminotransferase - 8 to 33 U/L.
- Age = 25
- Gender = 0
- Total_Bilirubin = 0.1
- Direct_Bilirubin = 0.1
- Alkaline_Phosphotase = 44
- Alamine_Aminotransferase = 4
- Aspartate_Aminotransferase = 8
- Age = 50
- Gender = 1
- Total_Bilirubin = 5
- Direct_Bilirubin = 1
- Alkaline_Phosphotase = 200
- Alamine_Aminotransferase = 50
- Aspartate_Aminotransferase = 50
- This model is very helpful in predicting Liver Cirrohsis with a Blood Test only.
- Usually it invloves MRI or Scan to make sure.
- Thus it makes the test cost effective and more guaranteed.
- 75% is a good accuracy score and it can further be increased by using certain Hyperparameters and Regularizing the ANN.
- These measures can be implemented in the next steps and our model will be more accuracte.
Thus a MLP is trained to classify whether a patient is infected with Liver Cirrohsis or Not based various blood test results with nearly 75%(74.269%) accuracy Refer Colab File HERE