Prediction of Liver Cirrhosis

Project Description:

About Liver Cirrhosis

Chronic liver damage from a variety of causes leading to scarring and liver failure.
Hepatitis and chronic alcohol abuse are frequent causes.
Liver damage caused by cirrhosis can't be undone, but further damage can be limited.
Initially patients may experience fatigue, weakness and weight loss.
During later stages, patients may develop jaundice (yellowing of the skin), gastrointestinal bleeding, abdominal swelling and confusion.

About the dataset

This data set contains 416 liver patient records and 167 non liver patient records collected from North East of Andhra Pradesh, India.
The "Dataset" column is a class label used to divide groups into liver patient (liver disease) or not (no disease).
This data set contains 441 male patient records and 142 female patient records.

Goal of the Project

The main aim of the project is to create a ANN model which classifies patients as Infected or not infected based on various protiens in the blood.
By using the simple blood tests we can predict whether he is infected or not.

Algorithm:

Import the Libraries.
Read the Dataset.
Check for Null Values, if there are any fill them.
Check for duplicated values, if there are any remove them.
Transform Categorical into Numerical values.
Check Correlation Values for each feature.
Drop UnCorrelated Featuers.
Assign X and Y.
Split Dataset into testing and training.
Apply MLP Classifier and predict accuracy
Analyze the metrics.
Predict for a given input

Program:

Import the Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Read & Basic info about Dataset

df = pd.read_csv("./Liver.csv")
df
df.info()
df.describe()
df.columns

Check for Null Values & Remove them

df.isnull().sum()
df['Albumin_and_Globulin_Ratio'] = 
   df['Albumin_and_Globulin_Ratio'].fillna(df['Albumin_and_Globulin_Ratio'].mean())
df.isnull().sum()

Check for Duplicate Values & Remove them

print("Duplicate Values =",df.duplicated().sum())
df[df.duplicated()]
df=df.drop_duplicates()
df.duplicated().sum()

Encode Values

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df["Gender"] = le.fit_transform(df["Gender"])

df['Dataset']=df['Dataset'].map({1:1,2:0})
df

Correlation Values

plt.figure(figsize=(10,5))
df.corr()['Dataset'].sort_values(ascending=False).plot(kind='bar',color='black')
plt.xticks(rotation=90)
plt.xlabel('Variables in the Data')
plt.ylabel('Correlation Values')
plt.show()

df = df.drop(["Total_Protiens","Albumin","Albumin_and_Globulin_Ratio"],axis=1)
df

Assigning X and Y

X = df.drop(['Dataset'], axis=1)
X
y = df['Dataset']
y

Splitting Dataset

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.30,random_state=101)
print("Training sample shape =",X_train.shape)
print("Testing sample sample =",X_test.shape)

Creating MLP

from sklearn.metrics import accuracy_score,classification_report,confusion_matrix
from sklearn.neural_network import MLPClassifier

reg = MLPClassifier(hidden_layer_sizes=(8), learning_rate_init=0.0001, max_iter=10000)  
reg.fit(X_train, y_train)

log_predicted= reg.predict(X_test)

Testing Metrics

print('Accuracy: \n', accuracy_score(y_test,log_predicted))
print('Confusion Matrix: \n', confusion_matrix(y_test,log_predicted))
sns.heatmap(confusion_matrix(y_test,log_predicted),annot=True,fmt="d")
print('Classification Report: \n', classification_report(y_test,log_predicted))

Testing Custom Inputs

pred_0 = reg.predict([[25,0,0.1,0.1,44,4,8]])
pred_1 = reg.predict([[50,1,5,1,200,50,50]])
if(pred_0 == 1 or pred_1 ==1):
  print("Infected with Liver Cirrohisis")
else:
  print("Not Infected with Liver Cirrohisis")

Output:

Read & Basic info about Dataset

Dataset

Info

Descrption

Columns

Check for Null Values & Remove them

Null Value - Before Removing

Null Value - After Removing

Check for Duplicate Values & Remove them

Total Duplicate Values

Duplicate Values - After Removing

Encode Values

Afer Encoding

Correlation Values

Correlation

Dataset after dropping uncorrelated values

Splitting Dataset

Training and testing size

Testing Metrics

Accuracy

Confusion Matrix

Classification Report

Testing Custom Inputs

Normal Levels

Total bilirubin: 0.1 to 1.2 mg/dL
Direct bilirubin: less than 0.3 mg/dL
Alkaline_Phosphotase -44 to 147 international units per liter
Alamine_Aminotransferase - 4 to 36 U/L
Aspartate_Aminotransferase - 8 to 33 U/L.

Test -1

Age = 25
Gender = 0
Total_Bilirubin = 0.1
Direct_Bilirubin = 0.1
Alkaline_Phosphotase = 44
Alamine_Aminotransferase = 4
Aspartate_Aminotransferase = 8

Test- 2

Age = 50
Gender = 1
Total_Bilirubin = 5
Direct_Bilirubin = 1
Alkaline_Phosphotase = 200
Alamine_Aminotransferase = 50
Aspartate_Aminotransferase = 50

Advantage :

This model is very helpful in predicting Liver Cirrohsis with a Blood Test only.
Usually it invloves MRI or Scan to make sure.
Thus it makes the test cost effective and more guaranteed.
75% is a good accuracy score and it can further be increased by using certain Hyperparameters and Regularizing the ANN.
These measures can be implemented in the next steps and our model will be more accuracte.

Result:

Thus a MLP is trained to classify whether a patient is infected with Liver Cirrohsis or Not based various blood test results with nearly 75%(74.269%) accuracy Refer Colab File HERE

Prediction of Liver Cirrhosis

Project Description:

About Liver Cirrhosis

About the dataset

Goal of the Project

Algorithm:

Program:

Import the Libraries

Read & Basic info about Dataset

Check for Null Values & Remove them

Check for Duplicate Values & Remove them

Encode Values

Correlation Values

Assigning X and Y

Splitting Dataset

Creating MLP

Testing Metrics

Testing Custom Inputs

Output:

Read & Basic info about Dataset

Dataset

Info

Descrption

Columns

Check for Null Values & Remove them

Null Value - Before Removing

Null Value - After Removing

Check for Duplicate Values & Remove them

Total Duplicate Values

Duplicate Values - After Removing

Encode Values

Afer Encoding

Correlation Values

Correlation

Dataset after dropping uncorrelated values

Splitting Dataset

Training and testing size

Testing Metrics

Accuracy

Confusion Matrix

Classification Report

Testing Custom Inputs

Test -1

Test- 2

Advantage :

Result:

A Project By:

Shafeeq Ahamed.S - 212221230092

Sanjay Kumar.S.S - 212221240048

About