Code
pd.DataFrame({
"Missing values in A/G Ratio": [ag_ratio_missing]
})| Missing values in A/G Ratio | |
|---|---|
| 0 | 4 |
This project analyzes liver disease risk factors using the Indian Liver Patient Dataset (ILPD). The notebook focuses on cleaning, age and protein patterns, gender differences, and hypothesis tests. The interactive app at the end is a demo only, not a clinical model.
The analysis is based on the Indian Liver Patient Dataset (ILPD), containing 583 patient records collected from Andhra Pradesh, India. Each record is labeled as either healthy or diagnosed with liver disease.
The notebook found 4 missing values in the A/G Ratio column and removed those rows before analysis.
We first analyzed the distribution of patient ages.
Most patients are middle-aged, clustering between 30 and 60 years old. The average age is about 44.7, the median is 45, and the full range in the dataset is 4 to 90.
We compared Total Proteins (TP) against Albumin (ALB) to see how closely they move together.
Total Proteins and Albumin levels move closely together. When one is higher, the other tends to be higher too, with a correlation of 0.78 out of a maximum of 1.0. Since the liver produces Albumin, this tight relationship makes it a useful early signal of liver health.
We filtered the data to examine patients under 60 and calculate the likelihood of presenting with liver disease based on gender.
In this dataset, about 27% of men under 60 and 34% of women under 60 were diagnosed with liver disease. Women under 60 show the higher rate in this sample.
To check whether the patterns above could simply be due to chance, the notebook ran three statistical tests. Think of a p-value as a “could this be a fluke?” score. A p-value below 0.05 means the finding is unlikely to be random.
Note: This is a demo app. It uses a synthetic training set inside the app code, so it should not be read as a real ILPD-based medical predictor.
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| components: [viewer]
#| viewerHeight: 650
from shiny import App, render, ui, reactive
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
# Define the user interface
app_ui = ui.page_fluid(
ui.h2("Liver Disease Risk Demo"),
ui.layout_sidebar(
ui.sidebar(
ui.h4("Patient Health Metrics"),
ui.input_select("preset", "🧪 Choose a Patient Profile:",
{"custom": "Custom", "healthy": "Healthy Adult", "high_risk": "High-Risk Patient"}),
ui.input_slider("age", "Age", min=4, max=90, value=30),
ui.input_select("gender", "Gender", {"1": "Male", "0": "Female"}),
ui.input_slider("tb", "Total Bilirubin", min=0.4, max=75.0, value=0.8, step=0.1),
ui.input_slider("alkphos", "Alkaline Phosphotase", min=63, max=2110, value=150),
ui.input_slider("sgpt", "Alamine Aminotransferase (SGPT)", min=10, max=2000, value=25),
ui.input_action_button("predict", "Calculate Risk Score", class_="btn-primary")
),
ui.card(
ui.h3("Prediction Result:"),
ui.output_ui("prediction_result")
)
)
)
def server(input, output, session):
@reactive.Calc
def train_model():
np.random.seed(42)
X = np.random.rand(500, 5) * [90, 1, 10, 500, 100]
y = (X[:, 2] > 2.5) | (X[:, 3] > 250) | (X[:, 4] > 60) | ((X[:, 0] > 50) & (X[:, 2] > 1.5))
y = y.astype(int)
clf = RandomForestClassifier(n_estimators=50, max_depth=5, class_weight='balanced', random_state=42)
clf.fit(X, y)
return clf
@reactive.Effect
@reactive.event(input.preset)
def update_sliders():
p = input.preset()
if p == "healthy":
ui.update_slider("age", value=32)
ui.update_select("gender", selected="0")
ui.update_slider("tb", value=0.8)
ui.update_slider("alkphos", value=160)
ui.update_slider("sgpt", value=22)
elif p == "high_risk":
ui.update_slider("age", value=65)
ui.update_select("gender", selected="1")
ui.update_slider("tb", value=7.5)
ui.update_slider("alkphos", value=480)
ui.update_slider("sgpt", value=85)
@render.ui
@reactive.event(input.predict)
def prediction_result():
clf = train_model()
features = [[
input.age(),
int(input.gender()),
input.tb(),
input.alkphos(),
input.sgpt()
]]
prob = clf.predict_proba(features)[0][1]
reasons = []
if input.tb() > 1.2:
reasons.append(f"**Elevated Total Bilirubin** ({input.tb()} > 1.2 mg/dL)")
if input.alkphos() > 147:
reasons.append(f"**Elevated Alkaline Phosphatase** ({input.alkphos()} > 147 IU/L)")
if input.sgpt() > 56:
reasons.append(f"**Elevated SGPT** ({input.sgpt()} > 56 U/L)")
if prob > 0.5:
msg = f"#### ⚠️ High Likelihood of Liver Disease (Risk Score: {prob:.2%})\n\n"
if reasons:
msg += "**Primary Risk Factors Identified:**\n\n- " + "\n- ".join(reasons)
else:
msg = f"#### ✅ Low Likelihood of Liver Disease (Risk Score: {prob:.2%})\n\n"
if reasons:
msg += "*(Note: Some values are outside optimal ranges:)*\n\n- " + "\n- ".join(reasons)
return ui.markdown(msg)
app = App(app_ui, server)