Enlighten Data Story: iFood Performance

1. Business Context

In the highly competitive banking sector, customer churn acts as a silent drain on profitability. Financial institutions frequently operate under a purely reactive mindset—triggering the retention team only when the customer has already requested account closure, a point at which the decision is almost irreversible and the operational cost is extremely high.

The core issue faced was the lack of predictive visibility. Legacy reports were limited to descriptive statistics (showing who had already canceled), failing to identify customers at imminent risk. Consequently, cross-sell campaigns and retention calls were launched indiscriminately, inflating Customer Acquisition and Retention Costs (CAC/CRC) while failing to protect revenue from accounts with the highest systemic risk attached to the company's P&L.

2. Strategic Objectives

The design of this solution went beyond merely training algorithms; it required translating complex mathematical outputs into executive balance sheet metrics:

Predictive Intelligence (ML): Replace commercial guesswork with a statistical model capable of mapping the behavior of 10,000 account holders and inferring the exact probability of churn for each active individual.
Revenue Protection (Finance): Calculate and expose the Estimated Revenue at Risk metric in Power BI, directly linking retention efforts to the protection of the institution's Bottom Line.
Surgical Tactical Targeting: Deliver a daily, actionable matrix (*Actionable Risk List*) to operations, prioritizing telemarketing approaches based on the statistical risk of each customer.

3. The Intelligence Engine: Python & Machine Learning

The raw transactional database comprised 10,000 historical records, containing deep demographic attributes (Age, Geography, Gender, Tenure) blended with crucial banking relationship metrics (Retained Balance, Number of Products, Credit Score, Active Member Status).

Exploration & Feature Engineering (EDA/ETL): Data processing was conducted in a Python environment to ensure analytical integrity. During the Exploratory Data Analysis, I identified a severe class imbalance native to the problem: the historical churn rate hovered around 20%. For training, I applied One-Hot Encoding to convert nominal spatial variables, scaled the financial magnitude vectors to prevent bias, and split the data into training and testing samples. The predictive core took shape through the Random Forest Classifier, an ensemble learning algorithm unmatched in classifying non-linear complexities without the sacrifice caused by overfitting.

Python

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, roc_auc_score

# 1. Loading & Categorical Pre-processing
df = pd.read_csv('bank_customers.csv')
df = pd.get_dummies(df, columns=['Geography', 'Gender'], drop_first=True)

X = df.drop(['Churn', 'CustomerId', 'Surname'], axis=1)
y = df['Churn']

# 2. Train-Test Split with Stratification
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, stratify=y, random_state=42
)

# 3. Training: Mitigating Churn imbalance with class_weight
# The 'balanced' parameter forces the algorithm to severely penalize False Negatives
rf_model = RandomForestClassifier(
    n_estimators=200, 
    max_depth=12, 
    class_weight='balanced', 
    random_state=42
)
rf_model.fit(X_train, y_train)

# 4. Model Evaluation 
y_pred = rf_model.predict(X_test)
y_proba = rf_model.predict_proba(X_test)[:, 1]
print(f"ROC-AUC Score: {roc_auc_score(y_test, y_proba):.4f}")
print(classification_report(y_test, y_pred))

# 5. Applying the Strategic Threshold (30%) & Enriching the Base
df['Probabilidade_Churn'] = rf_model.predict_proba(X)[:, 1]

# Executive Calibration: Instead of assuming propensity only at > 50%, 
# we force the early warning signal at >= 30% for timely commercial action.
df['Risco_Identificado'] = np.where(df['Probabilidade_Churn'] >= 0.30, 1, 0)

# Exporting the final Predictive Dataset for ingestion into Power BI
df.to_csv('Treated_Bank_Churn_For_PBI.csv', index=False)

Risk Calibration: As detailed in the code block, calibrating the Decision Threshold to 30% was a deliberate executive intervention. Maximizing the model's Recall—capturing more customers in the primary alert zone—was statistically prioritized, ensuring the commercial team has the necessary lead time to act before the cancellation decision becomes effective.

4. Analytical Architecture & Advanced DAX in Power BI

All the predictive math processed in Python was channeled into a bilingual Executive Dashboard in Power BI, focused on C-Level interactivity and Data Storytelling:

Historical Diagnosis: A retroactive page proving the empirical correlation of engagement with churn, demonstrating to the board that inactive customers have a significantly higher attrition rate (2x) compared to active ones.
Customer Risk Matrix (Complex Scatter Plot): I employed advanced visual engineering in Power BI, fixing categorical axes with AVERAGE() functions to bypass structural overlapping limits. By applying opacity and size metrics conditioned to the Random Forest Probability, I created an authentic dense "Heat/Density Map", perfectly highlighting the lethality at the extremes of offered banking products.
Risk Funnel: Real-time classification coded via DAX, predictively segmenting the portfolio into "Safe", "Warning", and "Critical" statuses.
Actionable List: A matrix tactically positioned in the footer acting as a daily operational CRM agenda for the commercial team. It is ordered by the Predicted Risk Score, indicating exactly which customers to contact immediately.

DAX

// DAX consumes the Python mathematical output to generate a bilingual semantic grouping
Risk Level | Nível de Risco = 
SWITCH(
    TRUE(),
    'Treated_Bank_Churn_For_PBI'[Probabilidade_Churn] >= 0.70, "🔴 1. Critical | Crítico (>70%)",
    'Treated_Bank_Churn_For_PBI'[Probabilidade_Churn] >= 0.30, "🟡 2. Warning | Alerta (30% - 70%)",
    "🟢 3. Safe | Seguro (<30%)"
)

Integrated Executive Perspectives

5. Hidden Behavioral Patterns (Clusters)

The spatial analysis and dispersion plotting revealed invaluable business phenomena:

The Operational Trap: The modeling uncovered a strong critical cluster of customers concentrated in the Germany base, aged around 45 to 60, holding exclusively one single product from the service basket. The Safe Zone: Counterintuitively, the model demonstrated that migrating to holding exactly "2 banking products" grants the bank a massive relationship anchoring (radical retention), regardless of the volatility in their account balance.

6. Accounting & Financial Impact (P&L)

Drawing from my 15+ years in managerial accounting, I attest that generic vanity metric dashboards rarely justify a data budget at the boardroom level.

The structural differentiator of this tool is the line item of BRL 188 Million measured as Estimated Revenue at Risk. Monitoring this proxy anticipates provisions and systematically protects the final Net Income. Reducing the mapped marginal loss costs immeasurably less than the efforts to capture new customers through performance media.

7. Corporate Governance & Compliance

Employing predictive algorithms that judge human behaviors in banking spheres touches upon rigid constitutional boundaries and strict regulations from the Central Bank and data protection laws (LGPD/GDPR). The modeling proactively rejects excessively invasive and prejudicial characteristics in generating the score, ensuring that the real-time processing of transactional flows remains anonymized.

Analytical Disclaimer: The development of this prediction falls under the rigorous field of Legal Analytics and fiduciary operational risk mitigation. These evaluations expose data-driven managerial methods and do not constitute legal advice.

8. Data Quality & Analytical Limitations

The dataset presented excellent demographic robustness. However, an inherent limitation of static scenarios is the lack of a transactional Timestamp (e.g., daily app login frequency). Inactivity, parameterized in a binary way in the IsActiveMember column, proved to be highly predictive; in future technological restructurings, including real-time API logistics flows would give the model millimeter-level granularity for issuing alerts in the Predictive Dashboard.

9. Impact & Strategic Recommendations

The project consolidated prescriptive commercial action fronts capable of reversing the current projected bank Churn trend:

Risk Inactivation

Execution of automation pipelines (cashback, temporary free emergency limits) aimed primarily at engaging accounts parked with the "Inactive" attribute and Zero Cash.

Cross-Sell as Retention

End of generic email marketing blasts. All budget pivots to push customers in the "1 Product Cluster" to actively acquire their 2nd product through fee exemptions, forcing the customer to anchor themselves in the statistically proven "Safe Zone".

10. Conclusion

This Machine Learning dashboard transcends the traditional aesthetics of a typical BI report. Development via clean Python code coupled with Power BI's analytical elasticity proves that true seniority lies in the translatability of Artificial Intelligence equations to sustain corporate P&L.

By unifying Data Science, predictive data modeling, and the critical foundation matured over my 15+ years of experience in Accounting and Corporate Law, the project converted a static file of "canceled customers" into a proactive tool for net capital defense.

Bank Churn Prediction with AI: Machine Learning & Revenue Protection in Power BI

How I engineered a Random Forest predictive model in Python to mitigate bank churn, translating complex mathematical outputs into a bilingual Executive Dashboard in Power BI focused on bottom-line protection.

Predictive Dashboard: Bank Churn Prediction & Risk Clusters

Tech Stack Used in this Project