apidays LIVE Hong Kong 2021 - API Ecosystem & Data Interchange
August 25 & 26, 2021
Federated Learning for Banking
Isaac Wong, AI Solution Architect at WeBank
4. IT Staff
Founded in
2014
Driven by technology
and innovation
Peak Daily Transactions
Individuals & SMEs
Served by WeBank
Customers
First Digital Bank
1st
270+mn
>56% 750mn
High Concurrent
Transaction Processing
4
1.8+mn
WeBank: China’s 1st Digital Bank
5. ( ─────────────── )
Efficiency * UX * Scale
Cost * Risk
Optimize
Value of FinTech:
• Chatbot
Handles 98% of inquiries
• Remote KYC
FAR ~1 in a million
• Quality Control
100% coverage rate
• FL for Credit Scoring
AUC increased by 12%, cost reduced by 5%-10%
• AI Risk Mgmt.
Diverse alternative data sources (e.g. satellite, GPS, sentiment)
• ARM-Infrastructure
Stable and secure
• Distributed Architecture
12k standardized servers
• Precision-Marketing
Customer acquisition cost reduced by 93%
• Smart Risk Mgmt.
100K+ variables; 387 models of 44 types
Artificial Intelligence Blockchain
Cloud Computing Big Data
• Supplier of China’s Blockchain-based Service Network
• A/C Mgmt. & Reconciliation
Financial-grade, 170 million+ transactions w/o error
• Arbitration Chain
Stored 3 billion+ records
Dispute resolution reduced from 6+ months to ~7 days
• Supply Chain Finance Platform
Served over 10k companies
• Mainland-Macau Health Code
Improved cross border traveling efficiency
• Macau Smart City
Improved government service efficiency by 50%
• Copy Rights Platform
Stored over 5 million press releases
5
Leading Technological Capabilities : ABCD
7. Hype Cycle for Privacy, 2021
7
• Hype cycle: In H1 2020,
researchers published more
than 1,000 papers on FL—
compared to just 180 total
papers in 2018.
• Google searches for the term is
on surging trend
8. Gartner Top Strategic Technology Trends for 2021
8
• Privacy is becoming a bigger issue,
and new regulations will force
organizations to be more concerned
about privacy protection.
• Gartner believes that by 2025, half of
large organizations will implement
privacy-enhancing
computation(PEC) for multiparty data
analytics use cases.
10. Limits of Traditional ML
10
Combining Results - RISKY
Buying Data - ILLEGAL
Using Desensitization Data - INEFFECTIVE
Directly buying data from 3rd party companies is getting
banned around the world and violates privacy.
Unresolved Issue
Getting and using desensitization data between corporations
cannot provide any guarantee of the outcome and
performance of modeling.
Using results from models individually from different data
sources: Companies take their own risks to the results.
Bank A Social Media B
Companies cannot buy data directly under more restricted laws.
Further audit and privacy concerns make companies unwilling to collaborate.
Ways Blocked Between Collaborators
Financial Data include credit reports,
transaction history and fraud detection, etc.
User Data include user portrait, activity history,
interest labels, and consumption habits, etc.
Current Challenges
Unwillingness of Data Sharing within Departments/Subsidiaries
Data Platform
Data?
NO
Consumers Dept SME Dept Corporate Dept
Parent company finds it hard to build a universal data platform.
Suffocation of Data Collaboration limits the Effectiveness of ML
11. FL resolves Limitations of Traditional ML
11
FL deployed for
a Single Financial Services Institute(FSI)
FSI A Data Partner 2
Dept/Subs 1
Dept/Subs 2
Dept/Subs N
…
Data Partner N
Data Partner 1
…
FL Network deployed for
Multiple FSI
Operator Data Provider 2
FSI A
FSI B
FSI N
…
Data Provider N
Data Provider 1
…
• A distributed machine learning framework that helps multiple parties (e.g. multiple departments /
subsidiaries / organizations) effectively and collaboratively building models in compliance with user privacy
and data security rules, as well as government policies and regulations
12. Categorization of FL
12
Large overlap of sample IDs (users) of the two data sets
Large overlap of features of the two data sets
Horizontal FL
“Aggregate” IDs
Vertical FL
Samples
Features
Samples
Features
“Aggregate” features
Data from C
Vertical
Federated
Learning
Aggregate
Features
Labels
Data from B
Data from A
Labels
Horizontal
Federated
Learning
Aggregate
Samples
Labels
Data from B
14. WeBank serves more SME with FL technology
Limited SME data
Difficult for WeBank to achieve large-scale
growth in its SME credit business
Achieve inclusive finance
WeBank hopes to serve more SME that is
underserved in financial services.
No. customers : SME 1.88 Mn+
FL technology
WeBank
ID, Y(Overdue)
Loan size: 2Bn+
No. customers: 300K+
A large invoicing Co.
ID, X (Invoice data for SME)
02
03
01
Goal
Result
Challenge
15. Strengthen Anti-Money Laundering(AML) in Banking Industry
Internet
company
Vertical FL
Mobile payment and
geolocation data
E-commerce shopping
Map track
...
Expand AML samples through horizontal FL and
build a baseline AML model
Expand the dimension of customer characteristics
through vertical FL to further optimize the model effect
Bank
1
Bank
3
Horizontal FL Bank
2
Bank transaction data
Transfer
To pay
...
• Due to data security requirements, financial institutions such as banks and insurance companies model data locally
• With FL, models of various institutions can be combined to break the barriers between data and improve the accuracy of the AML
system and the efficiency of reviewers
16. Enhance bank credit risk control capabilities
02
03
01
Goal
Result
Challenge
Unresolved data privacy challenge
The lack of privacy data protection mechanisms
prohibit usage of external data
Ramping up risk control capability
The bank aim at improving credit risk control
capabilities through leveraging external data to
fulfill regulatory requirements
Enhance retail credit model, while fulfilling the self-
built risk control regulatory requirements
A bank
ID, Y (Overdue)
Loan size: 10Bn+
No. customers: 1Mn+
A large Internet Co.
ID, X (Internet behavior data)
FL Technology
17. Optimize Pricing for insurance industry
17
Internet
company
Insurance
company
2
Vertical FL
Horizontal
FL
Insurance
company
N
Assist reinsurance companies to establish auto insurance pricing model for insurance company:
• Vertical federation introduces and mines the Internet big data "from the human factor",
• Horizontal federation expands the scale of the insurer’ s traditional factor data set, enhancing risk analysis of car owners
Insurance company data
Underwriting data
Claim data
Internet of Vehicles data
...
Internet behavior data
Trip data
Consumption data
Information preference
Driving violation data
...
Insurance
company
1
19. FL Illustration– Give Me Some Credit(GMSC)
19
• Use a public dataset:
https://www.kaggle.com/c/GiveMeSomeCredit/data
• Credit scoring algorithms predicting the probability
that somebody will experience financial distress in
the next two years.
• Public Best AUC – 0.86390
Data Summary
Data Set Name: Give Me Some Credit
No. Records: 150K
Target Variable: With Default(Y/N)
Explanatory Variables: 10
1. Age
2. Debt Ratio
3. Monthly Income
4. No. Time 30- 59 Days Past Due Not Worse
5. No. Time 60-89 Days Past Due Not Worse
6. No. Times 90 Days Late
7. Revolving Utilization Of Unsecured Lines
8. No. Open Credit Lines And Loans
9. No. Real Estate Loans Or Lines
10. No. Dependents
20. Data Preprocessing Modelling
FL Illustration Scope
Binary Classification for Credit Default Prediction
Party A
Data
Party B
Data
POC Data
• 150K Records
• 6 X
• Y
Party A
1. Debt Ratio
2. Monthly Income
3. No. Time 30-59
Days Past Due
Not Worse
4. No. Time 60-89
Days Past Due
Not Worse
5. No. Times 90
Days Late
6. Age
ID1
ID2
ID3
ID 150K
.
.
.
.
.
Binary classification
• Cr Default
Machine Learning Model
• Predict Credit Default
• Train/ Validation Ratio: 8:2
X: Explanatory Variables Y: Target Variable
Data Attributes
Tree LR SBT SVM NN
Algorithms Demonstrated
1. Secure Gradient Boost(SBT)
Party B
1.No. Dependents
2.No. Open Credit
Lines And Loans
3.No. Real Estate
Loans Or Lines
4.Revolving
Utilization Of
Unsecured Lines
• 150K Records
• 4 X
Intersect
Read Data
Participat
e
21. FL with Federated AI Technology Enabler (FATE) Enterprise Version
21
FATE support Secured Boosting Tree(SBT) Algorithm using Homomorphic Encryption, similar to
XGBoost/GBDT
Algo configuration Training Process Modelling Dashboard
22. Training Result Comparison
22
Bank Data
150k IDs
6 X | 1 Y(label)
Local ML
XGBoost
iter: 70 max_depth:
5
Train
AUC 0.821
Validation
AUC 0.802
Partial Local Result
Bank Data
150k IDs
6 X | 1 Y(label)
Federated ML
Secure Boosting Tree
iter: 50 max_depth: 5
Train
AUC 0.879
Validation
AUC 0.862
FL Result
3rd Party Data
150k IDs
4 X
val +7.5% improvement
All Data
150k IDs
10 X | 1 Y(label)
Centralized ML
XGBoost
iter: 70 max_depth:
5
Train
AUC 0.878
Validation
AUC 0.862
Centralized Result
VS
23. FATE Credentials
23
WeBank led the standard of “Federated
Learning Architecture in AIOSS
WeBank publish the
first book of “Federated
Learning”
WeBank is the founding member of IEEE
P3652.1. We are pushing an IEEE
Standard of Federate Learning Application.
Standard
Recommendation Publication
Award & Certification
Vision FL won the AAAI-20 Award of
Best Industrial Application.
The FATE platform got certifications
from CAICT on both FL and MPC
compliance tests.
HKMA encourages banks in Hong Kong to co-
create a digital framework on advanced
technologies such as Federated Learning.
WeBank introduced Federated Learning cases
and the regulator encouraged use cases of this
technology to PBOC Shenzhen about the Anti-
Money Laundering.