The document discusses using open data and advanced analytics for financial innovations in developing countries. It provides examples of how open data from sources like national statistics agencies can be used to analyze trends and provide insights. Advanced techniques like machine learning and predictive modeling are described as ways to generate individual and company credit profiles, rank entities, and help automate audits and predictions. The document argues that scaling such innovations using multiple data sources could significantly benefit developing world citizens and economies.
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Open Data for Financial Innovations in the Developing World
1. OPEN DATA FOR
FINANCIAL INNOVATIONS IN THE
DEVELOPING WORLD
DR. BIPLAV SRIVASTAVA
A C M D I S T I N G U I S H E D S C I E N T I S T , A C M D I S T I N G U I S H E D
S P E A K E R
S E N I O R R E S E A R C H E R A N D M A S T E R I N V E N T O R ,
I B M R E S E A R C H – I N D I A
11Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
2. Why This Talk? Main Messages
— Financial Innovations are key for a developing country like
India to provide better opportunities to its citizens
¡ Impacts not only finance (Banking, Insurance, …)
¡ But all other areas of a society (Healthcare, Transportation, Industry)
— Innovations depend on data, analysis and timely access
— Open data is often the most promising source to start
making quick impact
— Eventual aim should be to scale innovations with other data
sources and reach production scale to people seamlessly
2Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
3. Actions to Take
Tutorial on 27 July 2015 @ IJCAI 2015
— Join: “AI in India” google group –
¡ https://groups.google.com/forum/#!forum/ai-in-india
— Participate in machine learning competition on
using open data for health area (disease, finance, …)
¡ Start: https://www.facebook.com/dataview2016
¡ Competition page:
http://gator3080.hostgator.com/~sigdata//comad2016/
data_challenge_competition.html
¡ Data and insights sought:
http://gator3080.hostgator.com/~sigdata//comad2016/
data_sources.html
3
6. Complexity and Innovation
— Complexity
¡ Many countries: 28 in EU, 19 use Euro
¡ Changes within Europe; Yugoslavia broke up during
2004-2010
¡ There have been continuous currency changes since 1999 when
Euro was introduced; since 2001, Cyprus, Slovenia, Malta,
Slovakia … have joined or changed currency
— Innovation
¡ Linked data to represent data, metadata and relationships
¡ Contexual and holistic visualization
6Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
7. Indian Reality – Kingfisher Airline Case
— A two-term Rajya Sabha MP
¡ Heading company and taking loans from banks
¡ Leading airline to collapse
¡ Delaying repayment
— The airline (company)
¡ Not paying employees and vendors
¡ Not even paying income tax deducted from employees
— Consequence
¡ Airline collapses leading to loss to travellers and employees
¡ Banks suffer heavy losses
¡ Little impact on company leader
7Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
9. Reality in a Developing Country
— In private sector, hard to know about genuineness of
¡ Individuals and companies
¡ Their needs and expenses
— In government sector, hard to know about
¡ Spending – budgeted and actuals
¡ Effectiveness of their spending
¡ Benchmarking with best practices, e.g., return of investment
— Consequence
¡ Little loans available to the needy
¡ High non-performing assets (NPAs) of banks
¡ Lower performance of markets since investors stay away
¡ Lower country growth, high unemployment and poverty
9Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
10. Resources for Finding About a Person
— Public encyclopedia: Wikipedia
¡ Example: http://en.wikipedia.org/wiki/Vijay_Mallya
— Specialized databases
¡ Indianboards: http://indianboards.com/pages/index.aspx
÷ Example: Infosys (
http://indianboards.com/pages/companyprofile.aspx?
code=C0000604)
¡ US CEOs: http://ceo.com
¡ Forbes profile:
÷ Example: http://www.forbes.com/profile/ginni-rometty/
10Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
11. Resources for Finding About a Company
— Market regulators
¡ SEC (USA): Edgar filings -
http://www.sec.gov/edgar/searchedgar/companysearch.html
¡ Ministry of Corporate Affairs (MCA) database:
http://www.mca.gov.in/DCAPortalWeb/dca/
MyMCALogin.do?method=setDefaultProperty&mode=31
— Private market intelligence companies
¡ EMIS:
÷ Example:
http://www.securities.com/php/company-profile/KR/
Samsung_Electronics_CoLtd_en_1651328.html
11Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
12. Snapshot: Financial Innovations Needed
for Developing Countries
— [Individuals] Data-based generation of
¡ Credit profile of individuals
¡ Criminal profile of individuals
— [Entities] Data-based generation of
¡ Credit profile of legal entities – Companies, NGOs
¡ Ranking of companies in an industry
— [Governments] Data-driven automatic
¡ Audit of government programs for effectiveness
¡ Ranking of cities, state governments
¡ Corruption assessment
— Prediction of
¡ Stocks
¡ Initial public offers (IPOs)
¡ Tax collection
12Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
13. Outline
— Motivating Examples
— Open Data
— Analytical Techniques
— Discussion
¡ Pattern in Building Usable Systems
¡ Smart City – What to Solve?
¡ Call to Action
13Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
15. Open Data
— Open data is the notion that data should not
be hidden, but made available to everyone.
The idea is not new.
— Scientific publications follow this: “standing
on the shoulders of giants”
¡ Science stands for repeatability of results and
hence, sharing
¡ The scientific community asserts that open
data leads to increased pace of discovery.
(See: Ray P. Norris, How to Make the Dream Come True: The Astronomers' Data Manifesto, At
http://www.jstage.jst.go.jp/article/dsj/6/0/6_S116/_article, Accessed 2 Apr, 2012)
— Governments are the new source for open
data
¡ Data.gov efforts world-wide; 400+
governmental bodies, including 20+ national
agencies, including India, have opened data
¡ In India, additional movement is “Right to
Information Act”
15Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
16. Not to Be Confused With Orthogonal Trend – Big Data
— Volume
— Variety
— Velocity
— Veracity
— …
Cartoon critical of big data application,
by T. Gregorius.
http://upload.wikimedia.org/wikipedia/commons/thumb/b/b3/
Big_data_cartoon_t_gregorius.jpg/220px-
Big_data_cartoon_t_gregorius.jpg
16Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
17. 400+Data Catalogs of Public Data
As on 21 July 2015
17Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
18. Data.gov (USA)
As on 16 June 2015
18
Talk at IEEE Bangalore Workshop, Technologies for Planning and Acting in Real World Systems
19. City Level – Chicago, USA
19
As on 16 June 2015
Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
21. Peek into the Future - Amsterdam
http://citydashboard.waag.org/
21Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
22. Illustration of Levels
Source: http://5stardata.info/
Does Opening Data Make It Reusable? No
1
2
3
4
5
22Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
23. India: Right to Information Act
— Any citizen “may request information from a "public
authority" (a body of Government or "instrumentality of State")
which is required to reply expeditiously or within thirty days.”
¡ Passed by Parliament on 15 June 2005 and came fully into force on 13
October 2005. Citation Act No. 22 of 2005
— Lauded and reviled
¡ Brought transparency
¡ Also,
÷ Increased bureaucracy
÷ Shortcomings in preventing corruption
— More information
¡ http://en.wikipedia.org/wiki/Right_to_Information_Act
¡ http://rti.gov.in
23Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
24. Data Quality in Public Data in India
— Right to Information
¡ Not even 1*
¡ Information available to requester, but no one else
— Data.gov.in
¡ 2-3*
¡ Available in CSV, etc but not uniquely referenceable
— Open data movements are moving to linked data
form for semantics
24Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
25. Semantics for Published Data
25
Classify data in public domain. Use schema.org as illustration.
¡ Select an area (e.g., food, news events, crime, customs, diseases, …)
¡ Build + disseminate the catalog tags via a website
¡ Encourage publishers to use meta-data tags and enable search
Catalog/
ID
General
Logical
constraints
Terms/
glossary
Thesauri
“narrower
term”
relation
Formal
is-a
Frames
(properties)
Informal
is-a
Formal
instance
Value Restrs. Disjointness,
Inverse, part-of…
Credits:
Ontologies Come of Age McGuinness, 2001
From AAAI Panel 99 – McGuinness, Welty, Uschold, Gruninger, Lehmann
Plus basis of Ontologies Come of Age – McGuinness, 2003
Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
26. Still Confused on Semantics? Start with Linked Data Glossary
26Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
27. Open Data References
— Concept
¡ Open Data, At http://en.wikipedia.org/wiki/Open_data,
¡ Open 311, At http://open311.org/
¡ Catalog of Open Data, At http://datacatalogs.org/dataset
¡ Data City Exchange: http://www.imperial.ac.uk/digital-city-exchange
— India specific
¡ Open data report in India, At http://cis-india.org/openness/publications/ogd-report
— Standards
¡ W3C, At http://www.w3.org/2011/gld/
¡ 5 Star Linked Data ratings, At http://www.w3.org/DesignIssues/LinkedData.html
— Applications and ecoystems
¡ Introduction to Corruption, Youth for Governance, Distance Learning Program, Module 3, World Bank
Publication. Accessed on June 15th 2011, At
http://info.worldbank.org/etools/docs/library/35970/mod03.pdf
¡ Dublinked, At http://dulbinked.ie
27Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
29. Advanced AI Techniques (Analytics) like Planning & Machine Learning
make use of data and models to provide insight to guide decisions
Models
Analytics
Data
Insight
Data sources:
Business automation
Instrumentation
Sensors
Web 2.0
Expert knowledge
“real world physics”
Model:
a mathematical or
algorithmic
representation of
reality intended to
explain or predict
some aspect of it
Decision executed
automatically or
by people
29Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
30. Example: Talks
— Are they useful? (Descriptive)
¡ Answering needs an assessment about the event
— If it happens next time, how many will attend?
(Predictive)
¡ Above + Answering needs an assessment about unknowns
(e.g., future)
— Should you attend? (Prescriptive)
¡ Above + Answering needs understanding the goals and current
status of the individual
30Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
31. Analytics Landscape
Degree of Complexity
CompetitiveAdvantage
Standard Reporting
Ad hoc reporting
Query/drill down
Alerts
Simulation
Forecasting
Predictive modeling
Optimization
What exactly is the problem?
What will happen next if ?
What if these trends continue?
What could happen…. ?
What actions are needed?
How many, how often, where?
What happened?
Stochastic Optimization
Based on: Competing on Analytics, Davenport and Harris, 2007
Descriptive
Prescriptive
Predictive
How can we achieve the best outcome?
How can we achieve the best outcome
including the effects of variability?
31Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
32. ML References
— WEKA
¡ Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html
¡ WEKATutorial:
÷ Machine Learning withWEKA: A presentation demonstrating all graphical user interfaces (GUI) in
Weka.
÷ A presentation which explains how to useWeka for exploratory data mining.
¡ WEKA Data Mining Book:
÷ Ian H.Witten and Eibe Frank, Data Mining: Practical Machine LearningTools and
Techniques (Second Edition)
÷ http://www.cs.waikato.ac.nz/ml/weka/book.html
¡ WEKAWiki: http://weka.sourceforge.net/wiki/index.php/Main_Page
— Jiawei Han and Micheline Kamber, Data Mining: Concepts andTechniques, 2nd ed.
— http://www.kdnuggets.com/2015/03/machine-learning-table-elements.html
32Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
33. Discussion: A Pattern in Building Usable Systems
33Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
34. Recap of Key Points from Finance Scenarios
— Very difficult to find about persons, companies,
states reliably
— This is leading to wastage, e.g., non-performing
assets in banking system
— Outside finance: wastage in public spending
(healthcare, transportation, industrial
production, …), business and individual spending
— Information technology (IT) and financial
innovations are needed, especially in developing
countries
34Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
35. Real-World Applications of ICT Follow a Pattern
n Value (from Action, Decisions) – Providing
benefits that matter, to people most in need of, in a
timely and cost-efficient manner. Going beyond
technology to process and people aspects.
n Data + Insights – Available, Consumable with
Semantics, Visualization / Analysis
n Access - Apps (Applications), Usability - Human
Computer Interface, Application Programming
Interfaces (APIs)
35Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
36. Example – Financial Innovations
— Decision Value – To individuals, businesses, government
institutions
¡ Individuals Examples – Which person to financially trust? Which bank to trust?
¡ Govt Examples – What company to give contracts?
¡ Business Examples – Which companies and individuals to give credit to? What
discounts to give?
— Data – Quantitative as well as qualitative
¡ Open data
¡ Social data
¡ Transactional data
— Access –
¡ Today, little, reliable information
Key Idea: Can we make insights available when needed and help
people make better decisions?
36Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
37. Example – Public Health Innovations
— Decision Value – To individuals, businesses, government
institutions
¡ Individuals Examples – Which doctor should I go? Which hospital should I go?
What health policies should I take?
¡ Govt Examples – What diseases should be of focus? Which hospitals should be
given grants? Which health programs should be discontinued?
— Data – Quantitative as well as qualitative
¡ Past incidents – Cases, deaths, spending
¡ Health trends – vaccines, epidemics, health instruments
¡ Financial trends – insurance, policies, social behaviors
— Access –
¡ Today, little, and that too in health / technical jargon
¡ In pdf documents, website
Key Idea: Can we make insights available when needed and help
people make better decisions?
37Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
38. DataView 2016
Tutorial on 27 July 2015 @ IJCAI 2015
Data and insights sought:
http://gator3080.hostgator.com/~sigdata//comad2016/data_sources.html
38
Insights sought
1. What diseases are most prevalent in a given area (e.g., state, district, city, by keyword)?
2. Which diseases have been better controlled than others in India? What states have done better than others? Are there approaches which have worked for controlling / reducing instances of
diseases better than others?
3. How much money has been allocated to tackle specific diseases compared to others? Which regions do better than others in controlling diseases relative to money spent?
4. Is their a relationship between water-borne diseases and their relation to water pollution?
Datasets
Health
• H-DS-1: http://data.gov.in/catalog/number-cases-and-deaths-due-diseases , AllIndia (from 2000 to 2011) and State-wise (2010 and 2011) number of cases and deaths due to specified
diseases (Acute Diarrhoeal Diseases, Malaria, Acute Respiaratory Infection, Japanese Encephalitis, Viral Hepatitis).
• H-DS-2: http://data.gov.in/catalog/cases-and-deaths-due-kala-azar , Cases and Deaths due to the illness Kala-Azar in Bihar, West Bengal and Country during the years 1996 till 2000.
• H-DS-3: https://data.gov.in/catalog/cases-and-deaths-due-japanese-encephalitis-and-dengue-dhf-during-tenth-plancases and deaths due to Japanese Encephalitis and Dengue / DHF
during Tenth Plan.
• H-DS-4: https://data.gov.in/catalog/water-quality-affected-habitations, Water Quality Affected Habitations
• H-DS-5: Hospital Directory with Geo Code as on September 2015, https://data.gov.in/catalog/hospital-directory-national-health-portal
Expenditure
• F-DS-1: https://data.gov.in/catalog/outlays-and-expenditure-aids-control-programme-during-ninth-plan, outlays and expenditure of AIDS Control Programme during Ninth Plan.
• F-DS-2: https://data.gov.in/catalog/public-sector-outlaysexpenditure-during-eleventh-five-year-plan, public sector outlays and expenditures during Eleventh Five Year Plan (2007-12) under
various Heads of Development (Rs. Crore).
• F-DS-3: http://data.gov.in/catalog/outlays-department-health-agreed-planning-commission-during-tenth-plan , data related to 9th Plan Allocation, 9th Plan Anticipated Expenditure, 10th
Plan Allocation as Agreed by Planning Commission.
• F-DS-4: https://data.gov.in/catalog/percentage-share-household-expenditure-health-and-drugs-various-states-during-eleventh-five, data related to percentage share of household
expenditure on health and drugs in various states during Eleventh Five Year Plan.
• F-DS-5: https://data.gov.in/catalog/state-wise-plan-outlays-and-expenditure, table provides state-wise plan outlays and expenditure during 2011-2012.
• F-DS-6: https://data.gov.in/catalog/outlay-tenth-plan-tenth-plan-sum-annual-outlay-and-tenth-plan-actual-expenditure-department, data related to Outlay Tenth Plan, Tenth Plan
(200207) sum of Annual Outlay and Tenth Plan (2002-07) Actual Expenditure for Department of Health and Family Welfare.
Water Quality
• W-DS-1: https://data.gov.in/catalog/status-water-quality-india-2012, http://data.gov.in/catalog/number-cases-and-deaths-due-diseases , status of Water Quality in India in 2012
• W-DS-2: https://data.gov.in/catalog/status-water-quality-india-2008-and-2011, status of Water Quality in India - 2008 and 2011
39. Example –River Water Pollution
— Decision Value – To individuals, businesses, government
institutions
¡ Individuals Examples – Can I take a bath without getting sick? What crops
should I grow? What water should I drink and pay for?
¡ Govt Examples – How should govt spend money on sewage treatment for
maximum disease reduction? How should it inspect industries?
— Data – Quantitative as well as qualitative
¡ Dissolved oxygen,
¡ pH,
¡ … 30+ measurable quantities of interest
— Access –
¡ Today, little, and that too in water technical jargon
¡ In pdf documents, website
Key Idea: Can we make insights available when needed and help
people make better decisions?
39Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
41. What is a Smart City?
Smart city can mean one or more of the following:
— As a resource optimization objective, it is to know and manage a
city's resources using data.
— As a caring objective, it is about improving standard of life of citizens
with health, safety, etc indices and programs.
— As a vitality objective, it is about generating employment and doing
sustainable growth.
A city leadership can choose among these or define their own objective(s)
and manage with measurements to pro-actively achieve it
41
See other FAQs at: https://sites.google.com/site/biplavsrivastava/research-1/intelligent-systems/scfaqs
Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
42. 42
Smarter Cities solution paths leverage a similar approach
Uniquevaluerealized
Use of Smarter Cities capabilities
Manage
Data1
Analyze
Patterns2
Optimize
Outcomes
3
Integrate service
information to
improve department
operations
Develop integrated
view to improve
outcomes and
compliance
Leverage end-to-end
case management to
optimize service
delivery
Ç Improve service levels
È Reduce fraud and abuse
Ç Focus on the citizen
Ç Savings from overpayment
Ç Assistance with compliance
Ç Integrated case management
Ç Automation of citizen support
È Reduce operating costs
Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
43. India’s 100 Smart Cities
43Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
Details: https://sites.google.com/site/biplavsrivastava/smart-cities-in-india
44. Comments on India’s 100 City Plans
— A much-needed, much-delayed, start
¡ JNURM and earlier initiatives did not show impact
— However selection criteria was non-technical
¡ Focus was on funding feasibility (center-state) and administrative
considerations
¡ No commitment on measurable improvement of any metric in any
city domain
— Opportunity to impact India’s transformation
(theoretically)
¡ However, environment to try out India-specific, new innovations
needs to be created
¡ Focus has to be on improvement metrics; accountability for money
spent; quality outcomes
44Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
45. Discussion: Call for Action
45Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
46. Smart City Challenges
— From resource angle, decrease waste/ inefficiency while
improving service delivery to citizens
— Problems are old but accentuated today by population
growth and reducing resources
— Open Data, effectiveness of analytical methods hold
promise
— Challenges
¡ Provide value quickly
¡ Use value synergies from different domains (e.g., finance, health,
environment, traffic, corruption …)
¡ Grow to scale
46Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
47. Common Descriptive Analytics Patterns,
Accelerated with Open Data
— Correlation of outcomes, across
¡ Data sources in same domain
¡ Different domains
— Return of investment analysis
¡ Money invested v/s Metrics to measure improvement in
domain
¡ Comparison of performance with history
¡ Comparison of performance with other regions
47Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
48. Employing All Data – Data Fusion
— Open Data is one source
¡ Often easiest to get but with issues (e.g., at aggregate level, with gaps,
imprecise semantics)
— Social is another promising data
¡ People are anyway generating it (People-as-sensors)
¡ However, social sites have varying data reuse permissions,
license costs, access limits
¡ Big data techniques already being used here
— Use sensor data if available
¡ Internet of Things (IoT) and big data techniques are relevant
¡ Most prevalent in health, environment and transportation
— Key is to release the fused data also for reuse
48Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
49. Building Community for Innovations
— Multi-disciplinary
¡ In AI
¡ In Computer Science
¡ In science: domain (finance, health, transport, …), techniques (CS,
engg.) and evaluation (public policy, …)
— Multi-stakeholder
¡ Citizens
¡ Government
¡ Academia
¡ Business/ Industry
¡ Non-profits, …
— Getting to scale is key
49Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015
50. Main Messages
— Financial Innovations are key for a developing country like
India to provide better opportunities to its citizens
¡ Impacts not only finance (Banking, Insurance, …)
¡ But all other areas of a society (Healthcare, Transportation, Industry)
— Innovations depend on data, analysis and timely access
— Open data is often the most promising source to start
making quick impact
— Eventual aim should be to scale innovations with other data
sources and reach production scale to people seamlessly
50Talk at IDRBT Doctoral Consortium, Hyderabad 11 Dec 2015