SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Digital Enterprise Research Institute                                                            www.deri.ie




                  Challenges Ahead For Converging Financial
                                    Data
                                                Edward Curry1, Andreas Harth2, Sean O’Riain1

                                                DERI, NUI Galway, Ireland1
                 2   Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB),
                                        Karlsruher Institut für Technologie (KIT)




W3C Workshop on Improving Access to
Financial Data on the Web

October 2009, Arlington, Virginia USA
© Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Agenda
Digital Enterprise Research Institute                          www.deri.ie




       n    Motivation - Financial Data Ecosystem
              ¨    Data Providers
              ¨    Data Formats
              ¨    Data Consumers
       n    Converging Financial Data from Multiple Sources
              ¨    Entity Centric Approach
              ¨    Architecture
              ¨    Identity Mismatch
              ¨    Data Query
       n    Data Integration Challenges
       n    Recommendations
Financial Data Ecosystem
Digital Enterprise Research Institute                www.deri.ie


                    Information             Information
                    Providers               Consumers




                                        ?
Financial Information Providers
Digital Enterprise Research Institute                          www.deri.ie



      n    Individuals: e.g. CEOs reporting equity sale
      n    Companies: e.g. 10-K filing
      n    NGOs: e.g. sector-wide lobbying groups
      n    Government: e.g. regulators, central banks, statistics
            offices
      n    Worldwide organisations: UN, OECD
      n    Academics: various economists, public policy
      n    ...

      n    Publicly available datasets, purchased datasets or in-
            house sources
Various Data Formats
Digital Enterprise Research Institute                                         www.deri.ie



      n    Unstructed Text
             ¨    News articles, press releases, raw transcripts of investor calls
      n    Hypertext
             ¨    Coporate websites, goverment websites, ...
      n    Spreadsheets, et al.
             ¨    CSV files, word docs, pdf, powerpoint, ...
      n    Strucutred Data
             ¨    XML, XBRL, CSV, SDMX, ...
      n    Graph Structured Data in RDF
             ¨    DBPedia, CrunchBase, RSS-CB, ...
Financial Information Consumers
Digital Enterprise Research Institute                                        www.deri.ie



      n    Competitive Analysis
             ¨    Mash-up of financial figures and analyst commentary for
                   decision support

      n    Regulatory Compliance
             ¨    Forensic Economics
             ¨    Spotting patterns or conditions that support fraud or money
                   laundering

      n    Investment Analysis
             ¨    Individual/Institutional investors
             ¨    Transparent fund comparisons
             ¨    Evaluate potential fund return
Goal
Digital Enterprise Research Institute                       www.deri.ie



      n    Integrate data for:
             ¨    Central access
             ¨    Cross document analysis


      n    Our group works in data integration and have applied
            our approach to pilots in the financial services
            industry

      n    Report on experiences and lessons learned
Converging Financial Data from Multiple
         Sources
Digital Enterprise Research Institute                                        www.deri.ie



   n    Provide common data platform for search, browsing,
         analysis, and interactive visualisations across sources

   n    Entity centric approach
           ¨    Single data view allowing information filtering and cross
                 analysis
           ¨    Consolidate data into coherent graph 'mashed up' from
                 potentially thousands of sources

   n    Key challenge is semantic integration of structured
         and unstructured data from the open Web and internal
         corporate data sources
Converging Financial Data from Multiple
         Sources
Digital Enterprise Research Institute                                        www.deri.ie



   n    Large graph of RDF entities

   n    Entities typed according to what they describe
           ¨    People, locations, organizations, publications as well as
                 documents
           ¨    Inter-relations and structured descriptions of entities

   n    Entities have specified relations to other entities
           ¨    People can work for companies, people know other people,
                 people author documents, organisations are based in
                 locations, and so on
Data Integration Approach
Digital Enterprise Research Institute                            www.deri.ie



      n    Lifting data sources to common format, in our case
            RDF (Resource Description Format)

      n    Integrating the disparate datasets into a holistic
            dataset by aligning entities and concepts

      n    Run domain/task specific analysis algorithms on
            integrated data

      n    Interactive browsing and exploration of integrated
            data or results of algorithmic analysis
Data Integration Approach
Digital Enterprise Research Institute   www.deri.ie
Architecture
Digital Enterprise Research Institute   www.deri.ie
Identity Mismatch
Digital Enterprise Research Institute                                      www.deri.ie


  n    Need to connect sources that may describe the same
        data on a particular entity

  n    Case studies analyzing connections between people
        and organizations
         ¨    SEC filings (Form 4) identified 69K people connected to 80K
               organizations
         ¨    Same analysis on database describing companies produced
               122K people connected to 140K organizations
         ¨    Data needed to be enrich and interlinked using entity
               consolidation (a.k.a. object consolidation) to avoid having the
               knowledge split over numerous instances
         ¨    Ontology-based disambiguation
Data Query
Digital Enterprise Research Institute                                   www.deri.ie



  n    SPARQL, the semantic query language allows queries/
        questions to be asked:

         ¨    What do the companies ‘Microsoft’ and ‘IBM’ have in common?


         ¨    What competitors of ‘HP’ are in ‘Arlington’?


         ¨    What’s the relationship between ‘Microsoft’ and ‘IBM’?
Data Integration Challenges
Digital Enterprise Research Institute                                    www.deri.ie



  n    Text/Data Mismatch
         ¨    Human language often ambiguous
         ¨    Same company might be referred to in several variations
               (e.g. IBM, International Business Machines, Big Blue)
         ¨    Ambiguity makes cross-linking with structured data difficult
  n    Object Identity and Separate Schema
         ¨    Sources differ in how they state the same fact
         ¨    Differences on level of individual objects and schema
         ¨    SEC Central Index Key (CIK) to identify people (CEOs, CFOs),
               companies, and financial instruments
         ¨    DBpedia use URIs to identify same entities
         ¨    Methods have to be in place for reconciling different
               representations of objects and schema
Data Integration Challenges
Digital Enterprise Research Institute                                          www.deri.ie



      n    Abstraction Levels (Data Context)
             ¨    Financial data sources provide data at incompatible levels of
                   abstraction
             ¨    Classify data in taxonomies pertinent to a certain sector
             ¨    Differences in legislation on book-keeping (e.g. Indicators
                   from Euro regulators may not be directly comparable with
                   indicators from US-based regulators)
             ¨    Differences in geographic aggregation (e.g. region data from
                   one source and country-level data from another, IBM Ireland
                   Ltd, IBM Europe, IBM Global,…)
Data Integration Challenges
Digital Enterprise Research Institute                                        www.deri.ie



      n    Data Quality
             ¨    General challenge integrating data from multiple sources
             ¨    Errors in signage, amounts, labelling, and classification can
                   seriously impede utility of systems operating on such data
             ¨    Combining erroneous data aggravates the problem
             ¨    Within open environment data aggregator has little or no
                   influence on the data publisher
             ¨    Challenge for data publishers/consumers to coordinate to fix
                   problems in data or blacklist sites providing unreliable data
Recommendations
Digital Enterprise Research Institute                                         www.deri.ie


      n    Agree approach to the specification and use of
            common identifiers or at least their mappings

      n    Adhering to common publishing method reduces
            integration effort and facilitates data reuse
             ¨    Linked Data principles


      n    Convergence between data providers requires
            coordination and time
             ¨    No need for “Big Bang” integration
             ¨    Follow a pay-as-you-go iterative approach to integration
Digital Enterprise Research Institute                                         www.deri.ie




                                   Thank you for listening

            E. Curry, A. Harth, and S. O’Riain, “Challenges Ahead for Converging
            Financial Data,” in Proceedings of the XBRL/W3C Workshop on
            Improving Access to Financial Data on the Web, 2009.

Weitere ähnliche Inhalte

Was ist angesagt?

Using Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy ManagementUsing Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy ManagementEdward Curry
 
Developing an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's JourneyDeveloping an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's JourneyEdward Curry
 
Dealing with Semantic Heterogeneity in Real-Time Information
Dealing with Semantic Heterogeneity in Real-Time InformationDealing with Semantic Heterogeneity in Real-Time Information
Dealing with Semantic Heterogeneity in Real-Time InformationEdward Curry
 
Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Edward Curry
 
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesCrowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesEdward Curry
 
Towards a BIG Data Public Private Partnership
Towards a BIG Data Public Private PartnershipTowards a BIG Data Public Private Partnership
Towards a BIG Data Public Private PartnershipEdward Curry
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebEdward Curry
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEdward Curry
 
The Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for EuropeThe Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for EuropeEdward Curry
 
Linked Water Data For Water Information Management
Linked Water Data For Water Information ManagementLinked Water Data For Water Information Management
Linked Water Data For Water Information ManagementEdward Curry
 
Towards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing SystemsTowards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing SystemsEdward Curry
 
Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...Edward Curry
 
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsSustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsEdward Curry
 
Collaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage DataCollaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage DataEdward Curry
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Edward Curry
 
Key Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeKey Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeEdward Curry
 
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingSLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingEdward Curry
 
Interactive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics ApproachInteractive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics ApproachEdward Curry
 
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Edward Curry
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...Edward Curry
 

Was ist angesagt? (20)

Using Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy ManagementUsing Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy Management
 
Developing an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's JourneyDeveloping an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's Journey
 
Dealing with Semantic Heterogeneity in Real-Time Information
Dealing with Semantic Heterogeneity in Real-Time InformationDealing with Semantic Heterogeneity in Real-Time Information
Dealing with Semantic Heterogeneity in Real-Time Information
 
Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013
 
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesCrowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
 
Towards a BIG Data Public Private Partnership
Towards a BIG Data Public Private PartnershipTowards a BIG Data Public Private Partnership
Towards a BIG Data Public Private Partnership
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data Web
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
 
The Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for EuropeThe Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for Europe
 
Linked Water Data For Water Information Management
Linked Water Data For Water Information ManagementLinked Water Data For Water Information Management
Linked Water Data For Water Information Management
 
Towards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing SystemsTowards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing Systems
 
Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...
 
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsSustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
 
Collaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage DataCollaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage Data
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
 
Key Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeKey Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in Europe
 
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingSLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
 
Interactive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics ApproachInteractive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics Approach
 
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
 

Andere mochten auch

A Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTA Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTEdward Curry
 
The Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesThe Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesEdward Curry
 
Influenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizajeInfluenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizajeInstituto Familia y Adopción
 
Big Data Analytics: A New Business Opportunity
Big Data Analytics: A New Business OpportunityBig Data Analytics: A New Business Opportunity
Big Data Analytics: A New Business OpportunityEdward Curry
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
 
Crowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementCrowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementEdward Curry
 
Open Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and TrendsOpen Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and TrendsEdward Curry
 

Andere mochten auch (7)

A Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTA Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICT
 
The Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesThe Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for Enterprises
 
Influenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizajeInfluenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizaje
 
Big Data Analytics: A New Business Opportunity
Big Data Analytics: A New Business OpportunityBig Data Analytics: A New Business Opportunity
Big Data Analytics: A New Business Opportunity
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
 
Crowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementCrowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data Management
 
Open Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and TrendsOpen Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and Trends
 

Ähnlich wie Challenges Ahead for Converging Financial Data

The open semantic enterprise enterprise data meets web data
The open semantic enterprise   enterprise data meets web dataThe open semantic enterprise   enterprise data meets web data
The open semantic enterprise enterprise data meets web dataGeorg Guentner
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsRichard Cyganiak
 
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Alexandre Passant
 
Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Bianca Pereira
 
Introduction to Open Data
Introduction to Open DataIntroduction to Open Data
Introduction to Open DataDerilinx
 
Innovation Ecosystems at EBRF 2010, Nokia, Finland
Innovation Ecosystems at EBRF 2010, Nokia, FinlandInnovation Ecosystems at EBRF 2010, Nokia, Finland
Innovation Ecosystems at EBRF 2010, Nokia, FinlandJukka Huhtamäki
 
i4Trust - Overview
i4Trust - Overviewi4Trust - Overview
i4Trust - OverviewFIWARE
 
i4Trust - Overview
i4Trust - Overviewi4Trust - Overview
i4Trust - OverviewFIWARE
 
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...Srini Bezwada
 
1. The Importance of Graphs in Government
1. The Importance of Graphs in Government1. The Importance of Graphs in Government
1. The Importance of Graphs in GovernmentNeo4j
 
Envisioning a discussion dashboard for collective intelligence of web convers...
Envisioning a discussion dashboard for collective intelligence of web convers...Envisioning a discussion dashboard for collective intelligence of web convers...
Envisioning a discussion dashboard for collective intelligence of web convers...jodischneider
 
Hello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic DeveloperHello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic DeveloperAlexandre Passant
 
How to Monetize Your Data Assets and Gain a Competitive Advantage
How to Monetize Your Data Assets and Gain a Competitive AdvantageHow to Monetize Your Data Assets and Gain a Competitive Advantage
How to Monetize Your Data Assets and Gain a Competitive AdvantageCCG
 
Why CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital masteryWhy CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital masteryCoert Du Plessis (杜康)
 
Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...Alexandre Passant
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 

Ähnlich wie Challenges Ahead for Converging Financial Data (20)

Innovation ecosystems value co-creation
Innovation ecosystems value co-creationInnovation ecosystems value co-creation
Innovation ecosystems value co-creation
 
The open semantic enterprise enterprise data meets web data
The open semantic enterprise   enterprise data meets web dataThe open semantic enterprise   enterprise data meets web data
The open semantic enterprise enterprise data meets web data
 
What is SDMX-RDF?
What is SDMX-RDF?What is SDMX-RDF?
What is SDMX-RDF?
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
 
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Hello Open World - Semtech 2009
Hello Open World - Semtech 2009
 
Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)
 
Introduction to Open Data
Introduction to Open DataIntroduction to Open Data
Introduction to Open Data
 
Innovation Ecosystems at EBRF 2010, Nokia, Finland
Innovation Ecosystems at EBRF 2010, Nokia, FinlandInnovation Ecosystems at EBRF 2010, Nokia, Finland
Innovation Ecosystems at EBRF 2010, Nokia, Finland
 
How to Publish Open Data
How to Publish Open DataHow to Publish Open Data
How to Publish Open Data
 
i4Trust - Overview
i4Trust - Overviewi4Trust - Overview
i4Trust - Overview
 
i4Trust - Overview
i4Trust - Overviewi4Trust - Overview
i4Trust - Overview
 
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
 
DERI Overview March 2009
DERI Overview March 2009DERI Overview March 2009
DERI Overview March 2009
 
1. The Importance of Graphs in Government
1. The Importance of Graphs in Government1. The Importance of Graphs in Government
1. The Importance of Graphs in Government
 
Envisioning a discussion dashboard for collective intelligence of web convers...
Envisioning a discussion dashboard for collective intelligence of web convers...Envisioning a discussion dashboard for collective intelligence of web convers...
Envisioning a discussion dashboard for collective intelligence of web convers...
 
Hello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic DeveloperHello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic Developer
 
How to Monetize Your Data Assets and Gain a Competitive Advantage
How to Monetize Your Data Assets and Gain a Competitive AdvantageHow to Monetize Your Data Assets and Gain a Competitive Advantage
How to Monetize Your Data Assets and Gain a Competitive Advantage
 
Why CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital masteryWhy CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital mastery
 
Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Kürzlich hochgeladen (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Challenges Ahead for Converging Financial Data

  • 1. Digital Enterprise Research Institute www.deri.ie Challenges Ahead For Converging Financial Data Edward Curry1, Andreas Harth2, Sean O’Riain1 DERI, NUI Galway, Ireland1 2 Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB), Karlsruher Institut für Technologie (KIT) W3C Workshop on Improving Access to Financial Data on the Web October 2009, Arlington, Virginia USA © Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
  • 2. Agenda Digital Enterprise Research Institute www.deri.ie n  Motivation - Financial Data Ecosystem ¨  Data Providers ¨  Data Formats ¨  Data Consumers n  Converging Financial Data from Multiple Sources ¨  Entity Centric Approach ¨  Architecture ¨  Identity Mismatch ¨  Data Query n  Data Integration Challenges n  Recommendations
  • 3. Financial Data Ecosystem Digital Enterprise Research Institute www.deri.ie Information Information Providers Consumers ?
  • 4. Financial Information Providers Digital Enterprise Research Institute www.deri.ie n  Individuals: e.g. CEOs reporting equity sale n  Companies: e.g. 10-K filing n  NGOs: e.g. sector-wide lobbying groups n  Government: e.g. regulators, central banks, statistics offices n  Worldwide organisations: UN, OECD n  Academics: various economists, public policy n  ... n  Publicly available datasets, purchased datasets or in- house sources
  • 5. Various Data Formats Digital Enterprise Research Institute www.deri.ie n  Unstructed Text ¨  News articles, press releases, raw transcripts of investor calls n  Hypertext ¨  Coporate websites, goverment websites, ... n  Spreadsheets, et al. ¨  CSV files, word docs, pdf, powerpoint, ... n  Strucutred Data ¨  XML, XBRL, CSV, SDMX, ... n  Graph Structured Data in RDF ¨  DBPedia, CrunchBase, RSS-CB, ...
  • 6. Financial Information Consumers Digital Enterprise Research Institute www.deri.ie n  Competitive Analysis ¨  Mash-up of financial figures and analyst commentary for decision support n  Regulatory Compliance ¨  Forensic Economics ¨  Spotting patterns or conditions that support fraud or money laundering n  Investment Analysis ¨  Individual/Institutional investors ¨  Transparent fund comparisons ¨  Evaluate potential fund return
  • 7. Goal Digital Enterprise Research Institute www.deri.ie n  Integrate data for: ¨  Central access ¨  Cross document analysis n  Our group works in data integration and have applied our approach to pilots in the financial services industry n  Report on experiences and lessons learned
  • 8. Converging Financial Data from Multiple Sources Digital Enterprise Research Institute www.deri.ie n  Provide common data platform for search, browsing, analysis, and interactive visualisations across sources n  Entity centric approach ¨  Single data view allowing information filtering and cross analysis ¨  Consolidate data into coherent graph 'mashed up' from potentially thousands of sources n  Key challenge is semantic integration of structured and unstructured data from the open Web and internal corporate data sources
  • 9. Converging Financial Data from Multiple Sources Digital Enterprise Research Institute www.deri.ie n  Large graph of RDF entities n  Entities typed according to what they describe ¨  People, locations, organizations, publications as well as documents ¨  Inter-relations and structured descriptions of entities n  Entities have specified relations to other entities ¨  People can work for companies, people know other people, people author documents, organisations are based in locations, and so on
  • 10. Data Integration Approach Digital Enterprise Research Institute www.deri.ie n  Lifting data sources to common format, in our case RDF (Resource Description Format) n  Integrating the disparate datasets into a holistic dataset by aligning entities and concepts n  Run domain/task specific analysis algorithms on integrated data n  Interactive browsing and exploration of integrated data or results of algorithmic analysis
  • 11. Data Integration Approach Digital Enterprise Research Institute www.deri.ie
  • 13. Identity Mismatch Digital Enterprise Research Institute www.deri.ie n  Need to connect sources that may describe the same data on a particular entity n  Case studies analyzing connections between people and organizations ¨  SEC filings (Form 4) identified 69K people connected to 80K organizations ¨  Same analysis on database describing companies produced 122K people connected to 140K organizations ¨  Data needed to be enrich and interlinked using entity consolidation (a.k.a. object consolidation) to avoid having the knowledge split over numerous instances ¨  Ontology-based disambiguation
  • 14. Data Query Digital Enterprise Research Institute www.deri.ie n  SPARQL, the semantic query language allows queries/ questions to be asked: ¨  What do the companies ‘Microsoft’ and ‘IBM’ have in common? ¨  What competitors of ‘HP’ are in ‘Arlington’? ¨  What’s the relationship between ‘Microsoft’ and ‘IBM’?
  • 15. Data Integration Challenges Digital Enterprise Research Institute www.deri.ie n  Text/Data Mismatch ¨  Human language often ambiguous ¨  Same company might be referred to in several variations (e.g. IBM, International Business Machines, Big Blue) ¨  Ambiguity makes cross-linking with structured data difficult n  Object Identity and Separate Schema ¨  Sources differ in how they state the same fact ¨  Differences on level of individual objects and schema ¨  SEC Central Index Key (CIK) to identify people (CEOs, CFOs), companies, and financial instruments ¨  DBpedia use URIs to identify same entities ¨  Methods have to be in place for reconciling different representations of objects and schema
  • 16. Data Integration Challenges Digital Enterprise Research Institute www.deri.ie n  Abstraction Levels (Data Context) ¨  Financial data sources provide data at incompatible levels of abstraction ¨  Classify data in taxonomies pertinent to a certain sector ¨  Differences in legislation on book-keeping (e.g. Indicators from Euro regulators may not be directly comparable with indicators from US-based regulators) ¨  Differences in geographic aggregation (e.g. region data from one source and country-level data from another, IBM Ireland Ltd, IBM Europe, IBM Global,…)
  • 17. Data Integration Challenges Digital Enterprise Research Institute www.deri.ie n  Data Quality ¨  General challenge integrating data from multiple sources ¨  Errors in signage, amounts, labelling, and classification can seriously impede utility of systems operating on such data ¨  Combining erroneous data aggravates the problem ¨  Within open environment data aggregator has little or no influence on the data publisher ¨  Challenge for data publishers/consumers to coordinate to fix problems in data or blacklist sites providing unreliable data
  • 18. Recommendations Digital Enterprise Research Institute www.deri.ie n  Agree approach to the specification and use of common identifiers or at least their mappings n  Adhering to common publishing method reduces integration effort and facilitates data reuse ¨  Linked Data principles n  Convergence between data providers requires coordination and time ¨  No need for “Big Bang” integration ¨  Follow a pay-as-you-go iterative approach to integration
  • 19. Digital Enterprise Research Institute www.deri.ie Thank you for listening E. Curry, A. Harth, and S. O’Riain, “Challenges Ahead for Converging Financial Data,” in Proceedings of the XBRL/W3C Workshop on Improving Access to Financial Data on the Web, 2009.