SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
#datapointlive
The Human Algorithm:
Automating Startup Data Collection
at Mattermark
Sarah Catanzaro,
Head of Data at Mattermark
@sarahcat21
#DPL15 | @sarahcat21
Mattermark is a deal intelligence platform and
private company database used by
●
investors
●
business and corporate development
●
sales
Mattermark
#DPL15 | @sarahcat21
THE CHALLENGE
Scale + Information Overload +
Stealth
#DPL15 | @sarahcat21
Scale
Over 125 million private companies in the world
(only about 45.5 thousand public).
#DPL15 | @sarahcat21
Information overload
#DPL15 | @sarahcat21
Stealth
●
Private companies do not have strong incentives
(e.g. legal obligations) to share data. Many may
have competitive incentives to obfuscate
information.
●
Investors may request non-disclosure.
#DPL15 | @sarahcat21
Mattermark’s Solution
#DPL15 | @sarahcat21
Software-oriented approach
●
A must, due to the scale of our dataset
○ 1.3 million companies
○ 16.5k investors
○ 110k funding events
●
Leverage a lean data team
#DPL15 | @sarahcat21
Data collection strategy
●
Web scraping
●
Machine learning
●
Direct submission
●
Manual data entry
#DPL15 | @sarahcat21
The “Human Algorithm”
#DPL15 | @sarahcat21
Investors ask questions like
What start-ups
might raise capital
in the next 6
months? What startups is
Stephanie Palmeri
investing in?
#DPL15 | @sarahcat21
Our data analysts seek to understand:
●
Why does this question matter?
●
What data is required to answer this question?
●
Where can this data be accessed?
#DPL15 | @sarahcat21
Next, data analysts:
1.
Define repeatable processes for data collection.
2.
Determine whether processes can be replicated
through web scraping and/or machine learning
algorithms to collect data at scale.
3.
Write functional specifications, reviewed by
sales and engineering team members.
#DPL15 | @sarahcat21
Next, web and/or machine learning
engineers
1.
Write dev designs, reviewed by data analysts.
2.
Upon implementation and marketing release,
this data becomes available to customers.
3.
New questions arise and the cycle starts again.
#DPL15 | @sarahcat21
Funding Automation
#DPL15 | @sarahcat21
Investors ask questions like
How much funding
has a company
already raised?
Who were the
investors at each of
those rounds?
#DPL15 | @sarahcat21
Problems with existing sources
Rely on wiki-style data collection (cannot confirm
the credibility of sources)
News reports are better; but
●
facts are harder to extricate
●
different sources report different figures
#DPL15 | @sarahcat21
Solution: funding automation
A new framework for collecting and synthesizing
funding data.
1.
News article fact extraction (machine learning)
2.
Funding override system (web engineering)
3.
Funding confirmation email campaign
(marketing)
#DPL15 | @sarahcat21
2. News article fact extraction
Crawl RSS feeds, extract
data from stories (title,
texts, links, etc.)
● 750+ sources
● 5,000 - 10,000 articles
#DPL15 | @sarahcat21
2. News article fact extraction
Classify stories
about funding
● 250 articles/day
#DPL15 | @sarahcat21
2. News article fact extraction
●
Identify sentences containing information about
investors, amount, and/or series
#DPL15 | @sarahcat21
2. News article fact extraction
● Extract facts
● Match companies and
investors to entities in our
database
○ 30% of extracted articles
are entered automatically
#DPL15 | @sarahcat21
1. Funding override system
●
Identify reports about the same funding event
●
Combine information from multiple reports using wongi rules engine
#DPL15 | @sarahcat21
3. Funding confirmation email
campaign
Use CRM and Hubspot
to automatically send
emails to founders
after equity financing.
#DPL15 | @sarahcat21
What We Learned
#DPL15 | @sarahcat21
Where we struggled
Our initial implementation of a funding override
system was inefficient. Why?
Because our data analysts and developers were
not aligned on functional requirements.
#DPL15 | @sarahcat21
Solution
●
Analysts must work closely with developers
○ Pre-spec check-ins
○ Analysts review dev designs to ensure that
the system design addresses the use case.
●
Analysts must avoid being prescriptive
●
Analysts must understand data mining and
machine learning concepts
#DPL15 | @sarahcat21
Where we succeeded
Implementation of news article fact extraction
was successful. Why?
Because data analysts and developers worked as
service providers to each other.
#DPL15 | @sarahcat21
How We Did It
#DPL15 | @sarahcat21
1. Tighter Analyst + Dev Communication
Tiger teams: 1 ML developer, 1 web/infrastructure
developer, 1 data analyst, 1 project lead
Define milestones & hold daily stand-ups.
#DPL15 | @sarahcat21
3. Track II interaction reinforce symbiotic
relationship
●
Devs lead Python learning group
●
Data analysts hold seminars on topics like admin
tooling and alternative assets
#DPL15 | @sarahcat21
Thank You!

Weitere ähnliche Inhalte

Was ist angesagt?

Mattermark 1st Series A Deck
Mattermark 1st Series A DeckMattermark 1st Series A Deck
Mattermark 1st Series A DeckDanielle Morrill
 
Linio IR Deck - May 2014
Linio IR Deck - May 2014Linio IR Deck - May 2014
Linio IR Deck - May 2014SYGroup
 
Marko Savic - MarTech and the buyer journey
Marko Savic - MarTech and the buyer journeyMarko Savic - MarTech and the buyer journey
Marko Savic - MarTech and the buyer journeyFunnelCake
 
Bookkeeping executives mailing database
Bookkeeping executives mailing databaseBookkeeping executives mailing database
Bookkeeping executives mailing databaseGlobal B2B Contacts
 
Making Your Site Vendor Agnostic via a Modern Data Layer
Making Your Site Vendor Agnostic via a Modern Data LayerMaking Your Site Vendor Agnostic via a Modern Data Layer
Making Your Site Vendor Agnostic via a Modern Data LayerEnsighten
 
10 Analytics Dashboards To Monitor Your Business
10 Analytics Dashboards To Monitor Your Business10 Analytics Dashboards To Monitor Your Business
10 Analytics Dashboards To Monitor Your BusinessBeeckon
 
Steve Lok - SUPERNOVA: Centralised Data Platforms (CDPs) blow sh*t up at The...
Steve Lok - SUPERNOVA:  Centralised Data Platforms (CDPs) blow sh*t up at The...Steve Lok - SUPERNOVA:  Centralised Data Platforms (CDPs) blow sh*t up at The...
Steve Lok - SUPERNOVA: Centralised Data Platforms (CDPs) blow sh*t up at The...Martech Alliance
 
Scott Brinker - Navigating the Marketing Technology landscape
Scott Brinker - Navigating the Marketing Technology landscapeScott Brinker - Navigating the Marketing Technology landscape
Scott Brinker - Navigating the Marketing Technology landscapeAvaus
 
InnerTrends - Batch 25 Demo Day
InnerTrends - Batch 25 Demo DayInnerTrends - Batch 25 Demo Day
InnerTrends - Batch 25 Demo Day500 Startups
 
Return on Content: Data Driven Insights for Publishers eMetrics Toronto 14
Return on Content: Data Driven Insights for Publishers eMetrics Toronto 14Return on Content: Data Driven Insights for Publishers eMetrics Toronto 14
Return on Content: Data Driven Insights for Publishers eMetrics Toronto 14Charlene Dipaola
 
Bookkeeping executives mailing list
Bookkeeping executives mailing listBookkeeping executives mailing list
Bookkeeping executives mailing listGlobal B2B Contacts
 
Fix, don't stitch: be a steward of your marketing data
Fix, don't stitch: be a steward of your marketing dataFix, don't stitch: be a steward of your marketing data
Fix, don't stitch: be a steward of your marketing dataMAD//Fest London
 
"The “Dos & Don’ts” of Building Winning SaaS Companies with G2 Crowd
"The “Dos & Don’ts” of Building Winning SaaS Companies with G2 Crowd"The “Dos & Don’ts” of Building Winning SaaS Companies with G2 Crowd
"The “Dos & Don’ts” of Building Winning SaaS Companies with G2 Crowdsaastr
 
Fixing marketing data: how to achieve success in a data-driven world
Fixing marketing data: how to achieve success in a data-driven worldFixing marketing data: how to achieve success in a data-driven world
Fixing marketing data: how to achieve success in a data-driven worldMAD//Fest London
 
Miscellaneous retail executives mailing database
Miscellaneous retail executives mailing databaseMiscellaneous retail executives mailing database
Miscellaneous retail executives mailing databaseGlobal B2B Contacts
 
Miscellaneous retail executives mailing database
Miscellaneous retail executives mailing databaseMiscellaneous retail executives mailing database
Miscellaneous retail executives mailing databaseGlobal B2B Contacts
 
Aisling McKeod- Talent Development in the Digital Age
Aisling McKeod- Talent Development in the Digital AgeAisling McKeod- Talent Development in the Digital Age
Aisling McKeod- Talent Development in the Digital AgeMartech Alliance
 

Was ist angesagt? (20)

Jacqueline Urick - Advanced Search Summit Napa 2021
Jacqueline Urick - Advanced Search Summit Napa 2021Jacqueline Urick - Advanced Search Summit Napa 2021
Jacqueline Urick - Advanced Search Summit Napa 2021
 
Mattermark 1st Series A Deck
Mattermark 1st Series A DeckMattermark 1st Series A Deck
Mattermark 1st Series A Deck
 
Linio IR Deck - May 2014
Linio IR Deck - May 2014Linio IR Deck - May 2014
Linio IR Deck - May 2014
 
Marko Savic - MarTech and the buyer journey
Marko Savic - MarTech and the buyer journeyMarko Savic - MarTech and the buyer journey
Marko Savic - MarTech and the buyer journey
 
Bookkeeping executives mailing database
Bookkeeping executives mailing databaseBookkeeping executives mailing database
Bookkeeping executives mailing database
 
ITAC Presentation
ITAC PresentationITAC Presentation
ITAC Presentation
 
Making Your Site Vendor Agnostic via a Modern Data Layer
Making Your Site Vendor Agnostic via a Modern Data LayerMaking Your Site Vendor Agnostic via a Modern Data Layer
Making Your Site Vendor Agnostic via a Modern Data Layer
 
10 Analytics Dashboards To Monitor Your Business
10 Analytics Dashboards To Monitor Your Business10 Analytics Dashboards To Monitor Your Business
10 Analytics Dashboards To Monitor Your Business
 
Steve Lok - SUPERNOVA: Centralised Data Platforms (CDPs) blow sh*t up at The...
Steve Lok - SUPERNOVA:  Centralised Data Platforms (CDPs) blow sh*t up at The...Steve Lok - SUPERNOVA:  Centralised Data Platforms (CDPs) blow sh*t up at The...
Steve Lok - SUPERNOVA: Centralised Data Platforms (CDPs) blow sh*t up at The...
 
Scott Brinker - Navigating the Marketing Technology landscape
Scott Brinker - Navigating the Marketing Technology landscapeScott Brinker - Navigating the Marketing Technology landscape
Scott Brinker - Navigating the Marketing Technology landscape
 
InnerTrends - Batch 25 Demo Day
InnerTrends - Batch 25 Demo DayInnerTrends - Batch 25 Demo Day
InnerTrends - Batch 25 Demo Day
 
Return on Content: Data Driven Insights for Publishers eMetrics Toronto 14
Return on Content: Data Driven Insights for Publishers eMetrics Toronto 14Return on Content: Data Driven Insights for Publishers eMetrics Toronto 14
Return on Content: Data Driven Insights for Publishers eMetrics Toronto 14
 
Bookkeeping executives mailing list
Bookkeeping executives mailing listBookkeeping executives mailing list
Bookkeeping executives mailing list
 
Fix, don't stitch: be a steward of your marketing data
Fix, don't stitch: be a steward of your marketing dataFix, don't stitch: be a steward of your marketing data
Fix, don't stitch: be a steward of your marketing data
 
"The “Dos & Don’ts” of Building Winning SaaS Companies with G2 Crowd
"The “Dos & Don’ts” of Building Winning SaaS Companies with G2 Crowd"The “Dos & Don’ts” of Building Winning SaaS Companies with G2 Crowd
"The “Dos & Don’ts” of Building Winning SaaS Companies with G2 Crowd
 
Machinery mailing database
Machinery mailing databaseMachinery mailing database
Machinery mailing database
 
Fixing marketing data: how to achieve success in a data-driven world
Fixing marketing data: how to achieve success in a data-driven worldFixing marketing data: how to achieve success in a data-driven world
Fixing marketing data: how to achieve success in a data-driven world
 
Miscellaneous retail executives mailing database
Miscellaneous retail executives mailing databaseMiscellaneous retail executives mailing database
Miscellaneous retail executives mailing database
 
Miscellaneous retail executives mailing database
Miscellaneous retail executives mailing databaseMiscellaneous retail executives mailing database
Miscellaneous retail executives mailing database
 
Aisling McKeod- Talent Development in the Digital Age
Aisling McKeod- Talent Development in the Digital AgeAisling McKeod- Talent Development in the Digital Age
Aisling McKeod- Talent Development in the Digital Age
 

Andere mochten auch

Hustle Con: Prototyping Mattermark with Danielle Morrill, founder of Mattermark
Hustle Con: Prototyping Mattermark with Danielle Morrill, founder of MattermarkHustle Con: Prototyping Mattermark with Danielle Morrill, founder of Mattermark
Hustle Con: Prototyping Mattermark with Danielle Morrill, founder of MattermarkSam Parr
 
Pandoland 2015: Q1-Q2 State of Startups | Mattermark
Pandoland 2015: Q1-Q2 State of Startups | MattermarkPandoland 2015: Q1-Q2 State of Startups | Mattermark
Pandoland 2015: Q1-Q2 State of Startups | MattermarkMattermark
 
The Value Proposition Canvas 워크샵 강의안
The Value Proposition Canvas 워크샵 강의안The Value Proposition Canvas 워크샵 강의안
The Value Proposition Canvas 워크샵 강의안Jung Soo Kim
 
Customers' Job To Be Done
Customers' Job To Be DoneCustomers' Job To Be Done
Customers' Job To Be DoneINNODYN
 
Designing products against customer jobs
Designing products against customer jobsDesigning products against customer jobs
Designing products against customer jobsMartin Jordan
 
Integrating JTBD into existing tools & frameworks / Jobs-to-be-Done Meetup Be...
Integrating JTBD into existing tools & frameworks / Jobs-to-be-Done Meetup Be...Integrating JTBD into existing tools & frameworks / Jobs-to-be-Done Meetup Be...
Integrating JTBD into existing tools & frameworks / Jobs-to-be-Done Meetup Be...Martin Jordan
 
Customer Experience in digital identification
Customer Experience in digital identificationCustomer Experience in digital identification
Customer Experience in digital identificationPieter Baert
 
Making jobs-to-be-done actionable / Service Design Drinks
Making jobs-to-be-done actionable / Service Design DrinksMaking jobs-to-be-done actionable / Service Design Drinks
Making jobs-to-be-done actionable / Service Design DrinksService Design Berlin
 
[PreMoney SF 2015] CB Insights >> "Venture-nomics: A Quantitative Look At Bub...
[PreMoney SF 2015] CB Insights >> "Venture-nomics: A Quantitative Look At Bub...[PreMoney SF 2015] CB Insights >> "Venture-nomics: A Quantitative Look At Bub...
[PreMoney SF 2015] CB Insights >> "Venture-nomics: A Quantitative Look At Bub...500 Startups
 
How to Create a Strong Value Proposition Design for B2B - It's all about the ...
How to Create a Strong Value Proposition Design for B2B - It's all about the ...How to Create a Strong Value Proposition Design for B2B - It's all about the ...
How to Create a Strong Value Proposition Design for B2B - It's all about the ...Daniel Nilsson
 
Mattermark 2nd (Final) Series A Deck
Mattermark 2nd (Final) Series A DeckMattermark 2nd (Final) Series A Deck
Mattermark 2nd (Final) Series A DeckDanielle Morrill
 
Value Proposition Design
Value Proposition DesignValue Proposition Design
Value Proposition DesignYves Pigneur
 
The State of Sales & Marketing at the 50 Fastest-Growing B2B Companies
The State of Sales & Marketing at the 50 Fastest-Growing B2B CompaniesThe State of Sales & Marketing at the 50 Fastest-Growing B2B Companies
The State of Sales & Marketing at the 50 Fastest-Growing B2B CompaniesMattermark
 

Andere mochten auch (13)

Hustle Con: Prototyping Mattermark with Danielle Morrill, founder of Mattermark
Hustle Con: Prototyping Mattermark with Danielle Morrill, founder of MattermarkHustle Con: Prototyping Mattermark with Danielle Morrill, founder of Mattermark
Hustle Con: Prototyping Mattermark with Danielle Morrill, founder of Mattermark
 
Pandoland 2015: Q1-Q2 State of Startups | Mattermark
Pandoland 2015: Q1-Q2 State of Startups | MattermarkPandoland 2015: Q1-Q2 State of Startups | Mattermark
Pandoland 2015: Q1-Q2 State of Startups | Mattermark
 
The Value Proposition Canvas 워크샵 강의안
The Value Proposition Canvas 워크샵 강의안The Value Proposition Canvas 워크샵 강의안
The Value Proposition Canvas 워크샵 강의안
 
Customers' Job To Be Done
Customers' Job To Be DoneCustomers' Job To Be Done
Customers' Job To Be Done
 
Designing products against customer jobs
Designing products against customer jobsDesigning products against customer jobs
Designing products against customer jobs
 
Integrating JTBD into existing tools & frameworks / Jobs-to-be-Done Meetup Be...
Integrating JTBD into existing tools & frameworks / Jobs-to-be-Done Meetup Be...Integrating JTBD into existing tools & frameworks / Jobs-to-be-Done Meetup Be...
Integrating JTBD into existing tools & frameworks / Jobs-to-be-Done Meetup Be...
 
Customer Experience in digital identification
Customer Experience in digital identificationCustomer Experience in digital identification
Customer Experience in digital identification
 
Making jobs-to-be-done actionable / Service Design Drinks
Making jobs-to-be-done actionable / Service Design DrinksMaking jobs-to-be-done actionable / Service Design Drinks
Making jobs-to-be-done actionable / Service Design Drinks
 
[PreMoney SF 2015] CB Insights >> "Venture-nomics: A Quantitative Look At Bub...
[PreMoney SF 2015] CB Insights >> "Venture-nomics: A Quantitative Look At Bub...[PreMoney SF 2015] CB Insights >> "Venture-nomics: A Quantitative Look At Bub...
[PreMoney SF 2015] CB Insights >> "Venture-nomics: A Quantitative Look At Bub...
 
How to Create a Strong Value Proposition Design for B2B - It's all about the ...
How to Create a Strong Value Proposition Design for B2B - It's all about the ...How to Create a Strong Value Proposition Design for B2B - It's all about the ...
How to Create a Strong Value Proposition Design for B2B - It's all about the ...
 
Mattermark 2nd (Final) Series A Deck
Mattermark 2nd (Final) Series A DeckMattermark 2nd (Final) Series A Deck
Mattermark 2nd (Final) Series A Deck
 
Value Proposition Design
Value Proposition DesignValue Proposition Design
Value Proposition Design
 
The State of Sales & Marketing at the 50 Fastest-Growing B2B Companies
The State of Sales & Marketing at the 50 Fastest-Growing B2B CompaniesThe State of Sales & Marketing at the 50 Fastest-Growing B2B Companies
The State of Sales & Marketing at the 50 Fastest-Growing B2B Companies
 

Ähnlich wie The Human Algorithm: Automating Startup Data Collection at Mattermark

Big Data and Marketing: Data Activation and Management
Big Data and Marketing: Data Activation and ManagementBig Data and Marketing: Data Activation and Management
Big Data and Marketing: Data Activation and ManagementConor Duke
 
Architecting for Analytics, Aaron Crear
Architecting for Analytics, Aaron CrearArchitecting for Analytics, Aaron Crear
Architecting for Analytics, Aaron CrearCzechDreamin
 
From IoT to IoTA
From IoT to IoTAFrom IoT to IoTA
From IoT to IoTAStriim
 
Crawl, Walk, Run: How to Get Started with Hadoop
Crawl, Walk, Run: How to Get Started with HadoopCrawl, Walk, Run: How to Get Started with Hadoop
Crawl, Walk, Run: How to Get Started with HadoopInside Analysis
 
The Bigger Picture: New Opportunities for the Modern Enterprise
The Bigger Picture: New Opportunities for the Modern EnterpriseThe Bigger Picture: New Opportunities for the Modern Enterprise
The Bigger Picture: New Opportunities for the Modern EnterpriseInside Analysis
 
Analytics trends report 2017
Analytics trends report 2017Analytics trends report 2017
Analytics trends report 2017Robert Sibo
 
Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...
Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...
Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...DATAVERSITY
 
The Big Data Ecosystem for Financial Services
The Big Data Ecosystem for Financial ServicesThe Big Data Ecosystem for Financial Services
The Big Data Ecosystem for Financial ServicesDataStax
 
Five Trends in Real Time Applications
Five Trends in Real Time ApplicationsFive Trends in Real Time Applications
Five Trends in Real Time Applicationsconfluent
 
Analytics: What is it really and how can it help my organization?
Analytics: What is it really and how can it help my organization?Analytics: What is it really and how can it help my organization?
Analytics: What is it really and how can it help my organization?SAS Canada
 
How Deloitte Uses AI to Simplify Reporting and Increase Value
How Deloitte Uses AI to Simplify Reporting and Increase ValueHow Deloitte Uses AI to Simplify Reporting and Increase Value
How Deloitte Uses AI to Simplify Reporting and Increase ValueAmazon Web Services
 
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...Databricks
 
Tableau Conference 2014: How One Agency Evolved from Vendor to Strategic Partner
Tableau Conference 2014: How One Agency Evolved from Vendor to Strategic PartnerTableau Conference 2014: How One Agency Evolved from Vendor to Strategic Partner
Tableau Conference 2014: How One Agency Evolved from Vendor to Strategic PartnerSIGMA Marketing Insights
 
Using Web Data for Finance
Using Web Data for FinanceUsing Web Data for Finance
Using Web Data for FinanceScrapinghub
 
Strategic CIOs: What Comes After the Cloud
Strategic CIOs: What Comes After the CloudStrategic CIOs: What Comes After the Cloud
Strategic CIOs: What Comes After the CloudSAP Ariba
 
Big Data & New Media
Big Data & New MediaBig Data & New Media
Big Data & New MediaTara Fusco
 
How to setup Big Data Company in India or data analytics Company
How to setup Big Data Company in India or data analytics  CompanyHow to setup Big Data Company in India or data analytics  Company
How to setup Big Data Company in India or data analytics Companystartupscratch
 
Graph+AI for Fin. Services
Graph+AI for Fin. ServicesGraph+AI for Fin. Services
Graph+AI for Fin. ServicesTigerGraph
 
Pitch Deck Teardown: Scalestack's $1M AI sales tech Seed deck
Pitch Deck Teardown: Scalestack's $1M AI sales tech Seed deckPitch Deck Teardown: Scalestack's $1M AI sales tech Seed deck
Pitch Deck Teardown: Scalestack's $1M AI sales tech Seed deckHajeJanKamps
 
Synthetic VIX Data Generation Using ML Techniques
Synthetic VIX Data Generation Using ML TechniquesSynthetic VIX Data Generation Using ML Techniques
Synthetic VIX Data Generation Using ML TechniquesQuantUniversity
 

Ähnlich wie The Human Algorithm: Automating Startup Data Collection at Mattermark (20)

Big Data and Marketing: Data Activation and Management
Big Data and Marketing: Data Activation and ManagementBig Data and Marketing: Data Activation and Management
Big Data and Marketing: Data Activation and Management
 
Architecting for Analytics, Aaron Crear
Architecting for Analytics, Aaron CrearArchitecting for Analytics, Aaron Crear
Architecting for Analytics, Aaron Crear
 
From IoT to IoTA
From IoT to IoTAFrom IoT to IoTA
From IoT to IoTA
 
Crawl, Walk, Run: How to Get Started with Hadoop
Crawl, Walk, Run: How to Get Started with HadoopCrawl, Walk, Run: How to Get Started with Hadoop
Crawl, Walk, Run: How to Get Started with Hadoop
 
The Bigger Picture: New Opportunities for the Modern Enterprise
The Bigger Picture: New Opportunities for the Modern EnterpriseThe Bigger Picture: New Opportunities for the Modern Enterprise
The Bigger Picture: New Opportunities for the Modern Enterprise
 
Analytics trends report 2017
Analytics trends report 2017Analytics trends report 2017
Analytics trends report 2017
 
Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...
Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...
Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...
 
The Big Data Ecosystem for Financial Services
The Big Data Ecosystem for Financial ServicesThe Big Data Ecosystem for Financial Services
The Big Data Ecosystem for Financial Services
 
Five Trends in Real Time Applications
Five Trends in Real Time ApplicationsFive Trends in Real Time Applications
Five Trends in Real Time Applications
 
Analytics: What is it really and how can it help my organization?
Analytics: What is it really and how can it help my organization?Analytics: What is it really and how can it help my organization?
Analytics: What is it really and how can it help my organization?
 
How Deloitte Uses AI to Simplify Reporting and Increase Value
How Deloitte Uses AI to Simplify Reporting and Increase ValueHow Deloitte Uses AI to Simplify Reporting and Increase Value
How Deloitte Uses AI to Simplify Reporting and Increase Value
 
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
 
Tableau Conference 2014: How One Agency Evolved from Vendor to Strategic Partner
Tableau Conference 2014: How One Agency Evolved from Vendor to Strategic PartnerTableau Conference 2014: How One Agency Evolved from Vendor to Strategic Partner
Tableau Conference 2014: How One Agency Evolved from Vendor to Strategic Partner
 
Using Web Data for Finance
Using Web Data for FinanceUsing Web Data for Finance
Using Web Data for Finance
 
Strategic CIOs: What Comes After the Cloud
Strategic CIOs: What Comes After the CloudStrategic CIOs: What Comes After the Cloud
Strategic CIOs: What Comes After the Cloud
 
Big Data & New Media
Big Data & New MediaBig Data & New Media
Big Data & New Media
 
How to setup Big Data Company in India or data analytics Company
How to setup Big Data Company in India or data analytics  CompanyHow to setup Big Data Company in India or data analytics  Company
How to setup Big Data Company in India or data analytics Company
 
Graph+AI for Fin. Services
Graph+AI for Fin. ServicesGraph+AI for Fin. Services
Graph+AI for Fin. Services
 
Pitch Deck Teardown: Scalestack's $1M AI sales tech Seed deck
Pitch Deck Teardown: Scalestack's $1M AI sales tech Seed deckPitch Deck Teardown: Scalestack's $1M AI sales tech Seed deck
Pitch Deck Teardown: Scalestack's $1M AI sales tech Seed deck
 
Synthetic VIX Data Generation Using ML Techniques
Synthetic VIX Data Generation Using ML TechniquesSynthetic VIX Data Generation Using ML Techniques
Synthetic VIX Data Generation Using ML Techniques
 

Mehr von Janessa Lantz

From Question to Action
From Question to ActionFrom Question to Action
From Question to ActionJanessa Lantz
 
Analyzing Mixpanel Data with SQL
Analyzing Mixpanel Data with SQLAnalyzing Mixpanel Data with SQL
Analyzing Mixpanel Data with SQLJanessa Lantz
 
Optimizing Customer Support
Optimizing Customer SupportOptimizing Customer Support
Optimizing Customer SupportJanessa Lantz
 
Analyzing ROI Using Your Facebook and Adwords Data
Analyzing ROI Using Your Facebook and Adwords DataAnalyzing ROI Using Your Facebook and Adwords Data
Analyzing ROI Using Your Facebook and Adwords DataJanessa Lantz
 
How to Find the Customer Retention Secrets Hiding in Your Data
How to Find the Customer Retention Secrets Hiding in Your DataHow to Find the Customer Retention Secrets Hiding in Your Data
How to Find the Customer Retention Secrets Hiding in Your DataJanessa Lantz
 
How to Use Feedback Surveys to Improve Customer Retention
How to Use Feedback Surveys to Improve Customer RetentionHow to Use Feedback Surveys to Improve Customer Retention
How to Use Feedback Surveys to Improve Customer RetentionJanessa Lantz
 
Shopify and rjmetrics 2.25.16
Shopify and rjmetrics 2.25.16Shopify and rjmetrics 2.25.16
Shopify and rjmetrics 2.25.16Janessa Lantz
 
The Ultimate 30-Minute Guide to SaaS Analytics
The Ultimate 30-Minute Guide to SaaS AnalyticsThe Ultimate 30-Minute Guide to SaaS Analytics
The Ultimate 30-Minute Guide to SaaS AnalyticsJanessa Lantz
 
Using Benchmark Data to Improve Performance
Using Benchmark Data to Improve PerformanceUsing Benchmark Data to Improve Performance
Using Benchmark Data to Improve PerformanceJanessa Lantz
 
Jumpstart Your Momentum
Jumpstart Your MomentumJumpstart Your Momentum
Jumpstart Your MomentumJanessa Lantz
 
How to Build a Data-Driven Company: From Infrastructure to Insights
How to Build a Data-Driven Company: From Infrastructure to InsightsHow to Build a Data-Driven Company: From Infrastructure to Insights
How to Build a Data-Driven Company: From Infrastructure to InsightsJanessa Lantz
 
Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...
Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...
Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...Janessa Lantz
 
How to Analyze Your Marketing Funnel Using Pardot + RJMetrics
How to Analyze Your Marketing Funnel Using Pardot + RJMetricsHow to Analyze Your Marketing Funnel Using Pardot + RJMetrics
How to Analyze Your Marketing Funnel Using Pardot + RJMetricsJanessa Lantz
 
The Insider’s Guide to Increasing Ecommerce Customer Lifetime Value
The Insider’s Guide to Increasing Ecommerce Customer Lifetime ValueThe Insider’s Guide to Increasing Ecommerce Customer Lifetime Value
The Insider’s Guide to Increasing Ecommerce Customer Lifetime ValueJanessa Lantz
 
Two Founders Share How Startups Can Reach a Massive Audience
Two Founders Share How Startups Can Reach a Massive AudienceTwo Founders Share How Startups Can Reach a Massive Audience
Two Founders Share How Startups Can Reach a Massive AudienceJanessa Lantz
 
Evaluating SaaS Startups: The Investor's Perspective
Evaluating SaaS Startups: The Investor's PerspectiveEvaluating SaaS Startups: The Investor's Perspective
Evaluating SaaS Startups: The Investor's PerspectiveJanessa Lantz
 
How to Build a $24 Million Ecommerce Company in 2 Years
How to Build a $24 Million Ecommerce Company in 2 YearsHow to Build a $24 Million Ecommerce Company in 2 Years
How to Build a $24 Million Ecommerce Company in 2 YearsJanessa Lantz
 
How to 2X Your Paid Search ROI Without More Conversions
How to 2X Your Paid Search ROI Without More ConversionsHow to 2X Your Paid Search ROI Without More Conversions
How to 2X Your Paid Search ROI Without More ConversionsJanessa Lantz
 
The Growth Hacking Skill No One's Talking About
The Growth Hacking Skill No One's Talking AboutThe Growth Hacking Skill No One's Talking About
The Growth Hacking Skill No One's Talking AboutJanessa Lantz
 

Mehr von Janessa Lantz (20)

From Question to Action
From Question to ActionFrom Question to Action
From Question to Action
 
Analyzing Mixpanel Data with SQL
Analyzing Mixpanel Data with SQLAnalyzing Mixpanel Data with SQL
Analyzing Mixpanel Data with SQL
 
Optimizing Customer Support
Optimizing Customer SupportOptimizing Customer Support
Optimizing Customer Support
 
Analyzing ROI Using Your Facebook and Adwords Data
Analyzing ROI Using Your Facebook and Adwords DataAnalyzing ROI Using Your Facebook and Adwords Data
Analyzing ROI Using Your Facebook and Adwords Data
 
How to Find the Customer Retention Secrets Hiding in Your Data
How to Find the Customer Retention Secrets Hiding in Your DataHow to Find the Customer Retention Secrets Hiding in Your Data
How to Find the Customer Retention Secrets Hiding in Your Data
 
How to Use Feedback Surveys to Improve Customer Retention
How to Use Feedback Surveys to Improve Customer RetentionHow to Use Feedback Surveys to Improve Customer Retention
How to Use Feedback Surveys to Improve Customer Retention
 
Shopify and rjmetrics 2.25.16
Shopify and rjmetrics 2.25.16Shopify and rjmetrics 2.25.16
Shopify and rjmetrics 2.25.16
 
The Ultimate 30-Minute Guide to SaaS Analytics
The Ultimate 30-Minute Guide to SaaS AnalyticsThe Ultimate 30-Minute Guide to SaaS Analytics
The Ultimate 30-Minute Guide to SaaS Analytics
 
Using Benchmark Data to Improve Performance
Using Benchmark Data to Improve PerformanceUsing Benchmark Data to Improve Performance
Using Benchmark Data to Improve Performance
 
Jumpstart Your Momentum
Jumpstart Your MomentumJumpstart Your Momentum
Jumpstart Your Momentum
 
How to Build a Data-Driven Company: From Infrastructure to Insights
How to Build a Data-Driven Company: From Infrastructure to InsightsHow to Build a Data-Driven Company: From Infrastructure to Insights
How to Build a Data-Driven Company: From Infrastructure to Insights
 
Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...
Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...
Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...
 
Thinking in Data
Thinking in DataThinking in Data
Thinking in Data
 
How to Analyze Your Marketing Funnel Using Pardot + RJMetrics
How to Analyze Your Marketing Funnel Using Pardot + RJMetricsHow to Analyze Your Marketing Funnel Using Pardot + RJMetrics
How to Analyze Your Marketing Funnel Using Pardot + RJMetrics
 
The Insider’s Guide to Increasing Ecommerce Customer Lifetime Value
The Insider’s Guide to Increasing Ecommerce Customer Lifetime ValueThe Insider’s Guide to Increasing Ecommerce Customer Lifetime Value
The Insider’s Guide to Increasing Ecommerce Customer Lifetime Value
 
Two Founders Share How Startups Can Reach a Massive Audience
Two Founders Share How Startups Can Reach a Massive AudienceTwo Founders Share How Startups Can Reach a Massive Audience
Two Founders Share How Startups Can Reach a Massive Audience
 
Evaluating SaaS Startups: The Investor's Perspective
Evaluating SaaS Startups: The Investor's PerspectiveEvaluating SaaS Startups: The Investor's Perspective
Evaluating SaaS Startups: The Investor's Perspective
 
How to Build a $24 Million Ecommerce Company in 2 Years
How to Build a $24 Million Ecommerce Company in 2 YearsHow to Build a $24 Million Ecommerce Company in 2 Years
How to Build a $24 Million Ecommerce Company in 2 Years
 
How to 2X Your Paid Search ROI Without More Conversions
How to 2X Your Paid Search ROI Without More ConversionsHow to 2X Your Paid Search ROI Without More Conversions
How to 2X Your Paid Search ROI Without More Conversions
 
The Growth Hacking Skill No One's Talking About
The Growth Hacking Skill No One's Talking AboutThe Growth Hacking Skill No One's Talking About
The Growth Hacking Skill No One's Talking About
 

Kürzlich hochgeladen

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一F La
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 

Kürzlich hochgeladen (20)

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 

The Human Algorithm: Automating Startup Data Collection at Mattermark

  • 1. #datapointlive The Human Algorithm: Automating Startup Data Collection at Mattermark Sarah Catanzaro, Head of Data at Mattermark @sarahcat21
  • 2. #DPL15 | @sarahcat21 Mattermark is a deal intelligence platform and private company database used by ● investors ● business and corporate development ● sales Mattermark
  • 3. #DPL15 | @sarahcat21 THE CHALLENGE Scale + Information Overload + Stealth
  • 4. #DPL15 | @sarahcat21 Scale Over 125 million private companies in the world (only about 45.5 thousand public).
  • 6. #DPL15 | @sarahcat21 Stealth ● Private companies do not have strong incentives (e.g. legal obligations) to share data. Many may have competitive incentives to obfuscate information. ● Investors may request non-disclosure.
  • 8. #DPL15 | @sarahcat21 Software-oriented approach ● A must, due to the scale of our dataset ○ 1.3 million companies ○ 16.5k investors ○ 110k funding events ● Leverage a lean data team
  • 9. #DPL15 | @sarahcat21 Data collection strategy ● Web scraping ● Machine learning ● Direct submission ● Manual data entry
  • 10. #DPL15 | @sarahcat21 The “Human Algorithm”
  • 11. #DPL15 | @sarahcat21 Investors ask questions like What start-ups might raise capital in the next 6 months? What startups is Stephanie Palmeri investing in?
  • 12. #DPL15 | @sarahcat21 Our data analysts seek to understand: ● Why does this question matter? ● What data is required to answer this question? ● Where can this data be accessed?
  • 13. #DPL15 | @sarahcat21 Next, data analysts: 1. Define repeatable processes for data collection. 2. Determine whether processes can be replicated through web scraping and/or machine learning algorithms to collect data at scale. 3. Write functional specifications, reviewed by sales and engineering team members.
  • 14. #DPL15 | @sarahcat21 Next, web and/or machine learning engineers 1. Write dev designs, reviewed by data analysts. 2. Upon implementation and marketing release, this data becomes available to customers. 3. New questions arise and the cycle starts again.
  • 16. #DPL15 | @sarahcat21 Investors ask questions like How much funding has a company already raised? Who were the investors at each of those rounds?
  • 17. #DPL15 | @sarahcat21 Problems with existing sources Rely on wiki-style data collection (cannot confirm the credibility of sources) News reports are better; but ● facts are harder to extricate ● different sources report different figures
  • 18. #DPL15 | @sarahcat21 Solution: funding automation A new framework for collecting and synthesizing funding data. 1. News article fact extraction (machine learning) 2. Funding override system (web engineering) 3. Funding confirmation email campaign (marketing)
  • 19. #DPL15 | @sarahcat21 2. News article fact extraction Crawl RSS feeds, extract data from stories (title, texts, links, etc.) ● 750+ sources ● 5,000 - 10,000 articles
  • 20. #DPL15 | @sarahcat21 2. News article fact extraction Classify stories about funding ● 250 articles/day
  • 21. #DPL15 | @sarahcat21 2. News article fact extraction ● Identify sentences containing information about investors, amount, and/or series
  • 22. #DPL15 | @sarahcat21 2. News article fact extraction ● Extract facts ● Match companies and investors to entities in our database ○ 30% of extracted articles are entered automatically
  • 23. #DPL15 | @sarahcat21 1. Funding override system ● Identify reports about the same funding event ● Combine information from multiple reports using wongi rules engine
  • 24. #DPL15 | @sarahcat21 3. Funding confirmation email campaign Use CRM and Hubspot to automatically send emails to founders after equity financing.
  • 26. #DPL15 | @sarahcat21 Where we struggled Our initial implementation of a funding override system was inefficient. Why? Because our data analysts and developers were not aligned on functional requirements.
  • 27. #DPL15 | @sarahcat21 Solution ● Analysts must work closely with developers ○ Pre-spec check-ins ○ Analysts review dev designs to ensure that the system design addresses the use case. ● Analysts must avoid being prescriptive ● Analysts must understand data mining and machine learning concepts
  • 28. #DPL15 | @sarahcat21 Where we succeeded Implementation of news article fact extraction was successful. Why? Because data analysts and developers worked as service providers to each other.
  • 30. #DPL15 | @sarahcat21 1. Tighter Analyst + Dev Communication Tiger teams: 1 ML developer, 1 web/infrastructure developer, 1 data analyst, 1 project lead Define milestones & hold daily stand-ups.
  • 31. #DPL15 | @sarahcat21 3. Track II interaction reinforce symbiotic relationship ● Devs lead Python learning group ● Data analysts hold seminars on topics like admin tooling and alternative assets