SlideShare ist ein Scribd-Unternehmen logo
1 von 17
1
Exploring New York Neighborhoods
for the best Italian Restaurants
Using Data Analytics
(The Battle of Neighborhoods)
CHIBUIKE OSIGWE
i
Exploring New York Neighborhoods
for the best Italian Restaurants Using
Data Analytics
(The Battle of Neighborhoods)
CHIBUIKE OSIGWE
ii
Preface
As a part of the IBM Data Science professional program Capstone Project, we
worked on the real datasets to get an experience of what a data scientist goes through
in real life. Main objectives of this project were to define a business problem, look
for data in the web and use Foursquare location data to compare different
neighborhoods of New York to figure out which neighborhood is suitable for starting
a new restaurant business. In this project, we will go through all the process in a step
by step manner from problem designing, data preparation to final analysis and finally
will provide a conclusion that can be leveraged by the business stakeholders to make
their decisions.
iii
Content
Preface....................................................................................................................... ii
Content..................................................................................................................... iii
Introduction................................................................................................................1
1.1 Background.......................................................................................................1
1.2 Problem.............................................................................................................2
1.3 Target Audience................................................................................................3
Data Acquisition and Methodology...........................................................................4
2.1 Data Source.......................................................................................................4
2.2 Methodology.....................................................................................................4
Exploratory Data Analysis.........................................................................................5
3.1 Number of Neighborhoods ...............................................................................5
3.2 Italian Restaurants Per Borough.......................................................................5
3.3 Italian Restaurants Per Neighborhood..............................................................9
Conclusion and Recommendation ...........................................................................12
4.1 Recommendation and Discussion...................................................................12
4.2 Conclusion ......................................................................................................13
1
Introduction
1.1 Background
New York City (NYC), often called the City of New York or simply New
York (NY), is the most populous city in the United States. With an estimated 2018
population of 8,398,748 distributed over about 302.6 square miles (784 km2
), New
York is also the most densely populated major city in the United States.[10]
Located
at the southern tip of the U.S. state of New York, the city is the center of the New
York metropolitan area, the largest metropolitan area in the world by urban
landmass.[11]
With almost 20 million people in its metropolitan statistical area and
approximately 23 million in its combined statistical area, it is one of the world's most
populous megacities. New York City has been described as the cultural, financial,
and media capital of the world, significantly influencing
commerce,[12]
entertainment, research, technology, education, politics, tourism, art,
fashion, and sports. Home to the headquarters of the United Nations,[13]
New York
is an important center for international diplomacy.[14][15]
Situated on one of the world's largest natural harbors, New York City is composed
of five boroughs, each of which is a county of the State of New York.[16]
The five
boroughs–Brooklyn, Queens, Manhattan, the Bronx, and Staten Island–were
consolidated into a single city in 1898.[17]
The city and its metropolitan area
constitute the premier gateway for legal immigration to the United States. As many
as 800 languages are spoken in New York,[18]
making it the
most linguistically diverse city in the world. New York is home to more than
3.2 million residents born outside the United States,[19]
the largest foreign-born
population of any city in the world as of 2016.[20][21]
As of 2019, the New York
2
metropolitan area is estimated to produce a gross metropolitan product (GMP) of
$2.0 trillion. If greater New York City were a sovereign state, it would have the 12th
highest GDP in the world.[22]
New York is home to the highest number of billionaires
of any city in the world.
Figure 1: A Typical Italian Restaurant
1.2 Problem
This final project explores the best locations for Italian restaurants throughout the
city of New York. Food Business News stated that worldwide pasta sales were up
for the second year in a row with the United Sates holding the largest market
(Donley, 2018). New York is a major metropolitan area with more than 8.4 million
(Quick Facts, 2018) people living within city limits. Most of the Italian immigration
3
into the United States occurred during the late 19th and early 20th century with over
two million immigrants between 1900 and 1910. Italian families first settled in Little
Italy’s neighborhood around Mulberry Street as has continued to thrive ever since.
Italy account for the largest black immigrants in the United State, with almost
100,000 Manhattan inhabitants reporting Italian ancestry, the need to find and enjoy
Italian cuisine is on the rise. This report explores which neighborhoods and boroughs
of New York City have the most as well as the best Italian restaurants. Additionally,
I will attempt to answer the questions “Where should I open a Italian Restaurant?”
and “Where should I stay If I want great Italian food?”
1.3 Target Audience
Who will be more interested in this project? What type of clients or a group of people
will benefit?
1. Business personnel who wants to invest or open a Italian restaurant in New
York. This analysis will be a comprehensive guide to start or expand
restaurants targeting the Italian crowd.
2. Freelancers who loves to have their own restaurant as a side business. This
analysis will give an idea, how beneficial it is to open a restaurant and what
are the pros and cons of this business.
3. Italian crowd who wants to find neighborhoods with lots of option for Italian
restaurants.
4. Business Analyst or Data Scientists, who wish to analyze the neighborhoods
of New York using Exploratory Data Analysis and other statistical & machine
learning techniques to obtain all the necessary data, perform some operations
on it and, finally be able to tell a story out of it.
4
Data Acquisition and Methodology
2.1 Data Source
In order to answer the above questions, data on New York City neighborhoods,
boroughs to include boundaries, latitude, longitude, restaurants, and restaurant
ratings and tips are required.
 New York City data containing the neighborhoods and boroughs, latitudes,
and longitudes will be obtained from the data
source: https://cocl.us/new_york_dataset
 New York City data containing neighborhood boundaries will be obtained
from the data source: https://data.cityofnewyork.us/City-
Government/Borough-Boundaries/tqmj-j8zm
 All data related to locations and quality of Italian restaurants will be
obtained via the FourSquare API utilized via the Request library in Python.
2.2 Methodology
Data will be collected from https://cocl.us/new_york_dataset and cleaned and
processed into a data frame. Foursquare be used to locate all venues and then filtered
by Italian restaurants. Ratings, tips, and likes by users will be counted and added to
the data frame. Data will be sorted based on rankings. Finally, the data be will be
visually assessed using graphing from various Python libraries.
5
Exploratory Data Analysis
3.1 Number of Neighborhoods
Foursquare API is very useful online application used my many developers & other
applications like Uber etc. In this project I have used it to retrieve information about
the places present in the neighborhoods of New York. The API returns a JSON file
and we need to turn that into a data-frame. Here I have chosen 100 popular spots for
each neighborhood within a radius of 1km.
From figure 1 below, it can be seen that the Manhattan have the lowest number of
neighborhood while Queens Borough have the highest number. Brooklyn and Staten
Island seem to have seem to be in pair. This shows a little bit of competitive attribute
between the two boroughs.
Using the Folium package, the coordinates of the various neighborhoods bbelonging
to the five boroughs were ascertained after requested. This can be found in Figure
two.
3.2 Italian Restaurants Per Borough
Total number of 233 restaurants were returned from the analysis, each belonging to
a particular borough and neighborhood.
6
Figure 2: Neigbourhood per borough
Figure 3 A Snapshot of the Boroughs and Neighborhood around New York
7
Figure 4: Italian Restuarants Per Borough
From Figure 3 above, it can be deduced that Manhattan have the highest number of
Italian restaurants despite having the least number of neighborhood. They have up
to 100 Italian restaurants in the borough. The Queen borough have the least number
with a total of 20. Additionally, Brooklyn and Staten Island are almost on pair
showing a high competition attribute between the two.
8
Figure 5: A picture of the Neighborhoods and Boroughs showing the total number
of Italian restaurants
Figure 6: Italian Restaurants Per Neighborhood
9
This shows that Manhattan borough accounts fo the highest number of Borough
despite having the smallest number of Neighbourhoods. Figure 4 shows a returned
value showing the total of Italian restaurants.
3.3 Italian Restaurants Per Neighborhood
From Figure 5, it can be deduced that the neighborhood of Belmont have the highest
number of Italian restaurant with over 16 numbers. This is followed by Greenwich
Village, then West Village to Lenox Hill which have the lowest. The range of
numbers of the Italian restaurant is highly skewed, showing that they are all
dispersed throughout the neighbourhoods.
From figure 6, it is evidently shown that Belmont Neighborhood belongs to Bronx
borough. This means that Bronx borough have the highest of restaurant of a
particular neighborhood
10
Figure 7: figure showing Belmont Neighborhood
.
11
Figure 8: Map Showing the restaurant density of the Neighbourhood and Borough
The map shows a high clustered visualization around Manhattan and Lenox Hill,
judging from their locations.
12
Conclusion and Recommendation
4.1 Recommendation and Discussion
Queens and The Bronx have the least amount of Italian restaurants per borough.
However, of note, Belmont of The Bronx is the neighborhood in all of NYC with
the most Italian Restaurants. Despite Manhattan having the least number of
neighborhoods in all five boroughs, it has the most Italian restaurants. Based on this
information, I would state that Manhattan and Queens are the best locations for
Italian cuisine in NYC. To have the best shot of success, I would open an Italian
restaurant in Queens. Queens has multiple neighborhoods and has the least number
of Italian restaurants making competition easier than in other boroughs.
According to this analysis, Queens’s borough will provide the least competition for
the new upcoming Italian restaurant, as there is very little Italian restaurants spread
or no Italian restaurants in few neighborhoods. Also looking at the population
distribution seems like it is densely populated with Italian crowd, which helps the
new restaurant by providing high customer visit possibility. Therefore, definitely
this region could potentially be a perfect place for starting quality Italian restaurants.
Some of the drawbacks of this analysis are — the clustering is completely based
only on data obtained from Foursquare API and the data about the Italian population
distribution in each neighborhood is also based on the 2016 census which is not up-
to date. Thus, there is a huge gap of around 3 years in the population distribution
data. Even Though there are many areas where it can be improved, yet this analysis
has certainly provided us with some good insights, preliminary information on
possibilities & a head start into this business problem by setting the step stones
properly.
13
4.2 Conclusion
Finally, to conclude this project, wwe have got a chance to solve a business problem
like how a real like data scientists would do. We have used many python libraries to
fetch the data, to manipulate the contents & to analyze and visualize those datasets.
We have made use of Foursquare API to explore the venues in neighborhoods of
New York, then get good amount of data from online. We also applied Visualization
technique for insights and used Folium to visualize it on a map.
Some of the drawbacks or areas of improvement shows us that this analysis can be
further improved with the help of more data and easy coding syntax. Similarly we
can use this project to analysis any scenario such as opening a different cuisine
restaurant or opening of a new gym and etc. I hope that this project helps as an initial
guidance to take more complex real-life challenges using data-science.
Find the code for this analysis on github .
Find me on LinkedIn!

Weitere ähnliche Inhalte

Ähnlich wie Exploring New York Neighborhoods for the best Italian Restaurants (The Battle of Neighborhoods)

Finance Ten Chicago Financial Clusters
Finance Ten Chicago Financial ClustersFinance Ten Chicago Financial Clusters
Finance Ten Chicago Financial ClustersMahmoud
 
Port Dickson Essay. Online assignment writing service.
Port Dickson Essay. Online assignment writing service.Port Dickson Essay. Online assignment writing service.
Port Dickson Essay. Online assignment writing service.Inell Campbell
 
Sat Essay Blank Paper
Sat Essay Blank PaperSat Essay Blank Paper
Sat Essay Blank PaperTania Knapp
 
Graebel_CitySynopsis_Chicago_US
Graebel_CitySynopsis_Chicago_USGraebel_CitySynopsis_Chicago_US
Graebel_CitySynopsis_Chicago_USPat Liberati
 
Impact Of Globalization On The Changing Process Of...
Impact Of Globalization On The Changing Process Of...Impact Of Globalization On The Changing Process Of...
Impact Of Globalization On The Changing Process Of...Cathy Baumgardner
 
Essay On Sediment Fingerprinting
Essay On Sediment FingerprintingEssay On Sediment Fingerprinting
Essay On Sediment FingerprintingJill Bell
 
City of San Antonio - Texas Digitization Expo 2010
City of San Antonio - Texas Digitization Expo 2010City of San Antonio - Texas Digitization Expo 2010
City of San Antonio - Texas Digitization Expo 2010Sarah Walch, CA
 
Essay On Apparel Industry. Online assignment writing service.
Essay On Apparel Industry. Online assignment writing service.Essay On Apparel Industry. Online assignment writing service.
Essay On Apparel Industry. Online assignment writing service.Amy Colantuoni
 
2018 LA Tech & Venture Scene | Amplify.LA
2018 LA Tech & Venture Scene | Amplify.LA2018 LA Tech & Venture Scene | Amplify.LA
2018 LA Tech & Venture Scene | Amplify.LAEric Pakravan
 
CONFERENCE PAPER.Explosive Economic Growth in the San Francisco Bay Area has ...
CONFERENCE PAPER.Explosive Economic Growth in the San Francisco Bay Area has ...CONFERENCE PAPER.Explosive Economic Growth in the San Francisco Bay Area has ...
CONFERENCE PAPER.Explosive Economic Growth in the San Francisco Bay Area has ...David Woltering
 
Ibm capstone assignment (part 2)ppt
Ibm capstone assignment (part 2)pptIbm capstone assignment (part 2)ppt
Ibm capstone assignment (part 2)pptArpitVasava1
 
Public commentdraftanalysis6292012
Public commentdraftanalysis6292012Public commentdraftanalysis6292012
Public commentdraftanalysis6292012cookcountyblog
 
E-Gov To We-Gov in Moscow. Best Practices In Open Government.
E-Gov To We-Gov in Moscow. Best Practices In Open Government.E-Gov To We-Gov in Moscow. Best Practices In Open Government.
E-Gov To We-Gov in Moscow. Best Practices In Open Government.The Glover Park Group
 
There is Something Going on in the LA Tech Market by Upfront Ventures
There is Something Going on in the LA Tech Market by Upfront VenturesThere is Something Going on in the LA Tech Market by Upfront Ventures
There is Something Going on in the LA Tech Market by Upfront VenturesMark Suster
 
“The War On Drugs Is A Big Fucking Lie”. In Recent History,
“The War On Drugs Is A Big Fucking Lie”. In Recent History,“The War On Drugs Is A Big Fucking Lie”. In Recent History,
“The War On Drugs Is A Big Fucking Lie”. In Recent History,Jessica Moore
 
Pay For Someone To Write Your Paper
Pay For Someone To Write Your PaperPay For Someone To Write Your Paper
Pay For Someone To Write Your PaperJackie Gold
 
The Internet Essay Examples
The Internet Essay ExamplesThe Internet Essay Examples
The Internet Essay ExamplesMichelle Wilson
 
Spatial Patterns of Urban Innovation and Productivity
Spatial Patterns of Urban Innovation and ProductivitySpatial Patterns of Urban Innovation and Productivity
Spatial Patterns of Urban Innovation and ProductivityRadu Stancut
 

Ähnlich wie Exploring New York Neighborhoods for the best Italian Restaurants (The Battle of Neighborhoods) (20)

Finance Ten Chicago Financial Clusters
Finance Ten Chicago Financial ClustersFinance Ten Chicago Financial Clusters
Finance Ten Chicago Financial Clusters
 
Port Dickson Essay. Online assignment writing service.
Port Dickson Essay. Online assignment writing service.Port Dickson Essay. Online assignment writing service.
Port Dickson Essay. Online assignment writing service.
 
Sat Essay Blank Paper
Sat Essay Blank PaperSat Essay Blank Paper
Sat Essay Blank Paper
 
Graebel_CitySynopsis_Chicago_US
Graebel_CitySynopsis_Chicago_USGraebel_CitySynopsis_Chicago_US
Graebel_CitySynopsis_Chicago_US
 
Impact Of Globalization On The Changing Process Of...
Impact Of Globalization On The Changing Process Of...Impact Of Globalization On The Changing Process Of...
Impact Of Globalization On The Changing Process Of...
 
Essay On Sediment Fingerprinting
Essay On Sediment FingerprintingEssay On Sediment Fingerprinting
Essay On Sediment Fingerprinting
 
City of San Antonio - Texas Digitization Expo 2010
City of San Antonio - Texas Digitization Expo 2010City of San Antonio - Texas Digitization Expo 2010
City of San Antonio - Texas Digitization Expo 2010
 
Essay On Apparel Industry. Online assignment writing service.
Essay On Apparel Industry. Online assignment writing service.Essay On Apparel Industry. Online assignment writing service.
Essay On Apparel Industry. Online assignment writing service.
 
2018 LA Tech & Venture Scene | Amplify.LA
2018 LA Tech & Venture Scene | Amplify.LA2018 LA Tech & Venture Scene | Amplify.LA
2018 LA Tech & Venture Scene | Amplify.LA
 
CONFERENCE PAPER.Explosive Economic Growth in the San Francisco Bay Area has ...
CONFERENCE PAPER.Explosive Economic Growth in the San Francisco Bay Area has ...CONFERENCE PAPER.Explosive Economic Growth in the San Francisco Bay Area has ...
CONFERENCE PAPER.Explosive Economic Growth in the San Francisco Bay Area has ...
 
Ibm capstone assignment (part 2)ppt
Ibm capstone assignment (part 2)pptIbm capstone assignment (part 2)ppt
Ibm capstone assignment (part 2)ppt
 
Public commentdraftanalysis6292012
Public commentdraftanalysis6292012Public commentdraftanalysis6292012
Public commentdraftanalysis6292012
 
E-Gov To We-Gov in Moscow. Best Practices In Open Government.
E-Gov To We-Gov in Moscow. Best Practices In Open Government.E-Gov To We-Gov in Moscow. Best Practices In Open Government.
E-Gov To We-Gov in Moscow. Best Practices In Open Government.
 
There is Something Going on in the LA Tech Market by Upfront Ventures
There is Something Going on in the LA Tech Market by Upfront VenturesThere is Something Going on in the LA Tech Market by Upfront Ventures
There is Something Going on in the LA Tech Market by Upfront Ventures
 
Rise Of Globalization
Rise Of GlobalizationRise Of Globalization
Rise Of Globalization
 
“The War On Drugs Is A Big Fucking Lie”. In Recent History,
“The War On Drugs Is A Big Fucking Lie”. In Recent History,“The War On Drugs Is A Big Fucking Lie”. In Recent History,
“The War On Drugs Is A Big Fucking Lie”. In Recent History,
 
Pay For Someone To Write Your Paper
Pay For Someone To Write Your PaperPay For Someone To Write Your Paper
Pay For Someone To Write Your Paper
 
Woltering-PAPER
Woltering-PAPERWoltering-PAPER
Woltering-PAPER
 
The Internet Essay Examples
The Internet Essay ExamplesThe Internet Essay Examples
The Internet Essay Examples
 
Spatial Patterns of Urban Innovation and Productivity
Spatial Patterns of Urban Innovation and ProductivitySpatial Patterns of Urban Innovation and Productivity
Spatial Patterns of Urban Innovation and Productivity
 

Kürzlich hochgeladen

5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 

Kürzlich hochgeladen (17)

5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 

Exploring New York Neighborhoods for the best Italian Restaurants (The Battle of Neighborhoods)

  • 1. 1 Exploring New York Neighborhoods for the best Italian Restaurants Using Data Analytics (The Battle of Neighborhoods) CHIBUIKE OSIGWE
  • 2. i Exploring New York Neighborhoods for the best Italian Restaurants Using Data Analytics (The Battle of Neighborhoods) CHIBUIKE OSIGWE
  • 3. ii Preface As a part of the IBM Data Science professional program Capstone Project, we worked on the real datasets to get an experience of what a data scientist goes through in real life. Main objectives of this project were to define a business problem, look for data in the web and use Foursquare location data to compare different neighborhoods of New York to figure out which neighborhood is suitable for starting a new restaurant business. In this project, we will go through all the process in a step by step manner from problem designing, data preparation to final analysis and finally will provide a conclusion that can be leveraged by the business stakeholders to make their decisions.
  • 4. iii Content Preface....................................................................................................................... ii Content..................................................................................................................... iii Introduction................................................................................................................1 1.1 Background.......................................................................................................1 1.2 Problem.............................................................................................................2 1.3 Target Audience................................................................................................3 Data Acquisition and Methodology...........................................................................4 2.1 Data Source.......................................................................................................4 2.2 Methodology.....................................................................................................4 Exploratory Data Analysis.........................................................................................5 3.1 Number of Neighborhoods ...............................................................................5 3.2 Italian Restaurants Per Borough.......................................................................5 3.3 Italian Restaurants Per Neighborhood..............................................................9 Conclusion and Recommendation ...........................................................................12 4.1 Recommendation and Discussion...................................................................12 4.2 Conclusion ......................................................................................................13
  • 5. 1 Introduction 1.1 Background New York City (NYC), often called the City of New York or simply New York (NY), is the most populous city in the United States. With an estimated 2018 population of 8,398,748 distributed over about 302.6 square miles (784 km2 ), New York is also the most densely populated major city in the United States.[10] Located at the southern tip of the U.S. state of New York, the city is the center of the New York metropolitan area, the largest metropolitan area in the world by urban landmass.[11] With almost 20 million people in its metropolitan statistical area and approximately 23 million in its combined statistical area, it is one of the world's most populous megacities. New York City has been described as the cultural, financial, and media capital of the world, significantly influencing commerce,[12] entertainment, research, technology, education, politics, tourism, art, fashion, and sports. Home to the headquarters of the United Nations,[13] New York is an important center for international diplomacy.[14][15] Situated on one of the world's largest natural harbors, New York City is composed of five boroughs, each of which is a county of the State of New York.[16] The five boroughs–Brooklyn, Queens, Manhattan, the Bronx, and Staten Island–were consolidated into a single city in 1898.[17] The city and its metropolitan area constitute the premier gateway for legal immigration to the United States. As many as 800 languages are spoken in New York,[18] making it the most linguistically diverse city in the world. New York is home to more than 3.2 million residents born outside the United States,[19] the largest foreign-born population of any city in the world as of 2016.[20][21] As of 2019, the New York
  • 6. 2 metropolitan area is estimated to produce a gross metropolitan product (GMP) of $2.0 trillion. If greater New York City were a sovereign state, it would have the 12th highest GDP in the world.[22] New York is home to the highest number of billionaires of any city in the world. Figure 1: A Typical Italian Restaurant 1.2 Problem This final project explores the best locations for Italian restaurants throughout the city of New York. Food Business News stated that worldwide pasta sales were up for the second year in a row with the United Sates holding the largest market (Donley, 2018). New York is a major metropolitan area with more than 8.4 million (Quick Facts, 2018) people living within city limits. Most of the Italian immigration
  • 7. 3 into the United States occurred during the late 19th and early 20th century with over two million immigrants between 1900 and 1910. Italian families first settled in Little Italy’s neighborhood around Mulberry Street as has continued to thrive ever since. Italy account for the largest black immigrants in the United State, with almost 100,000 Manhattan inhabitants reporting Italian ancestry, the need to find and enjoy Italian cuisine is on the rise. This report explores which neighborhoods and boroughs of New York City have the most as well as the best Italian restaurants. Additionally, I will attempt to answer the questions “Where should I open a Italian Restaurant?” and “Where should I stay If I want great Italian food?” 1.3 Target Audience Who will be more interested in this project? What type of clients or a group of people will benefit? 1. Business personnel who wants to invest or open a Italian restaurant in New York. This analysis will be a comprehensive guide to start or expand restaurants targeting the Italian crowd. 2. Freelancers who loves to have their own restaurant as a side business. This analysis will give an idea, how beneficial it is to open a restaurant and what are the pros and cons of this business. 3. Italian crowd who wants to find neighborhoods with lots of option for Italian restaurants. 4. Business Analyst or Data Scientists, who wish to analyze the neighborhoods of New York using Exploratory Data Analysis and other statistical & machine learning techniques to obtain all the necessary data, perform some operations on it and, finally be able to tell a story out of it.
  • 8. 4 Data Acquisition and Methodology 2.1 Data Source In order to answer the above questions, data on New York City neighborhoods, boroughs to include boundaries, latitude, longitude, restaurants, and restaurant ratings and tips are required.  New York City data containing the neighborhoods and boroughs, latitudes, and longitudes will be obtained from the data source: https://cocl.us/new_york_dataset  New York City data containing neighborhood boundaries will be obtained from the data source: https://data.cityofnewyork.us/City- Government/Borough-Boundaries/tqmj-j8zm  All data related to locations and quality of Italian restaurants will be obtained via the FourSquare API utilized via the Request library in Python. 2.2 Methodology Data will be collected from https://cocl.us/new_york_dataset and cleaned and processed into a data frame. Foursquare be used to locate all venues and then filtered by Italian restaurants. Ratings, tips, and likes by users will be counted and added to the data frame. Data will be sorted based on rankings. Finally, the data be will be visually assessed using graphing from various Python libraries.
  • 9. 5 Exploratory Data Analysis 3.1 Number of Neighborhoods Foursquare API is very useful online application used my many developers & other applications like Uber etc. In this project I have used it to retrieve information about the places present in the neighborhoods of New York. The API returns a JSON file and we need to turn that into a data-frame. Here I have chosen 100 popular spots for each neighborhood within a radius of 1km. From figure 1 below, it can be seen that the Manhattan have the lowest number of neighborhood while Queens Borough have the highest number. Brooklyn and Staten Island seem to have seem to be in pair. This shows a little bit of competitive attribute between the two boroughs. Using the Folium package, the coordinates of the various neighborhoods bbelonging to the five boroughs were ascertained after requested. This can be found in Figure two. 3.2 Italian Restaurants Per Borough Total number of 233 restaurants were returned from the analysis, each belonging to a particular borough and neighborhood.
  • 10. 6 Figure 2: Neigbourhood per borough Figure 3 A Snapshot of the Boroughs and Neighborhood around New York
  • 11. 7 Figure 4: Italian Restuarants Per Borough From Figure 3 above, it can be deduced that Manhattan have the highest number of Italian restaurants despite having the least number of neighborhood. They have up to 100 Italian restaurants in the borough. The Queen borough have the least number with a total of 20. Additionally, Brooklyn and Staten Island are almost on pair showing a high competition attribute between the two.
  • 12. 8 Figure 5: A picture of the Neighborhoods and Boroughs showing the total number of Italian restaurants Figure 6: Italian Restaurants Per Neighborhood
  • 13. 9 This shows that Manhattan borough accounts fo the highest number of Borough despite having the smallest number of Neighbourhoods. Figure 4 shows a returned value showing the total of Italian restaurants. 3.3 Italian Restaurants Per Neighborhood From Figure 5, it can be deduced that the neighborhood of Belmont have the highest number of Italian restaurant with over 16 numbers. This is followed by Greenwich Village, then West Village to Lenox Hill which have the lowest. The range of numbers of the Italian restaurant is highly skewed, showing that they are all dispersed throughout the neighbourhoods. From figure 6, it is evidently shown that Belmont Neighborhood belongs to Bronx borough. This means that Bronx borough have the highest of restaurant of a particular neighborhood
  • 14. 10 Figure 7: figure showing Belmont Neighborhood .
  • 15. 11 Figure 8: Map Showing the restaurant density of the Neighbourhood and Borough The map shows a high clustered visualization around Manhattan and Lenox Hill, judging from their locations.
  • 16. 12 Conclusion and Recommendation 4.1 Recommendation and Discussion Queens and The Bronx have the least amount of Italian restaurants per borough. However, of note, Belmont of The Bronx is the neighborhood in all of NYC with the most Italian Restaurants. Despite Manhattan having the least number of neighborhoods in all five boroughs, it has the most Italian restaurants. Based on this information, I would state that Manhattan and Queens are the best locations for Italian cuisine in NYC. To have the best shot of success, I would open an Italian restaurant in Queens. Queens has multiple neighborhoods and has the least number of Italian restaurants making competition easier than in other boroughs. According to this analysis, Queens’s borough will provide the least competition for the new upcoming Italian restaurant, as there is very little Italian restaurants spread or no Italian restaurants in few neighborhoods. Also looking at the population distribution seems like it is densely populated with Italian crowd, which helps the new restaurant by providing high customer visit possibility. Therefore, definitely this region could potentially be a perfect place for starting quality Italian restaurants. Some of the drawbacks of this analysis are — the clustering is completely based only on data obtained from Foursquare API and the data about the Italian population distribution in each neighborhood is also based on the 2016 census which is not up- to date. Thus, there is a huge gap of around 3 years in the population distribution data. Even Though there are many areas where it can be improved, yet this analysis has certainly provided us with some good insights, preliminary information on possibilities & a head start into this business problem by setting the step stones properly.
  • 17. 13 4.2 Conclusion Finally, to conclude this project, wwe have got a chance to solve a business problem like how a real like data scientists would do. We have used many python libraries to fetch the data, to manipulate the contents & to analyze and visualize those datasets. We have made use of Foursquare API to explore the venues in neighborhoods of New York, then get good amount of data from online. We also applied Visualization technique for insights and used Folium to visualize it on a map. Some of the drawbacks or areas of improvement shows us that this analysis can be further improved with the help of more data and easy coding syntax. Similarly we can use this project to analysis any scenario such as opening a different cuisine restaurant or opening of a new gym and etc. I hope that this project helps as an initial guidance to take more complex real-life challenges using data-science. Find the code for this analysis on github . Find me on LinkedIn!