This is a data analytic report exploring the New York Neighborhoods for Italian restaurants. It is the Battle of neighborhoods where some neighborhoods win while others lose. The analysis was done using many libraries and packages in python like Foursquare and Pandas. The report forms part of the IBM Capstone Project for "Applied Data Science" Specialization.
ChistaDATA Real-Time DATA Analytics Infrastructure
Exploring New York Neighborhoods for the best Italian Restaurants (The Battle of Neighborhoods)
1. 1
Exploring New York Neighborhoods
for the best Italian Restaurants
Using Data Analytics
(The Battle of Neighborhoods)
CHIBUIKE OSIGWE
2. i
Exploring New York Neighborhoods
for the best Italian Restaurants Using
Data Analytics
(The Battle of Neighborhoods)
CHIBUIKE OSIGWE
3. ii
Preface
As a part of the IBM Data Science professional program Capstone Project, we
worked on the real datasets to get an experience of what a data scientist goes through
in real life. Main objectives of this project were to define a business problem, look
for data in the web and use Foursquare location data to compare different
neighborhoods of New York to figure out which neighborhood is suitable for starting
a new restaurant business. In this project, we will go through all the process in a step
by step manner from problem designing, data preparation to final analysis and finally
will provide a conclusion that can be leveraged by the business stakeholders to make
their decisions.
4. iii
Content
Preface....................................................................................................................... ii
Content..................................................................................................................... iii
Introduction................................................................................................................1
1.1 Background.......................................................................................................1
1.2 Problem.............................................................................................................2
1.3 Target Audience................................................................................................3
Data Acquisition and Methodology...........................................................................4
2.1 Data Source.......................................................................................................4
2.2 Methodology.....................................................................................................4
Exploratory Data Analysis.........................................................................................5
3.1 Number of Neighborhoods ...............................................................................5
3.2 Italian Restaurants Per Borough.......................................................................5
3.3 Italian Restaurants Per Neighborhood..............................................................9
Conclusion and Recommendation ...........................................................................12
4.1 Recommendation and Discussion...................................................................12
4.2 Conclusion ......................................................................................................13
5. 1
Introduction
1.1 Background
New York City (NYC), often called the City of New York or simply New
York (NY), is the most populous city in the United States. With an estimated 2018
population of 8,398,748 distributed over about 302.6 square miles (784 km2
), New
York is also the most densely populated major city in the United States.[10]
Located
at the southern tip of the U.S. state of New York, the city is the center of the New
York metropolitan area, the largest metropolitan area in the world by urban
landmass.[11]
With almost 20 million people in its metropolitan statistical area and
approximately 23 million in its combined statistical area, it is one of the world's most
populous megacities. New York City has been described as the cultural, financial,
and media capital of the world, significantly influencing
commerce,[12]
entertainment, research, technology, education, politics, tourism, art,
fashion, and sports. Home to the headquarters of the United Nations,[13]
New York
is an important center for international diplomacy.[14][15]
Situated on one of the world's largest natural harbors, New York City is composed
of five boroughs, each of which is a county of the State of New York.[16]
The five
boroughs–Brooklyn, Queens, Manhattan, the Bronx, and Staten Island–were
consolidated into a single city in 1898.[17]
The city and its metropolitan area
constitute the premier gateway for legal immigration to the United States. As many
as 800 languages are spoken in New York,[18]
making it the
most linguistically diverse city in the world. New York is home to more than
3.2 million residents born outside the United States,[19]
the largest foreign-born
population of any city in the world as of 2016.[20][21]
As of 2019, the New York
6. 2
metropolitan area is estimated to produce a gross metropolitan product (GMP) of
$2.0 trillion. If greater New York City were a sovereign state, it would have the 12th
highest GDP in the world.[22]
New York is home to the highest number of billionaires
of any city in the world.
Figure 1: A Typical Italian Restaurant
1.2 Problem
This final project explores the best locations for Italian restaurants throughout the
city of New York. Food Business News stated that worldwide pasta sales were up
for the second year in a row with the United Sates holding the largest market
(Donley, 2018). New York is a major metropolitan area with more than 8.4 million
(Quick Facts, 2018) people living within city limits. Most of the Italian immigration
7. 3
into the United States occurred during the late 19th and early 20th century with over
two million immigrants between 1900 and 1910. Italian families first settled in Little
Italy’s neighborhood around Mulberry Street as has continued to thrive ever since.
Italy account for the largest black immigrants in the United State, with almost
100,000 Manhattan inhabitants reporting Italian ancestry, the need to find and enjoy
Italian cuisine is on the rise. This report explores which neighborhoods and boroughs
of New York City have the most as well as the best Italian restaurants. Additionally,
I will attempt to answer the questions “Where should I open a Italian Restaurant?”
and “Where should I stay If I want great Italian food?”
1.3 Target Audience
Who will be more interested in this project? What type of clients or a group of people
will benefit?
1. Business personnel who wants to invest or open a Italian restaurant in New
York. This analysis will be a comprehensive guide to start or expand
restaurants targeting the Italian crowd.
2. Freelancers who loves to have their own restaurant as a side business. This
analysis will give an idea, how beneficial it is to open a restaurant and what
are the pros and cons of this business.
3. Italian crowd who wants to find neighborhoods with lots of option for Italian
restaurants.
4. Business Analyst or Data Scientists, who wish to analyze the neighborhoods
of New York using Exploratory Data Analysis and other statistical & machine
learning techniques to obtain all the necessary data, perform some operations
on it and, finally be able to tell a story out of it.
8. 4
Data Acquisition and Methodology
2.1 Data Source
In order to answer the above questions, data on New York City neighborhoods,
boroughs to include boundaries, latitude, longitude, restaurants, and restaurant
ratings and tips are required.
New York City data containing the neighborhoods and boroughs, latitudes,
and longitudes will be obtained from the data
source: https://cocl.us/new_york_dataset
New York City data containing neighborhood boundaries will be obtained
from the data source: https://data.cityofnewyork.us/City-
Government/Borough-Boundaries/tqmj-j8zm
All data related to locations and quality of Italian restaurants will be
obtained via the FourSquare API utilized via the Request library in Python.
2.2 Methodology
Data will be collected from https://cocl.us/new_york_dataset and cleaned and
processed into a data frame. Foursquare be used to locate all venues and then filtered
by Italian restaurants. Ratings, tips, and likes by users will be counted and added to
the data frame. Data will be sorted based on rankings. Finally, the data be will be
visually assessed using graphing from various Python libraries.
9. 5
Exploratory Data Analysis
3.1 Number of Neighborhoods
Foursquare API is very useful online application used my many developers & other
applications like Uber etc. In this project I have used it to retrieve information about
the places present in the neighborhoods of New York. The API returns a JSON file
and we need to turn that into a data-frame. Here I have chosen 100 popular spots for
each neighborhood within a radius of 1km.
From figure 1 below, it can be seen that the Manhattan have the lowest number of
neighborhood while Queens Borough have the highest number. Brooklyn and Staten
Island seem to have seem to be in pair. This shows a little bit of competitive attribute
between the two boroughs.
Using the Folium package, the coordinates of the various neighborhoods bbelonging
to the five boroughs were ascertained after requested. This can be found in Figure
two.
3.2 Italian Restaurants Per Borough
Total number of 233 restaurants were returned from the analysis, each belonging to
a particular borough and neighborhood.
10. 6
Figure 2: Neigbourhood per borough
Figure 3 A Snapshot of the Boroughs and Neighborhood around New York
11. 7
Figure 4: Italian Restuarants Per Borough
From Figure 3 above, it can be deduced that Manhattan have the highest number of
Italian restaurants despite having the least number of neighborhood. They have up
to 100 Italian restaurants in the borough. The Queen borough have the least number
with a total of 20. Additionally, Brooklyn and Staten Island are almost on pair
showing a high competition attribute between the two.
12. 8
Figure 5: A picture of the Neighborhoods and Boroughs showing the total number
of Italian restaurants
Figure 6: Italian Restaurants Per Neighborhood
13. 9
This shows that Manhattan borough accounts fo the highest number of Borough
despite having the smallest number of Neighbourhoods. Figure 4 shows a returned
value showing the total of Italian restaurants.
3.3 Italian Restaurants Per Neighborhood
From Figure 5, it can be deduced that the neighborhood of Belmont have the highest
number of Italian restaurant with over 16 numbers. This is followed by Greenwich
Village, then West Village to Lenox Hill which have the lowest. The range of
numbers of the Italian restaurant is highly skewed, showing that they are all
dispersed throughout the neighbourhoods.
From figure 6, it is evidently shown that Belmont Neighborhood belongs to Bronx
borough. This means that Bronx borough have the highest of restaurant of a
particular neighborhood
15. 11
Figure 8: Map Showing the restaurant density of the Neighbourhood and Borough
The map shows a high clustered visualization around Manhattan and Lenox Hill,
judging from their locations.
16. 12
Conclusion and Recommendation
4.1 Recommendation and Discussion
Queens and The Bronx have the least amount of Italian restaurants per borough.
However, of note, Belmont of The Bronx is the neighborhood in all of NYC with
the most Italian Restaurants. Despite Manhattan having the least number of
neighborhoods in all five boroughs, it has the most Italian restaurants. Based on this
information, I would state that Manhattan and Queens are the best locations for
Italian cuisine in NYC. To have the best shot of success, I would open an Italian
restaurant in Queens. Queens has multiple neighborhoods and has the least number
of Italian restaurants making competition easier than in other boroughs.
According to this analysis, Queens’s borough will provide the least competition for
the new upcoming Italian restaurant, as there is very little Italian restaurants spread
or no Italian restaurants in few neighborhoods. Also looking at the population
distribution seems like it is densely populated with Italian crowd, which helps the
new restaurant by providing high customer visit possibility. Therefore, definitely
this region could potentially be a perfect place for starting quality Italian restaurants.
Some of the drawbacks of this analysis are — the clustering is completely based
only on data obtained from Foursquare API and the data about the Italian population
distribution in each neighborhood is also based on the 2016 census which is not up-
to date. Thus, there is a huge gap of around 3 years in the population distribution
data. Even Though there are many areas where it can be improved, yet this analysis
has certainly provided us with some good insights, preliminary information on
possibilities & a head start into this business problem by setting the step stones
properly.
17. 13
4.2 Conclusion
Finally, to conclude this project, wwe have got a chance to solve a business problem
like how a real like data scientists would do. We have used many python libraries to
fetch the data, to manipulate the contents & to analyze and visualize those datasets.
We have made use of Foursquare API to explore the venues in neighborhoods of
New York, then get good amount of data from online. We also applied Visualization
technique for insights and used Folium to visualize it on a map.
Some of the drawbacks or areas of improvement shows us that this analysis can be
further improved with the help of more data and easy coding syntax. Similarly we
can use this project to analysis any scenario such as opening a different cuisine
restaurant or opening of a new gym and etc. I hope that this project helps as an initial
guidance to take more complex real-life challenges using data-science.
Find the code for this analysis on github .
Find me on LinkedIn!