Chengran Ouyang
2019-04-24
Requirement:
Download the data, and load it in Pycharm and provide initial overview information.
Visualize the location of the car accidents.
Find out the insight from the dataset (i.e. Location/ Time of Day).
Take Weather Data into Consideration.
This time, I would leverage the power of R and Python to perform the analysis and present the result via both Rmarkdown (R) and jupiter notebook (python). The analysis would be based on a standard data science framework and answer the questions above; however, I would extend the scope of the analysis to identify any unique insight as well as provide detailed explanation of my code.
The dataset is acquired from Kaggle open datasource.The Data is available via NYC hourly car accidents 2013-2016.
import pandas as pd
import numpy as np
import seaborn as sns
sns.set()
from IPython.display import display
from IPython.display import HTML
import matplotlib.pyplot as plt
import folium
from folium.plugins import MarkerCluster
from folium.plugins import FastMarkerCluster
from branca.colormap import LinearColormap
import os
colli = pd.read_csv("input/NYPD_Motor_Vehicle_Collisions.csv")
display(colli.head(5))
display(colli.tail(5))
print('Number of rows: ', colli.shape[0])
print('Number of columns: ', colli.shape[1])
colli.describe()
colli.info()
As the data range section shows, some data entries for latitude and longitude are out of the scale and need to be corrected or removed.
rev_colli = colli[(colli['LATITUDE']>0)&(colli['LONGITUDE']>-75)&(colli['LONGITUDE']<-72)]
By removing the incorrect lat and lon information, I am examing the data again, which shows a more accurate scale.
display(rev_colli.describe())
rev1_colli = rev_colli.sample(2000)
lat = rev1_colli['LATITUDE'].tolist()
lon = rev1_colli['LONGITUDE'].tolist()
locations = list(zip(lat, lon))
map1 = folium.Map(location=[40.723205, -73.923021], zoom_start=11.5)
#FastMarkerCluster(data=locations).add_to(map1)
# icon = folium.Icon(icon='car'))
marker_cluster = MarkerCluster().add_to(map1)
for point in range(len(locations)):
folium.Marker(
location=locations[point],
#popup='Add popup text here.',
icon=folium.Icon(color='red', icon='remove-sign'),
).add_to(marker_cluster)
map1.save("map1.html")
map1
display(rev_colli[rev_colli['NUMBER OF MOTORIST INJURED']==43].iloc[:,0:10])
display(rev_colli[rev_colli['NUMBER OF MOTORIST INJURED']==43].iloc[:,10:20])
display(rev_colli[rev_colli['NUMBER OF MOTORIST INJURED']==43].iloc[:,20:29])
display(rev_colli[rev_colli['NUMBER OF PERSONS KILLED']==5].iloc[:,0:10])
display(rev_colli[rev_colli['NUMBER OF PERSONS KILLED']==5].iloc[:,10:20])
display(rev_colli[rev_colli['NUMBER OF PERSONS KILLED']==5].iloc[:,20:29])
display(rev_colli[rev_colli['NUMBER OF PEDESTRIANS INJURED']==28].iloc[:,0:10])
display(rev_colli[rev_colli['NUMBER OF PEDESTRIANS INJURED']==28].iloc[:,10:20])
display(rev_colli[rev_colli['NUMBER OF PEDESTRIANS INJURED']==28].iloc[:,20:29])
pd.read_csv('input/weather_data_nyc_centralpark_2016.csv')