DIY - Use Instagram data to plan your next vacation
September 04, 2015
When I travel, I always want to get the most of the city I'm visiting. One way is to talk to local people and get advices about which spots you shouldn't miss. But I wish I could have the point of view of the past visitors... Which places did they enjoy the most ? Where is the best spot to watch the sunset ? The best selfies you can get ? I also want to have a look of some beaches before choosing one or getting an idea about what some areas looks like... The idea here is to ask Instagram about the nearby popular spots.
After looking into the Instagram API and playing around with it, I came up with the following script.
import os import json from collections import Counter import pandas as pd from instagram.client import InstagramAPIINSTAGRAM_ACCESS_TOKEN = ’’
INSTAGRAM_CLIENT_ID = ’’
INSTAGRAM_CLIENT_SECRET = ’’
api = InstagramAPI(access_token=INSTAGRAM_ACCESS_TOKEN, client_id=INSTAGRAM_CLIENT_ID,client_secret=INSTAGRAM_CLIENT_SECRET)
def getNbLikes(listMedia):
likes =0
count =0
for media in listMedia:
likes = likes + media.like_count
count = count + 1
if count > 0:
return likes/count
else:
return 0
def getTags(listMedia):
tags = []
for media in listMedia:
for mediaTag in media.tags:
tags.append(mediaTag.name)
return Counter(tags)
def getMedia(locationId):
medias = api.location_recent_media(location_id=locationId)
return medias[0]
bestLocations = [];
latD=48.858844
lonD=2.294351
for x in range(-10, 10):
for z in range(-10,10):
print(x,z)
locations = api.location_search(lat=48.858844+x0.001, lng=2.294351+z0.001)
for location in locations:
likes = 0
if not any(d[’name’] == location.name for d in bestLocations):
images = getMedia(location.id)
likes = getNbLikes(images)
tags = getTags(images)
if len(images)>0 :
bestLocations.append(dict(name=location.name,latitude=location.point.latitude,longitude=location.point.longitude,likes=likes,tags=tags,id=location.id,nbrImages=len(images)))
finalData = pd.DataFrame.from_dict(bestLocations)
finalData.to_csv(’instadata.csv’, sep=’\t’, encoding=’utf-8’)
We first query for the locations around the coordinates of the location we wish to know more about and then we query for the photos of each location and get the number of likes and the number of pictures for it.
Note: You need to replace the Access Tokens and Client ID with the values you get from Instagramhere.
After running the previous script for some time, we get a nice dataset that we can analyze with pandas.
> import pandas as pd > df = pd.read_csv("instadata.csv",sep='\t') > df.head(10)
This is the head of the dataFrame displayed in iPython Notebook.
Now let's see which spots have the most likes per picture and which ones have the most pictures
gr = df.groupby('name').sum()
After dropping the useless columns
my_plot = gr.head(30).sort(columns='likes',ascending=False).plot(kind='bar',figsize=[15,5])
This script could be improved with some text mining on the names, to combine the similar results. (You can see that we have multiple results for the Eiffel tower).
The next step is to visualize the nearby pictures.
I put up a little angularJS application where we can select a location
and see a list of pictures per location.
I'll put the code online when I have more time.
Please let me know in the comments if you have any improvement ideas !