@hassenchaieb

DIY - Use Instagram data to plan your next vacation

September 04, 2015

When I travel, I always want to get the most of the city I'm visiting. One way is to talk to local people and get advices about which spots you shouldn't miss. But I wish I could have the point of view of the past visitors... Which places did they enjoy the most ? Where is the best spot to watch the sunset ? The best selfies you can get ? I also want to have a look of some beaches before choosing one or getting an idea about what some areas looks like... The idea here is to ask Instagram about the nearby popular spots.

After looking into the Instagram API and playing around with it, I came up with the following script.

import os import json from collections import Counter import pandas as pd from instagram.client import InstagramAPI

INSTAGRAM_ACCESS_TOKEN = INSTAGRAM_CLIENT_ID = INSTAGRAM_CLIENT_SECRET =

api = InstagramAPI(access_token=INSTAGRAM_ACCESS_TOKEN, client_id=INSTAGRAM_CLIENT_ID,client_secret=INSTAGRAM_CLIENT_SECRET)

def getNbLikes(listMedia): likes =0 count =0 for media in listMedia: likes = likes + media.like_count count = count + 1 if count > 0: return likes/count else: return 0

def getTags(listMedia): tags = [] for media in listMedia: for mediaTag in media.tags: tags.append(mediaTag.name) return Counter(tags)

def getMedia(locationId): medias = api.location_recent_media(location_id=locationId) return medias[0]

bestLocations = []; latD=48.858844 lonD=2.294351

for x in range(-10, 10): for z in range(-10,10): print(x,z) locations = api.location_search(lat=48.858844+x0.001, lng=2.294351+z0.001) for location in locations: likes = 0 if not any(d[’name’] == location.name for d in bestLocations): images = getMedia(location.id) likes = getNbLikes(images) tags = getTags(images) if len(images)>0 : bestLocations.append(dict(name=location.name,latitude=location.point.latitude,longitude=location.point.longitude,likes=likes,tags=tags,id=location.id,nbrImages=len(images)))

finalData = pd.DataFrame.from_dict(bestLocations) finalData.to_csv(instadata.csv, sep=\t, encoding=utf-8)

We first query for the locations around the coordinates of the location we wish to know more about and then we query for the photos of each location and get the number of likes and the number of pictures for it.

Note: You need to replace the Access Tokens and Client ID with the values you get from Instagramhere.

After running the previous script for some time, we get a nice dataset that we can analyze with pandas.

> import pandas as pd > df = pd.read_csv("instadata.csv",sep='\t') > df.head(10)

This is the head of the dataFrame displayed in iPython Notebook.

Screenshot from 2015-09-04
22:28:44

Now let's see which spots have the most likes per picture and which ones have the most pictures

gr = df.groupby('name').sum()

After dropping the useless columns

my_plot = gr.head(30).sort(columns='likes',ascending=False).plot(kind='bar',figsize=[15,5])

index

This script could be improved with some text mining on the names, to combine the similar results. (You can see that we have multiple results for the Eiffel tower).

The next step is to visualize the nearby pictures.
I put up a little angularJS application where we can select a location and see a list of pictures per location.
I'll put the code online when I have more time.

Screenshot from 2015-09-04
23:59:36

Please let me know in the comments if you have any improvement ideas !

Discuss on Twitter

Join the newsletter

I write about cloud and software architectures.


Hi, I'm Hassen. I'm a Product engineer based in Paris 🇫🇷. I'm currently building data products at YOOI.