How to scrape Instagram images and quickly download photos
Apr 26, 2015
This post shows how to scrape Instagram images and popular photos using Instagram API and PHP and how to quickly download them in parallel using Redis and Curl.
At first you have to register as a developer at Instagram (if you haven’t already) here: https://instagram.com/developer/register/. Then create an application and get you public and secret keys.
Then create a directory for your project and install Instagram API wrapper for PHP via composer:
1
composer require cosenary/instagram
Then run example (/vendor/cosenary/instagram/example/index.php) and get your accessToken.
Then create scraper.php:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
require_once'vendor/autoload.php';
useMetzWeb\Instagram\Instagram;
date_default_timezone_set('UTC');
$redis = new Redis();
$redis->connect('127.0.0.1', 6379);
$instagram = new Instagram(array(
'apiKey' => 'YOUR_APP_KEY',
'apiSecret' => 'YOUR_APP_SECRET',
));
$accessToken = 'YOUR_ACCESS_TOKEN';
$instagram->setAccessToken($accessToken);
$search = $instagram->getPopularMedia();
$data = $search->data;
foreach($data as $d) {
if($d->type == 'image') {
$item = array(
'images' => $d->images,
'caption' => $d->caption,
'created_time' => $d->created_time,
'id' => $d->id,
'filename' => $id.'.jpg'
);
$redis->lPush('photo:queue', serialize($item));
}
}
This will get popular photos from Instagram and save them to Redis list (photo:queue) to be downloaded later.
Downloading photos in parallel
Create ‘photos’ directory and download.php:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
$dir = __DIR__.'/photos';
$redis = new Redis();
$redis->connect('127.0.0.1', 6379);
functionget($url){
//usual curl get
}
while(true) {
$item = $redis->brPop('photo:queue', 10); // wait until we get new item from Redis