When I needed working proxy servers for one project, I tried to use free proxy lists. As it turned out, most of those proxies didn’t work well for one reason or another.

Some were just a web proxy interface, some required authentication. Some were banned, slow or didn’t respond at all. And I needed only working proxies. That’s how Gimmeproxy.com idea was born.

Other posts in Gimmeproxy tech series:

Gimmeproxy is built with Node.js (koa), Redis, Elasticsearch and ELK stack for monitoring as well. Nginx is used as a reverse proxy.

I’m planning to describe all of the tech used in several articles. I’ll explain how Gimmeproxy.com works and provide some code examples.

This is the first one, let’s start with getting and checking proxies. Everything will be done with Node.js, so get it ready.

Getting proxy lists

Gimmeproxy uses custom proxylist parsing script which collects free public proxies from websites like hidemyass or gatherproxy. It’s written in javascript and is run by cron every 30 minutes.

At the time of developing I was not aware of any decent open source proxy scraping modules or libraries. But now there is one, so you don’t have to write it yourself.

I’m talking about https://github.com/chill117/proxy-lists. Let’s check how to use it.

1
npm install proxy-lists --save

Sample code client.js

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
const proxyLists = require('proxy-lists')
let foundProxies = []
const gettingProxies = proxyLists.getProxies({
countries: ['us', 'ca']
})
gettingProxies.on('data', function(proxies) {
foundProxies = foundProxies.concat(proxies)
})
gettingProxies.on('error', function(error) {
console.error(error)
})
gettingProxies.once('end', function() {
console.log(foundProxies)
checkIt(foundProxies) // to be implemented later
})

Sample foundProxies result

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[
{
ipAddress: '123.123.2.42',
port: 8080,
protocols: ['http'],
country: 'uk',
anonymityLevel: 'elite'
},
{
ipAddress: '234.221.233.142',
port: 3128,
protocols: ['https'],
country: 'us',
anonymityLevel: 'elite'
}
]

Checking proxies with proxy-check

Now we need to check if proxies are really working. I developed and open-sourced library for this - https://github.com/256cats/check-proxy. It requires only an IP address and a port to start working, it will get anonymity level, protocol and country for us.

It works like this: library has a server and client. A client tries to access server through provided proxy by sending get and post requests.

Server checks what was received from proxy (if any), whether client’s ip address was leaked, what headers were, etc. And responds with json of proxy parameters. Thus it allows to reliably check that proxy is indeed working.

Ok, let’s install check-proxy.

1
npm install check-proxy --save

At first you will need to run a server, better on a different machine that will respond to check-proxy requests. I’ve put it on Openshift VM.

Install express

1
npm install express --save

Create server.js

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
'use strict'
const express = require('express'),
app = express(),
url = require('url'),
bodyParser = require('body-parser'),
cookieParser = require('cookie-parser'),
getProxyType = require('check-proxy').ping
app.use(bodyParser.urlencoded({ extended: true }))
app.use(bodyParser.json())
app.use(cookieParser())
const ping = function(req, res) {
console.log('ip', req.connection.remoteAddress)
console.log('headers', req.headers)
console.log('cookies', req.cookies)
res.json(getProxyType(req.headers, req.query, req.body, req.cookies))
}
app.get('/', ping) // handle GET
app.post('/', ping) // and POST requests
const serverIp = '127.0.0.1'
const port = 8080
app.listen(port, serverIp, function() {
console.log('%s: Node server started on %s:%d ...', Date(Date.now() ), serverIp, port);
})

Here we used getProxyType function provided by check-proxy library to determine proxy server options.

Run server

1
node server.js

Now we are ready to actually check proxies. Let’s also check whether proxy supports Google.

Modify client.js to include the following

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
const Promise = require('bluebird')
const checkProxy = require('check-proxy').check
const checkIt = data => checkProxy({
testHost: 'http://yourserver.com:8080', // put your server app url here
proxyIP: data.ipAddress, // proxy ip to test
proxyPort: data.port, // proxy port to test
localIP: '192.168.1.1', // local machine ip to test
connectTimeout: 6, // curl connect timeout, sec
timeout: 10, // curl timeout, sec
websites: [
{
name: 'example',
url: 'http://www.example.com',
regex: /Example Domain/gim, // expected result
connectTimeout: 6, // curl connect timeout, sec
timeout: 30, // curl timeout, sec
},
{
name: 'google',
url: 'http://www.google.com',
regex: html => html
&& html.indexOf('crawl ban') == -1
&& html.indexOf('computer virus or spyware application') == -1
&& html.indexOf('entire network is affected') == -1
&& html.indexOf('http://www.download.com/Antivirus') == -1,
connectTimeout: 6,
timeout: 30,
}
]
})

Then you can do something like this to check all proxies.

1
2
3
4
5
6
Promise.map(foundProxies, check, { concurrency: 5}) // check 5 proxies in parallel
.then(checkResult => {
console.log('checkResult', checkResult)
process.exit()
})
.catch(err => console.log('error', err))

This should give you an idea how gimmeproxy gathers it’s proxies. In the next articles I’ll show how to add them to Redis and Elasticsearch.

Other posts in Gimmeproxy tech series: