Gimmeproxy tech description part 1. Collecting proxies
When I needed working proxy servers for one project, I tried to use free proxy lists. As it turned out, most of those proxies didn’t work well for one reason or another.
Some were just a web proxy interface, some required authentication. Some were banned, slow or didn’t respond at all. And I needed only working proxies. That’s how Gimmeproxy.com idea was born.
Other posts in Gimmeproxy tech series:
Gimmeproxy is built with Node.js (koa), Redis, Elasticsearch and ELK stack for monitoring as well. Nginx is used as a reverse proxy.
I’m planning to describe all of the tech used in several articles. I’ll explain how Gimmeproxy.com works and provide some code examples.
This is the first one, let’s start with getting and checking proxies. Everything will be done with Node.js, so get it ready.
Getting proxy lists
Gimmeproxy uses custom proxylist parsing script which collects free public proxies from websites like hidemyass or gatherproxy. It’s written in javascript and is run by cron every 30 minutes.
At the time of developing I was not aware of any decent open source proxy scraping modules or libraries. But now there is one, so you don’t have to write it yourself.
I’m talking about https://github.com/chill117/proxy-lists. Let’s check how to use it.
|
|
Sample code client.js
|
|
Sample foundProxies result
|
|
Checking proxies with proxy-check
Now we need to check if proxies are really working. I developed and open-sourced library for this - https://github.com/256cats/check-proxy. It requires only an IP address and a port to start working, it will get anonymity level, protocol and country for us.
It works like this: library has a server and client. A client tries to access server through provided proxy by sending get and post requests.
Server checks what was received from proxy (if any), whether client’s ip address was leaked, what headers were, etc. And responds with json of proxy parameters. Thus it allows to reliably check that proxy is indeed working.
Ok, let’s install check-proxy.
|
|
At first you will need to run a server, better on a different machine that will respond to check-proxy requests. I’ve put it on Openshift VM.
Install express
|
|
Create server.js
|
|
Here we used getProxyType function provided by check-proxy library to determine proxy server options.
Run server
|
|
Now we are ready to actually check proxies. Let’s also check whether proxy supports Google.
Modify client.js to include the following
|
|
Then you can do something like this to check all proxies.
|
|
This should give you an idea how gimmeproxy gathers it’s proxies. In the next articles I’ll show how to add them to Redis and Elasticsearch.
Other posts in Gimmeproxy tech series: