Gimmeproxy tech description part 2. Redis
In this article we will look how Gimmeproxy.com stores it’s data. The only datastores used are Redis and ElasticSearch. They were chosen because of speed.
Other posts in Gimmeproxy tech series:
- Gimmeproxy tech description part 1. Collecting proxies
- Gimmeproxy tech description part 3. ElasticSearch
Why Redis, you might ask, it is not a “real” database and not persistent enough. Well, while it may more or less true, depending on Redis settings (see http://oldblog.antirez.com/post/redis-persistence-demystified.html), it doesn’t matter for this project.
As shown in previous article - Gimmeproxy gets proxies from open sources by Cronjob. In case of some data loss it will be able to efficiently rescrape all the database in couple of minutes. So Redis is a good choice here.
BTW dedicated Redis Module might be a better way to solve it (https://redislabs.com/blog/writing-redis-modules). I’ll discuss ElasticSearch usage in the next article.
Some of the Redis-code is written in LUA to save on network speed and achieve transaction-like workflow for some operations.
I use node-redis-scripto to load LUA to Redis from Node.js. While it has not been updated for a long time, it does the job just fine. Please let me know if you are aware of better alternatives.
|
|
Redis boilerplate
This is some boilerplate for working with Redis, should be prepended to each example. Here I promisify Redis and redis-scripto for convenience and load LUA scripts from ./lua directory.
|
|
Load proxy to Redis
Here we load initial proxy data from the proxy list into the Redis.
- gimme:proxies:available is a set of available proxy ids
- gimme:proxy:data - hash of proxy data
- gimme:proxies:tocheck - list of ids to check
- gimme:proxies:checked - set of checked ids
lua/gimme-add-proxy.lua
|
|
load.js
|
|
Add checked proxy
After proxy was checked, we update our data with what check-proxy library provided.
lua/gimme-proxy-checked.lua
|
|
checker.js
|
|
Remove proxy from Redis
If proxy isn’t working, we remove it from Redis.
checker.js
|
|
Return random proxy
lua/gimme-get-random-proxy.lua
|
|
get-proxy.js
|
|
That’s it for Redis. Next section will be about ElasticSearch.
Other posts in Gimmeproxy tech series: