Referer SPAM
Referral SPAM is in essence data pollution attack on Google Analytics.
All Your Google Analytics Are Belong To Us
Warning! What defensive options do we have? Currently: none.
Referral SPAM lists
Referral SPAM lists that I found that are updated most frequently:
-
Ads Blocker - block referral spam A list that is updated every day.
-
Hey we’ve been putting together a list of the spam bot referrals you see in GA
Worst thing
Since there is no authentication in this whole GA process, virtually anybody can send hits for your website, just by knowing your UA-ID (UA-XXXXXXX-XX). This is exactly what spammers are using to pollute your data, and this is what we call Ghost referral SPAM.
Block at server level
Since no interaction with the server is required for the spam to occur, these are all pointless.
So far, all of the ghost referrals are still using incorrect hostnames. They could use your real hostname, but doing that is still a significant effort they haven’t made yet.
So this method works up to a point.
Perfect version
In nginx, we use
valid_referers
directive as in this
this or
this
example.
Basic version
But we can also use more simple approach. Problem is using if
as in
IfIsEvil.
## deny referers (case insensitive)
if ($http_referer ~* (babes|click|diamond|forsale|girl|jewelry|love|nudit|organic|poker|porn|poweroversoftware|sex|teen|video|webcam|zippo))
{
return 403;
}
Simple fix in GA as addendum
From analyst point of view, we are not interested in referral that have 100% bounce rate and 00:00:00 session duration. This is a common advanced segments that I use to exclude them from my results (Ghost SPAM).
The best fix is in GA
Filtering the referral spam from your Google Analytics is the most efficient and effective way to create clean webdata.
Guide to Removing Referrer Spam in Google Analytics How To Stop Referrer Spam
Fix on server
Now doing this also requires a little bit of processing power on the server side, because the nginx server will scan each referral request for the strings we mentioned above. So please do not make the list too long for nginx to process. The smaller the list, the better the performance.
HTTP referer spam blocking for nginx
map $http_referer $bad_referer {
hostnames;
default 0;
.semalt.com 1;
.kambasoft.com 1;
.savetubevideo.com 1;
.descargar-musica-gratis.net 1;
.7makemoneyonline.com 1;
.baixar-musicas-gratis.com 1;
.iloveitaly.com 1;
.iloveitaly.co 1;
.ilovevitaly.ru 1;
.fbdownloader.com 1;
.econom.co 1;
.buttons-for-website.com 1;
.buttons-for-your-website.com 1;
.free-share-buttons.com 1;
.srecorder.co 1;
.darodar.com 1;
.priceg.com 1;
.blackhatworth.com 1;
.adviceforum.info 1;
.hulfingtonpost.com 1;
.best-seo-offer.com 1;
.best-seo-solution.com 1;
}
HTTP status code 444: No Response (Nginx)
Used in Nginx logs to indicate that the server has returned no
information to the client and closed the connection (useful as a
deterrent for malware).
you should return 444 instead. that way your site looks down and the
bots are more likely to ignore you.
Globaly:
http {
# ...
include blacklist.conf;
# ...
}
In sites:
server {
# ...
if ($bad_referer) {
return 444;
}
# ...
}
Test it:
# with subdomain
curl -kI --referer http://www.social-buttons.com http://www.save-up.ch/
# without subdomain
curl -kI --referer http://social-buttons.com http://www.save-up.ch/
Blacklist Referer Spam Bots with NGINX - fadeit - software development company in Aarhus, Denmark
[ curl -s https://raw.githubusercontent.com/piwik/referrer-spam-blacklist/master/spammers.txt | xargs -I % echo -e “\t\t.%1\t\t\t1;”
Or sed -i -e 's/^/prefix/' file
sed -i -e ‘s/^/PREFIX/’ file_with_lines_to_prefix.txt
sed -i -e ‘1i TEXT’ FILE
Above command will insert string TEXT into the first line of the file FILE.
The following two records
example.com 1;
*.example.com 1;
can be combined:
.example.com 1;
from Module ngx_http_map_module
curl -s https://raw.githubusercontent.com/piwik/referrer-spam-blacklist/master/spammers.txt | xargs -I % echo -e '\"~*%\" 1;' | column -t | xargs -I % echo -e " %" >/tmp/part-body.txt
cat <<'EOF' >/tmp/part-head.txt
# autogenerated from https://github.com/piwik/referrer-spam-blacklist
#
map $http_referer $bad_referer {
default 0;
EOF
cat <<'EOF' >/tmp/part-tail.txt
}
# this goes inside `server` block per domain
# if ($bad_referer) { return 444; }
EOF
cat /tmp/part-head.txt /tmp/part-body.txt /tmp/part-tail.txt >/etc/nginx/conf.d/referer-spam.conf
We could do some check if files were updated? linux - Constantly check if file is modified bash - Unix & Linux Stack Exchange
- Using the
hostnames
option it would be simply.domain.com 1;
, but the referer field is a complete URL, not simply a hostname. So that doesn’t work.
refspam
It is done on server. It takes a small amount of server-processing power on every request (GA solution doesn’t do that), but it is unnoticeable.
Not enabled on every site. You must put a keyword refspam
to be enabled.
Keyword is entered in field Serve static files directly by nginx in
Plesk.