Referer SPAM

Referer SPAM

Referral SPAM is in essence data pollution attack on Google Analytics.

All Your Google Analytics Are Belong To Us

Warning! What defensive options do we have? Currently: none.

Referral SPAM lists

Referral SPAM lists that I found that are updated most frequently:

Worst thing

Since there is no authentication in this whole GA process, virtually anybody can send hits for your website, just by knowing your UA-ID (UA-XXXXXXX-XX). This is exactly what spammers are using to pollute your data, and this is what we call Ghost referral SPAM.

Block at server level

Since no interaction with the server is required for the spam to occur, these are all pointless.

So far, all of the ghost referrals are still using incorrect hostnames. They could use your real hostname, but doing that is still a significant effort they haven’t made yet.

So this method works up to a point.

Perfect version

In nginx, we use valid_referers directive as in this this or this example.

Basic version

But we can also use more simple approach. Problem is using if as in IfIsEvil.

## deny referers (case insensitive)
if ($http_referer ~* (babes|click|diamond|forsale|girl|jewelry|love|nudit|organic|poker|porn|poweroversoftware|sex|teen|video|webcam|zippo))
{
    return 403;
}

Simple fix in GA as addendum

From analyst point of view, we are not interested in referral that have 100% bounce rate and 00:00:00 session duration. This is a common advanced segments that I use to exclude them from my results (Ghost SPAM).

The best fix is in GA

Filtering the referral spam from your Google Analytics is the most efficient and effective way to create clean webdata.

Guide to Removing Referrer Spam in Google Analytics How To Stop Referrer Spam


Fix on server

Now doing this also requires a little bit of processing power on the server side, because the nginx server will scan each referral request for the strings we mentioned above. So please do not make the list too long for nginx to process. The smaller the list, the better the performance.

HTTP referer spam blocking for nginx


map $http_referer $bad_referer {
    hostnames;
    default                         0;
    .semalt.com                     1;
    .kambasoft.com                  1;
    .savetubevideo.com              1;
    .descargar-musica-gratis.net    1;
    .7makemoneyonline.com           1;
    .baixar-musicas-gratis.com      1;
    .iloveitaly.com                 1;
    .iloveitaly.co                  1;
    .ilovevitaly.ru                 1;
    .fbdownloader.com               1;
    .econom.co                      1;
    .buttons-for-website.com        1;
    .buttons-for-your-website.com   1;
    .free-share-buttons.com         1;
    .srecorder.co                   1;
    .darodar.com                    1;
    .priceg.com                     1;
    .blackhatworth.com              1;
    .adviceforum.info               1;
    .hulfingtonpost.com             1;
    .best-seo-offer.com             1;
    .best-seo-solution.com          1;
}



HTTP status code 444: No Response (Nginx)
Used in Nginx logs to indicate that the server has returned no
information to the client and closed the connection (useful as a
deterrent for malware).
you should return 444 instead. that way your site looks down and the
bots are more likely to ignore you.
Globaly:

http {
# ...

  include blacklist.conf;

# ...

}

In sites:

server {
  # ...

  if ($bad_referer) {
    return 444;
  }

  # ...
}

Test it:

# with subdomain
curl -kI --referer http://www.social-buttons.com http://www.save-up.ch/

# without subdomain
curl -kI --referer http://social-buttons.com http://www.save-up.ch/

Blacklist Referer Spam Bots with NGINX - fadeit - software development company in Aarhus, Denmark

[ curl -s https://raw.githubusercontent.com/piwik/referrer-spam-blacklist/master/spammers.txt | xargs -I % echo -e “\t\t.%1\t\t\t1;”

Or sed -i -e 's/^/prefix/' file

sed -i -e ‘s/^/PREFIX/’ file_with_lines_to_prefix.txt

sed -i -e ‘1i TEXT’ FILE

Above command will insert string TEXT into the first line of the file FILE.

The following two records

example.com   1;
*.example.com 1;

can be combined:

.example.com  1;

from Module ngx_http_map_module

curl -s https://raw.githubusercontent.com/piwik/referrer-spam-blacklist/master/spammers.txt | xargs -I % echo -e '\"~*%\" 1;' | column -t | xargs -I % echo -e "    %" >/tmp/part-body.txt

cat <<'EOF' >/tmp/part-head.txt
# autogenerated from https://github.com/piwik/referrer-spam-blacklist
#

map $http_referer $bad_referer {
    default 0;
EOF

cat <<'EOF' >/tmp/part-tail.txt
}

# this goes inside `server` block per domain
# if ($bad_referer) { return 444; }
EOF

cat /tmp/part-head.txt /tmp/part-body.txt /tmp/part-tail.txt >/etc/nginx/conf.d/referer-spam.conf

We could do some check if files were updated? linux - Constantly check if file is modified bash - Unix & Linux Stack Exchange

  • Using the hostnames option it would be simply .domain.com 1;, but the referer field is a complete URL, not simply a hostname. So that doesn’t work.

refspam

It is done on server. It takes a small amount of server-processing power on every request (GA solution doesn’t do that), but it is unnoticeable.

Not enabled on every site. You must put a keyword refspam to be enabled. Keyword is entered in field Serve static files directly by nginx in Plesk.

date 01. Jan 0001 | modified 29. Dec 2023
filename: Task - Referer SPAM