Quantcast
Channel: HAProxy community - Latest topics
Viewing all articles
Browse latest Browse all 4735

Howto block badbots, crawlers & scrapers using list file

$
0
0

@chomps wrote:

Hi,

I want to block badbots and crawlers from hitting any backend servers. An example bot, taken from apache log is as follows:

HTTP/1.1" 403 539 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.0; trendictionbot0.5.0; trendiction search; http://www.trendiction.de/bot; please let us know of any problems; web at trendiction.com) Gecko/20071127 Firefox/3.0.0.11"

I have this in my haproxy config:
acl badbots hdr_reg(User-Agent) -i -f /etc/haproxy/badbots.lst
tcp-request content reject if badbots

but it doesn't seem to be working as I still see the request coming to the apache log, unles the "403" means that it is in fact getting blocked. But then it shouldn't even be there if it is blocked at the HAP side. The badbots.lst file contains:
rubrikkgroup\ .com
Baiduspider
Sosospider
Sogou
ZumBot
Yandex
trendictionbot0\ .5\ .0
trendiction\ .com
trendiction

I would really appreciate some help if someone knows how to block these 'invading' url's
Regards

Posts: 5

Participants: 2

Read full topic


Viewing all articles
Browse latest Browse all 4735

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>