--- title: Filter out bot visits from Gunicorn log date: 2023-09-26 03:21:25.794894 UTC --- Our web is often written in Python, and to run the web app on production, we often use [Gunicorn](https://gunicorn.org/). Its log is also a resource for incident investigation. But the log of bot visits is so noisy. How to exclude them? When running Gunicorn, we often have a config file for Gunicorn. We often name it _gunicorn_conf.py_, with content like this: ```py proc_name = 'awesome-web' workers = 6 worker_tmp_dir = '/dev/shm/' # Make short log line. Some info is discarded, because it is shown by journalctl already. logconfig_dict = { 'formatters': { 'generic': { 'format': '[%(levelname)s] %(message)s', } }, 'loggers': { 'gunicorn.error': { 'level': 'INFO', 'handlers': ['error_console'], 'propagate': False, 'qualname': 'gunicorn.error', }, 'gunicorn.access': { 'level': 'INFO', 'handlers': ['console'], 'propagate': False, 'qualname': 'gunicorn.access', }, }, } ``` To tell Nginx not to log visits of bots, we will manipulate Gunicorn logger object. First, define a function to identify bots (search bots and crawling bots) and a logger filter class: ```py import logging from logging import LogRecord def is_bot(user_agent: str): bot_ids = ( 'SemrushBot', 'DataForSeoBot', 'bingbot', 'YandexBot', 'AhrefsBot', 'DotBot', 'PetalBot', 'EzLynx', 'Googlebot', 'Amazonbot', 'MJ12bot', 'Sogou web spider', ) return any(s in user_agent for s in bot_ids) class BotIgnoreFilter(logging.Filter): def filter(self, record: LogRecord) -> bool: passed = super().filter(record) # Ref: https://docs.gunicorn.org/en/stable/settings.html#access-log-format user_agent = record.args['a'] from_bot = is_bot(user_agent) return passed and not from_bot ``` Then, we inject the code of setting logger into Gunicorn's [`on_starting`](https://docs.gunicorn.org/en/stable/settings.html#on-starting) hook: ```py def on_starting(server): server.log.access_log.addFilter(BotIgnoreFilter()) ``` Done. If you let Gunicorn controlled by `systemd`, you can use `systemctl` to tell Gunicorn to re-read new config (given that your systemd unit file is _our-web.service_): ```console $ sudo systemctl reload our-web.service ``` Gunicorn's way of using Python script for configuration looks weird as first. But in some situation, like this case, it is an advantage.