How to split Nginx logs for bot visits

Logs are the valuable resource for debugging. When we have a website, we also often look into Nginx logs to see what happened with our website. But they are often cluttered by the visit of search bots, which make us difficult to find the noteworthy lines. So how to tell Nginx to log the search bot activities to another file, to make our access log cleaner?

To do that, first, create an file to help Nginx distinguish who is search bot. Create a file bot_definition.conf in /etc/nginx/conf.d folder, with this content:

map $http_user_agent $is_bot {
        ~Pingdom 1;
        ~Appenlight 1;

        ~Googlebot 1;
        ~Baiduspider 1;
        ~facebookexternalhit 1;
        ~MJ12bot 1;
        ~YandexBot 1;
        ~AhrefsBot 1;
        ~coccoc 1;

        default 0;

map $is_bot $is_not_bot {
        0 1;
        1 0;

Now, update your virtual host config. For example, with my website, the file is in /etc/nginx/sites-available/quanweb.conf. Add the include to the top of file then find the access_log directive and change to this:

include conf.d/bot_definition.conf;

# Other lines

access_log /var/log/nginx/quanweb/access.log combined if=$is_not_bot;
access_log /var/log/nginx/quanweb/bot_access.log combined if=$is_bot;

Change the log file path to your actual setup. What is added is the combined if=.... Because Nginx does not support the if=!$some_var syntax, we have to define two variables, $is_bot and $is_not_bot.

After saving the file, remember to test if your new config breaks Nginx:

$ sudo service nginx testconfig

If it says Ok, you can then apply the config:

$ sudo systemctl reload nginx

Now, check you main access log file. It no longer contains the visit of search bot. Note the the list of patterns to recognize bot above is not complete. You can extend it if you find something new.