How to automatically deploy Python web application

Recently I saw a question from a Python fellow, how to deploy Django application without manually SSH to server and run commands. This style is single-server deployment, where you put all components, from the application code, database, to static, media files, in the same server. No Docker involves. With this deployment strategy, we will need some way to deliver new version of our app everytime new code is pushed to the "release" branch of Git repository. Here is a guide.

Why we need automation? Because it is boring to do these things by hand again and again:

SSH to the server, cd to the installation folder.
Run git pull.
Run commands to stop your services.
Check and install Python packages if needed.
Run database migration.
Check and copy static files (CSS, JS, images) to a public folder for Nginx.
Run commands to start your services.
And mores, depending on how complex your project is.

There are some tools to do this automatically. My prefered one is Ansible. Before I dive to a detailed Ansible script, let set a convention of how our application is setup on the server:

1. Folder layout

The installation folder is /opt/ProjectName, where the tree structure is:

ProjectName/
├── project-name/
│  ├── pyproject.toml
│  └── manage.py
├── public/
└── venv/

Inside this folder, we have project-name subfolder for the source code, the public folder for JS, CSS or any files which are to be hosted by Nginx. Because this is a Python project, we also have venv for Python virtual environment. Why come up with this scheme?

People who adopt single-server deployment often run multiple applications in the same server. So having a folder to gather every files for a project is easier to maintain.
venv is outside source code folder to prevent us from copy / zip it by accident when we need to copy / move the project to somewhere else. And when we need to do some tasks to scan / search our source code, we don't waste time scanning the "venv". By putting "venv" as the sibling folder, it is quick to activate the environment with this command:
```
$ . ../venv/bin/activate
```
public folder is there with the same reason as venv. Note that, you need to set appropriate permission for /opt/ProjectName/public so that Nginx can serve files from it.

2. Process management

To let my application automatically run when server starts, I use Systemd. While other developers use tools like supervisord, pm2, I use Systemd because:

It is pre-included in Linux system. No need to install.
Linux server uses it to manage all other system processes. It is good to use a central tool. No need to remember the commands of other tools.
It can start / stop services in order. We want, for example, our application to run after the database systems have started, or when we reboot the server, we want to our application to stop before the database systems are stopped. It is meaningless to run our application when databases (PostgreSQL, Redis) are not available, right?
Because systemd is used for managing system, it is very powerful. It supports more use cases to control when our application can be run or when our application needs to be restarted (like when our application hangs due to some mysterious bugs).
It controls the security context for our application, like what resource and how much our application can use. It is needed to prevent disaster when our application is compromised.
Under systemd, our application can integrate with journald for logging and enjoy journald features when debugging with the logs. My other article about journald is here.

But we won't take advantage of systemd if we run our application in Docker container, simply because Docker container cannot run systemd. Docker has some features near to systemd, but not as rich and precise as systemd.

To use systemd, we will need to create a .service file like this:

[Unit]
Description=Our web backend
After=redis-server.service postgresql.service

[Service]
User=dodo
Group=dodo

Type=simple
WorkingDirectory=/opt/ProjectName/project-name
# Create directory /run/project-name and set appropriate permission
RuntimeDirectory=project-name
ExecStart=/opt/ProjectName/project-name/bin/gunicorn project.wsgi -b unix:/run/project-name/web.sock
TimeoutStopSec=20
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

In this service file, you can see:

The application is run with normal user. Don't run it with powerful user, or if your application is compromised, the hacker can use it to damage more to the system.
Our application is not listening on a TCP port (like 8000), but a Unix domain socket, via the file /run/project-name/web.sock. Why not using numeric port? Because if we have many projects, we cannot remember which port is of which project. Having a text-based thing with name is easier to manage.
When we use Unix domain socket, it is important to not forget the RuntimeDirectory. It tells systemd to create a directory where our application can create socket file, and systemd will delete it after our application stopped.

This service file should be copied to /usr/local/lib/systemd/system. Some articles tell to put the file to /etc/systemd. Don't do that, because sometimes we don't want our application to be auto-started (like when it has some bugs and need to be fixed before serving users), we then can enable/disable the auto-start by these commands:

$ sudo systemctl disable my-app
$ sudo systemctl enable my-app

When our app listens on Unix domain socket, the Nginx configuration will be like this:

location / {
    include proxy_params;
    proxy_pass http://unix:/run/project-name/web.sock;
}

We can take advantage of Unix domain socket further by connecting to PostgreSQL via Unix domain socket only. By doing so, we can stop PostgreSQL from opening to outside, reducing chance of being attacked. My other article with this practice is here.

For the tasks which need to run periodically, we should implement it with Systemd timer, instead of cron job. It has these benefits over cron:

Controlled by systemd like other services, meaning it is secure, it is itegrated with journald for logging.
We can temporarily disable the tasks, with
```
sudo systemctl stop my-task.timer
```
We can trigger the task manually any time, outside the time point we defined, with
```
sudo systemctl start my-task.service
```
We can see when was the last time our task ran, and when will it run next, with
```
sudo systemctl list-timers
```

Below is an example with one of my projects:

sytemd-timer

3. Ansible script

Ansible is easy to install. Just do:

$ sudo apt install ansible

Ansible is so powerful that its documentation is big and difficult to get started. To simplify, we will need at least two files:

An inventory file, let name it inventory.yml, to list the servers we will deploy app to.
A playbook file, let name it playbook.yml, to describe the steps that Ansible need to do to deploy our app.

In a more complex setup, the playbook can be many files, subfolders, and also the inventory.

Note that, you must setup your server beforehand so that we can SSH using public keys, not password.

Inventory

The inventory.yml looks like this:

prod:
  hosts:
    prod.agriconnect.vn:
      ansible_user: dodo

staging:
  hosts:
    staging.agriconnect.vn:
      ansible_user: dodo

In this inventory, we have two groups, prod for production and staging for staging servers. If you don't have staging servers, just delete the staging group. Each group must have hosts field to list the servers. To identify the server, you can use domain, or IP address. We also need to specify ansible_user, which is the Linux user of the server (not our local PC) that we often SSH to (it can be the same user under whom our web application runs).

Playbook

The playbook.yml file will look like this:

---
- hosts: '{{ target|default(staging) }}'
  remote_user: dodo
  # This is needed to make ansible_env work
  gather_facts: true
  gather_subset:
    - '!all'

  vars:
    target: staging

  tasks:
    - name: Say hello
    ansible.builtin.command: echo Hello

  environment:
    VIRTUAL_ENV: '/opt/ProjectName/venv'

At the hosts: parameter, we choose which group of server in inventory.yml to run this playbook on. If we have only one group, we can use a fixed value here. But because we have two, we use Jinja code to produce dynamic value. This value depends on the target variable which we declare in vars section, and we pass its value from command line when running Ansible.

Later on, if we want to deploy to prod servers, we run:

$ ansible-playbook -i inventory.yml playbook.yml -e "target=prod ansible_become_pass=$REMOTE_USER_PASS"

and if we want to deploy to staging servers, we run:

$ ansible-playbook -i inventory.yml playbook.yml -e "target=staging ansible_become_pass=$REMOTE_USER_PASS"

The commands which Ansible will execute on the server will need some info, like file paths, directory paths, so let define them as variables, to make the commands short:

  vars:
    target: staging
    base_folder: /opt/ProjectName
    webapp_folder: '{{ base_folder }}/project-name'
    bin_folder: '{{ base_folder }}/venv/bin/'

The tasks section then is:

  tasks:
    - name: Clean source
      ansible.builtin.command: git reset --hard
      args:
        chdir: '{{ webapp_folder }}'

    - name: Update source
      ansible.builtin.git:
        repo: 'git@gitlab.com:our-company/project-name.git'
        dest: '{{ webapp_folder }}'
        version: "{{ lookup('env', 'CI_COMMIT_REF_NAME')|default('develop', true) }}"
      register: git_out

    - name: Get changed files
      ansible.builtin.command: git diff --name-only {{ git_out.before }}..{{ git_out.after }}
      args:
        chdir: '{{ webapp_folder }}'
      register: changed_files
      when: git_out.changed

    - name: Stop ProjectName services
      ansible.builtin.systemd: name='{{ item }}' state=stopped
      loop:
        - my-web.service
        - my-ws-server.service
        - my-asynctask.service
      become: true
      become_method: sudo
      when:
        - git_out.changed
        - changed_files.stdout is search('.py|.po|.lock|.toml')

    - name: Install python libs
      ansible.builtin.command: poetry install --no-root --only main -E systemd
      args:
        chdir: '{{ webapp_folder }}'
      when: git_out.changed and changed_files.stdout is search('poetry|pyproject')

    - name: Migrate database
      ansible.builtin.command: '{{ bin_folder }}python3 manage.py migrate --no-input'
      args:
        chdir: '{{ webapp_folder }}'
      when:
        - git_out.changed
        - changed_files.stdout is search('poetry|pyproject|models|migrations|settings')

    - name: Compile translation
      ansible.builtin.command: '{{ bin_folder }}python3 manage.py compilemessages'
      args:
        chdir: '{{ webapp_folder }}'
      when:
        - git_out.changed
        - changed_files.stdout is search('locale')

    - name: Collect static
      ansible.builtin.command: '{{ bin_folder }}python3 manage.py collectstatic --no-input'
      args:
        chdir: '{{ webapp_folder }}'

    - name: Start ProjectName services
      ansible.builtin.systemd: name='{{ item }}' state=started
      loop:
        - my-ws-server.service
        - my-web.service
        - my-asynctask.service
      become: true
      become_method: sudo
      when: git_out.changed

The first step ("Clean source"), we reset any dirty changes in our source code folder, to prevent Git failure which may raise in the next step.

The second step, we pull the code from a Git hosting service, to the version we define for this deployment. In my project, I often use main branch for stable code, to deploy to production, and use develop branch for staging. You can put a fixed branch name at version, like version: main if you have simpler setup. In my case, Ansbile is triggered by Git push, and I want to pull the code of the exact Git revision which triggerred the deployment task. GitLab CI gives this info via CI_COMMIT_REF_NAME, so I use lookup('env', 'CI_COMMIT_REF_NAME') code to retrieve it. The code default('develop', true) is to fallback to develop branch when we manually run Ansible from command line (not via Git push). We use the register parameter to save Git result, it is needed by the next step.

The third step, we check which files have been changed since the last deployment. Later, we will base on this info to decide which command to skip.

Step 4, we stop our application services, which are corresponding to the systemd .service files. This example demonstrates the use of loop to do action on many objects. Without it, we have to define a step for each service, which makes the playbook long. One more thing to note that, because the systemctl start / stop actions need to be run with sudo, we use become and become_method parameters to tell Ansible to do sudo. Here, we also use when to define the condition when we need to execute this step. If the source code only changes some JS, CSS files, we don't need to stop the services.

The next steps may be already easy to understand, from the explanation of the first four steps.

The third section of the playbook is to set some environment variables which the above steps need:

  environment:
    # Modify PATH so that poetry can be found.
    PATH: '{{ base_folder }}/venv/bin:{{ ansible_env.PATH }}:{{ ansible_env.HOME }}/.local/bin'
    # Tell poetry to use our virtual env folder
    VIRTUAL_ENV: '{{ base_folder }}/venv'

Before, in some server configuration, the PATH was not populated with ~/.local/bin in the environment which Ansible logged in to, hence Ansible failed to run poetry at the 5th step. I'm not sure if it is still an issue.

We won't be able to write a correct playbook from the first time. So please setup a virtual machine with VirtualBox to test and fix the playbook. When testing with virtual machine, we won't have Git push event, we have to run Ansible directly from the CLI. In the commands I given above, REMOTE_USER_PASS is the environment variable, containing the password of the user on the server. You can set it with

export REMOTE_USER_PASS=mypassword

before running Ansible.

Now is a screenshot of Ansible in action, from one of my projects:

Ansible in action

And how it is run as part of GitLab Pipeline:

GitLab Pipeline

So, I have a short guide on how to take advantage of Ansible and SystemD to deploy a Django application. Hope that it make your life easier.