Making Celery 4 work in Django 1.10 and Elastic Beanstalk

Finally after many many days of trying to make it work and reading thousand of pages, I got Celery working with django 1.10 in Amazon AWS Elastic Beanstalk with SQS (Simple Queue Services) – including Celery Beat!.
First, the files I ended up with, then the explanation of what I understand (some of those things still remain being a mystery)

STEP 0:
Install using the following:
pip install -U celery[sqs]
pip install django-celery-beat

I’m using the following versions of the apps:
boto (2.45.0)
botocore (1.4.63)
celery (4.0.2)
Django (1.10.1)
django-celery-beat (1.0.1)
kombu (4.0.2)
pip (9.0.1)
pycurl (7.43.0)
FILE: /src/PROJECT_NAME/celery.py

from __future__ import absolute_import, unicode_literals
import os
from celery import Celery

# set the default Django settings module for the 'celery' program.
# DONE IN __init__.py
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "PROJECT_NAME.settings.production")

app = Celery('PROJECT_NAME')

# Using a string here means the worker don't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
#   should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')

# Load task modules from all registered Django app configs.
app.autodiscover_tasks()


@app.task(bind=True)
def debug_task(self):
    print('Request: {0!r}'.format(self.request))

Note: My settings are in a module called settings and split in: base, development and production. That’s why the DJANGO_SETTINGS_MODULE is set like that. You might need to set it to PROJECT_NAME.settings for example.

FILE: /src/PROJECT_NAME/__init__.py

from __future__ import absolute_import, unicode_literals

# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celery_app
import os

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "PROJECT_NAME.settings.development")

# Add celery app defined in celery.py
__all__ = ['celery_app']

Now, the settings in the app:
FILE: /src/PROJECT_NAME/settings.py (or settings/production.py)

INSTALLED_APPS = [
    ...   
    'django_celery_beat',
    ...
]


CELERY_BROKER_TRANSPORT = 'sqs'
CELERY_BROKER_TRANSPORT_OPTIONS = {
    'region': 'us-east-1',
}
CELERY_BROKER_USER = AWS_ACCESS_KEY_ID
CELERY_BROKER_PASSWORD = AWS_SECRET_ACCESS_KEY
CELERY_WORKER_STATE_DB = '/var/run/celery/worker.db'
CELERY_BEAT_SCHEDULER = 'django_celery_beat.schedulers:DatabaseScheduler'
CELERY_WORKER_PREFETCH_MULTIPLIER = 0         # See https://github.com/celery/celery/issues/3712


CELERY_DEFAULT_QUEUE = 'celery'
CELERY_QUEUES = {
    CELERY_DEFAULT_QUEUE: {
        'exchange': CELERY_DEFAULT_QUEUE,
        'binding_key': CELERY_DEFAULT_QUEUE,
    }
}

A couple things to note here: the BEAT_SCHEDULER is set like that because we want to store the schedules in the database. There are other ways to define that.
Also, while you are at it, make sure you are not using os.environ[‘xxxx’] anywhere. For example, the following (Thank to Alexander Tyapkov):
SECRET_KEY = os.environ['SECRET_KEY']
must be replaced by
SECRET_KEY = os.environ.get('SECRET_KEY', '')
After installing django_celery_beat to the INSTALLED_APPS, run python manage.py migrate to update the database.

For the user specified in BROKER_USER, you have to create the user in the Amazon AWS Console, IAM Service, and add the permission set AmazonSQSFullAccess, which defines a policy like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "sqs:*"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

You also need to go to the SQS Service (in the AWS Console) and create a Queue called ‘celery’. I believe it’s automatically created if you don’t anyway.

Your tasks.py files will look like this:
FILE: /src/ANY_APP/tasks.py

from __future__ import absolute_import, unicode_literals

from celery.decorators import task

from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)

@task()
def do_something():
    logger.info('******** CALLING ASYNC TASK WITH CELERY **********')
    # Your code

OK. That was the easy part, now, to make it work in Elastic Beanstalk, we need to make SQS work, which requires PyCurl and needs to be compiled
You should already have your .ebextensions folder with files and lots of things, we’ll add a bunch more there to configure everything.
FILE: /.ebextensions/01_packages.config

files:
  "/usr/local/share/pycurl-7.43.0.tar.gz" :
    mode: "000644"
    owner: root
    group: root
    source: https://pypi.python.org/packages/source/p/pycurl/pycurl-7.43.0.tar.gz

packages:
  yum:
    python34-devel: []
    libcurl-devel: []

commands:
  01_download_pip3:
    # run this before PIP installs requirements as it needs to be compiled with OpenSSL
    command: 'curl -O https://bootstrap.pypa.io/get-pip.py'
  02_install_pip3:
    # run this before PIP installs requirements as it needs to be compiled with OpenSSL
    command: 'python3 get-pip.py'

container_commands:
  03_pycurl_reinstall:
    # run this before PIP installs requirements as it needs to be compiled with OpenSSL
    # the upgrade option is because it will run after PIP installs the requirements.txt file.
    # and it needs to be done with the virtual-env activated
    command: 'source /opt/python/run/venv/bin/activate && pip3 install /usr/local/share/pycurl-7.43.0.tar.gz --global-option="--with-nss" --upgrade'

FILE: /.ebextensions/99_configure_celery.config

files:
  "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh":
    mode: "000755"
    owner: root
    group: root
    content: |
      #!/usr/bin/env bash

      # Create required directories
      sudo mkdir -p /var/log/celery/
      sudo mkdir -p /var/run/celery/
      
      # Create group called 'celery'
      sudo groupadd -f celery
      # add the user 'celery' if it doesn't exist and add it to the group with same name
      id -u celery &>/dev/null || sudo useradd -g celery celery
      # add permissions to the celery user for r+w to the folders just created
      sudo chown -R celery:celery /var/log/celery/
      sudo chown -R celery:celery /var/run/celery/

      # Get django environment variables
      celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/%/%%/g' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
      celeryenv=${celeryenv%?}

      # Create CELERY configuraiton script
      celeryconf="[program:celeryd]
      directory=/opt/python/current/app/src
      ; Set full path to celery program if using virtualenv
      command=/opt/python/run/venv/bin/celery worker –A PROJECT_NAME --loglevel=INFO --logfile="/var/log/celery/%%n%%I.log" --pidfile="/var/run/celery/%%n.pid"

      user=celery
      numprocs=1
      stdout_logfile=/var/log/celery-worker.log
      stderr_logfile=/var/log/celery-worker.log
      autostart=true
      autorestart=true
      startsecs=10

      ; Need to wait for currently executing tasks to finish at shutdown.
      ; Increase this if you have very long running tasks.
      stopwaitsecs = 60

      ; When resorting to send SIGKILL to the program to terminate it
      ; send SIGKILL to its whole process group instead,
      ; taking care of its children as well.
      killasgroup=true

      ; if rabbitmq is supervised, set its priority higher
      ; so it starts first
      priority=998

      environment=$celeryenv"
      
      
      # Create CELERY BEAT configuraiton script
      celerybeatconf="[program:celerybeat]
      ; Set full path to celery program if using virtualenv
      command=/opt/python/run/venv/bin/celery beat –A PROJECT_NAME --loglevel=INFO --logfile="/var/log/celery/celery-beat.log" --pidfile="/var/run/celery/celery-beat.pid"
           
      directory=/opt/python/current/app/src
      user=celery
      numprocs=1
      stdout_logfile=/var/log/celerybeat.log
      stderr_logfile=/var/log/celerybeat.log
      autostart=true
      autorestart=true
      startsecs=10
     
      ; Need to wait for currently executing tasks to finish at shutdown.
      ; Increase this if you have very long running tasks.
      stopwaitsecs = 60
     
      ; When resorting to send SIGKILL to the program to terminate it
      ; send SIGKILL to its whole process group instead,
      ; taking care of its children as well.
      killasgroup=true
     
      ; if rabbitmq is supervised, set its priority higher
      ; so it starts first
      priority=999
     
      environment=$celeryenv"

      # Create the celery supervisord conf script
      echo "$celeryconf" | tee /opt/python/etc/celery.conf
      echo "$celerybeatconf" | tee /opt/python/etc/celerybeat.conf

      # Add configuration script to supervisord conf (if not there already)
      if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
          then
          echo "[include]" | tee -a /opt/python/etc/supervisord.conf
          echo "files: celery.conf celerybeat.conf" | tee -a /opt/python/etc/supervisord.conf
      fi

      # Reread the supervisord config
      supervisorctl -c /opt/python/etc/supervisord.conf reread

      # Update supervisord in cache without restarting all services
      supervisorctl -c /opt/python/etc/supervisord.conf update

      # Start/Restart celeryd through supervisord
      supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd
      supervisorctl -c /opt/python/etc/supervisord.conf restart celerybeat
      
      
      
commands:
  01_killotherbeats:
    command: "ps auxww | grep 'celery beat' | awk '{print $2}' | sudo xargs kill -9 || true"
    ignoreErrors: true
  02_restartbeat:
    command: "supervisorctl -c /opt/python/etc/supervisord.conf restart celerybeat"
    leader_only: true

(Thanks to PythonEntusiast and WBAR.)
A couple notes for this file:
- Change all the PROJECT_NAME to whatever you project is call (the name of the folder where you put the celery.py file in the first step)
- In the 2 lines where it says directory=/opt/python/current/app/src, most people would probably remove the /src. Because I placed my project (where the manage.py file is) inside a /src folder, I had to put it there.
- If started by copying the same examples than me from the Internet, I had to change stopwaitsecs to a lower number because when the celery beat restart is called, it in fact takes that time to stop. If you have it set to 600 seconds, then t

the the Amazon AWS Console, Elastic Beanstalk and download the full logs. Then check the eb-activity.log file to see what happened.

If you have any problems, or it just doesn’t work, there are a couple things you can do. Connect using SSH (eg, Putty) and then:
- To see the celery and celery beat logs:
nano /var/log/celery-worker.log
or
nano /var/log/celerybeat.log
Also, check the files under /var/log/celery/

- Check supervisord configurations:
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf reread

- Manually restart celery or celerybeat:
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf update (if you have changed the configurations and executed a ‘reread’)

sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celerybeat

If you are having troubles with PyCurl/Curl (you will see messages in the celery logs for example), you can try to open a shell and do something like this:
source /opt/python/run/venv/bin/activate (activate the environment)
cd /opt/python/current/app/src/ (go to project directory, might need to remove /src)
python manage.py shell
> import pycurl
> pycurl.version

In my case, I was receiving a message like

ImportError: pycurl: libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (none/other)

With this, it should be working for you, except the following 2 things:
- The logger.info(…) in the task is NOT working. So change the code to test it.
- Changes in the task scheduler done in the admin interface are not refreshed until you restart celery beat (see bug report)

Needless to say, most of this code is not mine, I came up with it after searching online for days and copying it from different places.

Comments

UnknownMay 17, 2017 at 4:37 AM
I've tried doing so on AWS a year ago, I've spent totally 2 months to make Django + Celery work on AWS, but failed finally. Now, with your example, I will try again. Thanks
ReplyDelete
Replies
UnknownSeptember 11, 2017 at 10:25 PM
Hi Diego Jancic, i am new with eb and sqs, i have found your tutorial and trying to implement it for my own server but i have a question: FILE: /.ebextensions/99_configure_celery.py it really and with .py? or .config?
sorry for my english, hope you understand my question.
ReplyDelete
Replies
UnknownMay 3, 2018 at 9:00 AM
Hi, I followed your tutorial and I was able to make celery work on Elastic Beanstalk. The only part that I don't understand is why you're using the following commands: 01_killotherbeats and 02_restartbeat. You basically kill celery beat and then restart it before executing the config script run_supervised_celeryd.sh that at the end will restart celery beat. I don't think those commands are needed
ReplyDelete
Replies

Add comment

this.Blog

Search This Blog

Making Celery 4 work in Django 1.10 and Elastic Beanstalk

Comments

Post a Comment

Popular posts from this blog

Stripping HTML from text in SQL Server–Version 3

Deploying PIL/Pillow to AWS Lambda (Serverless)