Skip to main content

Making Celery 4 work in Django 1.10 and Elastic Beanstalk

Finally after many many days of trying to make it work and reading thousand of pages, I got Celery working with django 1.10 in Amazon AWS Elastic Beanstalk with SQS (Simple Queue Services) – including Celery Beat!.
First, the files I ended up with, then the explanation of what I understand (some of those things still remain being a mystery)

STEP 0:
Install using the following:
pip install -U celery[sqs]
pip install django-celery-beat

I’m using the following versions of the apps:
boto (2.45.0)
botocore (1.4.63)
celery (4.0.2)
Django (1.10.1)
django-celery-beat (1.0.1)
kombu (4.0.2)
pip (9.0.1)
pycurl (7.43.0)
FILE: /src/PROJECT_NAME/celery.py

from __future__ import absolute_import, unicode_literals
import os
from celery import Celery

# set the default Django settings module for the 'celery' program.
# DONE IN __init__.py
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "PROJECT_NAME.settings.production")

app = Celery('PROJECT_NAME')

# Using a string here means the worker don't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
#   should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')

# Load task modules from all registered Django app configs.
app.autodiscover_tasks()


@app.task(bind=True)
def debug_task(self):
    print('Request: {0!r}'.format(self.request))

Note: My settings are in a module called settings and split in: base, development and production. That’s why the DJANGO_SETTINGS_MODULE is set like that. You might need to set it to PROJECT_NAME.settings for example.

FILE: /src/PROJECT_NAME/__init__.py

from __future__ import absolute_import, unicode_literals

# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celery_app
import os

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "PROJECT_NAME.settings.development")

# Add celery app defined in celery.py
__all__ = ['celery_app']


Now, the settings in the app:
FILE: /src/PROJECT_NAME/settings.py   (or settings/production.py)

INSTALLED_APPS = [
    ...   
    'django_celery_beat',
    ...
]


CELERY_BROKER_TRANSPORT = 'sqs'
CELERY_BROKER_TRANSPORT_OPTIONS = {
    'region': 'us-east-1',
}
CELERY_BROKER_USER = AWS_ACCESS_KEY_ID
CELERY_BROKER_PASSWORD = AWS_SECRET_ACCESS_KEY
CELERY_WORKER_STATE_DB = '/var/run/celery/worker.db'
CELERY_BEAT_SCHEDULER = 'django_celery_beat.schedulers:DatabaseScheduler'
CELERY_WORKER_PREFETCH_MULTIPLIER = 0         # See https://github.com/celery/celery/issues/3712

CELERY_DEFAULT_QUEUE = 'celery'
CELERY_QUEUES = {
    CELERY_DEFAULT_QUEUE: {
        'exchange': CELERY_DEFAULT_QUEUE,
        'binding_key': CELERY_DEFAULT_QUEUE,
    }
}

A couple things to note here: the BEAT_SCHEDULER is set like that because we want to store the schedules in the database. There are other ways to define that.
Also, while you are at it, make sure you are not using os.environ[‘xxxx’] anywhere. For example, the following (Thank to Alexander Tyapkov):
SECRET_KEY = os.environ['SECRET_KEY']
must be replaced by
SECRET_KEY = os.environ.get('SECRET_KEY', '')
After installing django_celery_beat to the INSTALLED_APPS, run python manage.py migrate to update the database.

For the user specified in BROKER_USER, you have to create the user in the Amazon AWS Console, IAM Service, and add the permission set AmazonSQSFullAccess, which defines a policy like this:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "sqs:*"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

You also need to go to the SQS Service (in the AWS Console) and create a Queue called ‘celery’. I believe it’s automatically created if you don’t anyway.

Your tasks.py files will look like this:
FILE: /src/ANY_APP/tasks.py

from __future__ import absolute_import, unicode_literals
from celery.decorators import task

from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)

@task()
def do_something():
    logger.info('******** CALLING ASYNC TASK WITH CELERY **********')
    # Your code



OK. That was the easy part, now, to make it work in Elastic Beanstalk, we need to make SQS work, which requires PyCurl and needs to be compiled
You should already have your .ebextensions folder with files and lots of things, we’ll add a bunch more there to configure everything.
FILE: /.ebextensions/01_packages.config

files:
  "/usr/local/share/pycurl-7.43.0.tar.gz" :
    mode: "000644"
    owner: root
    group: root
    source: https://pypi.python.org/packages/source/p/pycurl/pycurl-7.43.0.tar.gz

packages:
  yum:
    python34-devel: []
    libcurl-devel: []

commands:
  01_download_pip3:
    # run this before PIP installs requirements as it needs to be compiled with OpenSSL
    command: 'curl -O https://bootstrap.pypa.io/get-pip.py'
  02_install_pip3:
    # run this before PIP installs requirements as it needs to be compiled with OpenSSL
    command: 'python3 get-pip.py'

container_commands:
  03_pycurl_reinstall:
    # run this before PIP installs requirements as it needs to be compiled with OpenSSL
    # the upgrade option is because it will run after PIP installs the requirements.txt file.
    # and it needs to be done with the virtual-env activated
    command: 'source /opt/python/run/venv/bin/activate && pip3 install /usr/local/share/pycurl-7.43.0.tar.gz --global-option="--with-nss" --upgrade'



FILE: /.ebextensions/99_configure_celery.config

files:
  "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh":
    mode: "000755"
    owner: root
    group: root
    content: |
      #!/usr/bin/env bash

      # Create required directories
      sudo mkdir -p /var/log/celery/
      sudo mkdir -p /var/run/celery/
      
      # Create group called 'celery'
      sudo groupadd -f celery
      # add the user 'celery' if it doesn't exist and add it to the group with same name
      id -u celery &>/dev/null || sudo useradd -g celery celery
      # add permissions to the celery user for r+w to the folders just created
      sudo chown -R celery:celery /var/log/celery/
      sudo chown -R celery:celery /var/run/celery/

      # Get django environment variables
      celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/%/%%/g' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
      celeryenv=${celeryenv%?}

      # Create CELERY configuraiton script
      celeryconf="[program:celeryd]
      directory=/opt/python/current/app/src
      ; Set full path to celery program if using virtualenv
      command=/opt/python/run/venv/bin/celery worker –A PROJECT_NAME --loglevel=INFO --logfile="/var/log/celery/%%n%%I.log" --pidfile="/var/run/celery/%%n.pid"

      user=celery
      numprocs=1
      stdout_logfile=/var/log/celery-worker.log
      stderr_logfile=/var/log/celery-worker.log
      autostart=true
      autorestart=true
      startsecs=10

      ; Need to wait for currently executing tasks to finish at shutdown.
      ; Increase this if you have very long running tasks.
      stopwaitsecs = 60

      ; When resorting to send SIGKILL to the program to terminate it
      ; send SIGKILL to its whole process group instead,
      ; taking care of its children as well.
      killasgroup=true

      ; if rabbitmq is supervised, set its priority higher
      ; so it starts first
      priority=998

      environment=$celeryenv"
      
      
      # Create CELERY BEAT configuraiton script
      celerybeatconf="[program:celerybeat]
      ; Set full path to celery program if using virtualenv
      command=/opt/python/run/venv/bin/celery beat –A PROJECT_NAME --loglevel=INFO --logfile="/var/log/celery/celery-beat.log" --pidfile="/var/run/celery/celery-beat.pid"
           
      directory=/opt/python/current/app/src
      user=celery
      numprocs=1
      stdout_logfile=/var/log/celerybeat.log
      stderr_logfile=/var/log/celerybeat.log
      autostart=true
      autorestart=true
      startsecs=10
     
      ; Need to wait for currently executing tasks to finish at shutdown.
      ; Increase this if you have very long running tasks.
      stopwaitsecs = 60
     
      ; When resorting to send SIGKILL to the program to terminate it
      ; send SIGKILL to its whole process group instead,
      ; taking care of its children as well.
      killasgroup=true
     
      ; if rabbitmq is supervised, set its priority higher
      ; so it starts first
      priority=999
     
      environment=$celeryenv"

      # Create the celery supervisord conf script
      echo "$celeryconf" | tee /opt/python/etc/celery.conf
      echo "$celerybeatconf" | tee /opt/python/etc/celerybeat.conf

      # Add configuration script to supervisord conf (if not there already)
      if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
          then
          echo "[include]" | tee -a /opt/python/etc/supervisord.conf
          echo "files: celery.conf celerybeat.conf" | tee -a /opt/python/etc/supervisord.conf
      fi

      # Reread the supervisord config
      supervisorctl -c /opt/python/etc/supervisord.conf reread

      # Update supervisord in cache without restarting all services
      supervisorctl -c /opt/python/etc/supervisord.conf update

      # Start/Restart celeryd through supervisord
      supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd
      supervisorctl -c /opt/python/etc/supervisord.conf restart celerybeat
      
      
      
commands:
  01_killotherbeats:
    command: "ps auxww | grep 'celery beat' | awk '{print $2}' | sudo xargs kill -9 || true"
    ignoreErrors: true
  02_restartbeat:
    command: "supervisorctl -c /opt/python/etc/supervisord.conf restart celerybeat"
    leader_only: true



(Thanks to PythonEntusiast and WBAR.)
A couple notes for this file:
- Change all the PROJECT_NAME to whatever you project is call (the name of the folder where you put the celery.py file in the first step)
- In the 2 lines where it says directory=/opt/python/current/app/src, most people would probably remove the /src. Because I placed my project (where the manage.py file is) inside a /src folder, I had to put it there.
- If started by copying the same examples than me from the Internet, I had to change stopwaitsecs to a lower number because when the celery beat restart is called, it in fact takes that time to stop. If you have it set to 600 seconds, then t


the the Amazon AWS Console, Elastic Beanstalk and download the full logs. Then check the eb-activity.log file to see what happened.

If you have any problems, or it just doesn’t work, there are a couple things you can do.  Connect using SSH (eg, Putty) and then:
- To see the celery and celery beat logs:
nano /var/log/celery-worker.log
or
nano /var/log/celerybeat.log
Also, check the files under /var/log/celery/

- Check supervisord configurations:
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf reread

- Manually restart celery or celerybeat:
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf update (if you have changed the configurations and executed a ‘reread’)

sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celerybeat

If you are having troubles with PyCurl/Curl (you will see messages in the celery logs for example), you can try to open a shell and do something like this:
source /opt/python/run/venv/bin/activate            (activate the environment)
cd /opt/python/current/app/src/               (go to project directory, might need to remove /src)
python manage.py shell
> import pycurl
> pycurl.version

In my case, I was receiving a message like
ImportError: pycurl: libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (none/other)

With this, it should be working for you, except the following 2 things:
- The logger.info(…) in the task is NOT working. So change the code to test it.
- Changes in the task scheduler done in the admin interface are not refreshed until you restart celery beat (see bug report)

Needless to say, most of this code is not mine, I came up with it after searching online for days and copying it from different places.


Comments

  1. I've tried doing so on AWS a year ago, I've spent totally 2 months to make Django + Celery work on AWS, but failed finally. Now, with your example, I will try again. Thanks

    ReplyDelete
    Replies
    1. I mean. This guy did it. But if you also succeeded I have more hopes I will too.

      Delete
    2. Hey Tuxedo! Yes, it worked but it was sort of a pain in the... Let me know if I can help.

      Delete
  2. Hi Diego Jancic, i am new with eb and sqs, i have found your tutorial and trying to implement it for my own server but i have a question: FILE: /.ebextensions/99_configure_celery.py it really and with .py? or .config?
    sorry for my english, hope you understand my question.

    ReplyDelete
    Replies
    1. Nice catch! Yes, it should be .config. I've just corrected the error! Thanks!

      Delete
    2. Hi, thank you for reply. i am follow your tutorial and added all above files but when i tried upload it to elastic beanstalks service it always fail and i am getting error with requirements.txt file (it contains new library called matplotlib==1.5.1 which make error when update). Please tell me what i am doing wrong, i will appreciate your help very much.

      Delete
    3. I don't know what you're doing wrong. You would need to try to install all the requirements in another computer (or download the logs) and see why that is failing. I don't remember having problems with matplotlib, but it might need something to be built for example.

      Delete
    4. This comment has been removed by the author.

      Delete
    5. Hi, i am enabled to send task to SQS, thank you.

      Delete
    6. Hi, i am enabled to deploy application to eb but when i see eb-activity.log the celery worker can't start. Please see error bellow:
      celeryd: changed
      celeryd: stopped
      celeryd: updated process group
      celeryd: stopped
      celeryd: ERROR (abnormal termination)

      Could you create a video tutorial about this topic? I think there will be many people interested because i can't find any tutorial on youtube

      Delete
    7. Hi can you tell me what version of python you are using, i read from here https://stackoverflow.com/questions/38566456/how-to-run-a-celery-worker-on-aws-elastic-beanstalk that supervisor doesn't work on Python 3

      Delete
    8. Hung, I'm not an expert on the topic, I believe you need to look at the logs and Google the errors for additional help. I'm using Python 3.5.2 and supervisor can run a python3 script if in a virtual env.
      Read this: https://stackoverflow.com/a/32290474/72350

      Delete
    9. Hi, Thank you for recommendation. After 4 days i am finally make it worked after many after many trials and errors. I think important step is i find a way to asses to ec2 instance using ssh after that i can see log and fix errors.

      Delete
  3. Hi, I followed your tutorial and I was able to make celery work on Elastic Beanstalk. The only part that I don't understand is why you're using the following commands: 01_killotherbeats and 02_restartbeat. You basically kill celery beat and then restart it before executing the config script run_supervised_celeryd.sh that at the end will restart celery beat. I don't think those commands are needed

    ReplyDelete
    Replies
    1. I did that a while ago. I believe it was not restarting if I didn't kill the process. Try it and if it works for you, perfect :-)

      Delete

Post a Comment

Popular posts from this blog

Stripping HTML from text in SQL Server–Version 3

  I’ve used the HTML stripping function for SQL Server available in lazycoders.blogspot.com , which is the second version of the originally published in blog.sqlauthority.com . But neither one removes the comments in this case: <!-- <b>hello world</b> --> Hello which is more or less the code that MS Word generates. Well, the function with that fixed is this (changes are in bold): ALTER FUNCTION [dbo].[DeHtmlize] ( @HTMLText varchar ( MAX ) ) RETURNS varchar ( MAX ) AS BEGIN DECLARE @ Start int DECLARE @ End int DECLARE @Length int -- Replace the HTML entity &amp; with the '&' character (this needs to be done first, as -- '&' might be double encoded as '&amp;amp;') SET @ Start = CHARINDEX( '&amp;' , @HTMLText) SET @ End = @ Start + 4 SET @Length = (@ End - @ Start ) + 1 WHILE (@ Start > 0 AND @ End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @ Start , @Le