Finally after many many days of trying to make it work and reading thousand of pages, I got Celery working with django 1.10 in Amazon AWS Elastic Beanstalk with SQS (Simple Queue Services) – including Celery Beat!.
First, the files I ended up with, then the explanation of what I understand (some of those things still remain being a mystery)
STEP 0:
Install using the following:
pip install -U celery[sqs]
pip install django-celery-beat
I’m using the following versions of the apps:
boto (2.45.0)
botocore (1.4.63)
celery (4.0.2)
Django (1.10.1)
django-celery-beat (1.0.1)
kombu (4.0.2)
pip (9.0.1)
pycurl (7.43.0)
FILE: /src/PROJECT_NAME/celery.py
Note: My settings are in a module called settings and split in: base, development and production. That’s why the DJANGO_SETTINGS_MODULE is set like that. You might need to set it to PROJECT_NAME.settings for example.
FILE: /src/PROJECT_NAME/__init__.py
Now, the settings in the app:
FILE: /src/PROJECT_NAME/settings.py (or settings/production.py)
A couple things to note here: the BEAT_SCHEDULER is set like that because we want to store the schedules in the database. There are other ways to define that.
Also, while you are at it, make sure you are not using os.environ[‘xxxx’] anywhere. For example, the following (Thank to Alexander Tyapkov):
SECRET_KEY = os.environ['SECRET_KEY']
must be replaced by
SECRET_KEY = os.environ.get('SECRET_KEY', '')
After installing django_celery_beat to the INSTALLED_APPS, run python manage.py migrate to update the database.
For the user specified in BROKER_USER, you have to create the user in the Amazon AWS Console, IAM Service, and add the permission set AmazonSQSFullAccess, which defines a policy like this:
You also need to go to the SQS Service (in the AWS Console) and create a Queue called ‘celery’. I believe it’s automatically created if you don’t anyway.
Your tasks.py files will look like this:
FILE: /src/ANY_APP/tasks.py
OK. That was the easy part, now, to make it work in Elastic Beanstalk, we need to make SQS work, which requires PyCurl and needs to be compiled
You should already have your .ebextensions folder with files and lots of things, we’ll add a bunch more there to configure everything.
FILE: /.ebextensions/01_packages.config
FILE: /.ebextensions/99_configure_celery.config
(Thanks to PythonEntusiast and WBAR.)
A couple notes for this file:
- Change all the PROJECT_NAME to whatever you project is call (the name of the folder where you put the celery.py file in the first step)
- In the 2 lines where it says directory=/opt/python/current/app/src, most people would probably remove the /src. Because I placed my project (where the manage.py file is) inside a /src folder, I had to put it there.
- If started by copying the same examples than me from the Internet, I had to change stopwaitsecs to a lower number because when the celery beat restart is called, it in fact takes that time to stop. If you have it set to 600 seconds, then t
the the Amazon AWS Console, Elastic Beanstalk and download the full logs. Then check the eb-activity.log file to see what happened.
If you have any problems, or it just doesn’t work, there are a couple things you can do. Connect using SSH (eg, Putty) and then:
- To see the celery and celery beat logs:
nano /var/log/celery-worker.log
or
nano /var/log/celerybeat.log
Also, check the files under /var/log/celery/
- Check supervisord configurations:
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf reread
- Manually restart celery or celerybeat:
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf update (if you have changed the configurations and executed a ‘reread’)
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celerybeat
If you are having troubles with PyCurl/Curl (you will see messages in the celery logs for example), you can try to open a shell and do something like this:
source /opt/python/run/venv/bin/activate (activate the environment)
cd /opt/python/current/app/src/ (go to project directory, might need to remove /src)
python manage.py shell
> import pycurl
> pycurl.version
In my case, I was receiving a message like
With this, it should be working for you, except the following 2 things:
- The logger.info(…) in the task is NOT working. So change the code to test it.
- Changes in the task scheduler done in the admin interface are not refreshed until you restart celery beat (see bug report)
Needless to say, most of this code is not mine, I came up with it after searching online for days and copying it from different places.
First, the files I ended up with, then the explanation of what I understand (some of those things still remain being a mystery)
STEP 0:
Install using the following:
pip install -U celery[sqs]
pip install django-celery-beat
I’m using the following versions of the apps:
boto (2.45.0)
botocore (1.4.63)
celery (4.0.2)
Django (1.10.1)
django-celery-beat (1.0.1)
kombu (4.0.2)
pip (9.0.1)
pycurl (7.43.0)
FILE: /src/PROJECT_NAME/celery.py
from __future__ import absolute_import, unicode_literals import os from celery import Celery # set the default Django settings module for the 'celery' program. # DONE IN __init__.py os.environ.setdefault("DJANGO_SETTINGS_MODULE", "PROJECT_NAME.settings.production") app = Celery('PROJECT_NAME') # Using a string here means the worker don't have to serialize # the configuration object to child processes. # - namespace='CELERY' means all celery-related configuration keys # should have a `CELERY_` prefix. app.config_from_object('django.conf:settings', namespace='CELERY') # Load task modules from all registered Django app configs. app.autodiscover_tasks() @app.task(bind=True) def debug_task(self): print('Request: {0!r}'.format(self.request))
Note: My settings are in a module called settings and split in: base, development and production. That’s why the DJANGO_SETTINGS_MODULE is set like that. You might need to set it to PROJECT_NAME.settings for example.
FILE: /src/PROJECT_NAME/__init__.py
from __future__ import absolute_import, unicode_literals # This will make sure the app is always imported when # Django starts so that shared_task will use this app. from .celery import app as celery_app import os os.environ.setdefault("DJANGO_SETTINGS_MODULE", "PROJECT_NAME.settings.development") # Add celery app defined in celery.py __all__ = ['celery_app']
Now, the settings in the app:
FILE: /src/PROJECT_NAME/settings.py (or settings/production.py)
INSTALLED_APPS = [
...
'django_celery_beat',
...
]
CELERY_BROKER_TRANSPORT = 'sqs'
CELERY_BROKER_TRANSPORT_OPTIONS = {
'region': 'us-east-1',
}
CELERY_BROKER_USER = AWS_ACCESS_KEY_ID
CELERY_BROKER_PASSWORD = AWS_SECRET_ACCESS_KEY
CELERY_WORKER_STATE_DB = '/var/run/celery/worker.db'
CELERY_BEAT_SCHEDULER = 'django_celery_beat.schedulers:DatabaseScheduler'
CELERY_WORKER_PREFETCH_MULTIPLIER = 0 # See https://github.com/celery/celery/issues/3712
CELERY_DEFAULT_QUEUE = 'celery' CELERY_QUEUES = { CELERY_DEFAULT_QUEUE: { 'exchange': CELERY_DEFAULT_QUEUE, 'binding_key': CELERY_DEFAULT_QUEUE, } }
A couple things to note here: the BEAT_SCHEDULER is set like that because we want to store the schedules in the database. There are other ways to define that.
Also, while you are at it, make sure you are not using os.environ[‘xxxx’] anywhere. For example, the following (Thank to Alexander Tyapkov):
SECRET_KEY = os.environ['SECRET_KEY']
must be replaced by
SECRET_KEY = os.environ.get('SECRET_KEY', '')
After installing django_celery_beat to the INSTALLED_APPS, run python manage.py migrate to update the database.
For the user specified in BROKER_USER, you have to create the user in the Amazon AWS Console, IAM Service, and add the permission set AmazonSQSFullAccess, which defines a policy like this:
{ "Version": "2012-10-17", "Statement": [ { "Action": [ "sqs:*" ], "Effect": "Allow", "Resource": "*" } ] }
You also need to go to the SQS Service (in the AWS Console) and create a Queue called ‘celery’. I believe it’s automatically created if you don’t anyway.
Your tasks.py files will look like this:
FILE: /src/ANY_APP/tasks.py
from __future__ import absolute_import, unicode_literals
from celery.decorators import task from celery.utils.log import get_task_logger logger = get_task_logger(__name__) @task() def do_something(): logger.info('******** CALLING ASYNC TASK WITH CELERY **********') # Your code
OK. That was the easy part, now, to make it work in Elastic Beanstalk, we need to make SQS work, which requires PyCurl and needs to be compiled
You should already have your .ebextensions folder with files and lots of things, we’ll add a bunch more there to configure everything.
FILE: /.ebextensions/01_packages.config
files: "/usr/local/share/pycurl-7.43.0.tar.gz" : mode: "000644" owner: root group: root source: https://pypi.python.org/packages/source/p/pycurl/pycurl-7.43.0.tar.gz packages: yum: python34-devel: [] libcurl-devel: [] commands: 01_download_pip3: # run this before PIP installs requirements as it needs to be compiled with OpenSSL command: 'curl -O https://bootstrap.pypa.io/get-pip.py' 02_install_pip3: # run this before PIP installs requirements as it needs to be compiled with OpenSSL command: 'python3 get-pip.py' container_commands: 03_pycurl_reinstall: # run this before PIP installs requirements as it needs to be compiled with OpenSSL # the upgrade option is because it will run after PIP installs the requirements.txt file. # and it needs to be done with the virtual-env activated command: 'source /opt/python/run/venv/bin/activate && pip3 install /usr/local/share/pycurl-7.43.0.tar.gz --global-option="--with-nss" --upgrade'
FILE: /.ebextensions/99_configure_celery.config
files: "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh": mode: "000755" owner: root group: root content: | #!/usr/bin/env bash # Create required directories sudo mkdir -p /var/log/celery/ sudo mkdir -p /var/run/celery/ # Create group called 'celery' sudo groupadd -f celery # add the user 'celery' if it doesn't exist and add it to the group with same name id -u celery &>/dev/null || sudo useradd -g celery celery # add permissions to the celery user for r+w to the folders just created sudo chown -R celery:celery /var/log/celery/ sudo chown -R celery:celery /var/run/celery/ # Get django environment variables celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/%/%%/g' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'` celeryenv=${celeryenv%?} # Create CELERY configuraiton script celeryconf="[program:celeryd] directory=/opt/python/current/app/src ; Set full path to celery program if using virtualenv command=/opt/python/run/venv/bin/celery worker –A PROJECT_NAME --loglevel=INFO --logfile="/var/log/celery/%%n%%I.log" --pidfile="/var/run/celery/%%n.pid" user=celery numprocs=1 stdout_logfile=/var/log/celery-worker.log stderr_logfile=/var/log/celery-worker.log autostart=true autorestart=true startsecs=10 ; Need to wait for currently executing tasks to finish at shutdown. ; Increase this if you have very long running tasks. stopwaitsecs = 60 ; When resorting to send SIGKILL to the program to terminate it ; send SIGKILL to its whole process group instead, ; taking care of its children as well. killasgroup=true ; if rabbitmq is supervised, set its priority higher ; so it starts first priority=998 environment=$celeryenv" # Create CELERY BEAT configuraiton script celerybeatconf="[program:celerybeat] ; Set full path to celery program if using virtualenv command=/opt/python/run/venv/bin/celery beat –A PROJECT_NAME --loglevel=INFO --logfile="/var/log/celery/celery-beat.log" --pidfile="/var/run/celery/celery-beat.pid" directory=/opt/python/current/app/src user=celery numprocs=1 stdout_logfile=/var/log/celerybeat.log stderr_logfile=/var/log/celerybeat.log autostart=true autorestart=true startsecs=10 ; Need to wait for currently executing tasks to finish at shutdown. ; Increase this if you have very long running tasks. stopwaitsecs = 60 ; When resorting to send SIGKILL to the program to terminate it ; send SIGKILL to its whole process group instead, ; taking care of its children as well. killasgroup=true ; if rabbitmq is supervised, set its priority higher ; so it starts first priority=999 environment=$celeryenv" # Create the celery supervisord conf script echo "$celeryconf" | tee /opt/python/etc/celery.conf echo "$celerybeatconf" | tee /opt/python/etc/celerybeat.conf # Add configuration script to supervisord conf (if not there already) if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf then echo "[include]" | tee -a /opt/python/etc/supervisord.conf echo "files: celery.conf celerybeat.conf" | tee -a /opt/python/etc/supervisord.conf fi # Reread the supervisord config supervisorctl -c /opt/python/etc/supervisord.conf reread # Update supervisord in cache without restarting all services supervisorctl -c /opt/python/etc/supervisord.conf update # Start/Restart celeryd through supervisord supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd supervisorctl -c /opt/python/etc/supervisord.conf restart celerybeat commands: 01_killotherbeats: command: "ps auxww | grep 'celery beat' | awk '{print $2}' | sudo xargs kill -9 || true" ignoreErrors: true 02_restartbeat: command: "supervisorctl -c /opt/python/etc/supervisord.conf restart celerybeat" leader_only: true
(Thanks to PythonEntusiast and WBAR.)
A couple notes for this file:
- Change all the PROJECT_NAME to whatever you project is call (the name of the folder where you put the celery.py file in the first step)
- In the 2 lines where it says directory=/opt/python/current/app/src, most people would probably remove the /src. Because I placed my project (where the manage.py file is) inside a /src folder, I had to put it there.
- If started by copying the same examples than me from the Internet, I had to change stopwaitsecs to a lower number because when the celery beat restart is called, it in fact takes that time to stop. If you have it set to 600 seconds, then t
the the Amazon AWS Console, Elastic Beanstalk and download the full logs. Then check the eb-activity.log file to see what happened.
If you have any problems, or it just doesn’t work, there are a couple things you can do. Connect using SSH (eg, Putty) and then:
- To see the celery and celery beat logs:
nano /var/log/celery-worker.log
or
nano /var/log/celerybeat.log
Also, check the files under /var/log/celery/
- Check supervisord configurations:
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf reread
- Manually restart celery or celerybeat:
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf update (if you have changed the configurations and executed a ‘reread’)
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celerybeat
If you are having troubles with PyCurl/Curl (you will see messages in the celery logs for example), you can try to open a shell and do something like this:
source /opt/python/run/venv/bin/activate (activate the environment)
cd /opt/python/current/app/src/ (go to project directory, might need to remove /src)
python manage.py shell
> import pycurl
> pycurl.version
In my case, I was receiving a message like
ImportError: pycurl: libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (none/other)
With this, it should be working for you, except the following 2 things:
- The logger.info(…) in the task is NOT working. So change the code to test it.
- Changes in the task scheduler done in the admin interface are not refreshed until you restart celery beat (see bug report)
Needless to say, most of this code is not mine, I came up with it after searching online for days and copying it from different places.
I've tried doing so on AWS a year ago, I've spent totally 2 months to make Django + Celery work on AWS, but failed finally. Now, with your example, I will try again. Thanks
ReplyDeleteDid you get it to work now?
DeleteI mean. This guy did it. But if you also succeeded I have more hopes I will too.
DeleteHey Tuxedo! Yes, it worked but it was sort of a pain in the... Let me know if I can help.
DeleteHi Diego Jancic, i am new with eb and sqs, i have found your tutorial and trying to implement it for my own server but i have a question: FILE: /.ebextensions/99_configure_celery.py it really and with .py? or .config?
ReplyDeletesorry for my english, hope you understand my question.
Nice catch! Yes, it should be .config. I've just corrected the error! Thanks!
DeleteHi, thank you for reply. i am follow your tutorial and added all above files but when i tried upload it to elastic beanstalks service it always fail and i am getting error with requirements.txt file (it contains new library called matplotlib==1.5.1 which make error when update). Please tell me what i am doing wrong, i will appreciate your help very much.
DeleteI don't know what you're doing wrong. You would need to try to install all the requirements in another computer (or download the logs) and see why that is failing. I don't remember having problems with matplotlib, but it might need something to be built for example.
DeleteThis comment has been removed by the author.
DeleteHi, i am enabled to send task to SQS, thank you.
DeleteHi, i am enabled to deploy application to eb but when i see eb-activity.log the celery worker can't start. Please see error bellow:
Deleteceleryd: changed
celeryd: stopped
celeryd: updated process group
celeryd: stopped
celeryd: ERROR (abnormal termination)
Could you create a video tutorial about this topic? I think there will be many people interested because i can't find any tutorial on youtube
Hi can you tell me what version of python you are using, i read from here https://stackoverflow.com/questions/38566456/how-to-run-a-celery-worker-on-aws-elastic-beanstalk that supervisor doesn't work on Python 3
DeleteHung, I'm not an expert on the topic, I believe you need to look at the logs and Google the errors for additional help. I'm using Python 3.5.2 and supervisor can run a python3 script if in a virtual env.
DeleteRead this: https://stackoverflow.com/a/32290474/72350
Hi, Thank you for recommendation. After 4 days i am finally make it worked after many after many trials and errors. I think important step is i find a way to asses to ec2 instance using ssh after that i can see log and fix errors.
DeleteHi, I followed your tutorial and I was able to make celery work on Elastic Beanstalk. The only part that I don't understand is why you're using the following commands: 01_killotherbeats and 02_restartbeat. You basically kill celery beat and then restart it before executing the config script run_supervised_celeryd.sh that at the end will restart celery beat. I don't think those commands are needed
ReplyDeleteI did that a while ago. I believe it was not restarting if I didn't kill the process. Try it and if it works for you, perfect :-)
Delete