summaryrefslogtreecommitdiff
path: root/content/posts/how-to-create-a-celery-task-that-fills-out-fields-using-django.md
diff options
context:
space:
mode:
Diffstat (limited to 'content/posts/how-to-create-a-celery-task-that-fills-out-fields-using-django.md')
-rw-r--r--content/posts/how-to-create-a-celery-task-that-fills-out-fields-using-django.md714
1 files changed, 714 insertions, 0 deletions
diff --git a/content/posts/how-to-create-a-celery-task-that-fills-out-fields-using-django.md b/content/posts/how-to-create-a-celery-task-that-fills-out-fields-using-django.md
new file mode 100644
index 0000000..de7595c
--- /dev/null
+++ b/content/posts/how-to-create-a-celery-task-that-fills-out-fields-using-django.md
@@ -0,0 +1,714 @@
++++
+title = "How to create a celery task that fills out fields using Django"
+author = ["Roger Gonzalez"]
+date = 2020-11-29T15:48:48-03:00
+lastmod = 2021-01-10T12:27:56-03:00
+tags = ["python", "celery", "django", "docker", "dockercompose"]
+categories = ["programming"]
+draft = false
+weight = 2002
++++
+
+Hi everyone!
+
+It's been way too long, I know. In this oportunity, I wanted to talk about
+asynchronicity in Django, but first, lets set up the stage:
+
+Imagine you are working in a library and you have to develop an app that allows
+users to register new books using a barcode scanner. The system has to read the
+ISBN code and use an external resource to fill in the information (title, pages,
+authors, etc.). You don't need the complete book information to continue, so the
+external resource can't hold the request.
+
+**How can you process the external request asynchronously?** 🤔
+
+For that, we need Celery.
+
+
+## What is Celery? {#what-is-celery}
+
+[Celery](https://docs.celeryproject.org/en/stable/) is a "distributed task queue". Fron their website:
+
+> Celery is a simple, flexible, and reliable distributed system to process vast
+amounts of messages, while providing operations with the tools required to
+maintain such a system.
+
+So Celery can get messages from external processes via a broker (like [Redis](https://redis.io/)),
+and process them.
+
+The best thing is: Django can connect to Celery very easily, and Celery can
+access Django models without any problem. Sweet!
+
+
+## Lets code! {#lets-code}
+
+Let's assume our project structure is the following:
+
+```nil
+- app/
+ - manage.py
+ - app/
+ - __init__.py
+ - settings.py
+ - urls.py
+```
+
+
+### Celery {#celery}
+
+First, we need to set up Celery in Django. Thankfully, [Celery has an excellent
+documentation](https://docs.celeryproject.org/en/stable/django/first-steps-with-django.html#using-celery-with-django), but the entire process can be summarized to this:
+
+In `app/app/celery.py`:
+
+```python
+import os
+
+from celery import Celery
+
+# set the default Django settings module for the 'celery' program.
+os.environ.setdefault("DJANGO_SETTINGS_MODULE", "app.settings")
+
+app = Celery("app")
+
+# Using a string here means the worker doesn't have to serialize
+# the configuration object to child processes.
+# - namespace='CELERY' means all celery-related configuration keys
+# should have a `CELERY_` prefix.
+app.config_from_object("django.conf:settings", namespace="CELERY")
+
+# Load task modules from all registered Django app configs.
+app.autodiscover_tasks()
+
+
+@app.task(bind=True)
+def debug_task(self):
+ """A debug celery task"""
+ print(f"Request: {self.request!r}")
+```
+
+What's going on here?
+
+- First, we set the `DJANGO_SETTINGS_MODULE` environment variable
+- Then, we instantiate our Celery app using the `app` variable.
+- Then, we tell Celery to look for celery configurations in the Django settings
+ with the `CELERY` prefix. We will see this later in the post.
+- Finally, we start Celery's `autodiscover_tasks`. Celery is now going to look for
+ `tasks.py` files in the Django apps.
+
+In `/app/app/__init__.py`:
+
+```python
+# This will make sure the app is always imported when
+# Django starts so that shared_task will use this app.
+from .celery import app as celery_app
+
+__all__ = ("celery_app",)
+```
+
+Finally in `/app/app/settings.py`:
+
+```python
+...
+# Celery
+CELERY_BROKER_URL = env.str("CELERY_BROKER_URL")
+CELERY_TIMEZONE = env.str("CELERY_TIMEZONE", "America/Montevideo")
+CELERY_RESULT_BACKEND = "django-db"
+CELERY_CACHE_BACKEND = "django-cache"
+...
+```
+
+Here, we can see that the `CELERY` prefix is used for all Celery configurations,
+because on `celery.py` we told Celery the prefix was `CELERY`
+
+With this, Celery is fully configured. 🎉
+
+
+### Django {#django}
+
+First, let's create a `core` app. This is going to be used for everything common
+in the app
+
+```bash
+$ python manage.py startapp core
+```
+
+On `core/models.py`, lets set the following models:
+
+```python
+"""
+Models
+"""
+import uuid
+
+from django.db import models
+
+
+class TimeStampMixin(models.Model):
+ """
+ A base model that all the other models inherit from.
+ This is to add created_at and updated_at to every model.
+ """
+
+ id = models.UUIDField(primary_key=True, default=uuid.uuid4)
+ created_at = models.DateTimeField(auto_now_add=True)
+ updated_at = models.DateTimeField(auto_now=True)
+
+ class Meta:
+ """Setting up the abstract model class"""
+
+ abstract = True
+
+
+class BaseAttributesModel(TimeStampMixin):
+ """
+ A base model that sets up all the attibutes models
+ """
+
+ name = models.CharField(max_length=255)
+ outside_url = models.URLField()
+
+ def __str__(self):
+ return self.name
+
+ class Meta:
+ abstract = True
+```
+
+Then, let's create a new app for our books:
+
+```bash
+python manage.py startapp books
+```
+
+And on `books/models.py`, let's create the following models:
+
+```python
+"""
+Books models
+"""
+from django.db import models
+
+from core.models import TimeStampMixin, BaseAttributesModel
+
+
+class Author(BaseAttributesModel):
+ """Defines the Author model"""
+
+
+class People(BaseAttributesModel):
+ """Defines the People model"""
+
+
+class Subject(BaseAttributesModel):
+ """Defines the Subject model"""
+
+
+class Book(TimeStampMixin):
+ """Defines the Book model"""
+
+ isbn = models.CharField(max_length=13, unique=True)
+ title = models.CharField(max_length=255, blank=True, null=True)
+ pages = models.IntegerField(default=0)
+ publish_date = models.CharField(max_length=255, blank=True, null=True)
+ outside_id = models.CharField(max_length=255, blank=True, null=True)
+ outside_url = models.URLField(blank=True, null=True)
+ author = models.ManyToManyField(Author, related_name="books")
+ person = models.ManyToManyField(People, related_name="books")
+ subject = models.ManyToManyField(Subject, related_name="books")
+
+ def __str__(self):
+ return f"{self.title} - {self.isbn}"
+```
+
+`Author`, `People`, and `Subject` are all `BaseAttributesModel`, so their fields
+come from the class we defined on `core/models.py`.
+
+For `Book` we add all the fields we need, plus a `many_to_many` with Author,
+People and Subjects. Because:
+
+- _Books can have many authors, and many authors can have many books_
+
+Example: [27 Books by Multiple Authors That Prove the More, the Merrier](https://www.epicreads.com/blog/ya-books-multiple-authors/)
+
+- _Books can have many persons, and many persons can have many books_
+
+Example: Ron Weasley is in several _Harry Potter_ books
+
+- _Books can have many subjects, and many subjects can have many books_
+
+Example: A book can be a _comedy_, _fiction_, and _mystery_ at the same time
+
+Let's create `books/serializers.py`:
+
+```python
+"""
+Serializers for the Books
+"""
+from django.db.utils import IntegrityError
+from rest_framework import serializers
+
+from books.models import Book, Author, People, Subject
+from books.tasks import get_books_information
+
+
+class AuthorInBookSerializer(serializers.ModelSerializer):
+ """Serializer for the Author objects inside Book"""
+
+ class Meta:
+ model = Author
+ fields = ("id", "name")
+
+
+class PeopleInBookSerializer(serializers.ModelSerializer):
+ """Serializer for the People objects inside Book"""
+
+ class Meta:
+ model = People
+ fields = ("id", "name")
+
+
+class SubjectInBookSerializer(serializers.ModelSerializer):
+ """Serializer for the Subject objects inside Book"""
+
+ class Meta:
+ model = Subject
+ fields = ("id", "name")
+
+
+class BookSerializer(serializers.ModelSerializer):
+ """Serializer for the Book objects"""
+
+ author = AuthorInBookSerializer(many=True, read_only=True)
+ person = PeopleInBookSerializer(many=True, read_only=True)
+ subject = SubjectInBookSerializer(many=True, read_only=True)
+
+ class Meta:
+ model = Book
+ fields = "__all__"
+
+
+class BulkBookSerializer(serializers.Serializer):
+ """Serializer for bulk book creating"""
+
+ isbn = serializers.ListField()
+
+ def create(self, validated_data):
+ return_dict = {"isbn": []}
+ for isbn in validated_data["isbn"]:
+ try:
+ Book.objects.create(isbn=isbn)
+ return_dict["isbn"].append(isbn)
+ except IntegrityError as error:
+ pass
+
+ return return_dict
+
+ def update(self, instance, validated_data):
+ """The update method needs to be overwritten on
+ serializers.Serializer. Since we don't need it, let's just
+ pass it"""
+ pass
+
+
+class BaseAttributesSerializer(serializers.ModelSerializer):
+ """A base serializer for the attributes objects"""
+
+ books = BookSerializer(many=True, read_only=True)
+
+
+class AuthorSerializer(BaseAttributesSerializer):
+ """Serializer for the Author objects"""
+
+ class Meta:
+ model = Author
+ fields = ("id", "name", "outside_url", "books")
+
+
+class PeopleSerializer(BaseAttributesSerializer):
+ """Serializer for the Author objects"""
+
+ class Meta:
+ model = People
+ fields = ("id", "name", "outside_url", "books")
+
+
+class SubjectSerializer(BaseAttributesSerializer):
+ """Serializer for the Author objects"""
+
+ class Meta:
+ model = Subject
+ fields = ("id", "name", "outside_url", "books")
+```
+
+The most important serializer here is `BulkBookSerializer`. It's going to get an
+ISBN list and then bulk create them in the DB.
+
+On `books/views.py`, we can set the following views:
+
+```python
+"""
+Views for the Books
+"""
+from rest_framework import viewsets, mixins, generics
+from rest_framework.permissions import AllowAny
+
+from books.models import Book, Author, People, Subject
+from books.serializers import (
+ BookSerializer,
+ BulkBookSerializer,
+ AuthorSerializer,
+ PeopleSerializer,
+ SubjectSerializer,
+)
+
+
+class BookViewSet(
+ viewsets.GenericViewSet,
+ mixins.ListModelMixin,
+ mixins.RetrieveModelMixin,
+):
+ """
+ A view to list Books and retrieve books by ID
+ """
+
+ permission_classes = (AllowAny,)
+ queryset = Book.objects.all()
+ serializer_class = BookSerializer
+
+
+class AuthorViewSet(
+ viewsets.GenericViewSet,
+ mixins.ListModelMixin,
+ mixins.RetrieveModelMixin,
+):
+ """
+ A view to list Authors and retrieve authors by ID
+ """
+
+ permission_classes = (AllowAny,)
+ queryset = Author.objects.all()
+ serializer_class = AuthorSerializer
+
+
+class PeopleViewSet(
+ viewsets.GenericViewSet,
+ mixins.ListModelMixin,
+ mixins.RetrieveModelMixin,
+):
+ """
+ A view to list People and retrieve people by ID
+ """
+
+ permission_classes = (AllowAny,)
+ queryset = People.objects.all()
+ serializer_class = PeopleSerializer
+
+
+class SubjectViewSet(
+ viewsets.GenericViewSet,
+ mixins.ListModelMixin,
+ mixins.RetrieveModelMixin,
+):
+ """
+ A view to list Subject and retrieve subject by ID
+ """
+
+ permission_classes = (AllowAny,)
+ queryset = Subject.objects.all()
+ serializer_class = SubjectSerializer
+
+
+class BulkCreateBook(generics.CreateAPIView):
+ """A view to bulk create books"""
+
+ permission_classes = (AllowAny,)
+ queryset = Book.objects.all()
+ serializer_class = BulkBookSerializer
+```
+
+Easy enough, endpoints for getting books, authors, people and subjects and an
+endpoint to post ISBN codes in a list.
+
+We can check swagger to see all the endpoints created:
+
+{{< figure src="/2020-11-29-115634.png" >}}
+
+Now, **how are we going to get all the data?** 🤔
+
+
+## Creating a Celery task {#creating-a-celery-task}
+
+Now that we have our project structure done, we need to create the asynchronous
+task Celery is going to run to populate our fields.
+
+To get the information, we are going to use the [OpenLibrary API](https://openlibrary.org/dev/docs/api/books%22%22%22).
+
+First, we need to create `books/tasks.py`:
+
+```python
+"""
+Celery tasks
+"""
+import requests
+from celery import shared_task
+
+from books.models import Book, Author, People, Subject
+
+
+def get_book_info(isbn):
+ """Gets a book information by using its ISBN.
+ More info here https://openlibrary.org/dev/docs/api/books"""
+ return requests.get(
+ f"https://openlibrary.org/api/books?jscmd=data&format=json&bibkeys=ISBN:{isbn}"
+ ).json()
+
+
+def generate_many_to_many(model, iterable):
+ """Generates the many to many relationships to books"""
+ return_items = []
+ for item in iterable:
+ relation = model.objects.get_or_create(
+ name=item["name"], outside_url=item["url"]
+ )
+ return_items.append(relation)
+ return return_items
+
+
+@shared_task
+def get_books_information(isbn):
+ """Gets a book information"""
+
+ # First, we get the book information by its isbn
+ book_info = get_book_info(isbn)
+
+ if len(book_info) > 0:
+ # Then, we need to access the json itself. Since the first key is dynamic,
+ # we get it by accessing the json keys
+ key = list(book_info.keys())[0]
+ book_info = book_info[key]
+
+ # Since the book was created on the Serializer, we get the book to edit
+ book = Book.objects.get(isbn=isbn)
+
+ # Set the fields we want from the API into the Book
+ book.title = book_info["title"]
+ book.publish_date = book_info["publish_date"]
+ book.outside_id = book_info["key"]
+ book.outside_url = book_info["url"]
+
+ # For the optional fields, we try to get them first
+ try:
+ book.pages = book_info["number_of_pages"]
+ except:
+ book.pages = 0
+
+ try:
+ authors = book_info["authors"]
+ except:
+ authors = []
+
+ try:
+ people = book_info["subject_people"]
+ except:
+ people = []
+
+ try:
+ subjects = book_info["subjects"]
+ except:
+ subjects = []
+
+ # And generate the appropiate many_to_many relationships
+ authors_info = generate_many_to_many(Author, authors)
+ people_info = generate_many_to_many(People, people)
+ subjects_info = generate_many_to_many(Subject, subjects)
+
+ # Once the relationships are generated, we save them in the book instance
+ for author in authors_info:
+ book.author.add(author[0])
+
+ for person in people_info:
+ book.person.add(person[0])
+
+ for subject in subjects_info:
+ book.subject.add(subject[0])
+
+ # Finally, we save the Book
+ book.save()
+
+ else:
+ raise ValueError("Book not found")
+```
+
+So when are we going to run this task? We need to run it in the **serializer**.
+
+On `books/serializers.py`:
+
+```python
+from books.tasks import get_books_information
+...
+class BulkBookSerializer(serializers.Serializer):
+ """Serializer for bulk book creating"""
+
+ isbn = serializers.ListField()
+
+ def create(self, validated_data):
+ return_dict = {"isbn": []}
+ for isbn in validated_data["isbn"]:
+ try:
+ Book.objects.create(isbn=isbn)
+ # We need to add this line
+ get_books_information.delay(isbn)
+ #################################
+ return_dict["isbn"].append(isbn)
+ except IntegrityError as error:
+ pass
+
+ return return_dict
+
+ def update(self, instance, validated_data):
+ pass
+```
+
+To trigger the Celery tasks, we need to call our function with the `delay`
+function, which has been added by the `shared_task` decorator. This tells Celery
+to start running the task in the background since we don't need the result
+right now.
+
+
+## Docker configuration {#docker-configuration}
+
+There are a lot of moving parts we need for this to work, so I created a
+`docker-compose` configuration to help with the stack. I'm using the package
+[django-environ](https://github.com/joke2k/django-environ) to handle all environment variables.
+
+On `docker-compose.yml`:
+
+```yaml
+version: "3.7"
+
+x-common-variables: &common-variables
+ DJANGO_SETTINGS_MODULE: "app.settings"
+ CELERY_BROKER_URL: "redis://redis:6379"
+ DEFAULT_DATABASE: "psql://postgres:postgres@db:5432/app"
+ DEBUG: "True"
+ ALLOWED_HOSTS: "*,test"
+ SECRET_KEY: "this-is-a-secret-key-shhhhh"
+
+services:
+ app:
+ build:
+ context: .
+ volumes:
+ - ./app:/app
+ environment:
+ <<: *common-variables
+ ports:
+ - 8000:8000
+ command: >
+ sh -c "python manage.py migrate &&
+ python manage.py runserver 0.0.0.0:8000"
+ depends_on:
+ - db
+ - redis
+
+ celery-worker:
+ build:
+ context: .
+ volumes:
+ - ./app:/app
+ environment:
+ <<: *common-variables
+ command: celery --app app worker -l info
+ depends_on:
+ - db
+ - redis
+
+ db:
+ image: postgres:12.4-alpine
+ environment:
+ - POSTGRES_DB=app
+ - POSRGRES_USER=postgres
+ - POSTGRES_PASSWORD=postgres
+
+ redis:
+ image: redis:6.0.8-alpine
+```
+
+This is going to set our app, DB, Redis, and most importantly our celery-worker
+instance. To run Celery, we need to execute:
+
+```bash
+$ celery --app app worker -l info
+```
+
+So we are going to run that command on a separate docker instance
+
+
+## Testing it out {#testing-it-out}
+
+If we run
+
+```bash
+$ docker-compose up
+```
+
+on our project root folder, the project should come up as usual. You should be
+able to open <http://localhost:8000/admin> and enter the admin panel.
+
+To test the app, you can use a curl command from the terminal:
+
+```bash
+curl -X POST "http://localhost:8000/books/bulk-create" -H "accept: application/json" \
+ -H "Content-Type: application/json" -d "{ \"isbn\": [ \"9780345418913\", \
+ \"9780451524935\", \"9780451526342\", \"9781101990322\", \"9780143133438\" ]}"
+```
+
+{{< figure src="/2020-11-29-124654.png" >}}
+
+This call lasted 147ms, according to my terminal.
+
+This should return instantly, creating 15 new books and 15 new Celery tasks, one
+for each book. You can also see tasks results in the Django admin using the
+`django-celery-results` package, check its [documentation](https://docs.celeryproject.org/en/stable/django/first-steps-with-django.html#django-celery-results-using-the-django-orm-cache-as-a-result-backend).
+
+{{< figure src="/2020-11-29-124734.png" >}}
+
+Celery tasks list, using `django-celery-results`
+
+{{< figure src="/2020-11-29-124751.png" >}}
+
+Created and processed books list
+
+{{< figure src="/2020-11-29-124813.png" >}}
+
+Single book information
+
+{{< figure src="/2020-11-29-124834.png" >}}
+
+People in books
+
+{{< figure src="/2020-11-29-124851.png" >}}
+
+Authors
+
+{{< figure src="/2020-11-29-124906.png" >}}
+
+Themes
+
+And also, you can interact with the endpoints to search by author, theme,
+people, and book. This should change depending on how you created your URLs.
+
+
+## That's it! {#that-s-it}
+
+This surely was a **LONG** one, but it has been a very good one in my opinion.
+I've used Celery in the past for multiple things, from sending emails in the
+background to triggering scraping jobs and [running scheduled tasks](https://docs.celeryproject.org/en/stable/userguide/periodic-tasks.html#using-custom-scheduler-classes) (like a [unix
+cronjob](https://en.wikipedia.org/wiki/Cron))
+
+You can check the complete project in my git instance here:
+<https://git.rogs.me/me/books-app> or in GitLab here:
+<https://gitlab.com/rogs/books-app>
+
+If you have any doubts, let me know! I always answer emails and/or messages.