Behind the Scenes
by Cory on November 19, 2018 1:12 AM
I spent the last two weeks upgrading Slacker Paradise! I realized that I wanted a new feature where I could queue up "Highlight of the Day" links and have it update the front page about every day. Up until now, I've been keeping a backlog of links in an email and then just posting them manually everyday. This is kind of stupid.
I thought it would be a relatively simple thing but it turns out my google searches turned up otherwise. There's this python framework called "celery" that lets you create jobs and stuff them in working queues and then spawn multiple threads to take from the queues and operate on them. You can do a lot of configuration including setting the jobs to not be executed until a specific time/date. This relates to my highlight link scheduling because I could write a method to publish a new highlight and then encapsulate that into a job and schedule them to be executed on consecutive days in the future.
The thing is, celery requires a lot of configuration and setup. Also I need a "broker" which is apparently a framework that lets the threads and queues communicate with each other. I decided to use "redis", another python framework for this. So get this, to schedule tasks to be done in the future, I needed to use celery and redis. And not only this but to get the whole thing working I need to start 3 processes on the server--one for the django webserver, another for celery, and then finally a third for redis. So yeah that's complicated.
But before I could do that, I needed to install celery. I tried to do that at first, but it wasn't compatible with my three year old version of django. So that meant I needed to upgrade ALL the packages I was using the latest version, include python! I had to fix a few things in the site code to fix some compatibility issues when I upgraded. I also switched servers from gunicorn to waitress because apparently gunicorn isn't friendly to other programs running in separate processes on the same server. UGH.
I spent about two or three days upgrading the server packages, had to remember how to use all my web tools again, and broke a couple things in the process, but finally I was able to start figuring out how to use celery.
This celery business was completely foregin to me and required a lot of weird configuration. Luckily, heroku has a guide because they support redis+celery right out of the box. It was still a lot of trial and error to get the paths and settings just right. It's a pain in the ass because you have to run celery, run redis, and run the webserver. And then when you use the python shell to create a new job, if it's not configured right the job will just sit there and hang. It took a lot of searching for me to figure out how to debug this shit.
After like a week of on and off trial and error, I end up actually getting celery to execute jobs from the python shell! All my progress basically followed fixing stupid shit like typing "slackerparadise.settings.production" instead of "slackerparadise.settings" or changing the command line to "celery -A slackerparadise" and not "celery -A slackerparadise.celery_app". The guides and documentation available online is ridiculously incomplete...
In my user dashboard, this is what I see now. It's beautiful T_T.
Once the python shell commands were working, I had to dig into my blog code after not really touching it for like 3 years. Slowly but surely all this django framework stuff came back to me. Also, this is really the first time I took a working project and looked at my own code that I don't remember writing. I wonder if I've learned enough to justify doing a re-design of the website...
Anyway, I modified the highlight model to have a boolean field called "is_published" that defaults to False. When I submit a new highlight, my server queries the last published highlight and if it was less than 24 hours ago (or there are other unpublished highlights), the website queues the new entry into the backlog by keeping is_published to false. At this point, the website schedules a new celery task to change the is_published field to true and set the eta for when to execute this task to 24 hours * the number of unpublished tasks. I also had to change the website view code to only filter and display published highlight links.
This Thought is part of Corys Blog
Welcome to the fucked up mind of Cory Parsnipson