From dea4a61cbf42ff422f1409d4342fa59017950526 Mon Sep 17 00:00:00 2001 From: Roger Gonzalez Date: Mon, 2 Nov 2020 18:37:23 -0300 Subject: Changed theme to Archie, moving blog to rogs.me --- posts.org | 209 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 209 insertions(+) create mode 100644 posts.org (limited to 'posts.org') diff --git a/posts.org b/posts.org new file mode 100644 index 0000000..28a0ff6 --- /dev/null +++ b/posts.org @@ -0,0 +1,209 @@ +#+hugo_base_dir: ./ +#+hugo_section: ./posts + +#+hugo_weight: auto +#+hugo_auto_set_lastmod: t + +#+author: Roger Gonzalez + +* Programming :@programming: +All posts in here will have the category set to /programming/. +** How I got a residency appointment thanks to Python, Selenium and Telegram :python::selenium:telegram: +:PROPERTIES: +:EXPORT_FILE_NAME: how-i-got-a-residency-appointment-thanks-to-python-and-selenium +:EXPORT_DATE: 2020-08-02 +:TLDR: keklol +:END: +Hello everyone! + +As some of you might know, I'm a Venezuelan 🇻🇪 living in Montevideo, Uruguay 🇺🇾. +I've been living here for almost a year, but because of the pandemic my +residency appointments have slowed down to a crawl, and in the middle of the +quarantine they added a new appointment system. Before, there were no +appointments, you just had to get there early and wait for the secretary to +review your files and assign someone to attend you. But now, they had +implemented an appointment system that you could do from the comfort of your own +home/office. There was just one issue: *there were never appointments available*. + +That was a little stressful. I was developing a small /tick/ by checking the +site multiple times a day, with no luck. But then, I decided I wanted to do a +bot that checks the site for me, that way I could just forget about it and let +the computers do it for me. + +*** Tech +**** Selenium +I had some experience with Selenium in the past because I had to run automated +tests on an Android application, but I had never used it for the web. I knew it +supported Firefox and had an extensive API to interact with websites. In the +end, I just had to inspect the HTML and search for the "No appointments +available" error message. If the message wasn't there, I needed a way to be +notified so I can set my appointment as fast as possible. +**** Telegram Bot API +Telegram was my goto because I have a lot of experience with it. It has a +stupidly easy API that allows for superb bot management. I just needed the bot +to send me a message whenever the "No appointments available" message wasn't +found on the site. + +*** The plan +Here comes the juicy part: How is everything going to work together? + +I divided the work into four parts: +1) Inspecting the site +2) Finding the error message on the site +3) Sending the message if nothing was found +4) Deploy the job with a cronjob on my VPS + +*** Inspecting the site +Here is the site I needed to inspect: +- On the first site, I need to click the bottom button. By inspecting the HTML, + I found out that its name is ~form:botonElegirHora~ + [[/2020-08-02-171251.png]] +- When the button is clicked, it loads a second page that has an error message + if no appointments are found. The ID of that message is ~form:warnSinCupos~. + [[/2020-08-02-162205.png]] + +*** Using Selenium to find the error message +First, I needed to define the browser session and its settings. I wanted to run +it in headless mode so no X session is needed: +#+BEGIN_SRC python +from selenium import webdriver +from selenium.webdriver.firefox.options import Options + +options = Options() +options.headless = True +d = webdriver.Firefox(options=options) +#+END_SRC + +Then, I opened the site, looked for the button (~form:botonElegirHora~) and +clicked it +#+BEGIN_SRC python +# This is the website I wanted to scrape +d.get('https://sae.mec.gub.uy/sae/agendarReserva/Paso1.xhtml?e=9&a=7&r=13') +elem = d.find_element_by_name('form:botonElegirHora') +elem.click() +#+END_SRC + +And on the new page, I looked for the error message (~form:warnSinCupos~) +#+BEGIN_SRC python +try: + warning_message = d.find_element_by_id('form:warnSinCupos') +except Exception: + pass +#+END_SRC + +This was working exactly how I wanted: It opened a new browser session, opened +the site, clicked the button, and then looked for the message. For now, if the +message wasn't found, it does nothing. Now, the script needs to send me a +message if the warning message wasn't found on the page. + +*** Using Telegram to send a message if the warning message wasn't found +The Telegram bot API has a very simple way to send messages. If you want to read +more about their API, you can check it [[https://core.telegram.org/][here]]. + +There are a few steps you need to follow to get a Telegram bot: +1) First, you need to "talk" to the [[https://core.telegram.org/bots#6-botfather][Botfather]] to create the bot. +2) Then, you need to find your Telegram Chat ID. There are a few bots that can help + you with that, I personally use ~@get_id_bot~. +3) Once you have the ID, you should read the ~sendMessage~ API, since that's the + only one we need now. You can check it [[https://core.telegram.org/bots/api#sendmessage][here]]. + +So, by using the Telegram documentation, I came up with the following code: +#+BEGIN_SRC python +import requests + +chat_id = # Insert your chat ID here +telegram_bot_id = # Insert your Telegram bot ID here +telegram_data = { + "chat_id": chat_id + "parse_mode": "HTML", + "text": ("Hay citas!\nHay citas en el registro civil, para " + f"entrar ve a {SAE_URL}") +} +requests.post('https://api.telegram.org/bot{telegram_bot_id}/sendmessage', data=telegram_data) +#+END_SRC + +*** The complete script +I added a few loggers and environment variables and voilá! Here is the complete code: +#+BEGIN_SRC python +#!/usr/bin/env python3 + +import os +import requests +from datetime import datetime + +from selenium import webdriver +from selenium.webdriver.firefox.options import Options + +from dotenv import load_dotenv + +load_dotenv() # This loads the environmental variables from the .env file in the root folder + +TELEGRAM_BOT_ID = os.environ.get('TELEGRAM_BOT_ID') +TELEGRAM_CHAT_ID = os.environ.get('TELEGRAM_CHAT_ID') +SAE_URL = 'https://sae.mec.gub.uy/sae/agendarReserva/Paso1.xhtml?e=9&a=7&r=13' + +options = Options() +options.headless = True +d = webdriver.Firefox(options=options) +d.get(SAE_URL) +print(f'Headless Firefox Initialized {datetime.now()}') +elem = d.find_element_by_name('form:botonElegirHora') +elem.click() +try: + warning_message = d.find_element_by_id('form:warnSinCupos') + print('No dates yet') + print('------------------------------') +except Exception: + telegram_data = { + "chat_id": TELEGRAM_CHAT_ID, + "parse_mode": "HTML", + "text": ("Hay citas!\nHay citas en el registro civil, para " + f"entrar ve a {SAE_URL}") + } + requests.post('https://api.telegram.org/bot' + f'{TELEGRAM_BOT_ID}/sendmessage', data=telegram_data) + print('Dates found!') +d.close() # To close the browser connection +#+END_SRC + +Only one more thing to do, to deploy everything to my VPS + +*** Deploy and testing on the VPS +This was very easy. I just needed to pull my git repo, install the +~requirements.txt~ and set a new cron to run every 10 minutes and check the +site. The cron settings I used where: +#+BEGIN_SRC bash +*/10 * * * * /usr/bin/python3 /my/script/location/registro-civil-scraper/app.py >> /my/script/location/registro-civil-scraper/log.txt +#+END_SRC +The ~>> /my/script/location/registro-civil-scraper/log.txt~ part is to keep the logs on a new file. + +*** Did it work? +Yes! And it worked perfectly. I got a message the following day at 21:00 +(weirdly enough, that's 0:00GMT, so maybe they have their servers at GMT time +and it opens new appointments at 0:00). +[[/2020-08-02-170458.png]] + +*** Conclusion +I always loved to use programming to solve simple problems. With this script, I +didn't need to check the site every couple of hours to get an appointment, and +sincerely, I wasn't going to check past 19:00, so I would've never found it by +my own. + +My brother is having similar issues in Argentina, and when I showed him this, he +said one of the funniest phrases I've heard about my profession: + +> /"Programmers could take over the world, but they are too lazy"/ + +I lol'd way too hard at that. + +I loved Selenium and how it worked. Recently I created a crawler using Selenium, +Redis, peewee, and Postgres, so stay tuned if you want to know more about that. + +In the meantime, if you want to check the complete script, you can see it on my +Git instance: https://git.rogs.me/me/registro-civil-scraper or Gitlab, if you +prefer: https://gitlab.com/rogs/registro-civil-scraper + +* COMMENT Local Variables +# Local Variables: +# eval: (org-hugo-auto-export-mode) +# End: -- cgit v1.2.3