From dea4a61cbf42ff422f1409d4342fa59017950526 Mon Sep 17 00:00:00 2001 From: Roger Gonzalez Date: Mon, 2 Nov 2020 18:37:23 -0300 Subject: Changed theme to Archie, moving blog to rogs.me --- content/posts.org | 208 ------------------------------------------------------ 1 file changed, 208 deletions(-) delete mode 100644 content/posts.org (limited to 'content/posts.org') diff --git a/content/posts.org b/content/posts.org deleted file mode 100644 index 12b6eb8..0000000 --- a/content/posts.org +++ /dev/null @@ -1,208 +0,0 @@ -#+hugo_base_dir: ../ -#+hugo_section: ./posts - -#+hugo_weight: auto -#+hugo_auto_set_lastmod: t - -#+author: Roger Gonzalez - -* Programming :@programming: -All posts in here will have the category set to /programming/. -** How I got a residency appointment thanks to Python, Selenium and Telegram :python::selenium:telegram: -:PROPERTIES: -:EXPORT_FILE_NAME: how-i-got-a-residency-appointment-thanks-to-python-and-selenium -:EXPORT_DATE: 2020-08-02 -:END: -Hello everyone! - -As some of you might know, I'm a Venezuelan 🇻🇪 living in Montevideo, Uruguay 🇺🇾. -I've been living here for almost a year, but because of the pandemic my -residency appointments have slowed down to a crawl, and in the middle of the -quarantine they added a new appointment system. Before, there were no -appointments, you just had to get there early and wait for the secretary to -review your files and assign someone to attend you. But now, they had -implemented an appointment system that you could do from the comfort of your own -home/office. There was just one issue: *there were never appointments available*. - -That was a little stressful. I was developing a small /tick/ by checking the -site multiple times a day, with no luck. But then, I decided I wanted to do a -bot that checks the site for me, that way I could just forget about it and let -the computers do it for me. - -*** Tech -**** Selenium -I had some experience with Selenium in the past because I had to run automated -tests on an Android application, but I had never used it for the web. I knew it -supported Firefox and had an extensive API to interact with websites. In the -end, I just had to inspect the HTML and search for the "No appointments -available" error message. If the message wasn't there, I needed a way to be -notified so I can set my appointment as fast as possible. -**** Telegram Bot API -Telegram was my goto because I have a lot of experience with it. It has a -stupidly easy API that allows for superb bot management. I just needed the bot -to send me a message whenever the "No appointments available" message wasn't -found on the site. - -*** The plan -Here comes the juicy part: How is everything going to work together? - -I divided the work into four parts: -1) Inspecting the site -2) Finding the error message on the site -3) Sending the message if nothing was found -4) Deploy the job with a cronjob on my VPS - -*** Inspecting the site -Here is the site I needed to inspect: -- On the first site, I need to click the bottom button. By inspecting the HTML, - I found out that its name is ~form:botonElegirHora~ - [[/2020-08-02-171251.png]] -- When the button is clicked, it loads a second page that has an error message - if no appointments are found. The ID of that message is ~form:warnSinCupos~. - [[/2020-08-02-162205.png]] - -*** Using Selenium to find the error message -First, I needed to define the browser session and its settings. I wanted to run -it in headless mode so no X session is needed: -#+BEGIN_SRC python -from selenium import webdriver -from selenium.webdriver.firefox.options import Options - -options = Options() -options.headless = True -d = webdriver.Firefox(options=options) -#+END_SRC - -Then, I opened the site, looked for the button (~form:botonElegirHora~) and -clicked it -#+BEGIN_SRC python -# This is the website I wanted to scrape -d.get('https://sae.mec.gub.uy/sae/agendarReserva/Paso1.xhtml?e=9&a=7&r=13') -elem = d.find_element_by_name('form:botonElegirHora') -elem.click() -#+END_SRC - -And on the new page, I looked for the error message (~form:warnSinCupos~) -#+BEGIN_SRC python -try: - warning_message = d.find_element_by_id('form:warnSinCupos') -except Exception: - pass -#+END_SRC - -This was working exactly how I wanted: It opened a new browser session, opened -the site, clicked the button, and then looked for the message. For now, if the -message wasn't found, it does nothing. Now, the script needs to send me a -message if the warning message wasn't found on the page. - -*** Using Telegram to send a message if the warning message wasn't found -The Telegram bot API has a very simple way to send messages. If you want to read -more about their API, you can check it [[https://core.telegram.org/][here]]. - -There are a few steps you need to follow to get a Telegram bot: -1) First, you need to "talk" to the [[https://core.telegram.org/bots#6-botfather][Botfather]] to create the bot. -2) Then, you need to find your Telegram Chat ID. There are a few bots that can help - you with that, I personally use ~@get_id_bot~. -3) Once you have the ID, you should read the ~sendMessage~ API, since that's the - only one we need now. You can check it [[https://core.telegram.org/bots/api#sendmessage][here]]. - -So, by using the Telegram documentation, I came up with the following code: -#+BEGIN_SRC python -import requests - -chat_id = # Insert your chat ID here -telegram_bot_id = # Insert your Telegram bot ID here -telegram_data = { - "chat_id": chat_id - "parse_mode": "HTML", - "text": ("Hay citas!\nHay citas en el registro civil, para " - f"entrar ve a {SAE_URL}") -} -requests.post('https://api.telegram.org/bot{telegram_bot_id}/sendmessage', data=telegram_data) -#+END_SRC - -*** The complete script -I added a few loggers and environment variables and voilá! Here is the complete code: -#+BEGIN_SRC python -#!/usr/bin/env python3 - -import os -import requests -from datetime import datetime - -from selenium import webdriver -from selenium.webdriver.firefox.options import Options - -from dotenv import load_dotenv - -load_dotenv() # This loads the environmental variables from the .env file in the root folder - -TELEGRAM_BOT_ID = os.environ.get('TELEGRAM_BOT_ID') -TELEGRAM_CHAT_ID = os.environ.get('TELEGRAM_CHAT_ID') -SAE_URL = 'https://sae.mec.gub.uy/sae/agendarReserva/Paso1.xhtml?e=9&a=7&r=13' - -options = Options() -options.headless = True -d = webdriver.Firefox(options=options) -d.get(SAE_URL) -print(f'Headless Firefox Initialized {datetime.now()}') -elem = d.find_element_by_name('form:botonElegirHora') -elem.click() -try: - warning_message = d.find_element_by_id('form:warnSinCupos') - print('No dates yet') - print('------------------------------') -except Exception: - telegram_data = { - "chat_id": TELEGRAM_CHAT_ID, - "parse_mode": "HTML", - "text": ("Hay citas!\nHay citas en el registro civil, para " - f"entrar ve a {SAE_URL}") - } - requests.post('https://api.telegram.org/bot' - f'{TELEGRAM_BOT_ID}/sendmessage', data=telegram_data) - print('Dates found!') -d.close() # To close the browser connection -#+END_SRC - -Only one more thing to do, to deploy everything to my VPS - -*** Deploy and testing on the VPS -This was very easy. I just needed to pull my git repo, install the -~requirements.txt~ and set a new cron to run every 10 minutes and check the -site. The cron settings I used where: -#+BEGIN_SRC bash -*/10 * * * * /usr/bin/python3 /my/script/location/registro-civil-scraper/app.py >> /my/script/location/registro-civil-scraper/log.txt -#+END_SRC -The ~>> /my/script/location/registro-civil-scraper/log.txt~ part is to keep the logs on a new file. - -*** Did it work? -Yes! And it worked perfectly. I got a message the following day at 21:00 -(weirdly enough, that's 0:00GMT, so maybe they have their servers at GMT time -and it opens new appointments at 0:00). -[[/2020-08-02-170458.png]] - -*** Conclusion -I always loved to use programming to solve simple problems. With this script, I -didn't need to check the site every couple of hours to get an appointment, and -sincerely, I wasn't going to check past 19:00, so I would've never found it by -my own. - -My brother is having similar issues in Argentina, and when I showed him this, he -said one of the funniest phrases I've heard about my profession: - -> /"Programmers could take over the world, but they are too lazy"/ - -I lol'd way too hard at that. - -I loved Selenium and how it worked. Recently I created a crawler using Selenium, -Redis, peewee, and Postgres, so stay tuned if you want to know more about that. - -In the meantime, if you want to check the complete script, you can see it on my -Git instance: https://git.rogs.me/me/registro-civil-scraper or Gitlab, if you -prefer: https://gitlab.com/rogs/registro-civil-scraper - -* COMMENT Local Variables -# Local Variables: -# eval: (org-hugo-auto-export-mode) -# End: -- cgit v1.2.3