summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRoger Gonzalez <roger@rogs.me>2020-08-02 18:03:07 -0300
committerRoger Gonzalez <roger@rogs.me>2020-08-02 18:03:07 -0300
commit8b5bf41fbd4227130e8ff8a46f9aa442b5b6359a (patch)
tree755461a6db0d979bb6bcc59c0ca337c002d5b76c
parent6ebc41ce54ac4010426c0bac82ad4ece1d4b8a04 (diff)
Added a new post for python and selenium
-rw-r--r--content/posts.org208
-rw-r--r--content/posts/how-i-got-a-residency-appointment-thanks-to-python-and-selenium.md232
-rw-r--r--static/2020-08-02-162205.pngbin0 -> 499400 bytes
-rw-r--r--static/2020-08-02-170458.pngbin0 -> 56328 bytes
-rw-r--r--static/2020-08-02-171251.pngbin0 -> 466670 bytes
-rw-r--r--themes/m10c/layouts/_default/baseof.html1
6 files changed, 441 insertions, 0 deletions
diff --git a/content/posts.org b/content/posts.org
new file mode 100644
index 0000000..e10bcdd
--- /dev/null
+++ b/content/posts.org
@@ -0,0 +1,208 @@
+#+hugo_base_dir: ../
+#+hugo_section: ./posts
+
+#+hugo_weight: auto
+#+hugo_auto_set_lastmod: t
+
+#+author: Roger Gonzalez
+
+* Programming :@programming:
+All posts in here will have the category set to /programming/.
+** How I got a residency appointment thanks to Python, Selenium and Telegram :python::selenium:telegram:
+:PROPERTIES:
+:EXPORT_FILE_NAME: how-i-got-a-residency-appointment-thanks-to-python-and-selenium
+:EXPORT_DATE: 2020-08-02
+:END:
+Hello everyone!
+
+As some of you might know, I'm a Venezuelan 🇻🇪 living in Montevideo, Uruguay 🇺🇾.
+I've been living here for almost a year, but because of the pandemic my
+residency appointments have slowed down to a crawl, and in the middle of the
+quarantine they added a new appointment system. Before, there were no
+appointments, you just had to get there early and wait for the secretary to
+review your files and assign someone to attend you. But now, they had
+implemented an appointment system that you could do from the comfort of your own
+home/office. There was just one issue: *there where never appointments available*.
+
+That was a little stressful. I was developing a small /tick/ by checking the
+site multiple times a day, with no luck. But then, I decided I wanted to do a
+bot that checks the site for me, that way I could just forget about it and let
+the computers do it for me.
+
+*** Tech
+**** Selenium
+I had some experience with Selenium in the past because I had to run automated
+tests on an Android application, but I had never used it for the web. I knew it
+supported Firefox and had an extensive API to interact with websites. In the
+end, I just had to inspect the HTML and search for the "No appointments
+available" error message. If the message wasn't there, I needed a way to be
+notified so I can set my appointment as fast as possible.
+**** Telegram Bot API
+Telegram was my goto because I have a lot of experience with it. It has a
+stupidly easy API that allows for superb bot management. I just needed the bot
+to send me a message whenever the "No appointments available" message wasn't
+found on the site.
+
+*** The plan
+Here comes the juicy part: How is everything going to work together?
+
+I divided the work into four parts:
+1) Inspecting the site
+2) Finding the error message on the site
+3) Sending the message if nothing was found
+4) Deploy the job with a cronjob on my VPS
+
+*** Inspecting the site
+Here is the site I needed to inspect:
+- On the first site, I need to click the bottom button. By inspecting the HTML,
+ I found out that its name is ~form:botonElegirHora~
+ [[/2020-08-02-171251.png]]
+- When the button is clicked, it loads a second page that has an error message
+ if no appointments are found. The ID of that message is ~form:warnSinCupos~.
+ [[/2020-08-02-162205.png]]
+
+*** Using Selenium to find the error message
+First, I needed to define the browser session and its settings. I wanted to run
+it in headless mode so no X session is needed:
+#+BEGIN_SRC python
+from selenium import webdriver
+from selenium.webdriver.firefox.options import Options
+
+options = Options()
+options.headless = True
+d = webdriver.Firefox(options=options)
+#+END_SRC
+
+Then, I opened the site, looked for the button (~form:botonElegirHora~) and
+clicked it
+#+BEGIN_SRC python
+# This is the website I wanted to scrape
+d.get('https://sae.mec.gub.uy/sae/agendarReserva/Paso1.xhtml?e=9&a=7&r=13')
+elem = d.find_element_by_name('form:botonElegirHora')
+elem.click()
+#+END_SRC
+
+And on the new page, I looked for the error message (~form:warnSinCupos~)
+#+BEGIN_SRC python
+try:
+ warning_message = d.find_element_by_id('form:warnSinCupos')
+except Exception:
+ pass
+#+END_SRC
+
+This was working exactly how I wanted: It opened a new browser session, opened
+the site, clicked the button, and then looked for the message. For now, if the
+message wasn't found, it does nothing. Now, the script needs to send me a
+message if the warning message wasn't found on the page.
+
+*** Using Telegram to send a message if the warning message wasn't found
+The Telegram bot API has a very simple way to send messages. If you want to read
+more about their API, you can check it [[https://core.telegram.org/][here]].
+
+There are a few steps you need to follow to get a Telegram bot:
+1) First, you need to "talk" to the [[https://core.telegram.org/bots#6-botfather][Botfather]] to create the bot.
+2) Then, you need to find your Telegram Chat ID. There are a few bots that can help
+ you with that, I personally use ~@get_id_bot~.
+3) Once you have the ID, you should read the ~sendMessage~ API, since that's the
+ only one we need now. You can check it [[https://core.telegram.org/bots/api#sendmessage][here]].
+
+So, by using the Telegram documentation, I came up with the following code:
+#+BEGIN_SRC python
+import requests
+
+chat_id = # Insert your chat ID here
+telegram_bot_id = # Insert your Telegram bot ID here
+telegram_data = {
+ "chat_id": chat_id
+ "parse_mode": "HTML",
+ "text": ("<b>Hay citas!</b>\nHay citas en el registro civil, para "
+ f"entrar ve a {SAE_URL}")
+}
+requests.post('https://api.telegram.org/bot{telegram_bot_id}/sendmessage', data=telegram_data)
+#+END_SRC
+
+*** The complete script
+I added a few loggers and environment variables and voilá! Here is the complete code:
+#+BEGIN_SRC python
+#!/usr/bin/env python3
+
+import os
+import requests
+from datetime import datetime
+
+from selenium import webdriver
+from selenium.webdriver.firefox.options import Options
+
+from dotenv import load_dotenv
+
+load_dotenv() # This loads the environmental variables from the .env file in the root folder
+
+TELEGRAM_BOT_ID = os.environ.get('TELEGRAM_BOT_ID')
+TELEGRAM_CHAT_ID = os.environ.get('TELEGRAM_CHAT_ID')
+SAE_URL = 'https://sae.mec.gub.uy/sae/agendarReserva/Paso1.xhtml?e=9&a=7&r=13'
+
+options = Options()
+options.headless = True
+d = webdriver.Firefox(options=options)
+d.get(SAE_URL)
+print(f'Headless Firefox Initialized {datetime.now()}')
+elem = d.find_element_by_name('form:botonElegirHora')
+elem.click()
+try:
+ warning_message = d.find_element_by_id('form:warnSinCupos')
+ print('No dates yet')
+ print('------------------------------')
+except Exception:
+ telegram_data = {
+ "chat_id": TELEGRAM_CHAT_ID,
+ "parse_mode": "HTML",
+ "text": ("<b>Hay citas!</b>\nHay citas en el registro civil, para "
+ f"entrar ve a {SAE_URL}")
+ }
+ requests.post('https://api.telegram.org/bot'
+ f'{TELEGRAM_BOT_ID}/sendmessage', data=telegram_data)
+ print('Dates found!')
+d.close() # To close the browser connection
+#+END_SRC
+
+Only one more thing to do, to deploy everything to my VPS
+
+*** Deploy and testing on the VPS
+This was very easy. I just needed to pull my git repo, install the
+~requirements.txt~ and set a new cron to run every 10 minutes and check the
+site. The cron settings I used where:
+#+BEGIN_SRC bash
+*/10 * * * * /usr/bin/python3 /my/script/location/registro-civil-scraper/app.py >> /my/script/location/registro-civil-scraper/log.txt
+#+END_SRC
+The ~>> /my/script/location/registro-civil-scraper/log.txt~ part is to keep the logs on a new file.
+
+*** Did it work?
+Yes! And it worked perfectly. I got a message the following day at 21:00
+(weirdly enough, that's 0:00GMT, so maybe they have their servers at GMT time
+and it opens new appointments at 0:00).
+[[/2020-08-02-170458.png]]
+
+*** Conclusion
+I always loved to use programming to solve simple problems. With this script, I
+didn't need to check the site every couple of hours to get an appointment, and
+sincerely, I wasn't going to check past 19:00, so I would've never found it by
+my own.
+
+My brother is having similar issues in Argentina, and when I showed him this, he
+said one of the funniest phrases I've heard about my profession:
+
+> /"Programmers could take over the world, but they are too lazy"/
+
+I lol'd way too hard at that.
+
+I loved Selenium and how it worked. Recently I created a crawler using Selenium,
+Redis, peewee, and Postgres, so stay tuned if you want to know more about that.
+
+In the meantime, if you want to check the complete script, you can see it on my
+Git instance: https://git.rogs.me/me/registro-civil-scraper or Gitlab, if you
+prefer: https://gitlab.com/rogs/registro-civil-scraper
+
+* COMMENT Local Variables
+# Local Variables:
+# eval: (org-hugo-auto-export-mode)
+# End:
diff --git a/content/posts/how-i-got-a-residency-appointment-thanks-to-python-and-selenium.md b/content/posts/how-i-got-a-residency-appointment-thanks-to-python-and-selenium.md
new file mode 100644
index 0000000..ce73dba
--- /dev/null
+++ b/content/posts/how-i-got-a-residency-appointment-thanks-to-python-and-selenium.md
@@ -0,0 +1,232 @@
++++
+title = "How I got a residency appointment thanks to Python, Selenium and Telegram"
+author = ["Roger Gonzalez"]
+date = 2020-08-02
+lastmod = 2020-08-02T18:00:39-03:00
+tags = ["python", "selenium", "telegram"]
+categories = ["programming"]
+draft = false
+weight = 2001
++++
+
+Hello everyone!
+
+As some of you might know, I'm a Venezuelan 🇻🇪 living in Montevideo, Uruguay 🇺🇾.
+I've been living here for almost a year, but because of the pandemic my
+residency appointments have slowed down to a crawl, and in the middle of the
+quarantine they added a new appointment system. Before, there were no
+appointments, you just had to get there early and wait for the secretary to
+review your files and assign someone to attend you. But now, they had
+implemented an appointment system that you could do from the comfort of your own
+home/office. There was just one issue: **there where never appointments available**.
+
+That was a little stressful. I was developing a small _tick_ by checking the
+site multiple times a day, with no luck. But then, I decided I wanted to do a
+bot that checks the site for me, that way I could just forget about it and let
+the computers do it for me.
+
+
+## Tech {#tech}
+
+
+### Selenium {#selenium}
+
+I had some experience with Selenium in the past because I had to run automated
+tests on an Android application, but I had never used it for the web. I knew it
+supported Firefox and had an extensive API to interact with websites. In the
+end, I just had to inspect the HTML and search for the "No appointments
+available" error message. If the message wasn't there, I needed a way to be
+notified so I can set my appointment as fast as possible.
+
+
+### Telegram Bot API {#telegram-bot-api}
+
+Telegram was my goto because I have a lot of experience with it. It has a
+stupidly easy API that allows for superb bot management. I just needed the bot
+to send me a message whenever the "No appointments available" message wasn't
+found on the site.
+
+
+## The plan {#the-plan}
+
+Here comes the juicy part: How is everything going to work together?
+
+I divided the work into four parts:
+
+1. Inspecting the site
+2. Finding the error message on the site
+3. Sending the message if nothing was found
+4. Deploy the job with a cronjob on my VPS
+
+
+## Inspecting the site {#inspecting-the-site}
+
+Here is the site I needed to inspect:
+
+- On the first site, I need to click the bottom button. By inspecting the HTML,
+ I found out that its name is `form:botonElegirHora`
+ ![](/2020-08-02-171251.png)
+- When the button is clicked, it loads a second page that has an error message
+ if no appointments are found. The ID of that message is `form:warnSinCupos`.
+ ![](/2020-08-02-162205.png)
+
+
+## Using Selenium to find the error message {#using-selenium-to-find-the-error-message}
+
+First, I needed to define the browser session and its settings. I wanted to run
+it in headless mode so no X session is needed:
+
+```python
+from selenium import webdriver
+from selenium.webdriver.firefox.options import Options
+
+options = Options()
+options.headless = True
+d = webdriver.Firefox(options=options)
+```
+
+Then, I opened the site, looked for the button (`form:botonElegirHora`) and
+clicked it
+
+```python
+# This is the website I wanted to scrape
+d.get('https://sae.mec.gub.uy/sae/agendarReserva/Paso1.xhtml?e=9&a=7&r=13')
+elem = d.find_element_by_name('form:botonElegirHora')
+elem.click()
+```
+
+And on the new page, I looked for the error message (`form:warnSinCupos`)
+
+```python
+try:
+ warning_message = d.find_element_by_id('form:warnSinCupos')
+except Exception:
+ pass
+```
+
+This was working exactly how I wanted: It opened a new browser session, opened
+the site, clicked the button, and then looked for the message. For now, if the
+message wasn't found, it does nothing. Now, the script needs to send me a
+message if the warning message wasn't found on the page.
+
+
+## Using Telegram to send a message if the warning message wasn't found {#using-telegram-to-send-a-message-if-the-warning-message-wasn-t-found}
+
+The Telegram bot API has a very simple way to send messages. If you want to read
+more about their API, you can check it [here](https://core.telegram.org/).
+
+There are a few steps you need to follow to get a Telegram bot:
+
+1. First, you need to "talk" to the [Botfather](https://core.telegram.org/bots#6-botfather) to create the bot.
+2. Then, you need to find your Telegram Chat ID. There are a few bots that can help
+ you with that, I personally use `@get_id_bot`.
+3. Once you have the ID, you should read the `sendMessage` API, since that's the
+ only one we need now. You can check it [here](https://core.telegram.org/bots/api#sendmessage).
+
+So, by using the Telegram documentation, I came up with the following code:
+
+```python
+import requests
+
+chat_id = # Insert your chat ID here
+telegram_bot_id = # Insert your Telegram bot ID here
+telegram_data = {
+ "chat_id": chat_id
+ "parse_mode": "HTML",
+ "text": ("<b>Hay citas!</b>\nHay citas en el registro civil, para "
+ f"entrar ve a {SAE_URL}")
+}
+requests.post('https://api.telegram.org/bot{telegram_bot_id}/sendmessage', data=telegram_data)
+```
+
+
+## The complete script {#the-complete-script}
+
+I added a few loggers and environment variables and voilá! Here is the complete code:
+
+```python
+#!/usr/bin/env python3
+
+import os
+import requests
+from datetime import datetime
+
+from selenium import webdriver
+from selenium.webdriver.firefox.options import Options
+
+from dotenv import load_dotenv
+
+load_dotenv() # This loads the environmental variables from the .env file in the root folder
+
+TELEGRAM_BOT_ID = os.environ.get('TELEGRAM_BOT_ID')
+TELEGRAM_CHAT_ID = os.environ.get('TELEGRAM_CHAT_ID')
+SAE_URL = 'https://sae.mec.gub.uy/sae/agendarReserva/Paso1.xhtml?e=9&a=7&r=13'
+
+options = Options()
+options.headless = True
+d = webdriver.Firefox(options=options)
+d.get(SAE_URL)
+print(f'Headless Firefox Initialized {datetime.now()}')
+elem = d.find_element_by_name('form:botonElegirHora')
+elem.click()
+try:
+ warning_message = d.find_element_by_id('form:warnSinCupos')
+ print('No dates yet')
+ print('------------------------------')
+except Exception:
+ telegram_data = {
+ "chat_id": TELEGRAM_CHAT_ID,
+ "parse_mode": "HTML",
+ "text": ("<b>Hay citas!</b>\nHay citas en el registro civil, para "
+ f"entrar ve a {SAE_URL}")
+ }
+ requests.post('https://api.telegram.org/bot'
+ f'{TELEGRAM_BOT_ID}/sendmessage', data=telegram_data)
+ print('Dates found!')
+d.close() # To close the browser connection
+```
+
+Only one more thing to do, to deploy everything to my VPS
+
+
+## Deploy and testing on the VPS {#deploy-and-testing-on-the-vps}
+
+This was very easy. I just needed to pull my git repo, install the
+`requirements.txt` and set a new cron to run every 10 minutes and check the
+site. The cron settings I used where:
+
+```bash
+*/10 * * * * /usr/bin/python3 /my/script/location/registro-civil-scraper/app.py >> /my/script/location/registro-civil-scraper/log.txt
+```
+
+The `>> /my/script/location/registro-civil-scraper/log.txt` part is to keep the logs on a new file.
+
+
+## Did it work? {#did-it-work}
+
+Yes! And it worked perfectly. I got a message the following day at 21:00
+(weirdly enough, that's 0:00GMT, so maybe they have their servers at GMT time
+and it opens new appointments at 0:00).
+![](/2020-08-02-170458.png)
+
+
+## Conclusion {#conclusion}
+
+I always loved to use programming to solve simple problems. With this script, I
+didn't need to check the site every couple of hours to get an appointment, and
+sincerely, I wasn't going to check past 19:00, so I would've never found it by
+my own.
+
+My brother is having similar issues in Argentina, and when I showed him this, he
+said one of the funniest phrases I've heard about my profession:
+
+> _"Programmers could take over the world, but they are too lazy"_
+
+I lol'd way too hard at that.
+
+I loved Selenium and how it worked. Recently I created a crawler using Selenium,
+Redis, peewee, and Postgres, so stay tuned if you want to know more about that.
+
+In the meantime, if you want to check the complete script, you can see it on my
+Git instance: <https://git.rogs.me/me/registro-civil-scraper> or Gitlab, if you
+prefer: <https://gitlab.com/rogs/registro-civil-scraper>
diff --git a/static/2020-08-02-162205.png b/static/2020-08-02-162205.png
new file mode 100644
index 0000000..5233e8a
--- /dev/null
+++ b/static/2020-08-02-162205.png
Binary files differ
diff --git a/static/2020-08-02-170458.png b/static/2020-08-02-170458.png
new file mode 100644
index 0000000..ad7d4dc
--- /dev/null
+++ b/static/2020-08-02-170458.png
Binary files differ
diff --git a/static/2020-08-02-171251.png b/static/2020-08-02-171251.png
new file mode 100644
index 0000000..cbb70d7
--- /dev/null
+++ b/static/2020-08-02-171251.png
Binary files differ
diff --git a/themes/m10c/layouts/_default/baseof.html b/themes/m10c/layouts/_default/baseof.html
index c6ab70e..d1e2e0e 100644
--- a/themes/m10c/layouts/_default/baseof.html
+++ b/themes/m10c/layouts/_default/baseof.html
@@ -39,6 +39,7 @@
<header class="app-header">
<a href="{{ .Site.BaseURL }}"><img class="app-header-avatar" src="{{ .Site.Params.avatar | default "avatar.jpg" | relURL }}" alt="{{ .Site.Params.author | default "John Doe" }}" /></a>
<h1>{{ .Site.Title }}</h1>
+ <p>venezuelan 🇻🇪 living in montevideo 🇺🇾</p>
<p>{{ .Site.Params.description | default "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vehicula turpis sit amet elit pretium." }}</p>
<div class="app-header-social">
{{ range .Site.Params.social }}