{"id":1549,"date":"2021-11-27T14:55:53","date_gmt":"2021-11-27T14:55:53","guid":{"rendered":"https:\/\/laserphotonics.uk\/?p=1549"},"modified":"2021-11-27T14:55:57","modified_gmt":"2021-11-27T14:55:57","slug":"controlling-the-web-with-python","status":"publish","type":"post","link":"https:\/\/laserphotonics.uk\/?p=1549","title":{"rendered":"Controlling the Web with Python"},"content":{"rendered":"\n<h1 id=\"51c9\"><\/h1>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/williamkoehrsen.medium.com\/?source=post_page-----6fceb22c5f08-----------------------------------\"><img src=\"https:\/\/miro.medium.com\/fit\/c\/56\/56\/1*SckxdIFfjlR-cWXkL5ya-g.jpeg\" alt=\"Will Koehrsen\"\/><\/a><\/figure>\n\n\n\n<p><a href=\"https:\/\/williamkoehrsen.medium.com\/?source=post_page-----6fceb22c5f08-----------------------------------\" class=\"\">Will Koehrsen<\/a><a class=\"\" href=\"https:\/\/towardsdatascience.com\/controlling-the-web-with-python-6fceb22c5f08?source=post_page-----6fceb22c5f08-----------------------------------\">Mar 10, 2018\u00b79 min read<\/a><a href=\"https:\/\/medium.com\/m\/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fbookmark%2Fp%2F6fceb22c5f08&amp;operation=register&amp;redirect=https%3A%2F%2Ftowardsdatascience.com%2Fcontrolling-the-web-with-python-6fceb22c5f08&amp;source=post_actions_header--------------------------bookmark_preview--------------\"><\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/towardsdatascience.com\/controlling-the-web-with-python-6fceb22c5f08\">https:\/\/towardsdatascience.com\/controlling-the-web-with-python-6fceb22c5f08<\/a><\/p>\n\n\n\n<p id=\"513e\"><strong>An adventure in simple web automation<\/strong><\/p>\n\n\n\n<p id=\"0ca3\"><strong>Problem:<\/strong>&nbsp;Submitting class assignments requires navigating a maze of web pages so complex that several times I\u2019ve turned an assignment in to the wrong place. Also, while this process only takes 1\u20132 minutes, it sometimes seems like an insurmountable barrier (like when I\u2019ve finished an assignment way too late at night and I can barely remember my password).<\/p>\n\n\n\n<p id=\"8ffa\"><strong>Solution:&nbsp;<\/strong>Use Python to automatically&nbsp;submit completed assignments! Ideally, I would be able to save an assignment, type a few keys, and have my work uploaded in a matter of seconds. At first this sounded too good to be true, but then I discovered&nbsp;<a href=\"https:\/\/selenium-python.readthedocs.io\/\" target=\"_blank\" rel=\"noreferrer noopener\">selenium<\/a>, a tool which can be used with Python to navigate the web for you.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/miro.medium.com\/max\/1036\/0*ZglaEb3qQK6xCBC6.png\" alt=\"\"\/><figcaption><a href=\"https:\/\/xkcd.com\/353\/\" target=\"_blank\" rel=\"noreferrer noopener\">Obligatory XKCD<\/a><\/figcaption><\/figure>\n\n\n\n<p id=\"b12b\">Anytime we find ourselves repeating tedious actions on the web with the same sequence of steps, this is a great chance to write a program to automate the process for us. With selenium and Python, we just need to write a script once, and which then we can run it as many times and save ourselves from repeating monotonous tasks (and in my case, eliminate the chance of submitting an assignment in the wrong place)!<\/p>\n\n\n\n<p id=\"1d75\">Here, I\u2019ll walk through the solution I developed to automatically (and correctly) submit my assignments. Along the way, we\u2019ll cover the basics of using Python and selenium to programmatically control the web. While this program does work (I\u2019m using it every day!) it\u2019s pretty custom so you won\u2019t be able to copy and paste the code for your application. Nonetheless, the general techniques here can be applied to a limitless number of situations. (If you want to see the complete code, it\u2019s&nbsp;<a href=\"https:\/\/gist.github.com\/WillKoehrsen\/127fb3963b12b4f0b339ff0c8ee14558\" target=\"_blank\" rel=\"noreferrer noopener\">available on GitHub<\/a>).<\/p>\n\n\n\n<h1 id=\"b4d4\">Approach<\/h1>\n\n\n\n<p id=\"0da5\">Before we can get to the fun part of automating the web, we need to figure out the general structure of our solution. Jumping right into programming without a plan is a great way to waste many hours in frustration. I want to write a program to submit completed course assignments to the correct location on Canvas (my university\u2019s&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Learning_management_system\" target=\"_blank\" rel=\"noreferrer noopener\">\u201clearning management system\u201d<\/a>). Starting with the basics, I need a way to tell the program the name of the assignment to submit and the class. I went with a simple approach and created a folder to hold completed assignments with child folders for each class. In the child folders, I place the completed document named for the particular assignment. The program can figure out the name of the class from the folder, and the name of the assignment by the document title.<\/p>\n\n\n\n<p id=\"459c\">Here\u2019s an example where the name of the class is EECS491 and the assignment is \u201cAssignment 3 \u2014 Inference in Larger Graphical Models\u201d.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/miro.medium.com\/max\/864\/1*3WzLi_pB4gI999Xzp_tBrQ.png\" alt=\"\"\/><figcaption>File structure (left) and Complete Assignment (right)<\/figcaption><\/figure>\n\n\n\n<p id=\"64fb\">The first part of the program is a loop to go through the folders to find the assignment and class, which we store in a Python tuple:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># os for file management<br>import os# Build tuple of (class, file) to turn in<br>submission_dir = 'completed_assignments'dir_list = list(os.listdir(submission_dir))for directory in dir_list:<br>    file_list = list(os.listdir(os.path.join(submission_dir, <br>directory)))<br>    if len(file_list) != 0:<br>        file_tup = (directory, file_list[0])<br>    <br>print(file_tup)<strong>('EECS491', 'Assignment 3 - Inference in Larger Graphical Models.txt')<\/strong><\/pre>\n\n\n\n<p id=\"a9db\">This takes care of file management and the program now knows the program and the assignment to turn in. The next step is to use selenium to navigate to the correct webpage and upload the assignment.<\/p>\n\n\n\n<h2 id=\"82e1\">Web Control with Selenium<\/h2>\n\n\n\n<p id=\"12d2\">To get started with selenium, we import the library and create a web driver, which is a browser that is controlled by our program. In this case, I\u2019ll use Chrome as my browser and send the driver to the Canvas website where I submit assignments.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import selenium# Using Chrome to access web<br>driver = webdriver.Chrome()# Open the website<br>driver.get('<a href=\"https:\/\/canvas.case.edu%27\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/canvas.case.edu'<\/a>)<\/pre>\n\n\n\n<p id=\"0f22\">When we open the Canvas webpage, we are greeted with our first obstacle, a login box! To get past this, we will need to fill in an id and a password and click the login button.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/miro.medium.com\/max\/809\/1*6K21H6TqFp52ilxqhnyJ7g.png\" alt=\"\"\/><\/figure>\n\n\n\n<p id=\"bb47\">Imagine the web driver as a person who has never seen a web page before: we need to tell it exactly where to click, what to type, and which buttons to press. There are a number of ways to tell our web driver what elements to find, all of which use selectors. A&nbsp;<a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Learn\/CSS\/Introduction_to_CSS\/Selectors\" target=\"_blank\" rel=\"noreferrer noopener\">selector<\/a>&nbsp;is a unique identifier for an element on a webpage. To find the selector for a specific element, say the CWRU ID box above, we need to inspect the webpage. In Chrome, this is done by pressing \u201cctrl + shift + i\u201d or right clicking on any element and selecting \u201cInspect\u201d. This brings up the&nbsp;<a href=\"https:\/\/developer.chrome.com\/devtools\" target=\"_blank\" rel=\"noreferrer noopener\">Chrome developer tools<\/a>, an extremely useful application which shows the&nbsp;<a href=\"https:\/\/www.pathinteractive.com\/blog\/design-development\/rendering-a-webpage-with-google-webmaster-tools\/\" target=\"_blank\" rel=\"noreferrer noopener\">HTML underlying any webpage<\/a>.<\/p>\n\n\n\n<p id=\"61e0\">To find a selector for the \u201cCWRU ID\u201d box, I right clicked in the box, hit \u201cInspect\u201d and saw the following in developer tools. The highlighted line corresponds to the id box element (this line is called an HTML tag).<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/miro.medium.com\/max\/875\/1*smbJ9oczUAAZ5aSCREAvWA.png\" alt=\"\"\/><figcaption>HTML in Chrome developer tools for the webpage<\/figcaption><\/figure>\n\n\n\n<p id=\"a811\">This HTML might look overwhelming, but we can ignore the majority of the information and focus on the&nbsp;<code>id = \"username\"<\/code>&nbsp;and&nbsp;<code>name=\"username\"<\/code>&nbsp;parts. (these are known as attributes of the HTML tag).<\/p>\n\n\n\n<p id=\"33ac\">To select the id box with our web driver, we can use either the&nbsp;<code>id<\/code>&nbsp;or&nbsp;<code>name<\/code>&nbsp;attribute we found in the developer tools. Web drivers in selenium have many different methods for selecting elements on a webpage and there are often multiple ways to select the exact same item:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># Select the id box<br>id_box = driver.find_element_by_name('username')# Equivalent Outcome! <br>id_box = driver.find_element_by_id('username')<\/pre>\n\n\n\n<p id=\"0f66\">Our program now has access to the&nbsp;<code>id_box<\/code>&nbsp;and we can interact with it in various ways, such as typing in keys, or clicking (if we have selected a button).<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># Send id information<br>id_box.send_keys('my_username')<\/pre>\n\n\n\n<p id=\"ad2c\">We carry out the same process for the password box and login button, selecting each based on what we see in the Chrome developer tools. Then, we send information to the elements or click on them as needed.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># Find password box<br>pass_box = driver.find_element_by_name('password')# Send password<br>pass_box.send_keys('my_password')# Find login button<br>login_button = driver.find_element_by_name('submit')# Click login<br>login_button.click()<\/pre>\n\n\n\n<p id=\"6605\">Once we are logged in, we are greeted by this slightly intimidating dashboard:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/miro.medium.com\/max\/875\/1*jG-_h99LhbiWsJSeMwSGaw.png\" alt=\"\"\/><\/figure>\n\n\n\n<p id=\"61ca\">We again need to guide the program through the webpage by specifying exactly the elements to click on and the information to enter. In this case, I tell the program to select courses from the menu on the left, and then the class corresponding to the assignment I need to turn in:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># Find and click on list of courses<br>courses_button = driver.find_element_by_id('global_nav_courses_link')courses_button.click()# Get the name of the folder<br>folder = file_tup[0]<br>    <br># Class to select depends on folder<br>if folder == 'EECS491':<br>    class_select = driver.find_element_by_link_text('Artificial Intelligence: Probabilistic Graphical Models (100\/10039)')elif folder == 'EECS531':<br>    class_select = driver.find_element_by_link_text('Computer Vision (100\/10040)')# Click on the specific class<br>class_select.click()<\/pre>\n\n\n\n<p id=\"058f\">The program finds the correct class using the name of the folder we stored in the first step. In this case, I use the selection method&nbsp;<code>find_element_by_link_text<\/code>&nbsp;to find the specific class. The \u201clink text\u201d for an element is just another selector we can find by inspecting the page. :<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/miro.medium.com\/max\/2000\/1*KAkAswxW6VGIkF-Vb1D-TA.png\" alt=\"\"\/><figcaption>Inspecting the page to find the selector for a specific class<\/figcaption><\/figure>\n\n\n\n<p id=\"77e8\">This workflow may seem a little tedious, but remember, we only have to do it once when we write our program! After that, we can hit run as many times as we want and the program will navigate through all these pages for us.<\/p>\n\n\n\n<p id=\"7a5f\">We use the same \u2018inspect page \u2014 select element \u2014 interact with element\u2019 process to get through a couple more screens. Finally, we reach the assignment submission page:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/miro.medium.com\/max\/1400\/1*iyz1HiKgExkyWmzW2M5Vxg.png\" alt=\"\"\/><\/figure>\n\n\n\n<p id=\"b654\">At this point, I could see the finish line, but initially this screen perplexed me. I could click on the \u201cChoose File\u201d box pretty easily, but how was I supposed to select the actual file I need to upload? The answer turns out to be incredibly simple! We locate the&nbsp;<code>Choose File<\/code>&nbsp;box using a selector, and use the&nbsp;<code>send_keys<\/code>&nbsp;method to pass the exact path of the file (called&nbsp;<code>file_location<\/code>&nbsp;in the code below) to the box:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># Choose File button<br>choose_file = driver.find_element_by_name('attachments[0][uploaded_data]')# Complete path of the file<br>file_location = os.path.join(submission_dir, folder, file_name)# Send the file location to the button<br>choose_file.send_keys(file_location)<\/pre>\n\n\n\n<p id=\"a23d\">That\u2019s it! By sending the exact path of the file to the button, we can skip the whole process of navigating through folders to find the right file. After sending the location, we are rewarded with the following screen showing that our file is uploaded and ready for submission.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/miro.medium.com\/max\/875\/1*RUaMhWWmRg47s10a8Pv6lg.png\" alt=\"\"\/><\/figure>\n\n\n\n<p id=\"0479\">Now, we select the \u201cSubmit Assignment\u201d button, click, and our assignment is turned in!<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># Locate submit button and click<br>submit_assignment = driver.find_element_by_id('submit_file_button')<br>submit_assignent.click()<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/miro.medium.com\/max\/260\/1*dfC4W3awW86kw-KpQH-rOQ.png\" alt=\"\"\/><\/figure>\n\n\n\n<h2 id=\"50a3\">Cleaning Up<\/h2>\n\n\n\n<p id=\"75b2\">File management is always a critical step and I want to make sure I don\u2019t re-submit or lose old assignments. I decided the best solution was to store a single file to be submitted in the&nbsp;<code>completed_assignments<\/code>&nbsp;folder at any one time and move files to a<code>submitted_assignments<\/code>&nbsp;folder once they had been turned in. The final bit of code uses the os module to move the completed assignment by renaming it with the desired location:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># Location of files after submission<br>submitted_file_location = os.path.join(submitted_dir, submitted_file_name)# Rename essentially copies and pastes files<br>os.rename(file_location, submitted_file_location)<\/pre>\n\n\n\n<p id=\"c3c2\">All of the proceeding code gets wrapped up in a single script, which I can run from the command line. To limit opportunities for mistakes, I only submit one assignment at a time, which isn\u2019t a big deal given that it only takes about 5 seconds to run the program!<\/p>\n\n\n\n<p id=\"9065\">Here\u2019s what it looks like when I start the program:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/miro.medium.com\/max\/875\/1*FK2MNOJQgCabZdXAEYT2Gw.png\" alt=\"\"\/><\/figure>\n\n\n\n<p id=\"e37f\">The program provides me with a chance to make sure this is the correct assignment before uploading. After the program has completed, I get the following output:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/miro.medium.com\/max\/2000\/1*Fihdzm-vnWULTULOVI97JQ.png\" alt=\"\"\/><\/figure>\n\n\n\n<p id=\"2bd7\">While the program is running, I can watch Python go to work for me:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/miro.medium.com\/max\/1400\/1*-drw9BuNnPEsDkm5TWRaOA.gif\" alt=\"\"\/><\/figure>\n\n\n\n<h1 id=\"2b4f\">Conclusions<\/h1>\n\n\n\n<p id=\"1dff\">The technique of automating the web with Python works great for many tasks, both general and in my field of data science. For example, we could use selenium to automatically download new data files every day (assuming the website doesn\u2019t have an&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Application_programming_interface\" target=\"_blank\" rel=\"noreferrer noopener\">API<\/a>). While it might seem like a lot of work to write the script initially, the benefit comes from the fact that we can have the computer repeat this sequence as many times as want in exactly the same manner.&nbsp;<mark>The program will never lose focus and wander off to Twitter.<\/mark>&nbsp;It will faithfully carry out the same exact series of steps with perfect consistency (which works great until the website changes).<\/p>\n\n\n\n<p id=\"c0ce\">I should mention you do want to be careful before you automate critical tasks. This example is relatively low-risk as I can always go back and re-submit assignments and I usually double-check the program\u2019s handiwork. Websites change, and if you don\u2019t change the program in response you might end up with a script that does something completely different than what you originally intended!<\/p>\n\n\n\n<p id=\"d2b1\">In terms of paying off, this program saves me about 30 seconds for every assignment and took 2 hours to write. So, if I use it to turn in 240 assignments, then I come out ahead on time! However, the payoff of this program is in designing a cool solution to a problem and learning a lot in the process. While my time might have been more effectively spent working on assignments rather than figuring out how to automatically turn them in, I thoroughly enjoyed this challenge. There are few things as satisfying as solving problems, and Python turns out to be a pretty good tool for doing exactly that.<\/p>\n\n\n\n<p id=\"49ea\">As always, I welcome feedback and constructive criticism. I can be reached on Twitter&nbsp;<a href=\"http:\/\/twitter.com\/@koehrsen_will\" target=\"_blank\" rel=\"noreferrer noopener\">@koehrsen_will<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Will KoehrsenMar 10, 2018\u00b79 min read https:\/\/towardsdatascience.com\/controlling-the-web-with-python-6fceb22c5f08 An adventure in simple web automation Problem:&nbsp;Submitting class assignments requires navigating a maze of web pages so complex that several times I\u2019ve turned an assignment in to the wrong place. Also, while this process only takes 1\u20132 minutes, it sometimes seems like an insurmountable barrier (like when I\u2019ve [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/laserphotonics.uk\/index.php?rest_route=\/wp\/v2\/posts\/1549"}],"collection":[{"href":"https:\/\/laserphotonics.uk\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/laserphotonics.uk\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/laserphotonics.uk\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/laserphotonics.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1549"}],"version-history":[{"count":1,"href":"https:\/\/laserphotonics.uk\/index.php?rest_route=\/wp\/v2\/posts\/1549\/revisions"}],"predecessor-version":[{"id":1550,"href":"https:\/\/laserphotonics.uk\/index.php?rest_route=\/wp\/v2\/posts\/1549\/revisions\/1550"}],"wp:attachment":[{"href":"https:\/\/laserphotonics.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1549"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/laserphotonics.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1549"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/laserphotonics.uk\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1549"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}