Python- automate navigation through websites

by Kunal Grover · Published July 15, 2014 · Updated October 19, 2018

Spread the love

Hi, There are many cases where we need to do the same thing over and over again on websites, for instance downloading episodes of a series manually from a website, checking mails every 10 mins waiting for some important mails, logging into your company or college website everyday just for the sake of logging in. For the impatient people out there, python comes to rescue 🙂

Python and its libraries can be used very effectively to navigate through websites, fill redundant forms and a lot more. A lot of cool things can be built where the website lacks an API. Some examples are: An automatic files downloader from a website, automated login and filling forms, booking tickets automatically, automatic payments using your credit card or debit card. So here, I am going to discuss a few libraries and some tips to use them effectively.

For checking out the content of websites, beautiful soup and urllib/urllib2 are libraries to look at. I have written about them here regarding some basic navigation http://crondev.wordpress.com/2014/06/15/use-python-to-download-files-from-websites/

Here we are looking at python ways of interacting with website forms. One of the difficult tasks is to login into websites.

Doing it the hard way- Using urllib2

Kind of a difficult and complex way, I would say. We will need to use cookielib for this task. So, let’s import it

  import cookielib

Now, we need to create a cookie jar to keep all cookies.

  cj = cookielib.CookieJar()
  opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
  urllib2.install_opener(opener)

Next, we need to make a request to the login page using the username and password as login parameters.

  authentication_url = 'https://site/login/index.php'
  payload = {
      'username' : username,
      'password' : password
  }
  data = urllib.urlencode(payload)
  req = urllib2.Request(authentication_url, data)

Now, we can proceed with opening the webpages.

  resp = urllib2.urlopen(  'https://site/link' )

Doing it the Mechanize way 🙂

Mechanize is a very powerful engine and includes a browser itself. So users don’t need to worry about cookies as long as they use the same browser object. Here is a small snippet with inline comments to describe how to use it. Make a new browser object

  br = Browser()

Open a page

  page = br.open( 'https://example.com/examplepage' )

Mechanize has inbuilt functions for handling forms easily

  br.select_form( nr = 0 )   #Selecting a form using the form number 0 for first
  br.select_form( name = 'form1' ) #Select form using the name- can be easily found using the source of the page

Note that the browser object takes care of cookies automatically. Now the form is selected, you can fill in form fields

  br.form[ 'userLogin' ] = user_name
  br.form[ 'userPassword' ] = password

You can also select radio options this way. Here 2 is the option-ID, if not specified, radio buttons start with 0 as option-ID(Inspect the element to check if it is specified)

  br.form[ 'radio-name' ] = [ '2' ]

Using select drop-downs can be a bit tricky at times but here is the way

  select1 = br.form.find_control( 'dropdown-name' )
  for item in select1.items:
      if item.name == 'required':
          item.selected = True

Submit the form with all these parameters

  br.submit()

Directly go to the next page. Cookies taken care of by Mechanize browser

  resp = br.open( 'https://nextpage.html' )

Note: This is an error that I faced (in case the HTML in the site has no closing tags somewhere)

mechanize._form.ParseError: nested FORMs

Here’s the fix. Analyze the response and then modify it to what mechanize would like to see ie fix the non matching tags to generate the response that should have been. Setting response for Page as the hardcoded response:

  resp.set_data( hardcoded_resp )
  br.set_response( resp )

I will end this post here to avoid making it too lengthy, I will add more ways of doing this in the second part of this post.

art phoenix az says:

July 19, 2014 at 5:50 am

After going over a number of the blog articles on your website, I seriously appreciate
your technique of blogging. I added it to my bookmark website list and will be checking back soon.

Dousti says:

December 23, 2015 at 9:20 am

Nice replies in return of this matter with real arguments
and describing everything concerning that.

John says:

November 6, 2016 at 6:37 am

Wheres the part 2 for the love of God?

- Kunal Grover says:
  
  November 7, 2016 at 9:15 pm
  
  Good question! I honestly don’t remember what I was going to write. Sorry for that.
  You might find some things of interest here: https://crondev.wordpress.com/category/python/

You may also like...

No Responses

Leave a Reply Cancel reply

Subscribe to Blog via Email

Blog Stats

Recent Posts

Archives

Kunal Grover

Verified Services

Python- automate navigation through websites

Doing it the hard way- Using urllib2

Doing it the Mechanize way 🙂

Share this:

Like this:

Related

You may also like...

Use python to download files from websites

BLE with RFduino, Linux and Python!!

Automate facebook comments and likes

No Responses

Leave a Reply Cancel reply

Subscribe to Blog via Email

Blog Stats

Recent Posts

Archives

Kunal Grover

Verified Services