Fosbenner.net

Transitioning from Static Webpages, Part 2

Posted:

As promised in the first installment of this article, I will now describe how you can transition from using the older Common Gateway Interface (CGI), to using a more modern interface, the Web Server Gateway Interface (WSGI). Whereas CGI could be used for a variety of programming languages, WSGI is only for Python. Since we will discussing setting this up on an Apache httpd server, I will be using an Apache module to implement WSGI. As we begin, I am assuming that you have the Apache httpd server installed and working on your Linux machine.

Installing mod_wsgi

Since mod_wsgi is not maintained by Apache, it is not included in the bunch of modules that come preinstalled. We will have to download the module's source code and install it on our server.

Before we start downloading and installing mod_wsgi, let's get some prerequisites out of the way. In addition to the httpd and python packages, your server will also need the "dev" packages for each. On CentOS 7, I can install these using the following command (as root):


yum install python-devel httpd-devel
					

Although I didn't test this, I assume that you would get errors later in the process if you tried to compile the module without having these packages installed.

Now, to download the source code for the module, first go to the following page on GitHub: https://github.com/GrahamDumpleton/mod_wsgi/releases

At the time of writing, the current release is 4.6.5. Under the header "mod_wsgi-4.6.5," right-click on the link that says "Source code (tar.gz)." and select "Copy Link Location." Now, assuming we have an SSH session open with the server, we can paste that link into a wget command, like the following:


wget https://github.com/GrahamDumpleton/mod_wsgi/archive/4.6.5.tar.gz
					

This command will download the tar file to the current directory on your Linux box. Alternatively, you could download the file to your local computer by just clicking on the link on GitHub, but then you would have to copy it over to the Linux server anyway. If you would go for the latter option, note that the filename is a little different; the file will be something like mod_wsgi-4.6.5.tar.gz instead of 4.6.5.tar.gz.

Now we will extract the tar. Use the following command (assuming that you are still in the same directory as when you downloaded it):


tar xzvf ./4.6.5.tar.gz
					

Do a quick ls and you will see that you now have a new directory called mod_wsgi-4.6.5. Let's move into this new directory:


cd ./mod_wsgi-4.6.5
					

Now, to compile and install the module, we are going to run three commands in order:


./configure
make
make install
					

If you get any errors when running any of those commands, the following command will not work right. I suggest looking at the documentation to resolve any issues. See the module's docs here: https://modwsgi.readthedocs.io

If you have gotten this far without any problems, let's move on to loading the module in Apache. In short, we need to add the following line somewhere in the httpd config:


LoadModule wsgi_module modules/mod_wsgi.so
					

You could simply put this in the global httpd.conf configuration by doing something like this (as root):


echo 'LoadModule wsgi_module modules/mod_wsgi.so' >> /etc/httpd/conf/httpd.conf
					

OR, if your server has a conf.modules.d directory, it would be cleaner to make a new file in this directory containing the directive:


echo 'LoadModule wsgi_module modules/mod_wsgi.so' > /etc/httpd/conf.modules.d/01-wsgi.conf
					

If you go for the latter approach, just make sure that you adjust the permissions of the new file to match the other files in that directory.

The LoadModule directive tells Apache to load a module when starting up the httpd server. Let's restart the server, then check that it has been loaded. On CentOS 7, I would run (as root):


systemctl restart httpd.service
httpd -M
					

The httpd -M command will show a list of loaded modules. If you see wsgi_module listed, you are good to go.

Configuring httpd for WSGI

The last thing we need to do is tell httpd which files should be treated as WSGI scripts. I will explain two different ways to do this that are based on the default CGI configuration.

Method 1

Using this method, we will repurpose the existing default CGI directory, telling httpd to handle its contents at WSGI scripts instead of as CGI scripts. If you have existing CGI scripts in this directory, they will not work after we make these changes. In this case, consider using the next method.

Like the ScriptAlias directive for CGI, mod_wsgi uses the WSGIScriptAlias directive to define a file or directory that should be run as a WSGI script. Similarly, the AddHandler


ScriptAlias /cgi-bin/ "/var/www/cgi-bin/"
AddHandler cgi-script .py
					

The above two lines should be changed to the following:


WSGIScriptAlias /cgi-bin/ "/var/www/cgi-bin/"
AddHandler wsgi-script .py
					

Method 2

With this method, we will preserve the functionality of the existing cgi-bin directory, and add a subdirectory to handle WSGI scripts. If you are trying to migrate from CGI to WSGI, it will probably be important that your CGI-based webpages continue to work while you are testing out the WSGI webpages.

First, create the new directory inside the existing cgi-bin directory:


mkdir /var/www/cgi-bin/wsgi
					

Now, add the following to your httpd configuration:


<Directory "/var/www/cgi-bin/wsgi/">
	Options ExecCGI
	AddHandler wsgi-script .py
</Directory>
					

Without adding this <Directory> directive, the new wsgi/ directory would inherit its properties from the cgi-bin/ directory, meaning that it's contents would be handled as CGI scripts. By adding this, we can keep the CGI functionality in cgi-bin/ and override it in wsgi/.

Both Methods

Regardless of which of the two methods you used, you will need to restart httpd again for the changes to take effect:


systemctl restart httpd.service
					

Your WSGI scripts will be accessible at <IP addr. or domain name>/cgi-bin/<script-name>.py OR <IP addr. or domain name>/cgi-bin/wsgi/<script-name>.py if using methods 1 or 2, respectfully.

Writing a WSGI script

With CGI, Apache sends the standard output of the script to the browser. In Transitioning from Static Webpages (Part 1) we used print statements to generate our HTML. With WSGI, our output will go into an iterable. Our script must have an application(environ, start_response) function like this:


def application(environ, start_response):
   status = '200 OK'
   content = "This is a line of text."
   response_headers = [('Content-type', 'text/plain'),
                       ('Content-Length', str(len(content)))]
   start_response(status, response_headers)

   return [content]
					

This example is pretty much as basic as it gets. The result would be a plaintext document with one line of text.

For a (slightly) better example, let's write a script that is based on CGI Example 1, but uses WSGI:

#!/usr/bin/env python

def application(environ, start_response):
   """This function is called when accessing the webpage"""
   status = '200 OK'

   title = 'Example Site'
   desc = 'This page is generated by Python!'

   out = []
   out.append('<!DOCTYPE html>\n' + st('html') + '\n' +
      elem('head', '\n\t' +
      elem('title', title) + '\n\t' +
      eelem('meta', 'name="description" content="' + desc + '"') + '\n\t' +
      eelem('link', 'rel="stylesheet" href="https://fosbenner.net/s/playground.css"') + '\n'))

   out.append(elem('body', '\n\t' +
      elem('h1', 'This is a header') + '\n\t' +
      elem('p', 'This is a paragraph') + '\n\t' +
      elem('p', 'This is another paragraph') + '\n'
      ) + '\n' + et('html'))

   # calculate content length
   length = 0
   for i in out:
      length += len(i)

   response_headers = [('Content-type', 'text/html'),
                       ('Content-Length', str(length))]
   start_response(status, response_headers)

   return out

def st(tag, attr=""):
   """Generate HTML start tag"""
   if attr != "": #pad attr with a space
      attr = " " + attr
   return '<' + tag + attr + '>'

def et(tag):
   """Generate HTML end tag"""
   return '</' + tag + '>'

def elem(tag, content, attr=""):
   """Generate whole element"""
   return st(tag, attr) + content + et(tag)

def eelem(tag, attr=""):
   """Generate empty element"""
   attr += ' /' # add space, slash to end
   return st(tag, attr)

					

If you compare that with CGI Example 1, you will see that I copied everything after "## START EXECUTION ##" into the application() function. I don't need to append the Content-Type line to out[], because this is contained in the response_headers tuple. I removed the for loop that printed the contents of out[], and instead return the whole out[] list when application() exits. The only new things I had to do here was add up the lengths of all the strings in out[], in order to have an accurate Content-Length for the response_headers tuple.

While we are still comparing the CGI and WSGI examples, it is worth mentioning that we need to think about newlines (at least if you care how your generated HTML looks). When using print statements in a CGI script, a newline is added automatically each time print is called. When mod_wsgi iterates through the output of application(), it does not add any newlines. Because of this, you will notice a slight difference in the generated HTML between these first CGI and WSGI examples.

Using environ

Perhaps you noiticed that the application() function has arguments. The names of these are not important, but their positions as arguments to the function are. I have kept the names of these arguments to match the examples in the WSGI spec, which you can find here: https://www.python.org/dev/peps/pep-0333/. The first argument is environ, which is a dictionary containing a bunch of environment variables, some that come from Apache (including some data from your browser), and some that come from mod_wsgi.

In the following example, we will see how we can use a few of these environment variables.

#!/usr/bin/env python

def application(environ, start_response):
   """This function is called when accessing the webpage"""
   status = '200 OK'

   title = 'Example using environ'
   desc = 'This page is generated by Python!'

   out = []
   a = lambda s: out.append(s)
   a('<!DOCTYPE html>')
   a(st('html'))
   a(st('head'))
   a('\t' + elem('title', title))
   a('\t' + eelem('meta', 'name="description" content="' + desc + '"'))
   a('\t' + eelem('link', 'rel="stylesheet" href="https://fosbenner.net/s/playground.css"'))
   a(et('head'))

   a(st('body'))
   a('\t' + elem('h1', 'Info about the user:'))

   a('\t' + elem('p', 'Your IP address is: ' + environ['REMOTE_ADDR']))
   a('\t' + elem('p', 'Your browser is: ' + environ['HTTP_USER_AGENT']))
   if 'HTTP_REFERER' in environ:
      a('\t' + elem('p', 'To get to this page, you followed a link on: ' + environ['HTTP_REFERER']))   
   else:
      a('\t' + elem('p', elem('a','Click here to reload this page and see referer',
                                  'href="' + environ['REQUEST_SCHEME'] +
                                  '://' + environ['HTTP_HOST'] +
                                  environ['REQUEST_URI'] + '"')))

   a(et('body'))
   a(et('html'))

   # add newlines and calculate content length
   length = 0
   for i in xrange(len(out)):
      out[i] += '\n'
      length += len(out[i])

   response_headers = [('Content-type', 'text/html'),
                       ('Content-Length', str(length))]
   start_response(status, response_headers)

   return out

def st(tag, attr=""):
   """Generate HTML start tag"""
   if attr != "": #pad attr with a space
      attr = " " + attr
   return '<' + tag + attr + '>'

def et(tag):
   """Generate HTML end tag"""
   return '</' + tag + '>'

def elem(tag, content, attr=""):
   """Generate whole element"""
   return st(tag, attr) + content + et(tag)

def eelem(tag, attr=""):
   """Generate empty element"""
   attr += ' /' # add space, slash to end
   return st(tag, attr)

					

If you were so inclined, you could feed that IP address into a geoIP lookup program to get a rough idea of where the user is located, and say, place ads on your page for things relevant to that location. Or you could change the content depending on the browser being used. Suppose you are offering a software download; this informaion could be used to suggest that the user downloads a version of your software for their particular operating system. Lastly, the content of the page could change depending on how you got there (what page linked to your page).

That's it for this time. Have fun playing with Python and WSGI!