Sheep: Step By Step Wiki Engine - The Wrong Side Of Code

Ok, so you've seen that 50-line wiki, but you would like to know how I ... simple web server implementation in the Python standard library, but I didn't know how to ...
97KB taille 11 téléchargements 94 vues
Sheep: Step By Step Wiki Engine

Contents Step By Step Wiki Engine

Step By Step Wiki Engine Ok, so you’ve seen that 50-line wiki, but you would like to know how I actually wrote it? It’s not any special feat, actually witting exceptionally small programs, although takes much more time, seems to me to be easier than writing elaborate code for doing the same thing. Mostly because there is less room for the bugs. Anyways, I thought it could be beneficial to show how you actually do it, not just the end result. So here goes. This wiki engine was intended to be used as a earning tool, and I wanted it to work out-of-thebox anywhere possible (in this case, where Python is available). Because getting hosting service with Python is not trivial, and setting up your own web server may be too complicated on various operating systems, I decided that the engine must contain its own web server. I knew there is a simple web server implementation in the Python standard library, but I didn’t know how to use it. So, naturally, the first step was a simple test server: source code begin import BaseHTTPServer handler = BaseHTTPServer.BaseHTTPRequestHandler server = BaseHTTPServer.HTTPServer(("127.0.0.1", 8080), handler) server.serve_forever() source code end

This code is the basic HTTP server that just runs there on port 8080 of our local host (127.0.0.1 always points to the local computer, it’s sometimes called a loopback address) and responds with an error to every request: Error response Error code 501. Message: Unsupported method (’GET’). Error code explanation: 501 = Server does not support this operation. You can terminate it by pressing ctrl+c twice in the console where you run it. I’ve chosen the port 8080, not the default one, 80, because there may be already a server running on that port, and on most systems you would need adminitrator privileges to use it. I tell it to only run on the loopback interface, and not on all interfaces for security reasons – I don’t want anyone from the outside connecting to my experimental program. I can replace the "127.0.0.1" with just empty string "" to make it respond on all interfaces later. The reason why it responds with error is obvious: it doesn’t know how to do anything else, the handler we used is a blank slate, doesn’t do anything useful yet. To make it do something,

1

we need to add something to it, to extend it – and we can do that by making our own handler that inherits everything from the BaseHTTPRequestHandler, but in addition defines code to handle the GET and other methods. So, the next step is a simple “hello world”: source code begin import BaseHTTPServer class Handler(BaseHTTPServer.BaseHTTPRequestHandler): def do_GET(self): self.send_response(200) self.send_header("content-type", "text/plain") self.end_headers() self.wfile.write("Hello world!") server = BaseHTTPServer.HTTPServer(("127.0.0.1", 8080), Handler) server.serve_forever() source code end

I only implemented the do_GET method, because that’s the default way web browsers “get” the web pages. All the rest of the Handler is copied from the BaseHTTPRequestHandler – I don’t even know what required code may be possibly in there, but I know (from the Python documentation) that there are several useful methods in there: • the send_response method, that starts a reply to the web browser, and sends the response code that we specify. We still need to know the response codes, but fortunatelly they are well documented in the RFC. • the send_header method, that sends a HTTP header to the browser. In our case, we must define the MIME content type of the data we will be sending. Since in this simple “hello world” I don’t need anything sophisticated, I just send “text/plain”, which means that I will be sending normal text, without any special characters or formatting. • end_headers method just sends an empty line to the browser – this means end of the HTTP headers and beginning of the actual content. There is also a file-like object defined in the handler, called wfile, that I can write to to send tings to the web browser. I use it to send a “hello world” message. Directing our web browser to any address beginning with http://localhost:8080/ gives us: Hello world! Now we can display the pages, changing the content type is not a problem. But it would be nice to show different pages depending on the URL used. We can get that information from the path attribute: source code begin import BaseHTTPServer class Handler(BaseHTTPServer.BaseHTTPRequestHandler): def do_GET(self): self.send_response(200) self.send_header("content-type", "text/html; charset=utf-8") self.end_headers() self.wfile.write("""Sample

%s

Hello world! """ % self.path)

2

server = BaseHTTPServer.HTTPServer(("127.0.0.1", 8080), Handler) server.serve_forever() source code end

Nice, it displays the page title as intended, we can easily remove the slashes from beginning, and optionally also from the end, using strip. There is however a problem for non-English speakers. Try this URL: http://localhost:8080/Łaczka ˛ and you will see something like this: /%C5%81%C4%85czka Hello world! Not exactly as intended. What is happening? Only a small set of characters is allowed to appear inside URL, and all other characters have to be encoded in form of their numeric codes, prefixed with %. We set ourcharacter set to utf-8, so the url is encoded as utf-8. We just need to decode these characters. Fortunately, there is a ready function that does that in the Python standard library, in urllib. source code begin import BaseHTTPServer, urllib class Handler(BaseHTTPServer.BaseHTTPRequestHandler): def do_GET(self): self.send_response(200) self.send_header("content-type", "text/html; charset=utf-8") self.end_headers() page = urllib.unquote(self.path.strip(’/’)) self.wfile.write("""Sample

%s

Hello world! """ % page) server = BaseHTTPServer.HTTPServer(("127.0.0.1", 8080), Handler) server.serve_forever() source code end

This takes care of all the special characters. Let’s see how this works: http://localhost:8080/ac. Weird, the characters are eaten, together with the “i”. Let’s look into the text that our web browser got from the server: using the “view page source” option present in most modern browsers, we can see: Sample

ac

Hello world! The is there alright, so what is going on here? Wait, that “c” looks a little weird, slanted as if it was italic. What was the “” in HTML for? Right, these characters are treated as the HTML markup, not as content. What can we do to avoid that? The standard procedure is to encode the three special characters: “&”, “” as so-called entities. There is a list of available HTML entities, but we only need “&”, “” (they are derived from the names “ampersand”, “lesser than” and “greater than”). Note, that we must replace the “&” first, otherwise we would break the ampersands in the other entities. Note, that if you don’t escape all and any user-provided text in your web applications, you are opening a security hole and enabling cross-site scripting attacks (so-called XSS) and various tricks with styles. source code begin

3

import BaseHTTPServer, urllib class Handler(BaseHTTPServer.BaseHTTPRequestHandler): def escape_html(self, text): return text.replace("&", "&").replace(">", ">").replace("", ">").replace("", ">").replace("", ">").replace("", ">").replace("", ">").replace("", ">").replace("", ">").replace("", ">").replace("