(O'Reilly) Apache (The Definitive Guide, 3rd Edition).pdf - The Swiss Bay

In addition, the Apache Quick Reference Card provides an outline of Apache 1.3 and 2.0 syntax. ...... in the Apache Server Project main menu. But do read the ...
2MB taille 13 téléchargements 405 vues
Copyright Preface Who Wrote Apache, and Why? The Demonstration Code Conventions Used in This Book Organization of This Book Acknowledgments Chapter 1. Getting Started Section 1.1. What Does a Web Server Do? Section 1.2. How Apache Works Section 1.3. Apache and Networking Section 1.4. How HTTP Clients Work Section 1.5. What Happens at the Server End? Section 1.6. Planning the Apache Installation Section 1.7. Windows? Section 1.8. Which Apache? Section 1.9. Installing Apache Section 1.10. Building Apache 1.3.X Under Unix Section 1.11. New Features in Apache v2 Section 1.12. Making and Installing Apache v2 Under Unix Section 1.13. Apache Under Windows Chapter 2. Configuring Apache: The First Steps Section 2.1. What's Behind an Apache Web Site? Section 2.2. site.toddle Section 2.3. Setting Up a Unix Server Section 2.4. Setting Up a Win32 Server Section 2.5. Directives Section 2.6. Shared Objects Chapter 3. Toward a Real Web Site Section 3.1. More and Better Web Sites: site.simple Section 3.2. Butterthlies, Inc., Gets Going Section 3.3. Block Directives Section 3.4. Other Directives Section 3.5. HTTP Response Headers Section 3.6. Restarts Section 3.7. .htaccess Section 3.8. CERN Metafiles Section 3.9. Expirations Chapter 4. Virtual Hosts Section 4.1. Two Sites and Apache Section 4.2. Virtual Hosts

Section 4.3. Two Copies of Apache Section 4.4. Dynamically Configured Virtual Hosting Chapter 5. Authentication Section 5.1. Authentication Protocol Section 5.2. Authentication Directives Section 5.3. Passwords Under Unix Section 5.4. Passwords Under Win32 Section 5.5. Passwords over the Web Section 5.6. From the Client's Point of View Section 5.7. CGI Scripts Section 5.8. Variations on a Theme Section 5.9. Order, Allow, and Deny Section 5.10. DBM Files on Unix Section 5.11. Digest Authentication Section 5.12. Anonymous Access Section 5.13. Experiments Section 5.14. Automatic User Information Section 5.15. Using .htaccess Files Section 5.16. Overrides Chapter 6. Content Description and Modification Section 6.1. MIME Types Section 6.2. Content Negotiation Section 6.3. Language Negotiation Section 6.4. Type Maps Section 6.5. Browsers and HTTP 1.1 Section 6.6. Filters Chapter 7. Indexing Section 7.1. Making Better Indexes in Apache Section 7.2. Making Our Own Indexes Section 7.3. Imagemaps Section 7.4. Image Map Directives Chapter 8. Redirection Section 8.1. Alias Section 8.2. Rewrite Section 8.3. Speling Chapter 9. Proxying Section 9.1. Security Section 9.2. Proxy Directives Section 9.3. Apparent Bug Section 9.4. Performance Section 9.5. Setup

Chapter 10. Logging Section 10.1. Logging by Script and Database Section 10.2. Apache's Logging Facilities Section 10.3. Configuration Logging Section 10.4. Status Chapter 11. Security Section 11.1. Internal and External Users Section 11.2. Binary Signatures, Virtual Cash Section 11.3. Certificates Section 11.4. Firewalls Section 11.5. Legal Issues Section 11.6. Secure Sockets Layer (SSL) Section 11.7. Apache's Security Precautions Section 11.8. SSL Directives Section 11.9. Cipher Suites Section 11.10. Security in Real Life Section 11.11. Future Directions Chapter 12. Running a Big Web Site Section 12.1. Machine Setup Section 12.2. Server Security Section 12.3. Managing a Big Site Section 12.4. Supporting Software Section 12.5. Scalability Section 12.6. Load Balancing Chapter 13. Building Applications Section 13.1. Web Sites as Applications Section 13.2. Providing Application Logic Section 13.3. XML, XSLT, and Web Applications Chapter 14. Server-Side Includes Section 14.1. File Size Section 14.2. File Modification Time Section 14.3. Includes Section 14.4. Execute CGI Section 14.5. Echo Section 14.6. Apache v2: SSI Filters Chapter 15. PHP Section 15.1. Installing PHP Section 15.2. Site.php Chapter 16. CGI and Perl

Section 16.1. Section 16.2. Section 16.3. Section 16.4. Section 16.5. Section 16.6. Section 16.7. Section 16.8. Section 16.9.

The World of CGI Telling Apache About the Script Setting Environment Variables Cookies Script Directives suEXEC on Unix Handlers Actions Browsers

Chapter 17. mod_perl Section 17.1. How mod_perl Works Section 17.2. mod_perl Documentation Section 17.3. Installing mod_perl — The Simple Way Section 17.4. Modifying Your Scripts to Run Under mod_perl Section 17.5. Global Variables Section 17.6. Strict Pregame Section 17.7. Loading Changes Section 17.8. Opening and Closing Files Section 17.9. Configuring Apache to Use mod_perl Chapter 18. mod_jserv and Tomcat Section 18.1. mod_jserv Section 18.2. Tomcat Section 18.3. Connecting Tomcat to Apache Chapter 19. XML and Cocoon Section 19.1. XML Section 19.2. XML and Perl Section 19.3. Cocoon Section 19.4. Cocoon 1.8 and JServ Section 19.5. Cocoon 2.0.3 and Tomcat Section 19.6. Testing Cocoon Chapter 20. The Apache API Section 20.1. Documentation Section 20.2. APR Section 20.3. Pools Section 20.4. Per-Server Configuration Section 20.5. Per-Directory Configuration Section 20.6. Per-Request Information Section 20.7. Access to Configuration and Request Information Section 20.8. Hooks, Optional Hooks, and Optional Functions Section 20.9. Filters, Buckets, and Bucket Brigades Section 20.10. Modules

Chapter 21. Writing Apache Modules Section 21.1. Overview Section 21.2. Status Codes Section 21.3. The Module Structure Section 21.4. A Complete Example Section 21.5. General Hints Section 21.6. Porting to Apache 2.0 Appendix A. The Apache 1.x API Section A.1. Pools Section A.2. Per-Server Configuration Section A.3. Per-Directory Configuration Section A.4. Per-Request Information Section A.5. Access to Configuration and Request Information Section A.6. Functions Colophon Index

Copyright Copyright © O'Reilly & Associates, Inc. Printed in the United States of America. Published by O'Reilly & Associates, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O'Reilly & Associates books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safari.oreilly.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or [email protected]. Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks of O'Reilly & Associates, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O'Reilly & Associates, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. The association between the image of Appaloosa horse and the topic of Apache is a trademark of O'Reilly & Associates, Inc. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.

Preface Apache: The Definitive Guide, Third Edition, is principally about the Apache web-server software. We explain what a web server is and how it works, but our assumption is that most of our readers have used the World Wide Web and understand in practical terms how it works, and that they are now thinking about running their own servers and sites. This book takes the reader through the process of acquiring, compiling, installing, configuring, and modifying Apache. We exercise most of the package's functions by showing a set of example sites that take a reasonably typical web business — in our case, a postcard publisher — through a process of development and increasing complexity. However, we have deliberately tried to make each site as simple as possible, focusing on the particular feature being described. Each site is pretty well self-contained, so that the reader can refer to it while following the text without having to disentangle the meat from extraneous vegetables. If desired, it is possible to install and run each site on a suitable system. Perhaps it is worth saying what this book is not. It is not a manual, in the sense of formally documenting every command — such a manual exists on the Apache site and has been much improved with Versions 1.3 and 2.0; we assume that if you want to use Apache, you will download it and keep it at hand. Rather, if the manual is a road map that tells you how to get somewhere, this book tries to be a tourist guide that tells you why you might want to make the journey. In passing, we do reproduce some sections of the web site manual simply to save the reader the trouble of looking up the formal definitions as she follows the argument. Occasionally, we found the manual text hard to follow and in those cases we have changed the wording slightly. We have also interspersed comments as seemed useful at the time. This is not a book about HTML or creating web pages, or one about web security or even about running a web site. These are all complex subjects that should be either treated thoroughly or left alone. As a result, a webmaster's library might include books on the following topics: • • • • • •

The Web and how it works HTML — formal definitions, what you can do with it How to decide what sort of web site you want, how to organize it, and how to protect it How to implement the site you want using one of the available servers (for instance, Apache) Handbooks on Java, Perl, and other languages Security

Apache: The Definitive Guide is just one of the six or so possible titles in the fourth category.

Apache is a versatile package and is becoming more versatile every day, so we have not tried to illustrate every possible combination of commands; that would require a book of a million pages or so. Rather, we have tried to suggest lines of development that a typical webmaster could follow once an understanding of the basic concepts is achieved. We realized from our own experience that the hardest stage of learning how to use Apache in a real-life context is right at the beginning, where the novice webmaster often has to get Apache, a scripting language, and a database manager to collaborate. This can be very puzzling. In this new edition we have therefore included a good deal of new material which tries to take the reader up these conceptual precipices. Once the collaboration is working, development is much easier. These new chapters are not intended to be an experts' account of, say, the interaction between Apache, Perl, and MySQL — but a simple beginners' guide, explaining how to make these things work with Apache. In the process we make some comments, from our own experience, on the merits of the various software products from which the user has to choose. As with the first and second editions, writing the book was something of a race with Apache's developers. We wanted to be ready as soon as Version 2 was stable, but not before the developers had finished adding new features. In many of the examples that follow, the motivation for what we make Apache do is simple enough and requires little explanation (for example, the different index formats in Chapter 7). Elsewhere, we feel that the webmaster needs to be aware of wider issues (for instance, the security issues discussed in Chapter 11) before making sensible decisions about his site's configuration, and we have not hesitated to branch out to deal with them.

Who Wrote Apache, and Why? Apache gets its name from the fact that it consists of some existing code plus some patches. The FAQFAQ is netspeak for Frequently Asked Questions. Most sites/subjects have an FAQ file that tells you what the thing is, why it is, and where it's going. It is perfectly reasonable for the newcomer to ask for the FAQ to look up anything new to her, and indeed this is a sensible thing to do, since it reduces the number of questions asked. Apache's FAQ can be found at http://www.apache.org/docs/FAQ.html. thinks that this is cute; others may think it's the sort of joke that gets programmers a bad name. A more responsible group thinks that Apache is an appropriate title because of the resourcefulness and adaptability of the American Indian tribe. You have to understand that Apache is free to its users and is written by a team of volunteers who do not get paid for their work. Whether they decide to incorporate your or anyone else's ideas is entirely up to them. If you don't like what they do, feel free to collect a team and write your own web server or to adapt the existing Apache code — as many have. The first web server was built by the British physicist Tim Berners-Lee at CERN, the European Centre for Nuclear Research at Geneva, Switzerland. The immediate ancestor

of Apache was built by the U.S. government's NCSA, the National Center for Supercomputing Applications. Because this code was written with (American) taxpayers' money, it is available to all; you can, if you like, download the source code in C from http://www.ncsa.uiuc.edu, paying due attention to the license conditions. There were those who thought that things could be done better, and in the FAQ for Apache (at http://www.apache.org ), we read: ...Apache was originally based on code and ideas found in the most popular HTTP server of the time, NCSA httpd 1.3 (early 1995). That phrase "of the time" is nice. It usually refers to good times back in the 1700s or the early days of technology in the 1900s. But here it means back in the deliquescent bogs of a few years ago! While the Apache site is open to all, Apache is written by an invited group of (we hope) reasonably good programmers. One of the authors of this book, Ben, is a member of this group. Why do they bother? Why do these programmers, who presumably could be well paid for doing something else, sit up nights to work on Apache for our benefit? There is no such thing as a free lunch, so they do it for a number of typically human reasons. One might list, in no particular order: • •



• •

They want to do something more interesting than their day job, which might be writing stock control packages for BigBins, Inc. They want to be involved on the edge of what is happening. Working on a project like this is a pretty good way to keep up-to-date. After that comes consultancy on the next hot project. The more worldly ones might remember how, back in the old days of 1995, quite a lot of the people working on the web server at NCSA left for a thing called Netscape and became, in the passage of the age, zillionaires. It's fun. Developing good software is interesting and amusing, and you get to meet and work with other clever people. They are not doing the bit that programmers hate: explaining to end users why their treasure isn't working and trying to fix it in 10 minutes flat. If you want support on Apache, you have to consult one of several commercial organizations (see Appendix A), who, quite properly, want to be paid for doing the work everyone loathes.

The Demonstration Code The code for the demonstration web sites referred to throughout the book is available at http://www.oreilly.com/catalog/apache3/. It contains the requisite README file with installation instructions and other useful information. The contents of the download are organized into two directories: install/ This directory contains scripts to install the sample sites: install Run this script to install the sites. install.conf Unix configuration file for install. installwin.conf Win32 configuration file for install. sites/ This directory contains the sample sites used in the book.

Conventions Used in This Book This section covers the various conventions used in this book. Typographic Conventions Constant width Used for HTTP headers, status codes, MIME content types, directives in configuration files, commands, options/switches, functions, methods, variable names, and code within body text Constant width bold

Used in code segments to indicate input to be typed in by the user Constant width italic

Used for replaceable items in code and text

Italic Used for filenames, pathnames, newsgroup names, Internet addresses (URLs), email addresses, variable names (except in examples), terms being introduced, program names, subroutine names, CGI script names, hostnames, usernames, and group names Icons

Text marked with this icon applies to the Unix version of Apache.

Text marked with this icon applies to the Win32 version of Apache. This icon designates a note relating to the surrounding text.

This icon designates a warning related to the surrounding text.

Pathnames We use the text convention ... / to indicate your path to the demonstration sites, which may well be different from ours. For instance, on our Apache machine, we kept all the demonstration sites in the directory /usr/www. So, for example, our path would be /usr/www/site.simple. You might want to keep the sites somewhere other than /usr/www, so we refer to the path as ... /site.simple. Don't type .../ into your computer. The attempt will upset it!

Directives Apache is controlled through roughly 150 directives. For each directive, a formal explanation is given in the following format: Directive Syntax Where used

An explanation of the directive is located here. So, for instance, we have the following directive: ServerAdmin ServerAdmin email address Server config, virtual host ServerAdmin gives the email address for correspondence. It automatically generates error messages so the user has someone to write to in case of problems.

The Where used line explains the appropriate environment for the directive. This will become clearer later.

Organization of This Book The chapters that follow and their contents are listed here: Chapter 1 Covers web servers, how Apache works, TCP/IP, HTTP, hostnames, what a client does, what happens at the server end, choosing a Unix version, and compiling and installing Apache under both Unix and Win32. Chapter 2 Discusses getting Apache to run, creating Apache users, runtime flags, permissions, and site.simple. Chapter 3 Introduces a demonstration business, Butterthlies, Inc.; some HTML; default indexing of web pages; server housekeeping; and block directives.

Chapter 4 Explains how to connect web sites to network addresses, including the common case where more than one web site is hosted at a given network address. Chapter 5 Explains controlling access, collecting information about clients, cookies, DBM control, digest authentication, and anonymous access. Chapter 6 Covers content and language arbitration, type maps, and expiration of information. Chapter 7 Discusses better indexes, index options, your own indexes, and imagemaps. Chapter 8 Describes Alias, ScriptAlias, and the amazing Rewrite module. Chapter 9 Covers remote proxies and proxy caching. Chapter 10 Explains Apache's facilities for tracking activity on your web sites. Chapter 11 Explores the many aspects of protecting an Apache server and its content from uninvited guests and intruders, including user validation, binary signatures, virtual cash, certificates, firewalls, packet filtering, secure sockets layer (SSL), legal issues, patent rights, national security, and Apache-SSL directives. Chapter 12 Explains best practices for running large sites, including support for multiple content-creators, separating test sites from production sites, and integrating the site with other Internet technologies.

Chapter 13 Explores the options available for using Apache to host automatically changing content and interactive applications. Chapter 14 Explains using runtime commands in your HTML and XSSI — a more secure server-side include. Chapter 15 Explains how to install and configure PHP, with an example for connecting it to MySQL. Chapter 16 Demonstrates aliases, logs, HTML forms, a shell script, a CGI script in Perl, environment variables, and using MySQL through Perl and Apache. Chapter 17 Demonstrates how to install, configure, and use the mod_perl module for efficient processing of Perl applications. Chapter 18 Explains how to install these two modules for supporting Java in the Apache environment. Chapter 19 Explains how to use XML in conjunction with Apache and how to install and configure the Cocoon set of tools for presenting XML content. Chapter 20 Explores the foundations of the Apache 2.0 API. Chapter 21 Describes how to create Apache modules using the Apache 2.0 Apache Portable Runtime, including how to port modules from 1.3 to 2.0.

Appendix A Describes pools; per-server, per-directory, and per-request information; functions; warnings; and parsing. In addition, the Apache Quick Reference Card provides an outline of Apache 1.3 and 2.0 syntax.

Acknowledgments First, thanks to Robert S. Thau, who gave the world the Apache API and the code that implements it, and to the Apache Group, who worked on it before and have worked on it since. Thanks to Eric Young and Tim Hudson for giving SSLeay to the Web. Thanks to Bryan Blank, Aram Mirzadeh, Chuck Murcko, and Randy Terbush, who read early drafts of the first edition text and made many useful suggestions; and to John Ackermann, Geoff Meek, and Shane Owenby, who did the same for the second edition. For the third edition, we would like to thank our reviewers Evelyn Mitchell, Neil Neely, Lemon, Dirk-Willem van Gulik, Richard Sonnen, David Reid, Joe Johnston, Mike Stok, and Steven Champeon. We would also like to offer special thanks to Andrew Ford for giving us permission to reprint his Apache Quick Reference Card. Many thanks to Simon St.Laurent, our editor at O'Reilly, who patiently turned our text into a book — again. The two layers of blunders that remain are our own contribution. And finally, thanks to Camilla von Massenbach and Barbara Laurie, who have continued to put up with us while we rewrote this book.

Chapter 1. Getting Started • • • • • • • • • • • • •

1.1 What Does a Web Server Do? 1.2 How Apache Works 1.3 Apache and Networking 1.4 How HTTP Clients Work 1.5 What Happens at the Server End? 1.6 Planning the Apache Installation 1.7 Windows? 1.8 Which Apache? 1.9 Installing Apache 1.10 Building Apache 1.3.X Under Unix 1.11 New Features in Apache v2 1.12 Making and Installing Apache v2 Under Unix 1.13 Apache Under Windows

Apache is the dominant web server on the Internet today, filling a key place in the infrastructure of the Internet. This chapter will explore what web servers do and why you might choose the Apache web server, examine how your web server fits into the rest of your network infrastructure, and conclude by showing you how to install Apache on a variety of different systems.

1.1 What Does a Web Server Do? The whole business of a web server is to translate a URL either into a filename, and then send that file back over the Internet, or into a program name, and then run that program and send its output back. That is the meat of what it does: all the rest is trimming. When you fire up your browser and connect to the URL of someone's home page — say the notional http://www.butterthlies.com/ we shall meet later on — you send a message across the Internet to the machine at that address. That machine, you hope, is up and running; its Internet connection is working; and it is ready to receive and act on your message. URL stands for Uniform Resource Locator. A URL such as http://www.butterthlies.com/ comes in three parts: :///

So, in our example, < scheme> is http, meaning that the browser should use HTTP (Hypertext Transfer Protocol); is www.butterthlies.com ; and is /, traditionally meaning the top page of the host.[1] The may contain either an IP address or a name, which the browser will then convert to an IP address. Using HTTP 1.1, your browser might send the following request to the computer at that IP address: GET / HTTP/1.1

Host: www.butterthlies.com

The request arrives at port 80 (the default HTTP port) on the host www.butterthlies.com. The message is again in four parts: a method (an HTTP method, not a URL method), that in this case is GET, but could equally be PUT, POST, DELETE, or CONNECT; the Uniform Resource Identifier (URI) /; the version of the protocol we are using; and a series of headers that modify the request (in this case, a Host header, which is used for namebased virtual hosting: see Chapter 4). It is then up to the web server running on that host to make something of this message. The host machine may be a whole cluster of hypercomputers costing an oil sheik's ransom or just a humble PC. In either case, it had better be running a web server, a program that listens to the network and accepts and acts on this sort of message. 1.1.1 Criteria for Choosing a Web Server What do we want a web server to do? It should: • •

• •









Run fast, so it can cope with a lot of requests using a minimum of hardware. Support multitasking, so it can deal with more than one request at once and so that the person running it can maintain the data it hands out without having to shut the service down. Multitasking is hard to arrange within a program: the only way to do it properly is to run the server on a multitasking operating system. Authenticate requesters: some may be entitled to more services than others. When we come to handling money, this feature (see Chapter 11) becomes essential. Respond to errors in the messages it gets with answers that make sense in the context of what is going on. For instance, if a client requests a page that the server cannot find, the server should respond with a "404" error, which is defined by the HTTP specification to mean "page does not exist." Negotiate a style and language of response with the requester. For instance, it should — if the people running the server can rise to the challenge — be able to respond in the language of the requester's choice. This ability, of course, can open up your site to a lot more action. There are parts of the world where a response in the wrong language can be a bad thing. Support a variety of different formats. On a more technical level, a user might want JPEG image files rather than GIF, or TIFF rather than either of those. He might want text in vdi format rather than PostScript. Be able to run as a proxy server. A proxy server accepts requests for clients, forwards them to the real servers, and then sends the real servers' responses back to the clients. There are two reasons why you might want a proxy server: o The proxy might be running on the far side of a firewall (see Chapter 11), giving its users access to the Internet. o The proxy might cache popular pages to save reaccessing them. Be secure. The Internet world is like the real world, peopled by a lot of lambs and a few wolves.[2] The aim of a good server is to prevent the wolves from troubling

the lambs. The subject of security is so important that we will come back to it several times. 1.1.2 Why Apache? Apache has more than twice the market share than its next competitor, Microsoft. This is not just because it is freeware and costs nothing. It is also open source,[3] which means that the source code can be examined by anyone so inclined. If there are errors in it, thousands of pairs of eyes scan it for mistakes. Because of this constant examination by outsiders, it is substantially more reliable[4] than any commercial software product that can only rely on the scrutiny of a closed list of employees. This is particularly important in the field of security, where apparently trivial mistakes can have horrible consequences. Anyone is free to take the source code and change it to make Apache do something different. In particular, Apache is extensible through an established technology for writing new Modules (described in more detail in Chapter 20), which many people have used to introduce new features. Apache suits sites of all sizes and types. You can run a single personal page on it or an enormous site serving millions of regular visitors. You can use it to serve static files over the Web or as a frontend to applications that generate customized responses for visitors. Some developers use Apache as a test-server on their desktops, writing and trying code in a local environment before publishing it to a wider audience. Apache can be an appropriate solution for practically any situation involving the HTTP protocol. Apache is freeware . The intending user downloads the source code and compiles it (under Unix) or downloads the executable (for Windows) from http://www.apache.org or a suitable mirror site. Although it sounds difficult to download the source code and configure and compile it, it only takes about 20 minutes and is well worth the trouble. Many operating system vendors now bundle appropriate Apache binaries. The result of Apache's many advantages is clear. There are about 75 web-server software packages on the market. Their relative popularity is charted every month by Netcraft (http://www.netcraft.com). In July 2002, their June survey of active sites, shown in Table 1-1, had found that Apache ran nearly two-thirds of the sites they surveyed (continuing a trend that has been apparent for several years).

Table 1-1. Active sites counted by Netcraft survey, June 2002 Developer Apache Microsoft iPlanet Zeus

May 2002 10411000 4121697 247051 214498

Percent 65.11 25.78 1.55 1.34

June 2002 10964734 4243719 281681 227857

Percent 64.42 24.93 1.66 1.34

1.2 How Apache Works Apache is a program that runs under a suitable multitasking operating system. In the examples in this book, the operating systems are Unix and Windows 95/98/2000/Me/NT/..., which we call Win32. There are many others: flavors of Unix, IBM's OS/2, and Novell Netware. Mac OS X has a FreeBSD foundation and ships with Apache. The Apache binary is called httpd under Unix and apache.exe under Win32 and normally runs in the background.[5] Each copy of httpd/apache that is started has its attention directed at a web site, which is, for our purposes, a directory. Regardless of operating system, a site directory typically contains four subdirectories: conf Contains the configuration file(s), of which httpd.conf is the most important. It is referred to throughout this book as the Config file. It specifies the URLs that will be served. htdocs Contains the HTML files to be served up to the site's clients. This directory and those below it, the web space, are accessible to anyone on the Web and therefore pose a severe security risk if used for anything other than public data. logs Contains the log data, both of accesses and errors. cgi-bin Contains the CGI scripts. These are programs or shell scripts written by or for the webmaster that can be executed by Apache on behalf of its clients. It is most important, for security reasons, that this directory not be in the web space — that is, in .../htdocs or below. In its idling state, Apache does nothing but listen to the IP addresses specified in its Config file. When a request appears, Apache receives it and analyzes the headers. It then applies the rules it finds in the Config file and takes the appropriate action. The webmaster's main control over Apache is through the Config file. The webmaster has some 200 directives at her disposal, and most of this book is an account of what these directives do and how to use them to reasonable advantage. The webmaster also has a dozen flags she can use when Apache starts up.

We've quoted most of the formal definitions of the directives directly from the Apache site manual pages because rewriting seemed unlikely to improve them, but very likely to introduce errors. In a few cases, where they had evidently been written by someone who was not a native English speaker, we rearranged the syntax a little. As they stand, they save the reader having to break off and go to the Apache site

1.3 Apache and Networking At its core, Apache is about communication over networks. Apache uses the TCP/IP protocol as its foundation, providing an implementation of HTTP. Developers who want to use Apache should have at least a foundation understanding of TCP/IP and may need more advanced skills if they need to integrate Apache servers with other network infrastructure like firewalls and proxy servers. 1.3.1 What to Know About TCP/IP To understand the substance of this book, you need a modest knowledge of what TCP/IP is and what it does. You'll find more than enough information in Craig Hunt and Robert Bruce Thompson's books on TCP/IP,[6] but what follows is, we think, what is necessary to know for our book's purposes. TCP/IP (Transmission Control Protocol/Internet Protocol) is a set of protocols enabling computers to talk to each other over networks. The two protocols that give the suite its name are among the most important, but there are many others, and we shall meet some of them later. These protocols are embodied in programs on your computer written by someone or other; it doesn't much matter who. TCP/IP seems unusual among computer standards in that the programs that implement it actually work, and their authors have not tried too much to improve on the original conceptions. TCP/IP is generally only used where there is a network.[7] Each computer on a network that wants to use TCP/IP has an IP address, for example, 192.168.123.1. There are four parts in the address, separated by periods. Each part corresponds to a byte, so the whole address is four bytes long. You will, in consequence, seldom see any of the parts outside the range 0 -255. Although not required by the protocol, by convention there is a dividing line somewhere inside this number: to the left is the network number and to the right, the host number. Two machines on the same physical network — usually a local area network (LAN) — normally have the same network number and communicate directly using TCP/IP. How do we know where the dividing line is between network number and host number? The default dividing line used to be determined by the first of the four numbers, but a

shortage of addresses required a change to the use of subnet masks. These allow us to further subdivide the network by using more of the bits for the network number and less for the host number. Their correct use is rather technical, so we leave it to the routing experts. (You should not need to know the details of how this works in order to run a host, because the numbers you deal with are assigned to you by your network administrator or are just facts of the Internet.) Now we can think about how two machines with IP addresses X and Y talk to each other. If X and Y are on the same network and are correctly configured so that they have the same network number and different host numbers, they should be able to fire up TCP/IP and send packets to each other down their local, physical network without any further ado. If the network numbers are not the same, the packets are sent to a router, a special machine able to find out where the other machine is and deliver the packets to it. This communication may be over the Internet or might occur on your wide area network (WAN). There are several ways computers use IP to communicate. These are two of them: UDP (User Datagram Protocol) A way to send a single packet from one machine to another. It does not guarantee delivery, and there is no acknowledgment of receipt. DNS uses UDP, as do other applications that manage their own datagrams. Apache doesn't use UDP. TCP (Transmission Control Protocol) A way to establish communications between two computers. It reliably delivers messages of any size in the order they are sent. This is a better protocol for our purposes. 1.3.2 How Apache Uses TCP/IP Let's look at a server from the outside. We have a box in which there is a computer, software, and a connection to the outside world — Ethernet or a serial line to a modem, for example. This connection is known as an interface and is known to the world by its IP address. If the box had two interfaces, they would each have an IP address, and these addresses would normally be different. A single interface, on the other hand, may have more than one IP address (see Chapter 3). Requests arrive on an interface for a number of different services offered by the server using different protocols: • • •

Network News Transfer Protocol (NNTP): news Simple Mail Transfer Protocol (SMTP): mail Domain Name Service (DNS)



HTTP: World Wide Web

The server can decide how to handle these different requests because the four-byte IP address that leads the request to its interface is followed by a two-byte port number. Different services attach to different ports: • • • •

NNTP: port number 119 SMTP: port number 25 DNS: port number 53 HTTP: port number 80

As the local administrator or webmaster, you can decide to attach any service to any port. Of course, if you decide to step outside convention, you need to make sure that your clients share your thinking. Our concern here is just with HTTP and Apache. Apache, by default, listens to port number 80 because it deals in HTTP business.

Port numbers below 1024 can only be used by the superuser (root, under Unix); this prevents other users from running programs masquerading as standard services, but brings its own problems, as we shall see.

Under Win32 there is currently no security directly related to port numbers and no superuser (at least, not as far as port numbers are concerned). This basic setup is fine if our machine is providing only one web server to the world. In real life, you may want to host several, many, dozens, or even hundreds of servers, which appear to the world as completely different from each other. This situation was not anticipated by the authors of HTTP 1.0, so handling a number of hosts on one machine has to be done by a kludge, assigning multiple addresses to the same interface and distinguishing the virtual host by its IP address. This technique is known as IP-intensive virtual hosting. Using HTTP 1.1, virtual hosts may be created by assigning multiple names to the same IP address. The browser sends a Host header to say which name it is using. 1.3.3 Apache and Domain Name Servers In one way the Web is like the telephone system: each site has a number that uniquely identifies it — for instance, 192.168.123.5. In another way it is not: since these numbers are hard to remember, they are automatically linked to domain names — www.amazon.com, for instance, or www.butterthlies.com, which we shall meet later in examples in this book.

When you surf to http://www.amazon.com, your browser actually goes first to a specialist server called a Domain Name Server (DNS), which knows (how it knows doesn't concern us here) that this name translates into 208.202.218.15.It then asks the Web to connect it to that IP number. When you get an error message saying something like "DNS not found," it means that this process has broken down. Maybe you typed the URL incorrectly, or the server is down, or the person who set it up made a mistake — perhaps because he didn't read this book. A DNS error impacts Apache in various ways, but one that often catches the beginner is this: if Apache is presented with a URL that corresponds to a directory, but does not have a / at the end of it, then Apache will send a redirect to the same URL with the trailing / added. In order to do this, Apache needs to know its own hostname, which it will attempt to determine from DNS (unless it has been configured with the ServerName directive, covered in Chapter 2. Often when beginners are experimenting with Apache, their DNS is incorrectly set up, and great confusion can result. Watch out for it! Usually what will happen is that you will type in a URL to a browser with a name you are sure is correct, yet the browser will give you a DNS error, saying something like "Cannot find server." Usually, it is the name in the redirect that causes the problem. If adding a / to the end of your URL causes it, then you can be pretty sure that's what has happened.

1.3.3.1 Multiple sites: Unix It is fortunate that the crucial Unix utility ifconfig, which binds IP addresses to physical interfaces, often allows the binding of multiple IP numbers to a single interface so that people can switch from one IP number to another and maintain service during the transition. This is known as "IP aliasing" and can be used to maintain multiple "virtual" web servers on a single machine. In practical terms, on many versions of Unix, we run ifconfig to give multiple IP addresses to the same interface. The interface in this context is actually the bit of software — the driver — that handles the physical connection (Ethernet card, serial port, etc.) to the outside. While writing this book, we accessed the practice sites through an Ethernet connection between a Windows 95 machine (the client) and a FreeBSD box (the server) running Apache. Our environment was very untypical, since the whole thing sat on a desktop with no access to the Web. The FreeBSD box was set up using ifconfig in a script lan_setup, which contained the following lines: ifconfig ep0 192.168.123.2 ifconfig ep0 192.168.123.3 alias netmask 0xFFFFFFFF ifconfig ep0 192.168.124.1 alias

The first line binds the IP address 192.168.123.2 to the physical interface ep0. The second binds an alias of 192.168.123.3 to the same interface. We used a subnet mask (netmask 0xFFFFFFFF) to suppress a tedious error message generated by the FreeBSD TCP/IP stack. This address was used to demonstrate virtual hosts. We also bound yet

another IP address, 192.168.124.1, to the same interface, simulating a remote server to demonstrate Apache's proxy server. The important feature to note here is that the address 192.168.124.1 is on a different IP network from the address 192.168.123.2, even though it shares the same physical network. No subnet mask was needed in this case, as the error message it suppressed arose from the fact that 192.168.123.2 and 192.168.123.3 are on the same network. Unfortunately, each Unix implementation tends to do this slightly differently, so these commands may not work on your system. Check your manuals! In real life, we do not have much to do with IP addresses. Web sites (and Internet hosts generally) are known by their names, such as www.butterthlies.com or sales.butterthlies.com , which we shall meet later. On the authors' desktop system, these names both translate into 192.168.123.2. The distinction between them is made by Apache' Virtual Hosting mechanism — see Chapter 4.

1.3.3.2 Multiple sites: Win32 As far as we can discern, it is not possible to assign multiple IP addresses to a single interface under a standard Windows 95 system. On Windows NT it can be done via Control Panel Networks Protocols TCP/IP/Properties... IP Address Advanced. Later versions of Windows, notably Windows 2000 and XP, support multiple IP addresses through the TCP/IP properties dialog of the Local Area Network in the Network and Dial-up Settings area of the Start menu.

1.4 How HTTP Clients Work Once the server is set up, we can get down to business. The client has the easy end: it wants web action on a particular site, and it sends a request with a URL that begins with http to indicate what service it wants (other common services are ftp for File Transfer Protocolor https for HTTP with Secure Sockets Layer — SSL) and continues with these possible parts: //:@:/

RFC 1738 says: Some or all of the parts ":@", ":",":", and "/" may be omitted. The scheme specific data start with a double slash "//" to indicate that it complies with the common Internet scheme syntax. In real life, URLs look more like: http://www.apache.org/ — that is, there is no user and password pair, and there is no port. What happens? The browser observes that the URL starts with http: and deduces that it should be using the HTTP protocol. The client then contacts a name server, which uses DNS to resolve

www.apache.org to an IP address. At the time of writing, this was 63.251.56.142. One way to check the validity of a hostname is to go to the operating-system prompt[8] and type: ping www.apache.org

If that host is connected to the Internet, a response is returned: Pinging www.apache.org [63.251.56.142] with 32 bytes of data: Reply Reply Reply Reply

from from from from

63.251.56.142: 63.251.56.142: 63.251.56.142: 63.251.56.142:

bytes=32 bytes=32 bytes=32 bytes=32

time=278ms time=620ms time=285ms time=290ms

TTL=49 TTL=49 TTL=49 TTL=49

Ping statistics for 63.251.56.142:

A URL can be given more precision by attaching a post number: the web address http://www.apache.org doesn't include a port because it is port 80, the default, and the browser takes it for granted. If some other port is wanted, it is included in the URL after a colon — for example, http://www.apache.org:8000/. We will have more to do with ports later. The URL always includes a path, even if is only /. If the path is left out by the careless user, most browsers put it back in. If the path were /some/where/foo.html on port 8000, the URL would be http://www.apache.org:8000/some/where/foo.html. The client now makes a TCP connection to port number 8000 on IP 204.152.144.38 and sends the following message down the connection (if it is using HTTP 1.0): GET /some/where/foo.html HTTP/1.0

These carriage returns and line feeds (CRLF) are very important because they separate the HTTP header from its body. If the request were a POST, there would be data following. The server sends the response back and closes the connection. To see it in action, connect again to the Internet, get a command-line prompt, and type the following: % telnet www.apache.org 80 > telnet www.apache.org 80 GET http://www.apache.org/foundation/contact.html HTTP/1.1 Host: www.apache.org

On Win98, telnet puts up a dialog box. Click connect remote system, and change Port from "telnet" to "80". In Terminal preferences, check "local echo". Then type this, followed by two Returns: GET http://www.apache.org/foundation/contact.html HTTP/1.1 Host: www.apache.org

You should see text similar to that which follows. Some implementations of telnet rather unnervingly don't echo what you type to the screen, so it seems that nothing is happening. Nevertheless, a whole mess of response streams past: Trying 64.125.133.20... Connected to www.apache.org. Escape character is '^]'. HTTP/1.1 200 OK Date: Mon, 25 Feb 2002 15:03:19 GMT Server: Apache/2.0.32 (Unix) Cache-Control: max-age=86400 Expires: Tue, 26 Feb 2002 15:03:19 GMT Accept-Ranges: bytes Content-Length: 4946 Content-Type: text/html <meta http-equiv="Content-Type" content="text/html; charset=iso-88591" /> Contact Information--The Apache Software Foundation

Apache Projects

  • HTTP Server
  • APR
  • Jakarta
  • Perl
  • PHP
  • TCL
  • XML
  • Conferences


  • Foundation
  • ...... and so on

    1.5 What Happens at the Server End? We assume that the server is well set up and running Apache. What does Apache do? In the simplest terms, it gets a URL from the Internet, turns it into a filename, and sends the file (or its output if it is a program)[9] back down the Internet. That's all it does, and that's all this book is about! Two main cases arise: •

    The Unix server has a standalone Apache that listens to one or more ports (port 80 by default) on one or more IP addresses mapped onto the interfaces of its machine. In this mode (known as standalone mode), Apache actually runs several copies of itself to handle multiple connections simultaneously. •

    On Windows, there is a single process with multiple threads. Each thread services a single connection. This currently limits Apache 1.3 to 64 simultaneous connections, because there's a system limit of 64 objects for which you can wait at once. This is something of a disadvantage because a busy site can have several hundred simultaneous connections. It has been improved in Apache 2.0. The default maximim is now 1920 — but even that can be extended at compile time. Both cases boil down to an Apache server with an incoming connection. Remember our first statement in this section, namely, that the object of the whole exercise is to resolve the incoming request either into a filename or the name of a script, which generates data internally on the fly. Apache thus first determines which IP address and port number were used by asking the operating system to where the connection is connecting. Apache then uses the IP address, port number — and the Host header in HTTP 1.1 — to decide which virtual host is the target of this request. The virtual host then looks at the path, which was handed to it in the request, and reads that against its configuration to decide on the appropriate response, which it then returns. Most of this book is about the possible appropriate responses and how Apache decides which one to use.

    1.6 Planning the Apache Installation Unless you're using a prepackaged installation, you'll want to do some planning before setting up the software. You'll need to consider network integration, operating system choices, Apache version choices, and the many modules available for Apache. Even if

    you're just using Apache at an ISP, you may want to know which choices the ISP made in its installation. 1.6.1 Fitting Apache into Your Network Apache installations come in many flavors. If an installation is intended only for local use on a developer's machine, it probably needs much less integration with network systems than an installation meant as public host supporting thousands of simultaneous hits. Apache itself provides network and security functionality, but you'll need to set up supporting services separately, like the DNS that identifies your server to the network or the routing that connects it to the rest of the network. Some servers operate behind firewalls, and firewall configuration may also be an issue. If these are concerns for you, involve your network administrator early in the process. 1.6.2 Which Operating System? Many webmasters have no choice of operating system — they have to use what's in the box on their desks — but if they have a choice, the first decision to make is between Unix and Windows. As the reader who persists with us will discover, much of the Apache Group and your authors prefer Unix. It is, itself, essentially open source. Over the last 30 years it has been the subject of intense scrutiny and improvement by many thousands of people. On the other hand, Windows is widely available, and Apache support for Windows has improved substantially in Apache 2.0. 1.6.3 Which Unix? The choice is commonly between some sort of Linux and FreeBSD. Both are technically acceptable. If you already know someone who has one of these OSs and is willing to help you get used to yours, then it would make sense to follow them. If you are an Apple user, OS X has a Unix core and includes Apache. Failing that, the difference between the two paths is mainly a legal one, turning on their different interperations of open source licensing. Linux lives at http://www.linux.org, and there are more than 160 different distributions from which Linux can be obtained free or in prepackaged pay-for formats. It is rather ominously described as a "Unix-type" operating system, which sometimes means that long-established Unix standards have been "improved", not always in an upwards direction. Linux supports Apache, and most of the standard distributions include it. However, the default position of the Config files may vary from platform to platform, though usually on Linux they are to be found in /etc. Under Red Hat Linux they will be in/etc/httpd/conf by default.

    FreeBSD ("BSD" means "Berkeley Software Distribution" — as in the University of California, Berkeley, where the version of Unix FreeBSD is derived from) lives at http://www.freebsd.org. We have been using FreeBSD for a long time and think it is the best environment. If you look at http://www.netcraft.com and go to What's that site running?, you can examine any web site you like. If you choose, let's say, http://www.microsoft.com, you will discover that the site's uptime (length of time between rebooting the server) is about 12 days, on average. One assumes that Microsoft's servers are running under their own operating systems. The page Longest uptimes, also at Netcraft, shows that many Apache servers running Unix have uptimes of more than 1380 days (which is probably as long as Netcraft had been running the survey when we looked at it). One of the authors (BL) has a server running FreeBSD that has been rebooted once in 15 years, and that was when he moved house. The whole of FreeBSD is freely available from http://www.freebsd.org/. But we would suggest that it's well worth spending a few dollars to get the software on CD-ROM or DVD plus a manual that takes you though the installation process. If you plan to run Apache 2.0 on FreeBSD, you need to install FreeBSD 4.x to take advantage of Apache's support for threads: earlier versions of FreeBSD do not support them, at least not well enough to run Apache. If you use FreeBSD, you will find (we hope) that it installs from the CD-ROM easily enough, but that it initially lacks several things you will need later. Among these are Perl, Emacs, and some better shell than sh (we like bash and ksh), so it might be sensible to install them straightaway from their lurking places on the CD-ROM.

    1.7 Windows? The main problem with the Win32 version of Apache lies in its security, which must depend, in turn, on the security of the underlying operating system. Unfortunately, Windows 95, Windows 98, and their successors have no effective security worth mentioning. Windows NT and Windows 2000 have a large number of security features, but they are poorly documented, hard to understand, and have not been subjected to the decades of public inspection, discussion, testing, and hacking that have forged Unix security into a fortress that can pretty well be relied upon. It is a grave drawback to Windows that the source code is kept hidden in Microsoft's hands so that it does not benefit from the scrutiny of the computing community. It is precisely because the source code of free software is exposed to millions of critical eyes that it works as well as it does. In the view of the Apache development group, the Win32 version is useful for easy testing of a proposed web site. But if money is involved, you would be wise to transfer the site to Unix before exposure to the public and the Bad Guys.

    1.8 Which Apache? At the time this edition was prepared, Apache 1.3.26 was the stable release. It has an improved build system (see the section that follows). Both the Unix and Windows versions were thought to be in good shape. Apache 2.0 had made it through beta test into full release. We suggest that if you are working under Unix and you don't need Apache 2.0's improved features (which are multitudinous but not fundamental for the ordinary webmaster), you go for Version 1.3.26 or later. 1.8.1 Apache 2.0 Apache 2.0 is a major new version. The main new features are multithreading (on platforms that support it), layered I/O (also known as filters), and a rationalized API. The ordinary user will see very little difference, but the programmer writing new modules (see the section that follows) will find a substantial change, which is reflected in our rewritten Chapter 20 and Chapter 21. However, the improvements in Apache v2.0 look to the future rather than trying to improve the present. The authors are not planning to transfer their own web sites to v2.0 any time soon and do not expect many other sites to do so either. In fact, many sites are still happily running Apache v1.2, which was nominally superseded several years ago. There are good security reasons for them to upgrade to v1.3. 1.8.2 Apache 2.0 and Win32 Apache 2.0 is designed to run on Windows NT and 2000. The binary installer will only work with x86 processors. In all cases, TCP/IP networking must be installed. If you are using NT 4.0, install Service Pack 3 or 6, since Pack 4 had TCP/IP problems. It is not recommended that Windows 95 or 98 ever be used for production servers and, when we went to press, Apache 2.0 would not run under either at all. See http://www.apache.org/docs-2.0/platform/windows.html.

    1.9 Installing Apache There are two ways of getting Apache running on your machine: by downloading an appropriate executable or by getting the source code and compiling it. Which is better depends on your operating system. 1.9.1 Apache Executables for Unix The fairly painless business of compiling Apache, which is described later, can now be circumvented by downloading a precompiled binary for the Unix of your choice. When we went to press, the following operating systems (mostly versions of Unix) were suported, but check before you decide. (See http://httpd.apache.org/dist/httpd/binaries.) aix darwin

    aux dgux

    beos digitalunix

    bs2000-osd freebsd

    bsdi hpux

    irix netware qnx sunos

    linux openbsd reliantunix unixware

    macosx os2 rhapsody win32

    macosxserver os390 sinix

    netbsd osf1 solaris

    Although this route is easier, you do forfeit the opportunity to configure the modules of your Apache, and you lose the chance to carry out quite a complex Unix operation, which is in itself interesting and confidence-inspiring if you are not very familiar with this operating system. 1.9.2 Making Apache 1.3.X Under Unix Download the most recent Apache source code from a suitable mirror site: a list can be found at http://www.apache.org/[10]. You will get a compressed file — with the extension .gz if it has been gzipped or .Z if it has been compressed. Most Unix software available on the Web (including the Apache source code) is zipped using gzip, a GNU compression tool. When expanded, the Apache .tar file creates a tree of subdirectories. Each new release does the same, so you need to create a directory on your FreeBSD machine where all this can live sensibly. We put all our source directories in /usr/src/apache. Go there, copy the .tar.gz or .tar.Z file, and uncompress the .Z version or gunzip (or gzip -d ) the .gz version: uncompress .tar.Z

    or: gzip -d .tar.gz

    Make sure that the resulting file is called .tar, or tar may turn up its nose. If not, type: mv .tar

    Now unpack it: % tar xvf .tar

    Incidentally, modern versions of tar will unzip as well: % tar xvfz .tar.gz

    Keep the .tar file because you will need to start fresh to make the SSL version later on (see Chapter 11). The file will make itself a subdirectory, such as apache_1.3.14.

    Under Red Hat Linux you install the .rpmfile and type: rpm -i apache

    Under Debian: aptget install apache

    The next task is to turn the source files you have just downloaded into the executable httpd. But before we can discuss that that, we need to talk about Apache modules. 1.9.3 Modules Under Unix Apache can do a wide range of things, not all of which are needed on every web site. Those that are needed are often not all needed all the time. The more capability the executable, httpd, has, the bigger it is. Even though RAM is cheap, it isn't so cheap that the size of the executable has no effect. Apache handles user requests by starting up a new version of itself for each one that comes in. All the versions share the same static executable code, but each one has to have its own dynamic RAM. In most cases this is not much, but in some — as in mod_perl (see Chapter 17) — it can be huge. The problem is handled by dividing Apache's functionality into modules and allowing the webmaster to choose which modules to include into the executable. A sensible choice can markedly reduce the size of the program. There are two ways of doing this. One is to choose which modules you want and then to compile them in permanently. The other is to load them when Apache is run, using the Dynamic Shared Object (DSO) mechanism — which is somewhat like Dynamic Link Libraries (DLL) under Windows. In the two previous editions of this book, we deprecated DSO because: • •

    It was experimental and not very reliable. The underlying mechanism varies strongly from Unix to Unix so it was, to begin with, not available on many platforms.

    However, things have moved on, the list of supported platforms is much longer, and the bugs have been ironed out. When we went to press, the following operating systems were supported: Linux Darwin/Mac OS OpenStep/Mach SCO HPUX Digital Unix

    SunOS FreeBSD OpenBSD DYNIX/ptx ReliantUNIX DGUX

    UnixWare AIX IRIX NetBSD BSDI

    Ultrix was entirely unsupported. If you use an operating system that is not mentioned

    here, consult the notes in INSTALL. More reasons for using DSOs are: • • •

    Web sites are also getting more complicated so they often positively need DSOs. Some distributions of Apache, like Red Hat's, are supplied without any compiledin modules at all. Some useful packages, such as Tomcat (see Chapter 17), are only available as shared objects.

    Having said all this, it is also true that using DSOs makes the novice webmaster's life more complicated than it need be. You need to create the DSOs at compile time and invoke them at runtime. The list of them clogs up the Config file (which is tricky enough to get right even when it is small), offers plenty of opportunity for typing mistakes, and, if you are using Apache v1.3.X, must be in the correct order (under Apache v2.0 the DSO list can be in any order). Our advice on DSOs is not to use them unless: • • •

    You have a precompiled version of Apache (e.g., from Red Hat) that only handles modules as DSOs. You need to invoke the DSO mechanism to use a package such as Tomcat (see Chapter 17). Your web site is so busy that executable size is really hurting performance. In practice, this is extremely unlikely, since the code is shared across all instances on every platform we know of.

    If none of these apply, note that DSOs exist and leave them alone.

    1.9.3.1 Compiled in modules This method is simple. You select the modules you want, or take the default list in either of the following methods, and compile away. We will discuss this in detail here.

    1.9.3.2 DSO modules To create an Apache that can use the DSO mechanism as a specific shared object, the compile process has to create a detached chunk of executable code — the shared object. This will be a file like (in our layout) /usr/src/apache/apache_1.3.26/src/modules/standard/mod_alias.so. If all the modules are defined to be DSOs, Apache ends up with only two compiled-in modules: core and mod_so. The first is the real Apache; the second handles DSO loading and running.

    You can, of course, mix the two methods and have the standard modules compiled in with DSO for things like Tomcat.

    1.9.3.3 APXS Once mod_so has been compiled in (see later), the necessary hooks for a shared object can be inserted into the Apache executable, httpd, at any time by using the utility apxs: apxs -i -a -c mod_foo.c

    This would make it possible to link in mod_foo at runtime. For practical details see the manual page by running man apxs or search http://www.apache.org for "apxs". The apxs utility is only built if you use the configure method — see Section 1.10.1 later in this chapter. Note that if you are running a version of Apache prior to 1.3.24, have previously configured Apache and now reconfigure it, you'll need to remove src/support/apxs to force a rebuild when you remake Apache. You will also need to reinstall Apache. If you do not do all this, things that use apxs may mysteriously fail.

    1.10 Building Apache 1.3.X Under Unix There are two methods for building Apache: the "Semimanual Method" and "Out of the Box". They each involve the user in about the same amount of keyboard work: if you are happy with the defaults, you need do very little; if you want to do a custom build, you have to do more typing to specify what you want. Both methods rely on a shell script that, when run, creates a Makefile. When you run make, this, in turn, builds the Apache executable with the side orders you asked for. Then you copy the executable to its home (Semimanual Method) or run make install (Out of the Box) and the various necessary files are moved to the appropriate places around the machine. Between the two methods, there is not a tremendous amount to choose. We prefer the Semimanual Method because it is older[11] and more reliable. It is also nearer to the reality of what is happening and generates its own record of what you did last time so you can do it again without having to perform feats of memory. Out of the Box is easier if you want a default build. If you want a custom build and you want to be able to repeat it later, you would do the build from a script that can get quite large. On the other hand, you can create several different scripts to trigger different builds if you need to. 1.10.1 Out of the Box Until Apache 1.3, there was no real out-of-the-box batch-capable build and installation procedure for the complete Apache package. This method is provided by a top-level configure script and a corresponding top-level Makefile.tmpl file. The goal is to provide a

    GNU Autoconf-style frontend that is capable of driving the old src/Configure stuff in batch. Once you have extracted the sources (see earlier), the build process can be done in a minimum of three command lines — which is how most Unix software is built nowadays. Change yourself to root before you run ./configure; otherwise, if you use the default build configuration (which we suggest you do not), the server will be looking at port 8080 and will, confusingly, refuse requests to the default port, 80. The result is, as you will be told during the process, probably not what you really want: ./configure make make install

    This will build Apache and install it, but we suggest you read on before deciding to do it this way. If you do this — and then decide to do something different, do: make clean

    afterwards, to tidy up. Don't forget to delete the files created with: rm -R /usr/local/apache

    Readers who have done some programming will recognize that configure is a shell script that creates a Makefile. The command make uses it to check a lot of stuff, sets compiler variables, and compiles Apache. The command make install puts the numerous components in their correct places around your machine, using, in this case, the default Apache layout, which we do not particularly like. So, we recommend a slightly more elaborate procedure, which uses the GNU layout. The GNU layout is probably the best for users who don't have any preconcieved ideas. As Apache involves more and more third-party materials and this scheme tends to be used by more and more players, it also tends to simplify the business of bringing new packages into your installation. A useful installation, bearing in mind what we said about modules earlier and assuming you want to use the mod_proxy DSO, is produced by: make clean ./configure --with-layout=GNU \ --enable-module=proxy --enable-shared=proxy make make install

    ( the \ character lets the arguments carry over to a new line). You can repeat the -enable- commands for as many shared objects as you like.

    If you want to compile in hooks for all the DSOs, use: ./configure --with-layout=GNU --enable-shared=max make make install

    If you then repeat the ./configure... line with --show-layout > layout added on the end, you get a map of where everything is in the file layout. However, there is an nifty little gotcha here — if you use this line in the previous sequence, the --showlayout command turns off acutal configuration. You don't notice because the output is going to the file, and when you do make and make install, you are using whichever previous ./configure actually rewrote the Makefile — or if you haven't already done a ./configure, you are building the default, old Apache-style configuration. This can be a bit puzzling. So, be sure to run this command only after completeing the installation, as it will reset the configuration file. If everything has gone well, you should look in /usr/local/sbin to find the new executables. Use the command ls -l to see the timestamps to make sure they came from the build you have just done (it is surprisingly easy to do several different builds in a row and get the files mixed up): total 1054 -rwxr-xr-x -rwxr-xr-x -rwxr-xr-x -rwxr-xr-x -rwxr-xr-x -rw-r--r--rwxr-xr-x

    1 1 1 1 1 1 1

    root root root root root root root

    wheel wheel wheel wheel wheel wheel wheel

    22972 7061 20422 409371 7000 0 4360

    Dec Dec Dec Dec Dec Dec Dec

    31 31 31 31 31 31 31

    14:04 14:04 14:04 14:04 14:04 14:17 14:04

    ab apachectl apxs httpd logresolve peter rotatelogs

    Here is the file layout (remember that this output means that no configuration was done): Configuring for Apache, Version 1.3.26 + using installation path layout: GNU (config.layout) Installation paths: prefix: exec_prefix: bindir: sbindir: libexecdir: mandir: sysconfdir: datadir: iconsdir: htdocsdir: cgidir: includedir: localstatedir: runtimedir: logfiledir:

    /usr/local /usr/local /usr/local/bin /usr/local/sbin /usr/local/libexec /usr/local/man /usr/local/etc/httpd /usr/local/share/httpd /usr/local/share/httpd/icons /usr/local/share/httpd/htdocs /usr/local/share/httpd/cgi-bin /usr/local/include/httpd /usr/local/var/httpd /usr/local/var/httpd/run /usr/local/var/httpd/log

    proxycachedir: /usr/local/var/httpd/proxy Compilation paths: HTTPD_ROOT: SHARED_CORE_DIR: DEFAULT_PIDLOG: DEFAULT_SCOREBOARD: DEFAULT_LOCKFILE: DEFAULT_XFERLOG: DEFAULT_ERRORLOG: TYPES_CONFIG_FILE: SERVER_CONFIG_FILE: ACCESS_CONFIG_FILE: RESOURCE_CONFIG_FILE:

    /usr/local /usr/local/libexec var/httpd/run/httpd.pid var/httpd/run/httpd.scoreboard var/httpd/run/httpd.lock var/httpd/log/access_log var/httpd/log/error_log etc/httpd/mime.types etc/httpd/httpd.conf etc/httpd/access.conf etc/httpd/srm.conf

    Since httpd should now be on your path, you can use it to find out what happened by running it, followed by one of a number of flags. Enter httpd -h. You see the following: httpd: illegal option -- ? Usage: httpd [-D name] [-d directory] [-f file] [-C "directive"] [-c "directive"] [-v] [-V] [-h] [-l] [-L] [-S] [-t] [-T] Options: -D name : define a name for use in directives -d directory : specify an alternate initial ServerRoot -f file : specify an alternate ServerConfigFile -C "directive" : process directive before reading config files -c "directive" : process directive after reading config files -v : show version number -V : show compile settings -h : list available command line options (this page) -l : list compiled-in modules -L : list available configuration directives -S : show parsed settings (currently only vhost settings) -t : run syntax check for config files (with docroot check) -T : run syntax check for config files (without docroot check)

    A useful flag is httpd -l, which gives a list of compiled-in modules: Compiled-in modules: http_core.c mod_env.c mod_log_config.c mod_mime.c mod_negotiation.c mod_status.c mod_include.c mod_autoindex.c mod_dir.c mod_cgi.c mod_asis.c

    mod_imap.c mod_actions.c mod_userdir.c mod_alias.c mod_access.c mod_auth.c mod_so.c mod_setenvif.c

    This list is the result of a build with only one DSO: mod_alias. All the other modules are compiled in, among which we find mod_so to handle the shared object. The compiled shared objects appear in /usr/local/libexec. as .so files. You will notice that the file /usr/local/etc/httpd/httpd.conf.default has an amazing amount of information it it — an attempt, in fact, to explain the whole of Apache. Since the rest of this book is also an attempt to present the same information in an expanded and digestible form, we do not suggest that you try to read the file with any great attention. However, it has in it a useful list of the directives you will later need to invoke DSOs — if you want to use them. In the /usr/src/apache/apache_XX directory you ought to read INSTALL and README.configure for background. 1.10.2 Semimanual Build Method Go to the top directory of the unpacked download — we used /usr/src/apache/apache1_3.26. Start off by reading README. This tells you how to compile Apache. The first thing it wants you to do is to go to the src subdirectory and read INSTALL. To go further, you must have an ANSI C-compliant compiler. Most Unices come with a suitable compiler; if not, GNU gcc works fine. If you have downloaded a beta test version, you first have to copy .../src/Configuration.tmpl to Configuration. We then have to edit Configuration to set things up properly. The whole file is in Appendix A of the installation kit. A script called Configure then uses Configuration and Makefile.tmpl to create your operational Makefile. (Don't attack Makefile directly; any editing you do will be lost as soon as you run Configure again.) It is usually only necessary to edit the Configuration file to select the permanent modules required (see the next section). Alternatively, you can specify them on the command line. The file will then automatically identify the version of Unix, the compiler to be used, the compiler flags, and so forth. It certainly all worked for us under FreeBSD without any trouble at all. Configuration has five kinds of things in it: • •

    Comment lines starting with # Rules starting with the word Rule

    • • •

    Commands to be inserted into Makefile , starting with nothing Module selection lines beginning with AddModule, which specify the modules you want compiled and enabled Optional module selection lines beginning with %Module, which specify modules that you want compiled-but not enabled until you issue the appropriate directive

    For the moment, we will only be reading the comments and occasionally turning a comment into a command by removing the leading #, or vice versa. Most comments are in front of optional module-inclusion lines to disable them. 1.10.3 Choosing Modules Inclusion of modules is done by uncommenting (removing the leading #) lines in Configuration. The only drawback to including more modules is an increase in the size of your binary and an imperceptible degradation in performance.[12] The default Configuration file includes the modules listed here, together with a lot of chat and comment that we have removed for clarity. Modules that are compiled into the Win32 core are marked with "W"; those that are supplied as a standard Win32 DLL are marked "WD." Our final list is as follows: AddModule modules/standard/mod_env.o Sets up environment variables to be passed to CGI scripts. AddModule modules/standard/mod_log_config.o Determines logging configuration. AddModule modules/standard/mod_mime_magic.o Determines the type of a file. AddModule modules/standard/mod_mime.o Maps file extensions to content types. AddModule modules/standard/mod_negotiation.o Allows content selection based on Accept headers. AddModule modules/standard/mod_status.o (WD) Gives access to server status information. AddModule modules/standard/mod_info.o

    Gives access to configuration information. AddModule modules/standard/mod_include.o Translates server-side include statements in CGI texts. AddModule modules/standard/mod_autoindex.o Indexes directories without an index file. AddModule modules/standard/mod_dir.o Handles requests on directories and directory index files. AddModule modules/standard/mod_cgi.o Executes CGI scripts. AddModule modules/standard/mod_asis.o Implements .asis file types. AddModule modules/standard/mod_imap.o Executes imagemaps. AddModule modules/standard/mod_actions.o Specifies CGI scripts to act as handlers for particular file types. AddModule modules/standard/mod_speling.o Corrects common spelling mistakes in requests. AddModule modules/standard/mod_userdir.o Selects resource directories by username and a common prefix. AddModule modules/proxy/libproxy.o Allows Apache to run as a proxy server; should be commented out if not needed. AddModule modules/standard/mod_alias.o Provides simple URL translation and redirection.

    AddModule modules/standard/mod_rewrite.o (WD) Rewrites requested URIs using specified rules. AddModule modules/standard/mod_access.o Provides access control. AddModule modules/standard/mod_auth.o Provides authorization control. AddModule modules/standard/mod_auth_anon.o (WD) Provides FTP-style anonymous username/password authentication. AddModule modules/standard/mod_auth_db.o Manages a database of passwords; alternative to mod_auth_dbm.o. AddModule modules/standard/mod_cern_meta.o (WD) Implements metainformation files compatible with the CERN web server. AddModule modules/standard/mod_digest.o (WD) Implements HTTP digest authentication; more secure than the others. AddModule modules/standard/mod_expires.o (WD) Applies Expires headers to resources. AddModule modules/standard/mod_headers.o (WD) Sets arbitrary HTTP response headers. AddModule modules/standard/mod_usertrack.o (WD) Tracks users by means of cookies. It is not necessary to use cookies. AddModule modules/standard/mod_unique_id.o Generates an ID for each hit. May not work on all systems. AddModule modules/standard/mod_so.o

    Loads modules at runtime. Experimental. AddModule modules/standard/mod_setenvif.o Sets environment variables based on header fields in the request. Here are the modules we commented out, and why: # AddModule modules/standard/mod_log_agent.o Not relevant here — CERN holdover. # AddModule modules/standard/mod_log_referer.o Not relevant here — CERN holdover. # AddModule modules/standard/mod_auth_dbm.o Can't have both this and mod_auth_db.o. Doesn't work with Win32. # AddModule modules/example/mod_example.o Only for testing APIs (see Chapter 20). These are the "standard" Apache modules, approved and supported by the Apache Group as a whole. There are a number of other modules available (see http://modules.apache.org). Although we mentioned mod_auth_db.o and mod_auth_dbm.o earlier, they provide equivalent functionality and shouldn't be compiled together. We have left out any modules described as experimental. Any disparity between the directives listed in this book and the list obtained by starting Apache with the -h flag is probably caused by the errant directive having moved out of experimental status since we went to press. Later on, when we are writing Apache configuration scripts, we can make them adapt to the modules we include or exclude with the IfModule directive. This allows you to give out predefined Config files that always work (in the sense of Apache loading), regardless of what mix of modules is actually compiled. Thus, for instance, we can adapt to the absence of configurable logging with the following: ... LogFormat "customers: host %h, logname %l, user %u, time %t, request %r, status %s, bytes %b"

    ...

    1.10.4 Shared Objects If you want to enable shared objects in this method, see the notes in the Configuration file. Essentially, you do the following: 1. Enable mod_so by uncommenting its line. 2. Change an existing AddModule /.o so it ends in .so rather than .o and, of course, making sure the path is correct. 1.10.5 Configuration Settings and Rules Most Apache users won't have to bother with this section at all. However, you can specify extra compiler flags (for instance, optimization commands), libraries, or includes by giving values to the following : EXTRA_CFLAGS= EXTRA_LDFLAGS= EXTRA_LIBS= EXTRA_INCLUDES=

    Configure will try to guess your operating system and compiler; therefore, unless things go wrong, you won't need to uncomment and give values to these: #CC= #OPTIM=-02 #RANLIB=

    The rules in the Configuration file allow you to adapt for a few exotic configuration problems. The syntax of a rule in Configuration is as follows: Rule RULE =value

    The possible values are as follows: yes Configure does what is required. default Configure makes a best guess. Any other value is ignored. The Rule s are as follows:

    STATUS If yes, and Configure decides that you are using the status module, then full status information is enabled. If the status module is not included, yes has no effect. This is set to yes by default. SOCKS4 SOCKS is a firewall traversal protocol that requires client-end processing. See http://ftp.nec.com/pub/security/socks.cstc. If set to yes, be sure to add the SOCKS library location to EXTRA_LIBS; otherwise, Configure assumes L/usr/local/lib lsocks. This allows Apache to make outgoing SOCKS connections, which is not something it normally needs to do, unless it is configured as a proxy. Although the very latest version of SOCKS is SOCKS5, SOCKS4 clients work fine with it. This is set to no by default. SOCKS5 If you want to use a SOCKS5 client library, you must use this rule rather than SOCKS4. This is set to no by default. IRIXNIS If Configure decides that you are running SGI IRIX, and you are using NIS, set this to yes. This is set to no by default. IRIXN32 Make IRIX use the n32 libraries rather than the o32 ones. This is set to yes by default. PARANOID During Configure, modules can run shell commands. If PARANOID is set to yes, it will print out the code that the modules use. This is set to no by default. There is a group of rules that Configure will try to set correctly, but that can be overridden. If you have to do this, please advise the Apache Group by filling out a problem report form at http://apache.org/bugdb.cgi or by sending an email to apachebugs@ apache.org. Currently, there is only one rule in this group: WANTHSREGEX: Apache needs to interpret regular expressions using POSIX methods. A good regex package is included with Apache, but you can use your OS version by

    setting WANTHSREGEX=no or commenting out the rule. The default action depends on your OS: Rule WANTSHREGEX=default

    1.10.6 Making Apache The INSTALL file in the src subdirectory says that all we have to do now is run the configuration script. Change yourself to root before you run ./configure; otherwise the server will be configured on port 8080 and will, confusingly, refuse requests to the default port, 80. Then type: % ./Configure

    You should see something like this — bearing in mind that we're using FreeBSD and you may not be: Using config file: Configuration Creating Makefile + configured for FreeBSD platform + setting C compiler to gcc + Adding selected modules o status_module uses ConfigStart/End: o dbm_auth_module uses ConfigStart/End: o db_auth_module uses ConfigStart/End: o so_module uses ConfigStart/End: + doing sanity check on compiler and options Creating Makefile in support Creating Makefile in main Creating Makefile in ap Creating Makefile in regex Creating Makefile in os/unix Creating Makefile in modules/standard Creating Makefile in modules/proxy

    Then type: % make

    When you run make, the compiler is set in motion using the makefile built by Configure, and streams of reassuring messages appear on the screen. However, things may go wrong that you have to fix, although this situation can appear more alarming than it really is. For instance, in an earlier attempt to install Apache on an SCO machine, we received the following compile error: Cannot open include file 'sys/socket.h'

    Clearly (since sockets are very TCP/IP-intensive), this had to do with TCP/IP, which we had not installed: we did so. Not that this is a big deal, but it illustrates the sort of minor problem that arises. Not everything turns up where it ought to. If you find something that really is not working properly, it is sensible to make a bug report via the Bug Report link in the Apache Server Project main menu. But do read the notes there. Make sure that it is a real bug, not a configuration problem, and look through the known bug list first so as not to waste everyone's time. The result of make was the executable httpd. If you run it with: % ./httpd

    it complains that it: could not open document config file /usr/local/etc/httpd/conf/httpd.conf

    This is not surprising because, at the moment, httpd.conf, which we call the Config file, doesn't exist. Before we are finished, we will become very familiar with this file. It is perhaps unfortunate that it has a name so similar to the Configuration file we have been dealing with here, because it is quite different. We hope that the difference will become apparent later on. The last step is to copy httpd to a suitable storage directory that is on your path. We use /usr/local/bin or /usr/local/sbin.

    1.11 New Features in Apache v2 The procedure for configuring and compiling Apache has changed, as we will see later. High-level decisions about the way Apache works internally can now be made at compile time by including one of a series of Multi Processing Modules (MPMs). This is done by attaching a flag to configure: ./configure --with_mpm=

    Although MPMs are rather like ordinary modules, only one can be used at a time. Some of them are designed to adapt Apache to different operating systems; others offer a range of different optimizations for Unix. It will be shown, along with the other compiled-in modules, by executing httpd -l. When we went to press, these were the possible MPMs under Unix: prefork Default. Most closely imitates behavior of v1.3. Currently the default for Unix and sites that require stability, though we hope that threading will become the default later on.

    threaded Suitable for sites that require the benefits brought by threading, particularly reduced memory footprint and improved interthread communications. But see "prefork" earlier in this list. perchild Allows different hosts to have different user IDs. mpmt_pthread Similar to prefork, but each child process has a specified number of threads. It is possible to specify a minimum and maximum number of idle threads. Dexter Multiprocess, multithreaded MPM that allows you to specify a static number of processes. Perchild Similar to Dexter, but you can define a seperate user and group for each child process to increase server security. Other operating systems have their own MPMs: spmt_os2 For OS2. beos For the Be OS. WinNT Win32-specific version, taking advantage of completion ports and native function calls to give better network performance. To begin with, accept the default MPM. More advanced users should refer to http://httpd.apache.org/docs-2.0/mpm.html and http://httpd.apache.org/docs2.0/misc/perf-tuning.html. See the entry for the AcceptMutex directive in Chapter 3.

    1.11.1 Config File Changes in v2 Version 2.0 makes the following changes to the Config file: • • •

    • • • •



    CacheNegotiatedDocs now takes the argument on/off. Existing instances of CacheNegotiatedDocs should be given the argument on. ErrorDocument "" now needs quotes around the , not just at the start. The AccessConfig and ResourceConfig directives have been abolished. If you want to use these files, replace them by Include conf/srm.conf Include conf/access.conf in that order, and at the end of the Config file. The BindAddress directive has been abolished. Use Listen. The ExtendedStatus directive has been abolished. The ServerType directive has been abolished. The AgentLog, ReferLog, and ReferIgnore directives have been removed along with the mod_log_agent and mod_log_referer modules. Agent and referer logs are still available using the CustomLog directive. The AddModule and ClearModule directives have been abolished. A very useful

    point is that Apache v2 does not care about the order in which DSOs are loaded. 1.11.2 httpd Command-Line Changes Running the v2 httpd with the flag -h to show the possible command-line flags produces this: Usage: ./httpd [-D name] [-d directory] [-f file] [-C "directive"] [-c "directive"] [-v] [-V] [-h] [-l] [-L] [-t] [-T] Options: -D name : define a name for use in directives -d directory : specify an alternate initial ServerRoot -f file : specify an alternate ServerConfigFile -C "directive" : process directive before reading config files -c "directive" : process directive after reading config files -v : show version number -V : show compile settings -h : list available command line options (this page) -l : list compiled in modules -L : list available configuration directives -t -D DUMP_VHOSTS : show parsed settings (currently only vhost settings) -t : run syntax check for config files (with docroot check) -T : run syntax check for config files (without docroot check)

    In particular, the -X flag has been removed. You can get the same effect — running a single copy of Apache without any children being generated — with this:

    httpd -D ONE_PROCESS

    or: httpd -D NO_DETACH

    depending on the MPM used. The available flags for each MPM will be visible on running httpd with -?. 1.11.3 Module Changes in v2 Version 2.0 makes the following changes to module handling: • •

    mod_auth_digest is now a standard module in v2. mod_mmap_static, which was experimental in v1.3, has been replaced by mod_file_cache.



    Third-party modules written for Apache v1.3 will not work with v2 since the API has been completely rewritten. See Chapter 20 and Chapter 21.

    1.12 Making and Installing Apache v2 Under Unix Disregard all the previous instructions for Apache compilation. There is no longer a .../src directory. Even the name of the Unix source file has changed. We downloaded httpd-2_0_40.tar.gz and unpacked it in /usr/src/apache as usual. You should read the file INSTALL. The scheme for building Apache v2 is now much more in line with that for most other downloaded packages and utilities. Set up the configuration file with this: ./configure

    --prefix=/usr/local

    or wherever it is you want to keep the Apache bits — which will appear in various subdirectories. The executable, for instance, will be in .../sbin. If you are compiling under FreeBSD, as we were, --with-mpm=prefork is automatically used internally, since threads do not currently work well under this operating system. To see all the configuration possibilities: ./configure --help | more

    If you want to preserve your Apache 1.3.X executable, you might rename it to httpd.13, wherever it is, and then: make

    which takes a surprising amount of time to run. Then: make install

    The result is a nice new httpd in /usr/local/sbin.

    1.13 Apache Under Windows Apache 1.3 will work under Windows NT 4.0 and 2000. Its performance under Windows 95 and 98 is not guaranteed. If running on Windows 95, the "Winsock2" upgrade must be installed before Apache will run. "Winsock2" for Windows 95 is available at http://www.microsoft.com/windows95/downloads/contents/WUAdminTools/S_WUNetw orkingTools/W95Sockets2. Be warned that the Dialup Networking 1.2 (MS DUN) updates include a Winsock2 that is entirely insufficient, and the Winsock2 update must be reinstalled after installing Windows 95 dialup networking. Windows 98, NT (Service Pack 3 or later), and 2000 users need to take no special action; those versions provide Winsock2 as distributed. Apache v2 will run under Windows 2000 and NT, but, when we went to press, they did not work under Win 95, 98, or Me. These different versions are the same as far as Apache is concerned, except that under NT, Apache can also be run as a service. From Apache v1.3.14, emulators are available to provide NT services under the other Windows platforms. Performance under Win32 may not be as good as under Unix, but this will probably improve over coming months. Since Win32 is considerably more consistent than the sprawling family of Unices, and since it loads extra modules as DLLs at runtime rather than compiling them at make time, it is practical for the Apache Group to offer a precompiled binary executable as the standard distribution. Go to http://www.apache.org/dist, and click on the version you want, which will be in the form of a self-installing .exe file (the .exe extension is how you tell which one is the Win32 Apache). Download it into, say, c:\temp, and then run it from the Win32 Start menu's Run option. The executable will create an Apache directory, C:\Program Files\Apache, by default. Everything to do with Win32 Apache happens in an MS-DOS window, so get into a window and type: > cd c:\ > dir

    and you should see something like this: Volume in drive C has no label Volume Serial Number is 294C-14EE Directory of C:\apache . 21/05/98 .. 21/05/98 DEISL1 ISU 12,818 29/07/98 HTDOCS 29/07/98 MODULES 29/07/98 ICONS 29/07/98 LOGS 29/07/98 CONF 29/07/98

    7:27 7:27 15:12 15:12 15:12 15:12 15:12 15:12

    . .. DeIsL1.isu htdocs modules icons logs conf

    CGI-BIN 29/07/98 15:12 cgi-bin ABOUT_~1 12,921 15/07/98 13:31 ABOUT_APACHE ANNOUN~1 3,090 18/07/98 23:50 Announcement KEYS 22,763 15/07/98 13:31 KEYS LICENSE 2,907 31/03/98 13:52 LICENSE APACHE EXE 3,072 19/07/98 11:47 Apache.exe APACHE~1 DLL 247,808 19/07/98 12:11 ApacheCore.dll MAKEFI~1 TMP 21,025 15/07/98 18:03 Makefile.tmpl README 2,109 01/04/98 13:59 README README~1 TXT 2,985 30/05/98 13:57 README-NT.TXT INSTALL DLL 54,784 19/07/98 11:44 install.dll _DEISREG ISR 147 29/07/98 15:12 _DEISREG.ISR _ISREG32 DLL 40,960 23/04/97 1:16 _ISREG32.DLL 13 file(s) 427,389 bytes 8 dir(s) 520,835,072 bytes free

    Apache.exe is the executable, and ApacheCore.dll is the meat of the thing. The important subdirectories are as follows: conf Where the Config file lives. logs Where the logs are kept. htdocs Where you put the material your server is to give clients. The Apache manual will be found in a subdirectory. modules Where the runtime loadable DLLs live. After 1.3b6, leave alone your original versions of files in these subdirectories, while creating new ones with the added extension .default — which you should look at. We will see what to do with all of this in the next chapter. See the file README-NT.TXT for current problems. 1.13.1 Modules Under Windows

    Under Windows, Apache is normally downloaded as a precompiled executable. The core modules are compiled in, and others are loaded .so at runtime (if

    needed), so control of the executable's size is less urgent. The DLLs supplied (they really are called .so and not .dll ) in the .../apache/modules subdirectory are as follows: mod_auth_anon.so mod_auth_dbm.so mod_auth_digest.so mod_cern_meta.so mod_dav.so mod_dav_fs.so mod_expires.so mod_file_cache.so mod_headers.so mod_info.so mod_mime_magic.so mod_proxy.so mod_rewrite.so mod_speling.so mod_status.so mod_unique_id.so mod_usertrack.so mod_vhost_alias.so mod_proxy_connect.so mod_proxy_ftp.so mod_proxy_http.so mod_access.so mod_actions.so mod_alias.so mod_asis.so mod_auth.so mod_autoindex.so mod_cgi.so mod_dir.so mod_env.so mod_imap.so mod_include.so mod_isapi.so mod_log_config.so mod_mime.so mod_negotiation.so mod_setenvif.so mod_userdir.so

    What these are and what they do will become more apparent as we proceed. 1.13.2 Compiling Apache Under Win32 The advanced user who wants to write her own modules (see Chapter 21) will need the source code. This can be installed with the Win32 version by choosing Custom installation. It can also be downloaded from the nearest mirror Apache site (start at http://apache.org/ ) as a .tar.gz file containing the normal Unix distribution. In addition, it can be unpacked into an appropriate source directory using, for instance, 32-bit WinZip, which deals with .tar and .gz format files, as well as .zip. You will also need Microsoft's Visual C++ Version 6. Scripts are available for users of MSVC v5, since the changes are

    not backwards compatible. Once the sources and compiler are in place, open an MS-DOS window, and go to the Apache src directory. Build a debug version, and install it into \Apache by typing: > nmake /f Makefile.nt _apached > nmake /f Makefile.nt installd

    or build a release version by typing: > nmake /f Makefile.nt _apacher > nmake /f Makefile.nt installr

    This will build and install the following files in and below \Apache\: Apache.exe The executable ApacheCore.dll The main shared library Modules\ApacheModule*.dll Seven optional modules \conf Empty config directory \logs Empty log directory The directives described in the rest of the book are the same for both Unix and Win32, except that Win32 Apache can load module DLLs. They need to be activated in the Config file by the LoadModule directive. For example, if you want status information, you need the line: LoadModule status_module modules/ApacheModuleStatus.dll

    Apache for Win32 can also load Internet Server Applications (ISAPI extensions). Notice that wherever filenames are relevant in the Config file, the Win32 version uses forward slashes (/) as in Unix, rather than backslashes (\) as in MS-DOS or Windows. Since almost all the rest of the book applies to both Win32 and Unix without distinction between then, we will use forward slashes (/) in filenames wherever they occur.

    [1] Note that since a URL has no predefined meaning, this really is just a tradition, though a pretty well entrenched one in this case. [2] We generally follow the convention of calling these people the Bad Guys. This avoids debate about "hackers," which to many people simply refers to good programmers, but to some means Bad Guys. We discover from the French edition of this book that in France they are Sales Types -- dirty fellows. [3] For more on the open source movement, see Open Sources: Voices from the Open Source Revolution (O'Reilly & Associates, 1999). [4] Netcraft also surveys the uptime of various sites. At the time of writing, the longest running site was http://wwwprod1.telia.com, which had been up for 1,386 days. [5] This double name is rather annoying, but it seems that life has progressed too far for anything to be done about it. We will, rather clumsily, refer to httpd/apache and hope that the reader can pick the right one. [6] Windows NT TCP/IP Network Administration, by Craig Hunt and Robert Bruce Thompson (O'Reilly & Associates, 1998), and TCP/IP Network Administration, Third Edition, by Craig Hunt (O'Reilly & Associates, 2002). [7] In the minimal case we could have two programs running on the same computer talking to each other via TCP/IP — the network is "virtual". [8] The operating-system prompt is likely to be ">" (Win95) or "%" (Unix). When we say, for instance, "Type % ping," we mean, "When you see '%', type 'ping'." [9] Usually. We'll see later that some URLs may refer to information generated completely within Apache. [10] It is best to download it, so you get the latest version with all its bug fixes and security patches. [11] New is a dirty four letter word in computing. [12] Assuming the module has been carefully written, it does very little unless enabled in the httpd.conf files.

    Chapter 2. Configuring Apache: The First Steps • • • • • •

    2.1 What's Behind an Apache Web Site? 2.2 site.toddle 2.3 Setting Up a Unix Server 2.4 Setting Up a Win32 Server 2.5 Directives 2.6 Shared Objects

    After the installation described in Chapter 1, you now have a shiny bright apache/httpd, and you're ready for anything. For our next step, we will be creating a number of demonstration web sites.

    2.1 What's Behind an Apache Web Site? It might be a good idea to get a firm idea of what, in the Apache business, a web site is: it is a directory somewhere on the server, say, /usr/www/APACHE3/site.for_instance. It usually contains at least four subdirectories. The first three are essential: conf Contains the Config file, usually httpd.conf, which tells Apache how to respond to different kinds of requests. htdocs Contains the documents, images, data, and so forth that you want to serve up to your clients. logs Contains the log files that record what happened. You should consult .../logs/error_log whenever anything fails to work as expected. cgi-bin Contains any CGI scripts that are needed. If you don't use scripts, you don't need the directory. In our standard installation, there will also be a file go in the site directory, which contains a script for starting Apache. Nothing happens until you start Apache. In this example, you do it from the command line. If your computer experience so far has been entirely with Windows or other Graphical User Interfaces (GUIs), you may find the command line rather stark and intimidating to begin with. However, it offers a great deal of flexibility and something

    which is often impossible through a GUI: the ability to write scripts (Unix) or batch files (Win32) to automate the executables you want to run and the inputs they need, as we shall see later. 2.1.1 Running Apache from the Command Line If the conf subdirectory is not in the default location (and it usually isn't), you need a flag that tells Apache where it is.

    httpd -d /usr/www/APACHE3/site.for_instance -f...

    apache -d c:/usr/www/APACHE3/site.for_instance

    Notice that the executable names are different under Win32 and Unix. The Apache Group decided to make this change, despite the difficulties it causes for documentation, because "httpd" is not a particularly sensible name for a specific web server and, indeed, is used by other web servers. However, it was felt that the name change would cause too many backward-compatibility issues on Unix, and so the new name is implemented only on Win32. Also note that the Win32 version still uses forward slashes rather than backslashes. This is because Apache internally uses forward slashes on all platforms; therefore, you should never use a backslash in an Apache Config file, regardless of the operating system. Once you start the executable, Apache runs silently in the background, waiting for a client's request to arrive on a port to which it is listening. When a request arrives, Apache either does its thing or fouls up and makes a note in the log file. What we call "a site" here may appear to the outside world as hundred of sites, because the Config file can invoke many virtual hosts. When you are tired of the whole Web business, you kill Apache (see Section 2.3, later in this chapter), and the computer reverts to being a doorstop. Various issues arise in the course of implementing this simple scheme, and the rest of this book is an attempt to deal with some of them. As we pointed out in the preface, running a web site can involve many questions far outside the scope of this book. All we deal with here is how to make Apache do what you want. We often have to leave the questions of what you want to do and whyyou might want to do it to a higher tribunal. httpd (or apache) takes the following flags. (This is information you can evoke by running httpd -h):

    -Usage: httpd.20 [-D name] [-d directory] [-f file] [-C "directive"] [-c "directive"] [-v] [-V] [-h] [-l] [-L] [-t] [-T] Options: -D name : define a name for use in directives -d directory : specify an alternate initial ServerRoot -f file : specify an alternate ServerConfigFile -C "directive" : process directive before reading config files -c "directive" : process directive after reading config files -v : show version number -V : show compile settings -h : list available command line options (this page) -l : list compiled in modules -L : list available configuration directives -t -D DUMP_VHOSTS : show parsed settings (currently only vhost settings) -t : run syntax check for config files (with docroot check) -T : run syntax check for config files (without docroot check)

    -i : Installs Apache as an NT service. -u : Uninstalls Apache as an NT service. -s : Under NT, prevents Apache registering itself as an NT service. If you are running under Win95 this flag does not seem essential, but it would be advisable to include it anyway. This flag should be used when starting Apache from the command line, but it is easy to forget because nothing goes wrong if you leave it out. The main advantage is a faster startup (omitting it causes a 30second delay). -k shutdown|restart : Run on another console window, apache -k shutdown stops Apache gracefully, and apache -k restart stops it and restarts it gracefully.

    The Apache Group seems to put in extra flags quite often, so it is worth experimenting with apache -? (or httpd -?) to see what you get.

    2.2 site.toddle You can't do much with Apache without a web site to play with. To embody our first shaky steps, we created site.toddle as a subdirectory, /usr/www/APACHE3/site.toddle, which you will find on the code download. Since you may want to keep your

    demonstration sites somewhere else, we normally refer to this path as ... /. So we will talk about ... /site.toddle. (Windows users, please read this as ...\site.toddle). In ... /site.toddle, we created the three subdirectories that Apache expects: conf, logs, and htdocs. The README file in Apache's root directory states: The next step is to edit the configuration files for the server. In the subdirectory called conf you should find distribution versions of the three configuration files: srm.conf-dist, access.conf-dist, and httpd.conf-dist. As a legacy from the NCSA server, Apache will accept these three Config files. But we strongly advise you to put everything you need in httpd.conf and to delete the other two. It is much easier to manage the Config file if there is only one of them. From Apache v1.3.4-dev on, this has become Group doctrine. In earlier versions of Apache, it was necessary to disable these files explicitly once they were deleted, but in v1.3 it is enough that they do not exist. The README file continues with advice about editing these files, which we will disregard. In fact, we don't have to set about this job yet; we will learn more later. A simple expedient for now is to run Apache with no configuration and to let it prompt us for what it needs.

    The Configuration File Before we start running Apache with no configuration, we would like to say a few words about the philosophy of the Configuration File. Apache comes with a huge file that, as we observe elsewhere, tries to tell you every possible thing the user might need to know about Apache. If you are new to the software, a vast amount of this will be gibberish to you. However, many Apache users modify this file to adapt it to their needs. We feel that this is a VERY BAD IDEA INDEED. The file is so complicated to start with that it is very hard to see what to do. It is all too easy to make amendments and then to forget what you have done. The resulting mess then stays around, perhaps for years, being teamed with possibly incompatible Apache updates, until it finally stops working altogether. It is then very difficult to disentangle your input from the absolute original (which you probably have not kept and is now unobtainable). It is much better to start with a completely minimal file and add to it only what is absolutely necessary. The set-up process for Unix and Windows systems is quite different, so they are described in two separate sections as follows. If you're using Unix, read on; if not, skip to Section 2.4 later in this chapter.

    2.3 Setting Up a Unix Server We can point httpd at our site with the -d flag (notice the full pathname to the site.toddle directory, which will probably be different on your machine): % httpd -d /usr/www/APACHE3/site.toddle

    Since you will be typing this a lot, it's sensible to copy it into a script called go. This can go in /usr/local/bin or in each local site. We have done the latter since it is convenient to change it slightly from time to time. Create it by typing: % cat > /usr/local/bin/go test -d logs || mkdir logs httpd -f 'pwd'/conf/httpd$1.conf -d 'pwd' ^d ^d is shorthand for Ctrl-D, which ends the input and gets your prompt back. This go will

    work on every site. It creates a logs directory if one does not exist, and it explicitly specifies paths for the ServerRoot directory (-d) and the Config file (-f). The command 'pwd' finds the current directory with the Unix command pwd. The back-ticks are essential: they substitute pwd's value into the script — in other words, we will run Apache with whatever configuration is in our current directory. To accomodate sites where we have more than one Config file, we have used ...httpd$1... where you might expect to see ...httpd... The symbol $1 copies the first argument (if any) given to the command go. Thus ./go 2 will run the Config file called httpd2.conf, and ./go by itself will run httpd.conf. Remember that you have to be in the site directory. If you try to run this script from somewhere else, pwd's return will be nonsense, and Apache will complain that it 'could not open document config file ...'. Make go runnable, and run it by typing the following (note that you have to be in the directory .../site.toddle when you run go): % chmod +x go % go

    If you get the error message: go: command not found

    you need to type: % ./go

    This launches Apache in the background. Check that it's running by typing something like this (arguments to psvary from Unix to Unix):

    % ps -aux

    This Unix utility lists all the processes running, among which you should find several httpds.[1] Sooner or later, you have finished testing and want to stop Apache. To do this, you have to get the process identity (PID) of the program httpd using ps -aux: USER PID %CPU %MEM VSZ RSS TT STAT STARTED root 701 0.0 0.8 396 240 v0 R+ 2:49PM root 1 0.0 0.9 420 260 ?? Is 8:13AM /sbin/init -root 2 0.0 0.0 0 0 ?? DL 8:13AM (pagedaemon) root 3 0.0 0.0 0 0 ?? DL 8:13AM (vmdaemon) root 4 0.0 0.0 0 0 ?? DL 8:13AM (syncer) root 35 0.0 0.3 204 84 ?? Is 8:13AM adjkerntz -i root 98 0.0 1.8 820 524 ?? Is 7:13AM daemon 107 0.0 1.3 820 384 ?? Is 7:13AM /usr/sbin/portma root 139 0.0 2.1 888 604 ?? Is 7:13AM root 142 0.0 2.0 980 592 ?? Ss 7:13AM root 146 0.0 3.2 1304 936 ?? Is 7:13AM sendmail: accept root 209 0.0 1.0 500 296 con- I 7:13AM /usr/loc root 238 0.0 5.8 10996 1676 con- I 7:13AM /usr/local/libex root 239 0.0 1.1 460 316 v0 Is 7:13AM (csh) root 240 0.0 1.2 460 336 v1 Is 7:13AM (csh) root 241 0.0 1.2 460 336 v2 Is 7:13AM (csh) root 251 0.0 1.7 1052 484 v0 S 7:14AM root 576 0.0 1.8 1048 508 v1 I 2:18PM root 618 0.0 1.7 1040 500 v2 I 2:22PM root 627 0.0 2.2 992 632 v2 I+ 2:22PM demo_test root 630 0.0 2.2 992 636 v1 I+ 2:23PM home root 694 0.0 6.7 2548 1968 ?? Ss 2:47PM /u webuser 695 0.0 7.0 2548 2044 ?? I 2:47PM /u webuser 696 0.0 7.0 2548 2044 ?? I 2:47PM /u webuser 697 0.0 7.0 2548 2044 ?? I 2:47PM /u webuser 698 0.0 7.0 2548 2044 ?? I 2:47PM /u

    TIME COMMAND 0:00.00 ps -aux 0:00.02 0:00.04 0:00.00 0:02.24 0:00.00 0:00.43 syslogd 0:00.00 0:00.07 inetd 0:00.27 cron 0:00.25 0:00.02 /bin/sh 0:00.09 0:00.09 -csh 0:00.07 -csh 0:00.07 -csh 0:00.32 0:00.07 0:00.04 0:00.02

    bash bash bash mince

    0:00.06 mince 0:00.03 httpd -d 0:00.00 httpd -d 0:00.00 httpd -d 0:00.00 httpd -d 0:00.00 httpd -d

    webuser /u

    699

    0.0

    7.0

    2548 2044

    ??

    I

    2:47PM

    0:00.00 httpd -d

    To kill Apache, you need to find the PID of the main copy of httpd and then do kill — the child processes will die with it. In the previous example the process to kill is 694 — the copy of httpd that belongs to root. The command is this: % kill 694

    If ps -aux produces more printout than will fit on a screen, you can tame it with ps aux | more — hit Return to see another line or Space to see another screen. It is important to make sure that the Apache process is properly killed because you can quite easily kill a child process by mistake and then start a new copy of the server with its children — and a different Config file or Perl scripts — and so get yourself into a royal muddle. To get just the lines from ps that you want, you can use: ps awlx | grep httpd

    On Linux: killall httpd

    Alternatively and better, since it is less prone to finger trouble, Apache writes its PID in the file ... /logs/httpd.pid (by default — see the PidFile directive), and you can write yourself a little script, as follows: kill 'cat /usr/www/APACHE3/site.toddle/logs/httpd.pid'

    You may prefer to put more generalized versions of these scripts somewhere on your path. stop looks like this: pwd | read path kill 'cat $path/logs/httpd.pid'

    Or, if you don't plan to mess with many different configurations, use .../src/support/apachect1 to start and stop Apache in the default directory. You might want to copy it into /usr/local/bin to get it onto the path, or add $apacheinstalldir/bin to your path. It uses the following flags: usage: ./apachectl (start|stop|restart|fullstatus|status|graceful|configtest|help)

    start Start httpd. stop

    Stop httpd. restart Restart httpd if running by sending a SIGHUP or start if not running. fullstatus Dump a full status screen; requires lynx and mod_status enabled. status Dump a short status screen; requires lynx and mod_status enabled. graceful Do a graceful restart by sending a SIGUSR1 or start if not running. configtest Do a configuration syntax test. help This screen. When we typed ./go, nothing appeared to happen, but when we looked in the logs subdirectory, we found a file called error_log with the entry: []:'mod_unique_id: unable to get hostbyname ("myname.my.domain")

    In our case, this problem was due to the odd way we were running Apache, and it will only affect you if you are running on a host with no DNS or on an operating system that has difficulty determining the local hostname. The solution was to edit the file /etc/hosts and add the line: 10.0.0.2 myname.my.domain myname

    where 10.0.0.2 is the IP number we were using for testing. However, our troubles were not yet over. When we reran httpd, we received the following error message: []--couldn't determine user name from uid

    This means more than might at first appear. We had logged in as root. Because of the security worries of letting outsiders log in with superuser powers, Apache, having been

    started with root permissions so that it can bind to port 80, has attempted to change its user ID to -1. On many Unix systems, this ID corresponds to the user nobody : a supposedly harmless user. However, it seems that FreeBSD does not understand this notion, hence the error message.[2] In any case, it really isn't a great idea to allow Apache to run as nobody (or any other shared user), because you run the risk that an attacker exploiting the fact that various different services are sharing the same user, that is, if you are running several different services (ftp, mail, etc) on the same machine. 2.3.1 webuser and webgroup The remedy is to create a new user, called webuser, belonging to webgroup. The names are unimportant. The main thing is that this user should be in a group of its own and should not actually be used by anyone for anything else. On most Unix systems, create the group first by running adduser -group webgroup then the user by running adduser. You will be asked for passwords for both. If the system insists on a password, use some obscure non-English string like cQuycn75Vg. Ideally, you should make sure that the newly created user cannot actually log in; how this is achieved varies according to operating system: you may have to replace the encrypted password in /etc/passwd, or remove the home directory, or perhaps something else. Having told the operating system about this user, you now have to tell Apache. Edit the file httpd.conf to include the following lines: User webuser Group webgroup

    The following are the interesting directives.

    2.3.1.1 User The User directive sets the user ID under which the server will run when answering requests. User unix-userid Default: User #-1 Server config, virtual host

    In order to use this directive, the standalone server must be run initially as root. unixuserid is one of the following: username

    Refers to the given user by name #usernumber

    Refers to a user by his number

    The user should have no privileges that allow access to files not intended to be visible to the outside world; similarly, the user should not be able to execute code that is not meant for httpd requests. However, the user must have access to certain things — the files it serves, for example, or mod_proxy 's cache, when enabled (see the CacheRoot directive in Chapter 9). If you start the server as a non-root user, it will fail to change to the lesser-privileged user and will instead continue to run as that original user. If you start the server as root, then it is normal for the parent process to remain running as root.

    Don't set User (or Group) to root unless you know exactly what you are doing and what the dangers are.

    2.3.1.2 Group The Group directive sets the group under which the server will answer requests. Group unix-group Default: Group #-1 Server config, virtual host

    To use this directive, the standalone server must be run initially as root. unix-group is one of the following: groupname

    Refers to the given group by name #groupnumber

    Refers to a group by its number It is recommended that you set up a new group specifically for running the server. Some administrators use group nobody, but this is not always possible or desirable, as noted earlier. If you start the server as a non-root user, it will fail to change to the specified group and will instead continue to run as the group of the original user. Now, when you run httpd and look for the PID, you will find that one copy belongs to root, and several others belong to webuser. Kill the root copy and the others will vanish.

    2.3.2 "Out of the Box" Default Problems We found that when we built Apache "out of the box" using a GNU layout, some file defaults were not set up properly. If when you run ./go you get the rather odd error message on the screen: fopen: No such file or directory httpd: could not open error log file site.toddle/var/httpd/log/error_log

    you need to add the line: ErrorLog logs/error_log

    to ...conf/httpd.conf. If, having done that, Apache fails to start and you get a message in .../logs/error_log: .... No such file or directory.: could not open mime types log file /site.toddle/etc/httpd/mime.types

    you need to add the line: TypesConfig conf/mime.types

    to ...conf/httpd.conf. And if, having done that, Apache fails to start and you get a message in .../logs/error_log: fopen: no such file or directory httpd: could not log pid to file /site.toddle/var/httpd/run/ httpd.pid

    you need to add the line: PIDFile logs/httpd.pid

    to ...conf/httpd.conf. 2.3.3 Running Apache Under Unix When you run Apache now, you may get the following error message: httpd: cannot determine local hostname Use ServerName to set it manually.

    What Apache means is that you should put this line in the httpd.conf file: ServerName

    Finally, before you can expect any action, you need to set up some documents to serve. Apache's default document directory is ... /httpd/htdocs — which you don't want to use because you are at /usr/www/APACHE3/site.toddle — so you have to set it explicitly. Create ... /site.toddle/htdocs, and then in it create a file called 1.txt containing the immortal words "hullo world." Then add this line to httpd.conf : DocumentRoot /usr/www/APACHE3/site.toddle/htdocs

    The complete Config file, .../site.toddle/conf/httpd.conf, now looks like this: User webuser Group webgroup ServerName my586 DocumentRoot /usr/www/APACHE3/APACHE3/site.toddle/htdocs/ #fix 'Out of the Box' default problems--remove leading #s if necessary #ServerRoot /usr/www/APACHE3/APACHE3/site.toddle #ErrorLog logs/error_log #PIDFile logs/httpd.pid #TypesConfig conf/mime.types

    When you fire up httpd, you should have a working web server. To prove it, start up a browser to access your new server, and point it at http:///.[3] As we know, http means use the HTTP protocol to get documents, and / on the end means go to the DocumentRoot directory you set in httpd.conf. Lynx is the text browser that comes with FreeBSD and other flavors of Unix; if it is available, type: % lynx http:///

    You see: INDEX OF / * Parent Directory * 1.txt

    If you move to 1.txt with the down arrow, you see: hullo world

    If you don't have Lynx (or Netscape, or some other web browser) on your server, you can use telnet :[4] % telnet 80

    You should see something like:

    Trying 192.168.123.2 Connected to my586.my.domain Escape character is '^]'

    Then type: GET / HTTP/1.0

    You should see: HTTP/1.0 200 OK Sat, 24 Aug 1996 23:49:02 GMT Server: Apache/1.3 Connection: close Content-Type: text/html Index of /

    Index of

    Connection closed by foreign host.

    This is a rare opportunity to see a complete HTTP message. The first lines are headers that are normally hidden by your browser. The stuff between the < and > is HTML, written by Apache, which, if viewed through a browser, produces the formatted message shown by Lynx earlier, and by Netscape or Microsoft Internet Explorer in the next chapter. 2.3.4 Several Copies of Apache To get a display of all the processes running, run: % ps -aux

    Among a lot of Unix stuff, you will see one copy of httpd belonging to root and a number that belong to webuser. They are similar copies, waiting to deal with incoming queries. The root copy is still attached to port 80 — thus its children will be as well — but it is not listening. This is because it is root and has too many powers for this to be safe. It is necessary for this "master" copy to remain running as root because under the (slightly flawed) Unix security doctrine, only root can open ports below 1024. Its job is to monitor the scoreboard where the other copies post their status: busy or waiting. If there are too few waiting (default 5, set by the MinSpareServers directive in httpd.conf ), the root copy starts new ones; if there are too many waiting (default 10, set by the MaxSpareServers directive), it kills some off. If you note the PID (shown by ps -ax, or ps -aux for a fuller listing; also to be found in ... /logs/httpd.pid ) of the root copy and kill it with:

    % kill PID

    you will find that the other copies disappear as well. It is better, however, to use the stop script described in Section 2.3 earlier in this chapter, since it leaves less to chance and is easier to do. 2.3.5 Unix Permissions If Apache is to work properly, it's important to correctly set the file-access permissions. In Unix systems, there are three kinds of permissions: read, write , and execute. They attach to each object in three levels: user, group, and other or "rest of the world." If you have installed the demonstration sites, go to ... /site.cgi/htdocs, and type: % ls -l

    You see: -rw-rw-r-- 5 root bin 1575 Aug 15 07:45 form_summer.html

    The first - indicates that this is a regular file. It is followed by three permission fields, each of three characters. They mean, in this case: User (root) Read yes, write yes, execute no Group (bin) Read yes, write yes, execute no Other Read yes, write no, execute no When the permissions apply to a directory, the x execute permission means scan: the ability to see the contents and move down a level. The permission that interests us is other, because the copy of Apache that tries to access this file belongs to user webuser and group webgroup. These were set up to have no affinities with root and bin, so that copy can gain access only under the other permissions, and the only one set is "read." Consequently, a Bad Guy who crawls under the cloak of Apache cannot alter or delete our precious form_summer.html; he can only read it. We can now write a coherent doctrine on permissions. We have set things up so that everything in our web site, except the data vulnerable to attack, has owner root and group

    wheel. We did this partly because it is a valid approach, but also because it is the only portable one. The files on our CD-ROM with owner root and group wheel have owner and group numbers 0 that translate into similar superuser access on every machine. Of course, this only makes sense if the webmaster has root login permission, which we had. You may have to adapt the whole scheme if you do not have root login, and you should perhaps consult your site administrator. In general, on a web site everything should be owned by a user who is not webuser and a group that is not webgroup (assuming you use these terms for Apache configurations). There are four kinds of files to which we want to give webuser access: directories, data, programs, and shell scripts. webuser must have scan permissions on all the directories, starting at root down to wherever the accessible files are. If Apache is to access a directory, that directory and all in the path must have x permission set for other. You do this by entering: % chmod o+x

    To produce a directory listing (if this is required by, say, an index), the final directory must have read permission for other. You do this by typing: % chmod o+r

    It probably should not have write permission set for other: % chmod o-w

    To serve a file as data — and this includes files like .htaccess (see Chapter 3) — the file must have read permission for other: % chmod o+r file

    And, as before, deny write permission: % chmod o-w

    To run a program, the file must have execute permission set for other: % chmod o+x

    To execute a shell script, the file must have read and execute permission set for other: % chmod o+rx <script>:

    For complete safety: % chmod a=rx <script>

    If the user is to edit the script, but it is to be safe otherwise: % chmod u=rwx,og=rx <script>

    2.3.6 A Local Network Emboldened by the success of site.toddle, we can now set about a more realistic setup, without as yet venturing out onto the unknown waters of the Web. We need to get two things running: Apache under some sort of Unix and a GUI browser. There are two main ways this can be achieved: • •

    Run Apache and a browser (such as Netscape or Lynx) on the same machine. The "network" is then provided by Unix. Run Apache on a Unix box and a browser on a Windows 95/Windows NT/Mac OS machine, or vice versa, and link them with Ethernet (which is what we did for this book using FreeBSD).

    We cannot hope to give detailed explanations for all possible variants of these situations. We expect that many of our readers will already be webmasters familiar with these issues, who will want to skip the following sidebar. Those who are new to the Web may find it useful to know what we did.

    Our Experimental Micro Web First, we had to install a network card on the FreeBSD machine. As it boots up, it tests all its components and prints a list on the console, which includes the card and the name of the appropriate driver. We used a 3Com card, and the following entries appeared: ... 1 3C5x9 board(s) on ISA found at 0x300 ep0 at 0x300-0x30f irq 10 on isa ep0: aui/bnc/utp[*BNC*] address 00:a0:24:4b:48:23 irq 10 ...

    This indicated pretty clearly that the driver was ep0 and that it had installed properly. If you miss this at bootup, FreeBSD lets you hit the Scroll Lock key and page up until you see it then hit Scroll Lock again to return to normal operation. Once a card was working, we needed to configure its driver, ep0. We did this with the following commands: ifconfig ep0 192.168.123.2 ifconfig ep0 192.168.123.3 alias netmask 0xFFFFFFFF ifconfig ep0 192.168.124.1 alias

    The alias command makes ifconfig bind an additional IP address to the same device. The netmask command is needed to stop FreeBSD from printing an error message (for more on netmasks, see Craig Hunt's TCP/IP Network Administration [O'Reilly, 2002]). Note that the network numbers used here are suited to our particular network configuration. You'll need to talk to your network administrator to determine suitable numbers for your configuration. Each time we start up the FreeBSD machine to play with Apache, we have to run these commands. The usual way to do this is to add them to /etc/rc.local (or the equivalent location — it varies from machine to machine, but whatever it is called, it is run whenever the system boots). If you are following the FreeBSD installation or something like it, you also need to install IP addresses and their hostnames (if we were to be pedantic, we would call them fully qualified domain names, or FQDN) in the file /etc/hosts : 192.168.123.2 192.168.123.2 192.168.123.3 192.168.124.1

    www.butterthlies.com sales.butterthlies.com sales-not-vh.butterthlies.com www.faraway.com

    Note that www.butterthlies.com and sales.butterthlies.com both have the same IP number. This is so we can demonstrate the new NameVirtualHosts directive in the next chapter. We will need sales-not-vh.butterthlies.com in site.twocopy. Note also that this method of setting up hostnames is normally only appropriate when DNS is not available — if you use this method, you'll have to do it on every machine that needs to know the names.

    2.4 Setting Up a Win32 Server There is no point trying to run Apache unless TCP/IP is set up and running on your machine. A quick test is to ping some IP — and if you can't think of a real one, ping yourself: >ping 127.0.0.1

    If TCP/IP is working, you should see some confirming message, like this: Pinging 127.0.0.1 with 32 bytes of data: Reply from 127.0.0.1: bytes=32 timeren httpd.conf *.cnk

    Otherwise, delete it, and delete srm.conf and access.conf : >del srm.conf >del access.conf

    When you run Apache now, you see: Apache/ fopen: No such file or directory httpd: could not open document config file apache/conf/httpd.conf

    And we can hardly blame it. Open edit : >edit httpd.conf

    and insert the line: # new config file

    The # makes this a comment without effect, but it gives the editor something to save. Run Apache again. We now see something sensible: ... httpd: cannot determine local host name use ServerName to set it manually

    What Apache means is that you should put a line in the httpd.conf file: ServerName your_host_name

    Now when you run Apache, you see: >apache -s Apache/ _

    The _ here is meant to represent a blinking cursor, showing that Apache is happily running. You will notice that throughout this book, the Config files always have the following lines:

    ... User webuser Group webgroup ...

    These are necessary for Unix security and, happily, are ignored by the Win32 version of Apache, so we have avoided tedious explanations by leaving them in throughout. Win32 users can include them or not as they please. You can now get out of the MS-DOS window and go back to the desktop, fire up your favorite browser, and access http://yourmachinename/. You should see a cheerful screen entitled "It Worked!," which is actually \apache\htdocs\index.html. When you have had enough, hit ^C in the Apache window. Alternatively, under Windows 95 and from Apache Version 1.3.3 on, you can open another DOS session window and type: apache -k shutdown

    This does a graceful shutdown, in which Apache allows any transactions currently in process to continue to completion before it exits. In addition, using: apache -k restart

    performs a graceful restart, in which Apache rereads the configuration files while allowing transactions in progress to complete. 2.4.2 Apache as a Service To start Apache as a service, you first need to install it as a service. Multiple Apache services can be installed, each with a different name and configuration. To install the default Apache service named "Apache," run the "Install Apache as Service (NT only)" option from the Start menu. Once this is done, you can start the "Apache" service by opening the Services window (in the Control Panel), selecting Apache, then clicking on Start. Apache will now be running in the background. You can later stop Apache by clicking on Stop. As an alternative to using the Services window, you can start and stop the "Apache" service from the control line with the following: NET START APACHE NET STOP APACHE

    See http://httpd.apache.org/docs-2.0/platform/windows.html#signalsrv for more information on installing and controlling Apache services. Apache, unlike many other Windows NT/2000 services, logs any errors to its own error.log file in the logs folder within the Apache server root folder. You will not find Apache error details in the Windows NT Event Log.

    After starting Apache running (either in a console window or as a service), it will be listening to port 80 (unless you changed the Listen directive in the configuration files). To connect to the server and access the default page, launch a browser and enter this URL: http://127.0.0.1 Once this is done, you can open the Services window in the Control Panel, select Apache, and click on Start. Apache then runs in the background until you click on Stop. Alternatively, you can open a console window and type: >net start apache

    To stop the Apache service, type: >net stop apache

    If you're running Apache as a service, you definitely will want to consider security issues. See Chapter 11 for more details.

    2.5 Directives Here we go over the directives again, giving formal definitions for reference. 2.5.1 ServerName ServerName gives the hostname of the server to use when creating redirection URLs, that is, if you use a directive or access a directory without a trailing /. ServerName hostname Server config, virtual host

    It will also be useful when we consider Virtual Hosting (see Chapter 4). 2.5.2 DocumentRoot This directive sets the directory from which Apache will serve files. DocumentRoot directory Default: /usr/local/apache/htdocs Server config, virtual host

    Unless matched by a directive like Alias, the server appends the path from the requested URL to the document root to make the path to the document. For example: DocumentRoot /usr/web

    An access to http://www.www.my.host.com/index.html now refers to /usr/web/index.html.

    There appears to be a bug in the relevant Module, mod_dir, that causes problems when the directory specified in DocumentRoot has a trailing slash (e.g., DocumentRoot /usr/web/), so please avoid that. It is worth bearing in mind that the deeper DocumentRoot goes, the longer it takes Apache to check out the directories. For the sake of performance, adopt the British Army's universal motto: KISS (Keep It Simple, Stupid)! 2.5.3 ServerRoot ServerRoot specifies where the subdirectories conf and logs can be found. ServerRoot directory Default directory: /usr/local/etc/httpd Server config

    If you start Apache with the -f (file) option, you need to include the ServerRoot directive. On the other hand, if you use the -d (directory) option, as we do, this directive is not needed. 2.5.4 ErrorLog The ErrorLog directive sets the name of the file to which the server will log any errors it encounters. ErrorLog filename|syslog[:facility] Default: ErrorLog logs/error_log Server config, virtual host

    If the filename does not begin with a slash (/), it is assumed to be relative to the server root. If the filename begins with a pipe (|), it is assumed to be a command to spawn a file to handle the error log. Apache 1.3 and above: using syslog instead of a filename enables logging via syslogd(8) if the system supports it. The default is to use syslog facility local7, but you can override this by using the syslog:facility syntax, where facility can be one of the names usually documented in syslog(1). Your security could be compromised if the directory where log files are stored is writable by anyone other than the user who starts the server. 2.5.5 PidFile A useful piece of information about an executing process is its PID number. This is available under both Unix and Win32 in the PidFile, and this directive allows you to change its location.

    PidFile file Default file: logs/httpd.pid Server config

    By default, it is in ... /logs/httpd.pid. However, only Unix allows you to do anything easily with it; namely, to kill the process. 2.5.6 TypesConfig This directive sets the path and filename to find the mime.types file if it isn't in the default position. TypesConfig filename Default: conf/mime.types Server config

    2.5.7 Inclusions into the Config file You may want to include material from elsewhere into the Config file. You either just paste it in, or you use the Include directive: Include filename Server config, virtual host, directory, .htaccess

    Because it makes it hard to see what the Config file is actually doing, you probably will not want to use this directive until the file gets really complicated — (see, for instance, Chapter 17, where the Config file also has to control the Tomcat Java module).

    2.6 Shared Objects If you are using the DSO mechanism, you need quite a lot of stuff in your Config file. 2.6.1 Shared Objects Under Unix In Apache v1.3 the order of these directives is important, so it is probably easiest to generate the list by doing an "out of the box" build using the flag --enableshared=max. You will find /usr/etc/httpd/httpd.conf.default: copy the list from it into your own Config file, and edit it as you need. LoadModule LoadModule LoadModule LoadModule LoadModule LoadModule LoadModule LoadModule LoadModule LoadModule

    env_module config_log_module mime_module negotiation_module status_module includes_module autoindex_module dir_module cgi_module asis_module

    libexec/mod_env.so libexec/mod_log_config.so libexec/mod_mime.so libexec/mod_negotiation.so libexec/mod_status.so libexec/mod_include.so libexec/mod_autoindex.so libexec/mod_dir.so libexec/mod_cgi.so libexec/mod_asis.so

    LoadModule LoadModule LoadModule LoadModule LoadModule LoadModule LoadModule

    imap_module action_module userdir_module alias_module access_module auth_module setenvif_module

    libexec/mod_imap.so libexec/mod_actions.so libexec/mod_userdir.so libexec/mod_alias.so libexec/mod_access.so libexec/mod_auth.so libexec/mod_setenvif.so

    # Reconstruction of the complete module list from all available modules # (static and shared ones) to achieve correct module execution order. # [WHENEVER YOU CHANGE THE LOADMODULE SECTION ABOVE UPDATE THIS, TOO] ClearModuleList AddModule mod_env.c AddModule mod_log_config.c AddModule mod_mime.c AddModule mod_negotiation.c AddModule mod_status.c AddModule mod_include.c AddModule mod_autoindex.c AddModule mod_dir.c AddModule mod_cgi.c AddModule mod_asis.c AddModule mod_imap.c AddModule mod_actions.c AddModule mod_userdir.c AddModule mod_alias.c AddModule mod_access.c AddModule mod_auth.c AddModule mod_so.c AddModule mod_setenvif.c

    Notice that the list comes in three parts: LoadModules, then ClearModuleList, followed by AddModules to activate the ones you want. As we said earlier, it is all rather cumbersome and easy to get wrong. You might want put the list in a separate file and then Include it (see later in this section). If you have left out a shared module that is required by a directive in your Config file, you will get a clear indication in an error message as Apache loads. For instance, if you use the directive ErrorLog without doing what is necessary for the module mod_log_config, this will trigger a runtime error message.

    2.6.1.1 LoadModule The LoadModule directive links in the object file or library filename and adds the module structure named module to the list of active modules. LoadModule module filename server config mod_so

    module is the name of the external variable of type module in the file and is listed as the Module Identifier in the module documentation. For example (Unix, and for Windows as of Apache 1.3.15): LoadModule status_module modules/mod_status.so

    For example (Windows prior to Apache 1.3.15, and some third party modules): LoadModule foo_module modules/ApacheModuleFoo.dll

    2.6.2 Shared Modules Under Win32 Note that all modules bundled with the Apache Win32 binary distribution were renamed as of Apache Version 1.3.15. Win32 Apache modules are often distributed with the old style names, or even a name such as libfoo.dll. Whatever the name of the module, the LoadModule directive requires the exact filename.

    2.6.2.1 LoadFile The LoadFile directive links in the named object files or libraries when the server is started or restarted; this is used to load additional code that may be required for some modules to work. LoadFile filename [filename] ... server config Mod_so filename is either an absolute path or relative to ServerRoot.

    2.6.2.2 ClearModuleList This directive clears the list of active modules. ClearModuleList server config Abolished in Apache v2

    It is assumed that the list will then be repopulated using the AddModule directive.

    2.6.2.3 AddModule The server can have modules compiled in that are not actively in use. This directive can be used to enable the use of those modules. AddModule module [module] ... server config

    Mod_so

    The server comes with a preloaded list of active modules; this list can be cleared with the ClearModuleList directive. [1] On System V-based Unix systems (as opposed to Berkeley-based), the command ps -ef should have a similar effect. [2] In fact, this problem was fixed for FreeBSD long ago, but you may still encounter it on other operating systems. [3] Note that if you are on the same machine, you can use http://127.0.0.1/ or http://localhost/, but this can be confusing because virtual host resolution may cause the server to behave differently than if you had used the interface's "real" name. [4] telnet is not really suitable as a web browser, though it can be a very useful debugging tool.

    Chapter 3. Toward a Real Web Site • • • • • • • • •

    3.1 More and Better Web Sites: site.simple 3.2 Butterthlies, Inc., Gets Going 3.3 Block Directives 3.4 Other Directives 3.5 HTTP Response Headers 3.6 Restarts 3.7 .htaccess 3.8 CERN Metafiles 3.9 Expirations

    Now that we have the server running with a basic configuration, we can start to explore more sophisticated possibilities in greater detail. Fortunately, the differences between the Windows and Unix versions of Apache fade as we get past the initial setup and configuration, so it's easier to focus on the details of making a web site work.

    3.1 More and Better Web Sites: site.simple We are now in a position to start creating real(ish) web sites, which can be found in the sample code at the web site for the book, http://oreilly.com/catalog/apache3/. For the sake of a little extra realism, we will base the site loosely round a simple web business, Butterthlies, Inc., that creates and sells picture postcards. We need to give it some web addresses, but since we don't yet want to venture into the outside world, they should be variants on your own network ID. This way, all the machines in the network realize that they don't have to go out on the Web to make contact. For instance, we edited the \windows\hosts file on the Windows 95 machine running the browser and the /etc/hosts file on the Unix machine running the server to read as follows: 127.0.0.1 localhost 192.168.123.2 www.butterthlies.com 192.168.123.2 sales.butterthlies.com 192.168.123.3 sales-IP.butterthlies.com 192.168.124.1 www.faraway.com

    localhost is obligatory, so we left it in, but you should not make any server requests to it since the results are likely to be confusing. You probably need to consult your network manager to make similar arrangements. site.simple is site.toddle with a few small changes. The script go will work anywhere. To get started, do the following, depending on your operating environment:

    test -d logs || mkdir logs httpd -d 'pwd' -f 'pwd'/conf/httpd.conf

    Open an MS-DOS window and from the command line, type: c>cd \program files\apache group\apache c>apache -k start c>Apache/1.3.26 (Win32) running ...

    To stop Apache, open a second MS-DOS window: c>apache -k stop c>cd logs c>edit error.log

    This will be true of each site in the demonstration setup, so we will not mention it again. From here on, there will be minimal differences between the server setups necessary for Win32 and those for Unix. Unless one or the other is specifically mentioned, you should assume that the text refers to both. It would be nice to have a log of what goes on. In the first edition of this book, we found that a file access_log was created automatically in ...site.simple/logs. In a rather bizarre move since then, the Apache Group has broken backward compatibility and now requires you to mention the log file explicitly in the Config file using the TransferLog directive. The ... /conf/httpd.conf file now contains the following: User webuser Group webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/APACHE3/site.simple/htdocs TransferLog logs/access_log

    In ... /htdocs we have, as before, 1.txt : hullo world from site.simple again!

    Type ./go on the server. Become the client, and retrieve http://www.butterthlies.com. You should see: Index of / . Parent Directory

    . 1.txt

    Click on 1.txt for an inspirational message as before. This all seems satisfactory, but there is a hidden mystery. We get the same result if we connect to http://sales.butterthlies.com. Why is this? Why, since we have not mentioned either of these URLs or their IP addresses in the configuration file on site.simple, do we get any response at all? The answer is that when we configured the machine on which the server runs, we told the network interface to respond to anyof these IP addresses: 192.168.123.2 192.168.123.3

    By default Apache listens to all IP addresses belonging to the machine and responds in the same way to all of them. If there are virtual hosts configured (which there aren't, in this case), Apache runs through them, looking for an IP name that corresponds to the incoming connection. Apache uses that configuration if it is found, or the main configuration if it is not. Later in this chapter, we look at more definite control with the directives BindAddress, Listen, and . It has to be said that working like this (that is, switching rapidly between different configurations) seemed to get Netscape or Internet Explorer into a rare muddle. To be sure that the server was functioning properly while using Netscape as a browser, it was usually necessary to reload the file under examination by holding down the Control key while clicking on Reload. In extreme cases, it was necessary to disable caching by going to Edit Preferences Advanced Cache. Set memory and disk cache to 0, and set cache comparison to Every Time. In Internet Explorer, set Cache Compares to Every Time. If you don't, the browser tends to display a jumble of several different responses from the server. This occurs because we are doing what no user or administrator would normally do, namely, flipping around between different versions of the same site with different versions of the same file. Whenever we flip from a newer version to an older version, Netscape is led to believe that its cached version is up-to-date. Back on the server, stop Apache with ^C, and look at the log files. In ... /logs/access_log, you should see something like this: 192.168.123.1--- [] "GET / HTTP/1.1" 200 177 200 is the response code (meaning "OK, cool, fine"), and 177 is the number of bytes

    transferred. In ... /logs/error_log, there should be nothing because nothing went wrong. However, it is a good habit to look there from time to time, though you have to make sure that the date and time logged correspond to the problem you are investigating. It is easy to fool yourself with some long-gone drama.

    Life being what it is, things can go wrong, and the client can ask for something the server can't provide. It makes sense to allow for this with the ErrorDocument command. 3.1.1 ErrorDocument The ErrorDocument directive lets you specify what happens when a client asks for a nonexistent document. ErrorDocument error-code "document(" in Apache v2) Server config, virtual host, directory, .htaccess

    In the event of a problem or error, Apache can be configured to do one of four things: 1. 2. 3. 4.

    Output a simple hardcoded error message. Output a customized message. Redirect to a local URL to handle the problem/error. Redirect to an external URL to handle the problem/error.

    The first option is the default, whereas options 2 through 4 are configured using the ErrorDocument directive, which is followed by the HTTP response code and a message or URL. Messages in this context begin with a double quotation mark ("), which does not form part of the message itself. Apache will sometimes offer additional information regarding the problem or error. URLs can be local URLs beginning with a slash (/ ) or full URLs that the client can resolve. For example: ErrorDocument ErrorDocument ErrorDocument ErrorDocument

    500 404 401 403

    http://foo.example.com/cgi-bin/tester /cgi-bin/bad_urls.pl /subscription_info.html "Sorry can't allow you access today"

    Note that when you specify an ErrorDocument that points to a remote URL (i.e., anything with a method such as "http" in front of it), Apache will send a redirect to the client to tell it where to find the document, even if the document ends up being on the same server. This has several implications, the most important being that if you use an ErrorDocument 401 directive, it must refer to a local document. This results from the nature of the HTTP basic authentication scheme.

    3.2 Butterthlies, Inc., Gets Going The httpd.conf file (to be found in ... /site.first) contains the following: User webuser Group webgroup ServerName my586

    DocumentRoot /usr/www/APACHE3/APACHE3/site.first/htdocs TransferLog logs/access_log #Listen is needed for Apache2 Listen 80

    In the first edition of this book, we mentioned the directives AccessConfig and ResourceConfig here. If set with /dev/null (NUL under Win32), they disable the srm.conf and access.conf files, and they were formerly required if those files were absent. However, new versions of Apache ignore these files if they are not present, so the directives are no longer required. However, if they are present, the files mentioned will be included in the Config file. In Apache Version 1.3.14 and later, they can be given a directory rather than a filename, and all files in that directory and its subdirectories will be parsed as configuration files. In Apache v2 the directives AccessConfig and ResourceConfig are abolished and will cause an error. However, you can write: Include conf/srm.conf Include conf/access.conf in that order, and at the end of the Config file. Apache v2 also, rather oddly, insists on a Listen directive. If you don't include it in your Config file, you will get the error message: ...no listening sockets available, shutting down.

    If you are using Win32, note that the User and Group directives are not supported, so these can be removed. Apache's role in life is delivering documents, and so far we have not done much of that. We therefore begin in a modest way with a little HTML document that lists our cards, gives their prices, and tells interested parties how to get them. We can look at the Netscape Help item "Creating Net Sites" and download "A Beginners Guide to HTML" as well as the next web person can, then rough out a little brochure in no time flat:[1] Butterthlies Catalog

    Welcome to Butterthlies Inc

    Summer Catalog

    All our cards are available in packs of 20 at $2 a pack. There is a 10% discount if you order more than 100.




    Style 2315

    Be BOLD on the bench


    Style 2316

    Get SCRAMBLED in the henhouse


    Style 2317

    Get HIGH in the treehouse


    Style 2318

    Get DIRTY in the bath


    Postcards designed by [email protected]



    Butterthlies Inc, Hopeful City, Nevada 99999

    We want this brochure to appear in ... /site.first/htdocs, but we will in fact be using it in many other sites as we progress, so let's keep it in a central location. We will set up links to it using the Unixln command, which creates new directory entries having the same modes as the original file without wasting disk space. Moreover, if you change the "real" copy of the file, all the linked copies change too. We have a directory /usr/www/APACHE3/main_docs, and this document lives in it as catalog_summer.html. This file refers to some rather pretty pictures that are held in four .jpg files. They live in ... /main_docs and are linked to the working htdocs directories: % ln /usr/www/APACHE3/main_docs/catalog_summer.html . % ln /usr/www/APACHE3/main_docs/bench.jpg .

    The remainder of the links follow the same format (assuming we are in .../site.first/htdocs).

    If you type ls, you should see the files there as large as life.

    Under Win32 there is unfortunately no equivalent to a link, so you will just have to have multiple copies. 3.2.1 Default Index Type ./go, and shift to the client machine. Log onto http://www.butterthlies.com /: INDEX of / *Parent Directory *bath.jpg *bench.jpg *catalog_summer.html *hen.jpg *tree.jpg

    3.2.2 index.html What we see in the previous listing is the index that Apache concocts in the absence of anything better. We can do better by creating our own index page in the special file ... /htdocs/index.html : Index to Butterthlies Catalogs

    Butterthlies Inc, Hopeful City, Nevada 99999

    We needed a second file (catalog_autumn.html) to make our site look convincing. So we did what the management of this outfit would do themselves: we copied catalog_summer.html to catalog_autum.html and edited it, simply changing the word Summer to Autumn and including the link in ... /htdocs. Whenever a client opens a URL that points to a directory containing the index.html file, Apache automatically returns it to the client (by default, this can be configured with the DirectoryIndex directive). Now, when we visit, we see:

    INDEX TO BUTTERTHLIES CATALOGS *Summer Catalog *Autumn Catalog -------------------------------------------Butterthlies Inc, Hopeful City, Nevada 99999

    We won't forget to tell the web search engines about our site. Soon the clients will be logging in (we can see who they are by checking ... /logs/access_log). They will read this compelling sales material, and the phone will immediately start ringing with orders. Our fortune is on its way to being made.

    3.3 Block Directives Apache has a number of block directives that limit the application of other directives within them to operations on particular virtual hosts, directories, or files. These are extremely important to the operation of a real web site because within these blocks — particularly — the webmaster can, in effect, set up a large number of individual servers run by a single invocation of Apache. This will make more sense when you get to the Section 4.1. The syntax of the block directives is detailed next. ... Server config

    The directive within a Config file acts like a tag in HTML: it introduces a block of text containing directives referring to one host; when we're finished with it, we stop with . For example: .... ServerAdmin [email protected] DocumentRoot /usr/www/APACHE3/APACHE3/site.virtual/htdocs/customers ServerName www.butterthlies.com ErrorLog /usr/www/APACHE3/APACHE3/site.virtual/namebased/logs/error_log TransferLog /usr/www/APACHE3/APACHE3/site.virtual/namebased/logs/access_log ... also specifies which IP address we're hosting and, optionally, the port. If port is not specified, the default port is used, which is either the standard HTTP port, 80,

    or the port specified in a Port directive (not in Apache v2). host can also be _default_ , in which case it matches anything no other section matches. In a real system, this address would be the hostname of our server. There are three more similar directives that also limit the application of other directives: • • •



    This list shows the analogues in ascending order of authority, so that is overruled by , and by . Files can be nested within blocks. Execution proceeds in groups, in the following order: 1. (without regular expressions) and .htaccess are executed simultaneously.[2] .htaccess overrides . 2. and (with regular expressions). 3. and are executed simultaneously. 4. and are executed simultaneously. Group 1 is processed in the order of shortest directory to longest.[3] The other groups are processed in the order in which they appear in the Config file. Sections inside blocks are applied after corresponding sections outside. and

    The directive allows you to apply other directives to a directory or a group of directories. It is important to understand that dir refers to absolute directories, so that operates on the whole filesystem, not the DocumentRoot and below. dir can include wildcards — that is, ? to match a single character, * to match a sequence, and [ ] to enclose a range of characters. For instance, [a-d] means "any one of a, b, c, d." If the character ~ appears in front of dir, the name can consist of complete regular expressions.[4] has the same effect as . That is, it expects a regular

    expression. So, for instance, either:

    or:



    means "any directory name in the root directory that starts with a, b, c, or d." and ...

    The directive limits the application of the directives in the block to that file, which should be a pathname relative to the DocumentRoot. It can include wildcards or full regular expressions preceded by ~. can be followed by a regular expression without ~. So, for instance, you could match common graphics extensions with:

    Or, if you wanted our catalogs treated in some special way:

    Unlike and , can be used in a .htaccess file. and ...

    The directive limits the application of the directives within the block to those URLs specified, which can include wildcards and regular expressions preceded by ~. In line with regular-expression processing in Apache v1.3, * and ? no longer match to /. is followed by a regular expression without the ~. Most things that are allowed in a block are allowed in , but although AllowOverride will not cause an error in a block, it makes no sense there. ...

    The directive enables a block, provided the flag -Dnameis used when Apache starts up. This makes it possible to have multiple configurations within a single Config file. This is mostly useful for testing and distribution purposes rather than for dedicated sites. ...

    The directive enables a block, provided that the named module was compiled or dynamically loaded into Apache. If the ! prefix is used, the block is enabled if the named module was not compiled or loaded. blocks can be nested. The module-file-name should be the name of the module's source file, e.g. mod_log_config.c.

    3.4 Other Directives Other housekeeping directives are listed here. ServerName ServerName fully-qualified-domain-name Server config, virtual host

    The ServerName directive sets the hostname of the server; this is used when creating redirection URLs. If it is not specified, then the server attempts to deduce it from its own IP address; however, this may not work reliably or may not return the preferred hostname. For example: ServerName www.example.com

    could be used if the canonical (main) name of the actual machine were simple.example.com, but you would like visitors to see www.example.com. UseCanonicalName UseCanonicalName on|off Default: on Server config, virtual host, directory, .htaccess

    This directive controls how Apache forms URLs that refer to itself, for example, when redirecting a request for http://www.domain.com/some/directory to the correct http://www.domain.com/some/directory/ (note the trailing / ). If UseCanonical-Name is on (the default), then the hostname and port used in the redirect will be those set by ServerName and Port (not Apache v2). If it is off, then the name and port used will be the ones in the original request. One instance where this directive may be useful is when users are in the same domain as the web server (for example, on an intranet). In this case, they may use the "short" name for the server (www, for example), instead of the fully qualified domain name (www.domain.com, say). If a user types a URL such as http://www/APACHE3/somedir (without the trailing slash), then, with UseCanonicalName switched on, the user will be directed to http://www.domain.com/somedir/. With UseCanonicalName switched off, she will be redirected to http://www/APACHE3/somedir/. An obvious case in which this is useful is when user authentication is switched on: reusing the server name that the user typed means she won't be asked to reauthenticate when the server name appears to the browser to have changed. More obscure cases relate to name/address translation caused by some firewalling techniques. ServerAdmin ServerAdmin email_address Server config, virtual host ServerAdmin gives Apache an email_address for automatic pages generated when some errors occur. It might be sensible to make this a special address such as [email protected].

    ServerSignature ServerSignature [off|on|email] Default: off directory, .htaccess

    This directive allows you to let the client know which server in a chain of proxies actually did the business. ServerSignature on generates a footer to server-generated documents that includes the server version number and the ServerName of the virtual host. ServerSignature email additionally creates a mailto: reference to the relevant ServerAdmin address. ServerTokens

    ServerTokens [productonly|min(imal)|OS|full] Default: full Server config

    This directive controls the information about itself that the server returns. The securityminded webmaster may want to limit the information available to the bad guys: productonly (from v 1.3.14) Server returns name only: Apache min(imal) Server returns name and version number, for example, Apache v1.3 OS Server sends operating system as well, for example, Apache v1.3 (Unix) full Server sends the previously listed information plus information about compiled modules, for example, Apache v1.3 (Unix) PHP/3.0 MyMod/1.2 ServerAlias ServerAlias name1 name2 name3 ... Virtual host ServerAlias gives a list of alternate names matching the current virtual host. If a request uses HTTP 1.1, it arrives with Host: server in the header and can match ServerName, ServerAlias, or the VirtualHost name.

    ServerPath ServerPath path Virtual host

    In HTTP 1.1 you can map several hostnames to the same IP address, and the browser distinguishes between them by sending the Host header. But it was thought there would be a transition period during which some browsers still used HTTP 1.0 and didn't send the Host header.[5] So ServerPath lets the same site be accessed through a path instead.

    It has to be said that this directive often doesn't work very well because it requires a great deal of discipline in writing consistent internal HTML links, which must all be written as relative links to make them work with two different URLs. However, if you have to cope with HTTP 1.0 browsers that don't send Host headers when accessing virtual sites, you don't have much choice. For instance, suppose you have site1.example.com and site2.example.com mapped to the same IP address (let's say 192.168.123.2), and you set up the httpd.conf file like this: ServerName site1.example.com DocumentRoot /usr/www/APACHE3/site1 ServerPath /site1 ServerName site2.example.com DocumentRoot /usr/www/APACHE3/site2 ServerPath /site2

    Then an HTTP 1.1 browser can access the two sites with URLs http://site1.example.com / and http://site2.example.com /. Recall that HTTP 1.0 can only distinguish between sites with different IP addresses, so both of those URLs look the same to an HTTP 1.0 browser. However, with the previously listed setup, such browsers can access http://site1.example.com /site1 and http://site1.example.com /site2 to see the two different sites (yes, we did mean site1.example.com in the latter; it could have been site2.example.com in either, because they are the same as far as an HTTP 1.0 browser is concerned). ScoreBoardFile ScoreBoardFile filename Default: ScoreBoardFile logs/apache_status Server config

    The ScoreBoardFile directive is required on some architectures to place a file that the server will use to communicate between its children and the parent. The easiest way to find out if your architecture requires a scoreboard file is to run Apache and see if it creates the file named by the directive. If your architecture requires it, then you must ensure that this file is not used at the same time by more than one invocation of Apache. If you have to use a ScoreBoardFile, then you may see improved speed by placing it on a RAM disk. But be aware that placing important files on a RAM disk involves a certain amount of risk.

    Apache 1.2 and above: Linux 1.x and SVR4 users might be able to add -DHAVE_SHMGET DUSE_SHMGET_SCOREBOARD to the EXTRA_CFLAGS in your Config file. This might work with some 1.x installations, but not with all of them. (Prior to 1.3b4, HAVE_SHMGET would have sufficed.) CoreDumpDirectory CoreDumpDirectory directory Default: Server config

    When a program crashes under Unix, a snapshot of the core code is dumped to a file. You can then examine it with a debugger to see what went wrong. This directive specifies a directory where Apache tries to put the mess. The default is the ServerRoot directory, but this is normally not writable by Apache's user. This directive is useful only in Unix, since Win32 does not dump a core after a crash. SendBufferSize SendBufferSize Default: set by OS Server config SendBufferSize increases the send buffer in TCP beyond the default set by the

    operating system. This directive improves performance under certain circumstances, but we suggest you don't use it unless you thoroughly understand network technicalities. LockFile LockFile filename Default: logs/accept.lock Server config

    When Apache is compiled with USE_FCNTL_SERIALIZED_ACCEPT or USE_FLOCK_SERIALIZED_ACCEPT, it will not start until it writes a lock file to the local disk. If the logs directory is NFS mounted, this will not be possible. It is not a good idea to put this file in a directory that is writable by everyone, since a false file will prevent Apache from starting. This mechanism is necessary because some operating systems don't like multiple processes sitting in accept( ) on a single socket (which is where Apache sits while waiting). Therefore, these calls need to be serialized. One way is to use a lock file, but you can't use one on an NFS-mounted directory.

    AcceptMutex AcceptMutex default|method AcceptMutex default Server config

    The AcceptMutex directives sets the method that Apache uses to serialize multiple children accepting requests on network sockets. Prior to Apache 2.0, the method was selectable only at compile time. The optimal method to use is highly architecture- and platform-dependent. For further details, see http://httpd.apache.org/docs-2.0/misc/perftuning.html. If AcceptMutex is not used or this directive is set to default, then the compile-timeselected default will be used. Other possible methods are listed later. Note that not all methods are available on all platforms. If a method is specified that is not available, a message will be written to the error log listing the available methods. flock Uses the flock(2) system call to lock the file defined by the LockFile directive fcntl Uses the fnctl(2) system call to lock the file defined by the LockFile directive sysvsem Uses SySV-style semaphores to implement the mutex pthread Uses POSIX mutexes as implemented by the POSIX Threads (PThreads) specification KeepAlive KeepAlive number Default number: 5 Server config

    Chances are that if a user logs on to your site, he will reaccess it fairly soon. To avoid unnecessary delay, this command keeps the connection open, but only for number requests, so that one user does not hog the server. You might want to increase this from 5 if you have a deep directory structure. Netscape Navigator 2 has a bug that fouls up

    keepalives. Apache v1.2 and higher can detect the use of this browser by looking for Mozilla/2 in the headers returned by Netscape. If the BrowserMatch directive is set (see Chapter 13), the problem disappears. KeepAliveTimeout KeepAliveTimeout seconds Default seconds: 15 Server config

    Similarly, to avoid waiting too long for the next request, this directive sets the number of seconds to wait. Once the request has been received, the TimeOut directive applies. TimeOut TimeOut seconds Default seconds: 1200 Server config TimeOut sets the maximum time that the server will wait for the receipt of a request and

    then its completion block by block. This directive used to have an unfortunate effect: downloads of large files over slow connections would time out. Therefore, the directive has been modified to apply to blocks of data sent rather than to the whole transfer. HostNameLookups HostNameLookups [on|off|double] Default: off Server config, virtual host

    If this directive is on,[6] then every incoming connection is reverse DNS resolved, which means that, starting with the IP number, Apache finds the hostname of the client by consulting the DNS system on the Internet. The hostname is then used in the logs. If switched off, the IP address is used instead. It can take a significant amount of time to reverse-resolve an IP address, so for performance reasons it is often best to leave this off, particularly on busy servers. Note that the support program logresolve is supplied with Apache to reverse-resolve the logs at a later date.[7] The new double keyword supports the double-reverse DNS test. An IP address passes this test if the forward map of the reverse map includes the original IP. Regardless of the setting here, mod_access access lists using DNS names require all the names to pass the double-reverse test.

    Include Include filename Server config filename points to a file that will be included in the Config file in place of this directive. From Apache 1.3.14, if filename points to a directory, all the files in that directory and

    its subdirectories will be included. Limit ...

    The directive defines a block according to the HTTP method of the incoming request. For instance: ... directives ...

    This directive limits the application of the directives that follow to requests that use the GET and POST methods. Access controls are normally effective for all access methods, and this is the usual desired behavior. In the general case, access-control directives should not be placed within a section. The purpose of the directive is to restrict the effect of the access controls to the nominated HTTP methods. For all other methods, the access restrictions that are enclosed in the bracket will have no effect. The following example applies the access control only to the methods POST, PUT, and DELETE, leaving all other methods unprotected: Require valid-user

    The method names listed can be one or more of the following: GET, POST, PUT, DELETE, CONNECT, OPTIONS, TRACE, PATCH, PROPFIND, PROPPATCH, MKCOL, COPY, MOVE, LOCK, and UNLOCK. The method name is case sensitive. If GET is used, it will also restrict HEAD requests. Generally, Limit should not be used unless you really need it (for example, if you've implemented PUT and want to limit PUTs but not GETs), and we have not used it in

    site.authent. Unfortunately, Apache's online documentation encouraged its inappropriate use, so it is often found where it shouldn't be. ... and are used to enclose a group of access-control

    directives that will then apply to any HTTP access method not listed in the arguments; i.e., it is the opposite of a section and can be used to control both standard and nonstandard/unrecognized methods. See the documentation for for more details. LimitRequestBody Directive LimitRequestBody bytes Default: LimitRequestBody 0 Server config, virtual host, directory, .htaccess

    This directive specifies the number of bytes from 0 (meaning unlimited) to 2147483647 (2GB) that are allowed in a request body. The default value is defined by the compiletime constant DEFAULT_LIMIT_REQUEST_BODY (0 as distributed). The LimitRequestBody directive allows the user to set a limit on the allowed size of an HTTP request message body within the context in which the directive is given (server, per-directory, per-file, or per-location). If the client request exceeds that limit, the server will return an error response instead of servicing the request. The size of a normal request message body will vary greatly depending on the nature of the resource and the methods allowed on that resource. CGI scripts typically use the message body for passing form information to the server. Implementations of the PUT method will require a value at least as large as any representation that the server wishes to accept for that resource. This directive gives the server administrator greater control over abnormal client-request behavior, which may be useful for avoiding some forms of denial-of-service attacks. LimitRequestFields LimitRequestFields number Default: LimitRequestFields 100 Server config number is an integer from 0 (meaning unlimited) to 32,767. The default value is defined by the compile-time constant DEFAULT_LIMIT_REQUEST_FIELDS (100 as distributed).

    The LimitRequestFields directive allows the server administrator to modify the limit on the number of request header fields allowed in an HTTP request. A server needs this value to be larger than the number of fields that a normal client request might include. The number of request header fields used by a client rarely exceeds 20, but this may vary among different client implementations, often depending upon the extent to which a user has configured her browser to support detailed content negotiation. Optional HTTP extensions are often expressed using request-header fields. This directive gives the server administrator greater control over abnormal client-request behavior, which may be useful for avoiding some forms of denial-of-service attacks. The value should be increased if normal clients see an error response from the server that indicates too many fields were sent in the request. LimitRequestFieldsize LimitRequestFieldsize bytes Default: LimitRequestFieldsize 8190 Server config

    This directive specifies the number of bytes from 0 to the value of the compile-time constant DEFAULT_LIMIT_REQUEST_FIELDSIZE (8,190 as distributed) that will be allowed in an HTTP request header. The LimitRequestFieldsize directive allows the server administrator to reduce the limit on the allowed size of an HTTP request-header field below the normal input buffer size compiled with the server. A server needs this value to be large enough to hold any one header field from a normal client request. The size of a normal request-header field will vary greatly among different client implementations, often depending upon the extent to which a user has configured his browser to support detailed content negotiation. This directive gives the server administrator greater control over abnormal client-request behavior, which may be useful for avoiding some forms of denial-of-service attacks. Under normal conditions, the value should not be changed from the default. LimitRequestLine LimitRequestLine bytes Default: LimitRequestLine 8190

    This directive sets the number of bytes from 0 to the value of the compile-time constant DEFAULT_LIMIT_REQUEST_LINE (8,190 as distributed) that will be allowed on the HTTP request line.

    The LimitRequestLine directive allows the server administrator to reduce the limit on the allowed size of a client's HTTP request line below the normal input buffer size compiled with the server. Since the request line consists of the HTTP method, URI, and protocol version, the LimitRequestLine directive places a restriction on the length of a request URI allowed for a request on the server. A server needs this value to be large enough to hold any of its resource names, including any information that might be passed in the query part of a GET request. This directive gives the server administrator greater control over abnormal client-request behavior, which may be useful for avoiding some forms of denial-of-service attacks. Under normal conditions, the value should not be changed from the default.

    3.5 HTTP Response Headers The webmaster can set and remove HTTP response headers for special purposes, such as setting metainformation for an indexer or PICS labels. Note that Apache doesn't check whether what you are doing is at all sensible, so make sure you know what you are up to, or very strange things may happen. HeaderName HeaderName filename Server config, virtual host, directory, .htaccess

    The HeaderName directive sets the name of the file that will be inserted at the top of the index listing. filename is the name of the file to include. Apache 1.3.6 and Earlier The module first attempts to include filename.html as an HTML document; otherwise, it will try to include filename as plain text. filename is treated as a filesystem path relative to the directory being indexed. In no case is SSI (server-side includes — see Chapter 14) processing done. For example: HeaderName HEADER

    When indexing the directory /web, the server will first look for the HTML file /web/HEADER.html and include it if found; otherwise, it will include the plain text file /web/HEADER, if it exists. Apache Versions After 1.3.6 filename is treated as a URI path relative to the one used to access the directory being indexed, and it must resolve to a document with a major content type of "text" (e.g., text/html, text/plain, etc.). This means that filename may refer to a CGI script if the

    script's actual file type (as opposed to its output) is marked as text/html, such as with a directive like: AddType text/html .cgi

    Content negotiation will be performed if the MultiViews option is enabled. If filename resolves to a static text/html document (not a CGI script) and the Includes option is enabled, the file will be processed for server-side includes (see the mod_include documentation). This directive needs mod_autoindex. Header HeaderName [set|add|unset|append] HTTP-header "value"HeaderName remove HTTP-header Anywhere

    The HeaderName directive takes two or three arguments: the first may be set, add, unset, or append; the second is a header name (without a colon); and the third is the value (if applicable). It can be used in , , or sections. Header Header set|append|add header value

    or: Header unset headerServer config, virtual host, access.conf, .htaccess

    This directive can replace, merge, or remove HTTP response headers. The action it performs is determined by the first argument. This can be one of the following values: set The response header is set, replacing any previous header with this name. append The response header is appended to any existing header of the same name. When a new value is merged onto an existing header, it is separated from the existing header with a comma. This is the HTTP standard way of giving a header multiple values. add

    The response header is added to the existing set of headers, even if this header already exists. This can result in two (or more) headers having the same name. This can lead to unforeseen consequences, and in general append should be used instead. unset The response header of this name is removed, if it exists. If there are multiple headers of the same name, all will be removed. This argument is followed by a header name, which can include the final colon, but it is not required. Case is ignored. For add, append, and set, a value is given as the third argument. If this value contains spaces, it should be surrounded by double quotes. For unset, no value should be given. Order of Processing The Header directive can occur almost anywhere within the server configuration. It is valid in the main server config and virtual host sections, inside , , and sections, and within .htaccess files. The Header directives are processed in the following order: main server virtual host sections and .htaccess

    Order is important. These two headers have a different effect if reversed: Header append Author "John P. Doe" Header unset Author

    This way round, the Author header is not set. If reversed, the Author header is set to "John P. Doe". The Header directives are processed just before the response is sent by its handler. These means that some headers that are added just before the response is sent cannot be unset or overridden. This includes headers such as "Date" and "Server". Options Options option option ... Default: All Server config, virtual host, directory, .htaccess

    The Options directive is unusually multipurpose and does not fit into any one site or strategic context, so we had better look at it on its own. It gives the webmaster some farreaching control over what people get up to on their own sites. option can be set to None, in which case none of the extra features are enabled, or one or more of the following: All All options are enabled except MultiViews (for historical reasons). ExecCGI Execution of CGI scripts is permitted — and impossible if this is not set. FollowSymLinks The server will follow symbolic links in this directory. Even though the server follows the symlink, it does not change the pathname used to match against sections. This option gets ignored if set inside a section (see Chapter 14). Includes Server-side includes are permitted — and forbidden if this is not set. IncludesNOEXEC Server-side includes are permitted, but the #exec command and #exec CGI are disabled. It is still possible to #include virtual CGI scripts from ScriptAliased directories. Indexes If the customer requests a URL that maps to a directory and there is no index.html there, this option allows the suite of indexing commands to be used, and a formatted listing is returned (see Chapter 7 ). MultiViews Content-negotiated MultiViews are supported. This includes AddLanguage and image negotiation (see Chapter 6). SymLinksIfOwnerMatch

    The server will only follow symbolic links for which the target file or directory is owned by the same user id as the link. This option gets ignored if set inside a section.

    The arguments can be preceded by + or -, in which case they are added or removed. The following command, for example, adds Indexes but removes ExecCGI: Options +Indexes -ExecCGI

    If no options are set and there is no directive, the effect is as if All had been set, which means, of course, that MultiViews is notset. If any options are set, All is turned off. This has at least one odd effect, which we will demonstrate at .../site.options. Notice that the file go has been slightly modified: test -d logs || mkdir logs httpd -f 'pwd'/conf/httpd$1.conf -d 'pwd'

    There is an ... /htdocs directory without an index.html and a very simple Config file: User Webuser Group Webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/APACHE3/site.ownindex/htdocs

    Type ./go in the usual way. As you access the site, you see a directory of ... /htdocs. Now, if you copy the Config file to .../conf/httpd1.conf and add the line: Options ExecCGI

    Kill Apache, restart it with ./go 1, and access it again, you see a rather baffling message: FORBIDDEN You don't have permission to access / on this server

    (or something similar, depending on your browser). The reason is that when Options is not mentioned, it is, by default, set to All. By switching ExecCGI on, you switch all the others off, including Indexes. The cure for the problem is to edit the Config file (.../conf/httpd2.conf) so that the new line reads: Options +ExecCGI

    Similarly, if + or - are not used and multiple options could apply to a directory, the last most specific one is taken. For example (.../conf/httpd3.conf ): Options ExecCGI Options Indexes

    results in only Indexes being set; it might surprise you that CGIs did not work. The same effect can arise through multiple blocks: Options Indexes FollowSymLinks Options Includes

    Only Includes is set for /web/docs/specs. 3.5.1 FollowSymLinks, SymLinksIfOwnerMatch When we saved disk space for our multiple copies of the Butterthlies catalogs by keeping the images bench.jpg, hen.jpg, bath.jpg, and tree.jpg in /usr/www/APACHE3/main_docs and making links to them, we used hard links. This is not always the best idea, because if someone deletes the file you have linked to and then recreates it, you stay linked to the old version with a hard link. With a soft, or symbolic, link, you link to the new version. To make one, use ln -s source_filename destination_filename. However, there are security problems to do with other users on the same system. Imagine that one of them is a dubious character called Fred, who has his own webspace, ... /fred/public_html. Imagine that the webmaster has a CGI script called fido that lives in ... /cgi-bin and belongs to webuser. If the webmaster is wise, she has restricted read and execute permissions for this file to its owner and no one else. This, of course, allows web clients to use it because they also appear as webuser. As things stand, Fred cannot read the file. This is fine, and it's in line with our security policy of not letting anyone read CGI scripts. This denies them explicit knowledge of any security holes. Fred now sneakily makes a symbolic link to fido from his own web space. In itself, this gets him nowhere. The file is as unreadable via symlink as it is in person. But if Fred now logs on to the Web (which he is perfectly entitled to do), accesses his own web space and then the symlink to fido, he can read it because he now appears to the operating system as webuser. The Options command without All or FollowSymLinks stops this caper dead. The more trusting webmaster may be willing to concede FollowSymLinks-IfOwnerMatch , since that too should prevent access.

    3.6 Restarts

    A webmaster will sometimes want to kill Apache and restart it with a new Config file, often to add or remove a virtual host as people's web sites come and go. This can be done the brutal way, by running ps -aux to get Apache's PID, doing kill to stop httpd and restarting it. This method causes any transactions in progress to fail in an annoying and disconcerting way for logged-on clients. A recent innovation in Apache allowed restarts of the main server without suddenly chopping off any child processes that were running.

    There are three ways to restart Apache under Unix (see Chapter 2): • •

    Kill and reload Apache, which then rereads all its Config files and restarts:



    The same effect is achieved with less typing by using the flag-HUPto kill Apache:

    % kill PID % httpd [flags]

    % kill -HUP PID •

    A graceful restart is achieved with the flag-USR1. This rereads the Config files but lets the child processes run to completion, finishing any client transactions in progress, before they are replaced with updated children. In most cases, this is the best way to proceed, because it won't interrupt people who are browsing at the time (unless you messed up the Config files):



    % kill -USR1 PID

    A script to do the job automatically (assuming you are in the server root directory when you run it) is as follows: #!/bin/sh kill -USR1 `cat logs/httpd.pid`

    Under Win32 it is enough to open a second MS-DOS window and type: apache -k shutdown|restart

    See Chapter 2.

    3.7 .htaccess An alternative to restarting to change Config files is to use the .htaccess mechanism, which is explained in Chapter 5. In effect, the changeable parts of the Config file are stored in a secondary file kept in .../htdocs. Unlike the Config file, which is read by

    Apache at startup, this file is read at each access. The advantage is flexibility, because the webmaster can edit it whenever he likes without interrupting the server. The disadvantage is a fairly serious degradation in performance, because the file has to be laboriously parsed to serve each request. The webmaster can limit what people do in their .htaccess files with the AllowOverride directive. He may also want to prevent clients seeing the .htaccess files themselves. This can be achieved by including these lines in the Config file: order allow,deny deny from all

    3.8 CERN Metafiles A metafile is a file with extra header data to go with the file served — for example, you could add a Refresh header. There seems no obvious place for this material, so we will put it here, with apologies to those readers who find it rather odd. MetaFiles MetaFiles [on|off] Default: off Directory

    Turns metafile processing on or off on a directory basis. MetaDir MetaDir directory_name Default directory_name: .web Directory

    Names the directory in which Apache is to look for metafiles. This is usually a "hidden" subdirectory of the directory where the file is held. Set to the value . to look in the same directory. MetaSuffix MetaSuffix file_suffix Default file_suffix: .meta Directory

    Names the suffix of the file containing metainformation. The default values for these directives will cause a request for DOCUMENT_ROOT/mydir/fred.html to look for metainformation (supplementing the MIME header) in DOCUMENT_ROOT/mydir/fred.html.meta.

    3.9 Expirations Apache Version 1.2 brought the expires module, mod_expires, into the main distribution. The point of this module is to allow the webmaster to set the returned headers to pass information to clients' browsers about documents that will need to be reloaded because they are apt to change or, alternatively, that are not going to change for a long time and can therefore be cached. There are three directives: ExpiresActive ExpiresActive [on|off] Anywhere, .htaccess when AllowOverride Indexes ExpiresActive simply switches the expiration mechanism on and off.

    ExpiresByType ExpiresByType mime-type time Anywhere, .htaccess when AllowOverride Indexes ExpiresByType takes two arguments. mime-type specifies a MIME type of file; time

    specifies how long these files are to remain active. There are two versions of the syntax. The first is this: code seconds

    There is no space between code and seconds. code is one of the following: A Access time (or now, in other words) M Last modification time of the file seconds is simply a number. For example:

    A565656

    specifies 565,656 seconds after the access time. The more readable second format is: base [plus] number type [number type ...]

    where base is one of the following: access Access time now Synonym for access modification Last modification time of the file The plus keyword is optional, and type is one of the following: years months weeks days hours minutes seconds

    For example: now plus 1 day 4 hours

    does what it says. ExpiresDefault ExpiresDefault time Anywhere, .htaccess when AllowOverride Indexes

    This directive sets the default expiration time, which is used when expiration is enabled but the file type is not matched by an ExpireByType directive.

    [1] See also HTML & XHTML: The Definitive Guide, by Chuck Musciano and Bill Kennedy (O'Reilly & Associates, 2002). [2] That is, they are processed together for each directory in the path. [3] Shortest meaning "with the fewest components," rather than "with the fewest characters." [4] See Mastering Regular Expressions, by Jeffrey E.F. Friedl (O'Reilly & Associates, 2002). [5] Note that this transition period was almost over before it started because many browsers sent the Host header even in HTTP 1.0 requests. However, in some rare cases, this directive may be useful. [6] Before Apache v1.3, the default was on. Upgraders please note. [7] Dynamically allocated IP addresses may not resolve correctly at any time other than when they are in use. If it is really important to know the exact name of the client, HostNameLookups should be set to on.

    TransferLog /usr/www/APACHE3/APACHE3/site.virtual/IPbased/logs/access_log ServerAdmin [email protected] DocumentRoot /usr/www/APACHE3/APACHE3/site.virtual/htdocs/salesmen ServerName sales-IP.butterthlies.com ErrorLog /usr/www/APACHE3/APACHE3/site.virtual/IP-based/logs/error_log TransferLog /usr/www/APACHE3/APACHE3/site.virtual/IPbased/logs/access_log

    The two named sites are dealt with by the NameVirtualHost directive, whereas requests to sales-IP.butterthlies.com, which we have set up to be192.168.123.3, are dealt with by the third block. It is important that the IP-numbered VirtualHost block comes last in the file so that a call to it falls through the named blocks. This is a handy technique if you want to put a web site up for access — perhaps for testing — by outsiders, but you don't want to make the named domain available. Visitors surf to the IP number and enter your private site. The ordinary visitor is very unlikely to do this: she will surf to the named URL. Of course, you would only use this technique for sites that were not secret or compromising and could withstand inspection by strangers. 4.2.4 Port-Based Virtual Hosting Port-based virtual hosting follows on from IP-based hosting. The main advantage of this technique is that it makes it possible for a webmaster to test a lot of sites using only one IP address/hostname or, in a pinch, host a large number of sites without using namebased hosts and without using lots of IP numbers. Unfortunately, most ordinary users don't like their web server having a funny port number, but this can also be very useful for testing or staging sites. User webuser Group webgroup Listen 80 Listen 8080 ServerName www.butterthlies.com ServerAdmin [email protected] DocumentRoot /usr/www/APACHE3/APACHE3/site.virtual/htdocs/customers ErrorLog /usr/www/APACHE3/APACHE3/site.virtual/IP-based/logs/error_log TransferLog /usr/www/APACHE3/APACHE3/site.virtual/IPbased/logs/access_log ServerName sales-IP.butterthlies.com ServerAdmin [email protected] DocumentRoot /usr/www/APACHE3/APACHE3/site.virtual/htdocs/salesmen ServerName sales.butterthlies.com ErrorLog /usr/www/APACHE3/APACHE3/site.virtual/IP-based/logs/error_log

    TransferLog /usr/www/APACHE3/APACHE3/site.virtual/IPbased/logs/access_log

    The Listen directives tell Apache to watch ports 80 and 8080. If you set Apache going and access http://www.butterthlies.com, you arrive on port 80, the default, and see the customers' site; if you access http://www.butterthlies.com:8080, you get the salespeople's site. If you forget the port and go to http://sales.butterthlies.com, you arrive on the customers' site, because the two share an IP address in our dummied DNS.

    4.3 Two Copies of Apache To illustrate the possibilities, we will run two copies of Apache with different IP addresses on different consoles, as if they were on two completely separate machines. This is not something you want to do often, but on a heavily loaded site it may be useful to run two Apaches optimized in different ways. The different virtual hosts probably need very different configurations, such as different values for ServerType, User, TypesConfig, or ServerRoot (none of these directives can apply to a virtual host, since they are global to all servers, which is why you have to run two copies to get the desired effect). If you are expecting a lot of hits, you should avoid running more than one copy, as doing so will generally load the machine more. You can find the necessary machinery in ... /site.twocopy. There are two subdirectories: customers and sales. The Config file in ... /customers contains the following: User webuser Group webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/APACHE3/site.twocopy/customers/htdocs BindAddress www.butterthlies.com TransferLog logs/access_log

    In .../sales the Config file is as follows: User webuser Group webgroup ServerName sales.butterthlies.com DocumentRoot /usr/www/APACHE3/APACHE3/site.twocopy/sales/htdocs Listen sales-not-vh.butterthlies.com:80 TransferLog logs/access_log

    On this occasion, we will exercise the sales-not-vh.butterthlies.com URL. For the first time, we have more than one copy of Apache running, and we have to associate requests on specific URLs with different copies of the server. There are three more directives to for making these associations:

    BindAddress BindAddress addr Default addr: any Server config

    This directive forces Apache to bind to a particular IP address, rather than listening to all IP addresses on the machine. It has been abolished in Apache v2: use Listen instead. Port Port port Default port: 80 Server config

    When used in the main server configuration (i.e., outside any sections) and in the absence of a BindAddress or Listen directive, the Port directive sets the port number on which Apache is to listen. This is for backward compatibility, and you should really use BindAddress or Listen. When used in a section, this specifies the port that should be used when the server generates a URL for itself (see also ServerName and UseCanonicalName). It does not set the port on which the virtual host listens — that is done by the directive itself. Listen

    Listen hostname:port Server config Listen tells Apache to pay attention to more than one IP address or port. By default, it responds to requests on all IP addresses, but only to the port specified by the Port

    directive. It therefore allows you to restrict the set of IP addresses listened to and increase the set of ports. Listen is the preferred directive; BindAddress is obsolete, since it has to be combined with the Port directive if any port other than 80 is wanted. Also, more than one Listen can be used, but only a single BindAddress.

    There are some housekeeping directives to go with these three:

    ListenBacklog ListenBacklog number Default: 511 Server config ListenBacklog sets the maximum length of the queue of pending connections.

    Normally, doing so is unnecessary, but it can be useful if the server is under a TCP SYN flood attack, which simulates lots of new connection opens that don't complete. On some systems, this causes a large backlog, which can be alleviated by setting the ListenBacklog parameter. Only the knowledgeable should do this. See the backlog parameter in the manual entry for listen. Back in the Config file, DocumentRoot (as before) sets the arena for our offerings to the customer. ErrorLog tells Apache where to log its errors, and TransferLog its successes. As we will see in Chapter 10 , the information stored in these logs can be tuned. ServerType ServerType [inetd|standalone] Default: standalone Server config Abolished in Apache v2

    The ServerType directive allows you to control the way in which Apache handles multiple copies of itself. The arguments are inetd or standalone (the default): inetd You might not want Apache to spawn a cloud of waiting child processes at all, but rather to start up a new one each time a request comes in and exit once it has been dealt with. This is slower, but it consumes fewer resources when there are no clients to be dealt with. However, this method is deprecated by the Apache Group as being clumsy and inefficient. On some platforms it may not work at all, and the Group has no plans to fix it. The utility inetd is configured in /etc/inetd.conf (see man inetd ). The entry for Apache would look something like this: http stream tcp nowait root /usr/local/bin/httpd httpd -d directory

    standalone The default; this allows the swarm of waiting child servers. Having set up the customers, we can duplicate the block, making some slight changes to suit the salespeople. The two servers have different DocumentRoots, which is to be

    expected because that's why we set up two hosts in the first place. They also have different error and transfer logs, but they don't have to. You could have one transfer log and one error log, or you could write all the logging for both sites to a single file. Type go on the server (this may require root privileges); while on the client, as before, access http://www.butterthlies.com or http://sales.butterthlies.com /. The files in ... /sales/htdocs are similar to those on ... /customers/htdocs, but altered enough so that we can see the difference when we access the two sites. index.html has been edited so that the first line reads:

    SALESMEN Index to Butterthlies Catalogs



    The file catalog_summer.html has been edited so that it reads:

    Welcome to the great rip-off of '97: Butterthlies Inc

    All our worthless cards are available in packs of 20 at $1.95 a pack. WHAT A FANTASTIC DISCOUNT! There is an amazing FURTHER 10% discount if you order more than 100.

    ...

    and so on, until the joke gets boring. Now we can throw the great machine into operation. From console 1, get into ... /customers and type: % ./go

    The first Apache is running. Now get into .../sales and again type: % ./go

    Now, as the client, you log on to http://www.butterthlies.com / and see the customers' site, which shows you the customers' catalogs. Quit, and metamorphose into a voracious salesperson by logging on to http://sales.butterthlies.com /. You are given a nasty insight into the ugly reality beneath the smiling face of e-commerce!

    4.4 Dynamically Configured Virtual Hosting An even neater method of managing Virtual Hosting is provided by mod_vhost_alias, which lets you define a single boilerplate configuration and then fills in the details at service time from the IP address and or the Host header in the HTTP request. All the directives in this module interpolate a string into a pathname. The interpolated string (called the "name") may be either the server name (see the UseCanonicalName directive for details on how this is determined) or the IP address of the virtual host on the server in dotted-quad format (xxx.xxx.xxx.xxx).

    The interpolation is controlled by a mantra, %, which is replaced by some value you supply in the Config file. It's not unlike the controls for logging — see Chapter 10. These are the possible formats: %% Insert a literal %. %p Insert the port number of the virtual host. %N.M Insert (part of ) the name. N and M are numbers, used to specify substrings of the name. N selects from the dot-separated components of the name, and M selects characters within whatever N has selected. M is optional and defaults to zero if it isn't present. The dot must be present if and only if M is present. If we are trying to parse sales.butterthlies.com, the interpretation of N is as follows: 0 The whole name: sales.butterthlies.com 1 The first part: sales 2 The second part: butterthlies -1 The last part: com -2 The penultimate part: butterthlies 2+ The second and all subsequent parts: butterthlies.com

    -2+ The penultimate and all preceding parts: www.butterthlies 1+ and -1+ The same as 0: sales.butterthlies.com If N or M is greater than the number of parts available, a single underscore is interpolated. 4.4.1 Examples For simple name-based virtual hosts, you might use the following directives in your server-configuration file: UseCanonicalName Off VirtualDocumentRoot /usr/local/apache/vhosts/%0

    A request for http://www.example.com/directory/file.html will be satisfied by the file /usr/local/apache/vhosts/www.example.com/directory/file.html. On .../site.dynamic we have implemented a version of the familiar Buttterthlies site, with a password-protected salesperson's department. The first Config file, .../conf/httpd1.conf, is as follows: User webuser Group webgroup ServerName my586 UseCanonicalName Off VirtualDocumentRoot /usr/www/APACHE3/site.dynamic/htdocs/%0 AuthType Basic AuthName Darkness AuthUserFile /usr/www/APACHE3/ok_users/sales AuthGroupFile /usr/www/APACHE3/ok_users/groups Require group cleaners

    Launch it with go 1; it responds nicely to http://www.butterthlies.com and http://sales.butterthlies.com. There is an equivalent VirtualScriptAlias directive, but it insists on URLs containing ../cgi-bin/... — for instance, www.butterthlies.com/cgi-bin/mycgi. In view of the reputed horror some search engines have for "cgi-bin", you might prefer not to use it and to keep "cgi-bin" out of your URLs with this:

    ScriptAliasMatch /(.*) /usr/www/APACHE3/cgi-bin/handler/$1

    The effect should be that any visitor to /fredwill call the script .../cgibin/handler and pass "fred" to it in the PATH_INFO Environment variable. If you have a very large number of virtual hosts, it's a good idea to arrange the files to reduce the size of the vhosts directory. To do this, you might use the following in your configuration file: UseCanonicalName Off VirtualDocumentRoot /usr/local/apache/vhosts/%3+/%2.1/%2.2/%2.3/%2

    A request for http://www.example.isp.com/directory/file.html will be satisfied by the file /usr/local/apache/vhosts/isp.com/e/x/a/example/directory/file.html (because isp.com matches to %3+, e matches to %2.1 — the first character of the second part of the URL example, and so on). The point is that most OSes are very slow if you have thousands of subdirectories in a single directory: this scheme spreads them out. A more even spread of files can often be achieved by selecting from the end of the name, for example: VirtualDocumentRoot /usr/local/apache/vhosts/%3+/%2.-1/%2.-2/%2.-3/%2

    The example request would come from /usr/local/apache/vhosts/isp.com/e/l/p/example/directory/file.html. Alternatively, you might use: VirtualDocumentRoot /usr/local/apache/vhosts/%3+/%2.1/%2.2/%2.3/%2.4+

    The example request would come from /usr/local/apache/vhosts/isp.com/e/x/a/mple/directory/file.html. For IP-based virtual hosting you might use the following in your configuration file: UseCanonicalName DNS VirtualDocumentRootIP /usr/local/apache/vhosts/%1/%2/%3/%4/docs VirtualScriptAliasIP /usr/local/apache/vhosts/%1/%2/%3/%4/cgi-bin

    A request for http://www.example.isp.com/directory/file.html would be satisfied by the file /usr/local/apache/vhosts/10/20/30/40/docs/directory/file.html if the IP address of www.example.com were 10.20.30.40. A request for http://www.example.isp.com/cgibin/script.pl would be satisfied by executing the program /usr/local/apache/vhosts/10/20/30/40/cgi-bin/script.pl. If you want to include the . character in a VirtualDocumentRoot directive, but it clashes with a % directive, you can work around the problem in the following way: VirtualDocumentRoot /usr/local/apache/vhosts/%2.0.%3.0

    A request for http://www.example.isp.com/directory/file.html will be satisfied by the file /usr/local/apache/vhosts/example.isp/directory/file.html. The LogFormat directives %V and %A are useful in conjunction with this module. See Chapter 10. VirtualDocumentRoot VirtualDocumentRoot interpolated-directory Default: None Server config, virtual host Compatibility: VirtualDocumentRoot is only available in 1.3.7 and later.

    The VirtualDocumentRoot directive allows you to determine where Apache will find your documents based on the value of the server name. The result of expanding interpolated-directory is used as the root of the document tree in a similar manner to the DocumentRoot directive's argument. If interpolated-directory is none, then VirtualDocumentRoot is turned off. This directive cannot be used in the same context as VirtualDocumentRootIP. VirtualDocumentRootIP VirtualDocumentRootIP interpolated-directory Default: None Server config, virtual host

    The VirtualDocumentRootIP directive is like the VirtualDocumentRoot directive, except that it uses the IP address of the server end of the connection instead of the server name. VirtualScriptAlias VirtualScriptAlias interpolated-directory Default: None Server config, virtual host

    The VirtualScriptAlias directive allows you to determine where Apache will find CGI scripts in a manner similar to how VirtualDocumentRoot does for other documents. It matches requests for URIs starting /cgi-bin/, much like the following: ScriptAlias /cgi-bin/ ...

    VirtualScriptAliasIP

    VirtualScriptAliasIP interpolated-directoryDefault: NoneServer config, virtual host

    The VirtualScriptAliasIP directive is like the VirtualScriptAlias directive, except that it uses the IP address of the server end of the connection instead of the server name.

    CONTENTS

    Chapter 5. Authentication • • • • • • • • • • • • • • • •

    5.1 Authentication Protocol 5.2 Authentication Directives 5.3 Passwords Under Unix 5.4 Passwords Under Win32 5.5 Passwords over the Web 5.6 From the Client's Point of View 5.7 CGI Scripts 5.8 Variations on a Theme 5.9 Order, Allow, and Deny 5.10 DBM Files on Unix 5.11 Digest Authentication 5.12 Anonymous Access 5.13 Experiments 5.14 Automatic User Information 5.15 Using .htaccess Files 5.16 Overrides

    The volume of business Butterthlies, Inc. is doing is stupendous, and naturally our competitors are anxious to look at sensitive information such as the discounts we give our salespeople. We have to seal our site off from their vulgar gaze by authenticating those who log on to it.

    5.1 Authentication Protocol Authentication is simple in principle. The client sends his name and password to Apache. Apache looks up its file of names and encrypted passwords to see whether the client is entitled to access. The webmaster can store a number of clients in a list — either as a simple text file or as a database — and thereby control access person by person. It is also possible to group a number of people into named groups and to give or deny access to these groups as a whole. So, throughout this chapter, bill and ben are in the group directors, and daphne and sonia are in the group cleaners. The webmaster can require user so and so or require group such and such, or even simply require that visitors be registered users. If you have to deal with large numbers of people, it is obviously easier to group them in this way. To make the demonstration simpler, the password is always theft. Naturally, you would not use so short and obvious a password in real life, or one so open to a dictionary attack. Each username/password pair is valid for a particular realm, which is named when the passwords are created. The browser asks for a URL; the server sends back

    "Authentication Required" (code 401) and the realm. If the browser already has a username/password for that realm, it sends the request again with the username/password. If not, it prompts the user, usually including the realm's name in the prompt, and sends that. Of course, all this is worryingly insecure since the password is sent unencrypted over the Web (base64 encoding is easily reversed), and any malign observer simply has to watch the traffic to get the password — which is as good in his hands as in the legitimate client's. Digest authentication improves on this by using a challenge/handshake protocol to avoid revealing the actual password. In the two earlier editions of this book, we had to report that no browsers actually supported this technique; now things are a bit better. Using SSL (see Chapter 11) also improves this. 5.1.1 site.authent Examples are found in site.authent. The first Config file, .../conf/httpd1.conf, looks like this: User webuser Group webgroup ServerName www.butterthlies.com NameVirtualHost 192.168.123.2 ServerAdmin [email protected] DocumentRoot /usr/www/APACHE3/site.authent/htdocs/customers ServerName www.butterthlies.com ErrorLog /usr/www/APACHE3/site.authent/logs/error_log TransferLog /usr/www/APACHE3/site.authent/logs/customers/access_log ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin ServerAdmin [email protected] DocumentRoot /usr/www/APACHE3/site.authent/htdocs/salesmen ServerName sales.butterthlies.com ErrorLog /usr/www/APACHE3/site.authent/logs/error_log TransferLog /usr/www/APACHE3/site.authent/logs/salesmen/access_log ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin AuthType Basic AuthName darkness AuthUserFile /usr/www/APACHE3/ok_users/sales AuthGroupFile /usr/www/APACHE3/ok_users/groups require valid-user

    What's going on here? The key directive is AuthType Basic in the block. This turns Authentication checking on.

    5.2 Authentication Directives From Apache v1.3 on, filenames are relative to theserver rootunless they are absolute. A filename is taken as absolute if it starts with / or, on Win32, if it starts with drive :/. It seems sensible for us to write them in absolute form to prevent misunderstandings. The directives are as follows: AuthType AuthType type directory, .htaccess AuthType specifies the type of authorization control. Basic was originally the only possible type, but Apache 1.1 introduced Digest, which uses an MD5 digest and a shared secret.

    If the directive AuthType is used, we must also use AuthName, AuthGroupFile, and AuthUserFile. AuthName AuthName auth-realm directory, .htaccess AuthName gives the name of the realm in which the users' names and passwords are valid.

    If the name of the realm includes spaces, you will need to surround it with quotation marks: AuthName "sales people"

    AuthGroupFile AuthGroupFile filename directory, .htaccess AuthGroupFile has nothing to do with the Group webgroup directive at the top of the

    Config file. It gives the name of another file that contains group names and their members: cleaners: daphne sonia directors: bill ben

    We put this into ... /ok_users/groups and set AuthGroupFile to match. The AuthGroupFile directive has no effect unless the require directive is suitably set.

    AuthUserFile AuthUserFile filename AuthUserFile is a file of usernames and their encrypted passwords. There is quite a lot

    to this; see the section Section 5.3, Section 5.4, and Section 5.5 later in this chapter. AuthAuthoritative AuthAuthoritative on|off Default: AuthAuthoritative on directory, .htaccess

    Setting the AuthAuthoritative directive explicitly to off allows for both authentication and authorization to be passed on to lower-level modules (as defined in the Config and modules.c files) if there is no user ID or rule matching the supplied user ID. If there is a user ID and/or rule specified, the usual password and access checks will be applied, and a failure will give an Authorization Required reply. So if a user ID appears in the database of more than one module or if a valid Require directive applies to more than one module, then the first module will verify the credentials, and no access is passed on — regardless of the AuthAuthoritative setting. A common use for this is in conjunction with one of the database modules, such as mod_auth_db.c, mod_auth_dbm.c, mod_auth_msql.c, and mod_auth_anon.c. These modules supply the bulk of the user-credential checking, but a few (administrator) related accesses fall through to a lower level with a well-protected AuthUserFile. Default By default, control is not passed on, and an unknown user ID or rule will result in an Authorization Required reply. Not setting it thus keeps the system secure. Security Do consider the implications of allowing a user to allow fall-through in her .htaccess file, and verify that this is really what you want. Generally, it is easier just to secure a single .htpasswd file than it is to secure a database such as mSQL. Make sure that the AuthUserFile is stored outside the document tree of the web server; do not put it in the directory that it protects. Otherwise, clients will be able to download the AuthUserFile. AuthDBAuthoritative

    AuthDBAuthoritative on|off Default: AuthDBAuthoritative on directory, .htaccess

    Setting the AuthDBAuthoritative directive explicitly to off allows for both authentication and authorization to be passed on to lower-level modules (as defined in the Config and modules.c files) if there is no user ID or rule matching the supplied user ID. If there is a user ID and/or rule specified, the usual password and access checks will be applied, and a failure will give an Authorization Required reply. So if a user ID appears in the database of more than one module or if a valid Require directive applies to more than one module, then the first module will verify the credentials, and no access is passed on — regardless of the AuthAuthoritative setting. A common use for this is in conjunction with one of the basic auth modules, such as mod_auth.c. Whereas this DB module supplies the bulk of the user-credential checking, a few (administrator) related accesses fall through to a lower level with a well-protected .htpasswd file. Default By default, control is not passed on, and an unknown user ID or rule will result in an Authorization Required reply. Not setting it thus keeps the system secure. Security Do consider the implications of allowing a user to allow fall-through in his .htaccess file, and verify that this is really what you want. Generally, it is easier just to secure a single .htpasswd file than it is to secure a database that might have more access interfaces. AuthDBMAuthoritative AuthDBMAuthoritative on|off Default: AuthDBMAuthoritative on directory, .htaccess

    Setting the AuthDBMAuthoritative directive explicitly to off allows for both authentication and authorization to be passed on to lower-level modules (as defined in the Config and modules.c files) if there is no user ID or rule matching the supplied user ID. If there is a user ID and/or rule specified, the usual password and access checks will be applied, and a failure will give an Authorization Required reply. So if a user ID appears in the database of more than one module or if a valid Require directive applies to more than one module, then the first module will verify the credentials, and no access is passed on — regardless of the AuthAuthoritative setting.

    A common use for this is in conjunction with one of the basic auth modules, such as mod_auth.c. Whereas this DBM module supplies the bulk of the user-credential checking, a few (administrator) related accesses fall through to a lower level with a wellprotected .htpasswd file. Default By default, control is not passed on, and an unknown user ID or rule will result in an Authorization Required reply. Not setting it thus keeps the system secure. Security Do consider the implications of allowing a user to allow fall-through in her .htaccess file, and verify that this is really what you want. Generally, it is easier to just secure a single .htpasswd file than it is to secure a database that might have more access interfaces. require require [user user1 user2 ...] [group group1 group2] [validuser] [valid-user] [valid-group] directory, .htaccess

    The key directive that throws password checking into action is require. The argument, valid-user, accepts any users that are found in the password file. Do not mistype this as valid_user, or you will get a hard-to-explain authorization failure when you try to access this site through a browser. This is because Apache does not care what you put after require and will interpret valid_user as a username. It would be nice if Apache returned an error message, but require is usable by multiple modules, and there's no way to determine (in the current API) what values are valid. file-owner [Available after Apache 1.3.20] The supplied username and password must be in the AuthUserFile database, and the username must also match the system's name for the owner of the file being requested. That is, if the operating system says the requested file is owned by jones, then the username used to access it through the Web must be jones as well. file-group [Available after Apache 1.3.20] The supplied username and password must be in the AuthUserFile database, the name of the group that owns the file must be in the AuthGroupFile database, and the username must be a member of that group. For

    example, if the operating system says the requested file is owned by group accounts, the group accounts must be in the AuthGroupFile database, and the username used in the request must be a member of that group. We could say: require user bill ben simon

    to allow only those users, provided they also have valid entries in the password table, or we could say: require group cleaners

    in which case only sonia and daphne can access the site, provided they also have valid passwords and we have set up AuthGroupFile appropriately. The block that protects ... /cgi-bin could safely be left out in the open as a separate block, but since protection of the ... /salesmen directory only arises when sales.butterthlies.com is accessed, we might as well put the require directive there. satisfy satisfy [any|all] Default: all directory, .htaccess satisfy sets access policy if both allow and require are used. The parameter can be either all or any. This directive is only useful if access to a particular area is being

    restricted by both username/password and client host address. In this case, the default behavior (all) is to require the client to pass the address access restriction and enter a valid username and password. With the any option, the client will be granted access if he either passes the host restriction or enters a valid username and password. This can be used to let clients from particular addresses into a password-restricted area without prompting for a password. For instance, we want a password from everyone except site 1.2.3.4: require valid-user Satisfy any order deny,allow allow from 1.2.3.4 deny from all

    5.3 Passwords Under Unix

    Authentication of salespeople is managed by the password file sales, stored in /usr/www/APACHE3/ok_users. This is safely above the document root, so that the Bad Guys cannot get at it to mess with it. The file sales is maintained using the Apache utility htpasswd. The source code for this utility is to be found in ... /apache_1.3.1/src/support/htpasswd.c, and we have to compile it with this: % make htpasswd

    htpasswd now links, and we can set it to work. Since we don't know how it functions, the obvious thing is to prod it with this: % htpasswd -?

    It responds that the correct usage is as follows: Usage: htpasswd [-cmdps] passwordfile username htpasswd -b[cmdps] passwordfile username password -c Create a new file. -m Force MD5 encryption of the password. -d Force CRYPT encryption of the password (default). -p Do not encrypt the password (plaintext). -s Force SHA encryption of the password. -b Use the password from the command line rather than prompting for it. On Windows and TPF systems the '-m' flag is used by default. On all other systems, the '-p' flag will probably not work.

    This seems perfectly reasonable behavior, so let's create a user bill with the password "theft" (in real life, you would never use so obvious a password for a character such as Bill of the notorious Butterthlies sales team, because it would be subject to a dictionary attack, but this is not real life): % htpasswd -m -c ... /ok_users/sales bill

    We are asked to type his password twice, and the job is done. If we look in the password file, there is something like the following: bill:$1$Pd$E5BY74CgGStbs.L/fsoEU0

    Add subsequent users (the -c flag creates a new file, so we shouldn't use it after the first one): % htpasswd ... /ok_users/sales ben

    There is no warning if you use the -c flag by accident, so be cautious. Carry on and do the same for sonia and daphne. We gave them all the same password, "theft," to save having to remember different ones later — another dangerous security practice.

    The password file ... /ok_users/users now looks something like this:[1] bill:$1$Pd$E5BY74CgGStbs.L/fsoEU0 ben:$1$/S$hCyzbA05Fu4CAlFK4SxIs0 sonia:$1$KZ$ye9u..7GbCCyrK8eFGU2w. daphne:$1$3U$CF3Bcec4HzxFWppln6Ai01

    Each username is followed by an encrypted password. They are stored like this to protect the passwords because, at least in theory, you cannot work backward from the encrypted to the plain-text version. If you pretend to be Bill and log in using: $1$Pd$E5BY74CgGStbs.L/fsoEU0

    the password gets re-encrypted, becomes something like o09klks23O9RM, and fails to match. You can't tell by looking at this file (or if you can, we'll all be very disappointed) that Bill's password is actually "theft." From Apache v1.3.14, htpasswd will also generate a password to standard output by using the flag -n.

    5.4 Passwords Under Win32 Since Win32 lacks an encryption function, passwords are stored in plain text. This is not very secure, but one hopes it will change for the better. The passwords would be stored in the file named by the AuthUserFile directive, and Bill's entry would be: bill:theft

    except that in real life you would use a better password.

    5.5 Passwords over the Web The security of these passwords on your machine becomes somewhat irrelevant when we realize that they are transmitted unencrypted over the Web. The Base64 encoding used for Basic password transmission keeps passwords from being readable at a glance, but it is very easily decoded. Authentication, as described here, should only be used for the most trivial security tasks. If a compromised password could cause any serious trouble, then it is essential to encrypt it using SSL — see Chapter 11.

    5.6 From the Client's Point of View If you run Apache using httpd1.conf, you will find you can access www.butterthlies.comas before. But if you go to sales.butterthlies.com,you will have to give a username and password. 5.6.1 The Config File

    The file is httpd2.conf. These are the relevant bits: ... AuthType Digest AuthName darkness AuthDigestDomain http://sales.butterthlies.com AuthDigestFile /usr/www/APACHE3/ok_digest/digest_users

    Run it with ./go 2. At the client end, Microsoft Internet Explorer (MSIE) v5 displayed a password screen decorated with a key and worked as you would expect; Netscape v4.05 asked for a username and password in the usual way and returned error 401 "Authorization required."

    5.7 CGI Scripts Authentication (both Basic and Digest) can also protect CGI scripts. Simply provide a suitable block.

    5.8 Variations on a Theme You may find that logging in again is a bit more elaborate than you would think. We found that both MSIE and Netscape were annoyingly helpful in remembering the password used for the last login and using it again. To make sure you are really exercising the security features, you have to exit your browser completely each time and reload it to get a fresh crack. You might like to try the effect of inserting these lines in either of the previous Config files: .... #require #require #require #require ...

    valid-user user daphne bill group cleaners group directors

    and uncommenting them one line at a time (remember to kill and restart Apache each time).

    5.9 Order, Allow, and Deny So far we have dealt with potential users on an individual basis. We can also allow access from or deny access to specific IP addresses, hostnames, or groups of addresses and hostnames. The commands are allow from and deny from. The order in which the allow and deny commands are applied is not set by the order in which they appear in your file. The default order is deny then allow : if a client is

    excluded by deny, it is excluded unless it matches allow. If neither is matched, the client is granted access. The order in which these commands is applied can be set by the order directive. allow from allow from host host ... directory, .htaccess

    The allow directive controls access to a directory. The argument host can be one of the following: all All hosts are allowed access. A (partial) domain name All hosts whose names match or end in this string are allowed access. A full IP address The first one to three bytes of an IP address are allowed access, for subnet restriction. A network/netmask pair Network a.b.c.d and netmask w.x.y.z are allowed access, to give finer-grained subnet control. For instance, 10.1.0.0/255.255.0.0. A network CIDR specification The netmask consists of nnn high-order 1-bits. For instance, 10.1.0.0/16 is the same as 10.1.0.0/255.255.0.0. allow from env allow from env=variablename ... directory, .htaccess

    The allow from env directive controls access by the existence of a named environment variable. For instance:

    BrowserMatch ^KnockKnock/2.0 let_me_in order deny,allow deny from all allow from env=let_me_in

    Access by a browser called KnockKnock v2.0 sets an environment variable let_me_in,which in turn triggersallow from. deny from deny from host host ... directory, .htaccess

    The deny from directive controls access by host. The argument host can be one of the following: all All hosts are denied access. A (partial) domain name All hosts whose names match or end in this string are denied access. A full IP address The first one to three bytes of an IP address are denied access, for subnet restriction. A network/netmask pair Network a.b.c.d and netmask w.x.y.z are denied access, to give finer-grained subnet control. For instance, 10.1.0.0/255.255.0.0. A network CIDR specification The netmask consists of nnn high-order 1-bits. For instance, 10.1.0.0/16 is the same as 10.1.0.0/255.255.0.0. deny from env deny from env=variablename ... directory, .htaccess

    The deny from env directive controls access by the existence of a named environment variable. For instance: BrowserMatch ^BadRobot/0.9 go_away order allow,deny allow from all deny from env=go_away

    Access by a browser called BadRobot v0.9 sets an environment variable go_away, which in turn triggers deny from. Order order ordering directory, .htaccess

    The ordering argument is one word (i.e., it is not allowed to contain a space) and controls the order in which the foregoing directives are applied. If two order directives apply to the same host, the last one to be evaluated prevails: deny,allow The deny directives are evaluated before the allow directives. This is the default. allow,deny The allow directives are evaluated before the denys, but the user will still be rejected if a deny is encountered. mutual-failure Hosts that appear on the allow list and do not appear on the deny list are allowed access. We could say: allow from all

    which lets everyone in and is hardly worth writing, or we could say: allow from 123.156 deny from all

    As it stands, this denies everyone except those whose IP addresses happen to start with 123.156. In other words, allow is applied last and carries the day. If, however, we changed the default order by saying: order allow,deny allow from 123.156 deny from all

    we effectively close the site because deny is now applied last. It is also possible to use domain names, so that instead of: deny from 123.156.3.5

    you could say: deny from badguys.com

    Although this has the advantage of keeping up with the Bad Guys as they move from one IP address to another, it also allows access by people who control the reverse-DNS mapping for their IP addresses. A URL can be contain just part of the hostname. In this case, the match is done on whole words from the right. That is, allow from fred.com allows fred.com and abc.fred.com, but not notfred.com. Good intentions, however, are not enough: before conferring any trust in a set of access rules, you want to test them very thoroughly in private before exposing them to the world. Try the site with as many different browsers as you can muster: Netscape and MSIE can behave surprisingly differently. Having done that, try the site from a publicaccess terminal — in a library, for instance.

    5.10 DBM Files on Unix Although searching a file of usernames and passwords works perfectly well, it is apt to be rather slow once the list gets up to a couple hundred entries. To deal with this, Apache provides a better way of handling large lists by turning them into a database. You need one (not both!) of the modules that appear in the Config file as follows: #Module db_auth_module mod_auth_db.o Module dbm_auth_module mod_auth_dbm.o

    Bear in mind that they correspond to different directives: AuthDBMUserFile or AuthDBUserFile. A Perl script to manage both types of database, dbmmanage, is supplied with Apache in .../src/support. To decide which type to use, you need to discover the capabilities of your Unix. Explore these by going to the command prompt and typing first:

    % man db

    and then: % man dbm

    Whichever method produces a manpage is the one you should use. You can also use a SQL database, employing MySQLor a third-party package to manage it. Once you have decided which method to use, edit the Config file to include the appropriate module, and then type: % ./Configure

    and: % make

    We now have to create a database of our users: bill, ben, sonia, and daphne. Go to ... /apache/src/support, find the utility dbmmanage, and copy it into /usr/local/bin or something similar to put it on your path. This utility may be distributed without execute permission set, so, before attempting to run it, we may need to change the permissions: % chmod +x dbmmanage

    You may find, when you first try to run dbmmanage, that it complains rather puzzlingly that some unnamed file can't be found. Since dbmmanage is a Perl script, this is probably Perl, a text-handling language, and if you have not installed it, you should. It may also be necessary to change the first line of dbmmanage: #!/usr/bin/perl5

    to the correct path for Perl, if it is installed somewhere else. If you provoke it with dbmmanage -?, you get: Usage: dbmmanage [enc] dbname command [username [pw [group[,group] [comment]]]] where enc is Netware)

    -d for crypt encryption (default except on Win32, -m for MD5 encryption (default on Win32, Netware) -s for SHA1 encryption -p for plaintext

    command is one of: add|adduser|check|delete|import|update|view pw of . for update command retains the old password pw of--(or blank) for update command prompts for the password

    groups or comment of . (or blank) for update command retains old values groups or comment of--for update command clears the existing value groups or comment of--for add and adduser commands is the empty value takes the following arguments: dbmmanage [enc] dbname command [username [pw [group[,group] [comment]]]] 'enc' sets the encryption method: -d for crypt (default except Win32, Netware) -m for MD5 (default on Win32, Netware) -s for SHA1 -p for plaintext

    So, to add our four users to a file /usr/www/APACHE3/ok_dbm/users, we type: % dbmmanage /usr/www/APACHE3/ok_dbm/users.db adduser bill New password:theft Re-type new password:theft User bill added with password encrypted to vJACUCNeAXaQ2 using crypt

    Perform the same service for ben, sonia, and daphne. The file ... /users is not editable directly, but you can see the results by typing: % dbmmanage /usr/www/APACHE3/ok_dbm/users view bill:vJACUCNeAXaQ2 ben:TPsuNKAtLrLSE sonia:M9x731z82cfDo daphne:7DBV6Yx4.vMjc

    You can build a group file with dbmmanage,but because of faults in the script that we hope will have been rectified by the time readers of this edition use it, the results seem a bit odd. To add the user fred to the group cleaners, type: % dbmmanage /usr/www/APACHE3/ok_dbm/group add fred cleaners

    (Note: do not use adduser.) dbmmanagerather puzzlingly responds with the following message: User fred added with password encrypted to cleaners using crypt

    When we test this with: % dbmmanage /usr/www/APACHE3/ok_dbm/group view

    we see: fred:cleaners

    which is correct, because in a group file the name of the group goes where the encrypted password would go in a password file. Since we have a similar file structure, we invoke DBM authentication in ... /conf/httpd.conf by commenting out: #AuthUserFile /usr/www/APACHE3/ok_users/sales #AuthGroupFile /usr/www/APACHE3/ok_users/groups

    and inserting: AuthDBMUserFile /usr/www/APACHE3/ok_dbm/users AuthDBMGroupFile /usr/www/APACHE3/ok_dbm/users AuthDBMGroupFile is set to the samefile as the AuthDBMUserFile. What happens is that the username becomes the key in the DBM file, and the value associated with the key is password:group. To create a separate group file, a database with usernames as the key and groups as the value (with no colons in the value) would be needed.

    5.10.1 AuthDBUserFile The AuthDBUserFile directive sets the name of a DB file containing the list of users and passwords for user authentication. AuthDBUserFile filename directory, .htaccess filename is the absolute path to the user file.

    The user file is keyed on the username. The value for a user is the crypt( )-encrypted password, optionally followed by a colon and arbitrary data. The colon and the data following it will be ignored by the server.

    5.10.1.1 Security Make sure that the AuthDBUserFile is stored outside the document tree of the web server; do not put it in the directory that it protects. Otherwise, clients will be able to download the AuthDBUserFile. In regards to compatibility, the implementation of dbmopen in the Apache modules reads the string length of the hashed values from the DB data structures, rather than relying upon the string being NULL-appended. Some applications, such as the Netscape web server, rely upon the string being NULL-appended, so if you are having trouble using DB files interchangeably between applications, this may be a part of the problem.

    A perl script called dbmmanage is included with Apache. This program can be used to create and update DB-format password files for use with this module. 5.10.2 AuthDBMUserFile The AuthDBMUserFile directive sets the name of a DBM file containing the list of users and passwords for user authentication. AuthDBMUserFile filename directory, .htaccess filename is the absolute path to the user file.

    The user file is keyed on the username. The value for a user is the crypt( )-encrypted password, optionally followed by a colon and arbitrary data. The colon and the data following it will be ignored by the server.

    5.10.2.1 Security Make sure that the AuthDBMUserFile is stored outside the document tree of the web server; do not put it in the directory that it protects. Otherwise, clients will be able to download the AuthDBMUserFile. In regards to compatibility, the implementation of dbmopen in the Apache modules reads the string length of the hashed values from the DBM data structures, rather than relying upon the string being NULL-appended. Some applications, such as the Netscape web server, rely upon the string being NULL-appended, so if you are having trouble using DBM files interchangeably between applications, this may be a part of the problem. A perl script called dbmmanage is included with Apache. This program can be used to create and update DBM-format password files for use with this module.

    5.11 Digest Authentication A halfway house between complete encryption and none at all is digest authentication. The idea is that a one-way hash, or digest, is calculated from a password and various other bits of information. Rather than sending the lightly encoded password, as is done in basic authentication, the digest is sent. At the other end, the same function is calculated: if the numbers are not identical, something is wrong — and in this case, since all other factors should be the same, the "something" must be the password. Digest authentication is applied in Apache to improve the security of passwords. MD5 is a cryptographic hash function written by Ronald Rivest and distributed free by RSA Data

    Security; with its help, the client and server use the hash of the password and other stuff. The point of this is that although many passwords lead to the same hash value, there is a very small chance that a wrong password will give the right hash value, if the hash function is intelligently chosen; it is also very difficult to construct a password leading to the same hash value (which is why these are sometimes referred to as one-way hashes). The advantage of using the hash value is that the password itself is not sent to the server, so it isn't visible to the Bad Guys. Just to make things more tiresome for them, MD5 adds a few other things into the mix: the URI, the method, and a nonce. A nonce is simply a number chosen by the server and told to the client, usually different each time. It ensures that the digest is different each time and protects against replay attacks.[2] The digest function looks like this: MD5(MD5()+":"++":"+MD5(+":"+))

    MD5 digest authentication can be invoked with the following line: AuthType Digest

    This plugs a nasty hole in the Internet's security. As we saw earlier — and almost unbelievably — the authentication procedures discussed up to now send the user's password in barely encoded text across the Web. A Bad Guy who intercepts the Internet traffic then knows the user's password. This is a Bad Thing. You can either use SSL (see Chapter 11) to encrypt the password or Digest Authentication. Digest authentication works this way: 1. The client requests a URL. 2. Because that URL is protected, the server replies with error 401, "Authentication required," and among the headers, it sends a nonce. 3. The client combines the user's password, the nonce, the method, and the URL, as described previously, then sends the result back to the server. The server does the same thing with the hash of the user's password retrieved from the password file and checks that its result matches.[3] A different nonce is sent the next time, so that the Bad Guy can't use the captured digest to gain access. MD5 digest authentication is implemented in Apache, using mod_auth_digest, for two reasons. First, it provides one of the two fully compliant reference HTTP 1.1 implementations required for the standard to advance down the standards track; second, it provides a test bed for browser implementations. It should only be used for experimental purposes, particularly since it makes no effort to check that the returned nonce is the same as the one it chose in the first place.[4] This makes it susceptible to a replay attack. The httpd.conf file is as follows: User webuser

    Group webgroup ServerName www.butterthlies.com ServerAdmin [email protected] DocumentRoot /usr/www/APACHE3/site.digest/htdocs/customers ErrorLog /usr/www/APACHE3/site.digest/logs/customers/error_log TransferLog /usr/www/APACHE3/site.digest/logs/customers/access_log ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin ServerAdmin [email protected] DocumentRoot /usr/www/APACHE3/site.digest/htdocs/salesmen ServerName sales.butterthlies.com ErrorLog /usr/www/APACHE3/site.digest/logs/salesmen/error_log TransferLog /usr/www/APACHE3/site.digest/logs/salesmen/access_log ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin AuthType Digest AuthName darkness AuthDigestFile /usr/www/APACHE3/ok_digest/sales require valid-user #require group cleaners

    Go to the Config file (see Chapter 1 ). If the line: Module digest_module mod_digest.o

    is commented out, uncomment it and remake Apache as described previously. Go to the Apache support directory, and type: % make htdigest % cp htdigest /usr/local/bin

    The command-line syntax for htdigest is: % htdigest [-c]passwordfile realm user

    Go to /usr/www/APACHE3 (or some other appropriate spot) and make the ok_digest directory and contents: % mkdir ok_digest % cd ok_digest

    % htdigest -c sales darkness bill Adding password for user bill in realm darkness. New password: theft Re-type new password: theft % htdigest sales darkness ben ... % htdigest sales darkness sonia ... % htdigest sales darkness daphne ...

    Digest authentication can, in principle, also use group authentication. In earlier editions we had to report that none of it seemed to work with the then available versions of MSIE or Netscape. However, Netscape v6.2.3 and MSIE 6.0.26 seemed happy enough, though we have not tested them thoroughly. Include the line: LogLevel debug

    in the Config file, and check the error log for entries such as the following: client used wrong authentication scheme: Basic for \

    Whether a webmaster used this facility might depend on whether he could control which browsers the clients used. 5.11.1 ContentDigest This directive enables the generation of Content-MD5 headers as defined in RFC1864 and RFC2068. ContentDigest on|off Default: ContentDigest off server config, virtual host, directory, .htaccess

    MD5, as described earlier in this chapter, is an algorithm for computing a "message digest" (sometimes called "fingerprint") of arbitrary-length data, with a high degree of confidence that any alterations in the data will be reflected in alterations in the message digest. The Content-MD5 header provides an end-to-end message integrity check (MIC) of the entity body. A proxy or client may check this header for detecting accidental modification of the entity body in transit. See the following example header: Content-MD5: AuLb7Dp1rqtRtxz2m9kRpA==

    Note that this can cause performance problems on your server since the message digest is computed on every request (the values are not cached). Content-MD5 is only sent for documents served by the core and not by any module. For example, SSI documents, output from CGI scripts, and byte-range responses do not have this header.

    5.12 Anonymous Access It sometimes happens that even though you have passwords controlling the access to certain things on your site, you also want to allow guests to come and sample the site's joys — probably a reduced set of joys, mediated by the username passed on by the client's browser. The Apache module mod_auth_anon.c allows you to do this. We have to say that the whole enterprise seems rather silly. If you want security at all on any part of your site, you need to use SSL. If you then want to make some of the material accessible to everyone, you can give them a different URL or a link from a reception page. However, it seems that some people want to do this to capture visitors' email addresses (using a long-standing convention for anonymous access), and if that is what you want, and if your users' browsers are configured to provide that information, then here's how. The module should be compiled in automatically — check by looking at Configuration or by running httpd -l. If it wasn't compiled in, you will probably get this unnerving error message: Invalid command Anonymous

    when you try to exercise the Anonymous directive. The Config file in ... /site.anon/conf/httpd.conf is as follows: User webuser Group webgroup ServerName www.butterthlies.com IdentityCheck on NameVirtualHost 192.168.123.2 ServerAdmin [email protected] DocumentRoot /usr/www/APACHE3/site.anon/htdocs/customers ServerName www.butterthlies.com ErrorLog /usr/www/APACHE3/site.anon/logs/customers/error_log TransferLog /usr/www/APACHE3/site.anon/logs/access_log ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin ServerAdmin [email protected] DocumentRoot /usr/www/APACHE3/site.anon/htdocs/salesmen ServerName sales.butterthlies.com ErrorLog /usr/www/APACHE3/site.anon/logs/error_log TransferLog /usr/www/APACHE3/site.anon/logs/salesmen/access_log ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin AuthType Basic AuthName darkness

    AuthUserFile /usr/www/APACHE3/ok_users/sales AuthGroupFile /usr/www/APACHE3/ok_users/groups require valid-user Anonymous guest anonymous air-head Anonymous_NoUserID on

    Run go and try accessing http://sales.butterthlies.com /. You should be asked for a password in the usual way. The difference is that now you can also get in by being guest, air-head , or anonymous. You may have to type something in the password field. The Anonymous directives follow. Anonymous Anonymous userid1 userid2 ...

    The user can log in as any user ID on the list, but must provide something in the password field unless that is switched off by another directive. Anonymous_NoUserID Anonymous_NoUserID [on|off] Default: off directory, .htaccess

    If on, users can leave the ID field blank but must put something in the password field. Anonymous_LogEmail Anonymous_LogEmail [on|off] Default: on directory, .htaccess

    If on, accesses are logged to ... /logs/httpd_log or to the log set by TransferLog. Anonymous_VerifyEmail Anonymous_VerifyEmail [on|off] Default: off directory, .htaccess

    The user ID must contain at least one "@" and one ".". Anonymous_Authoritative Anonymous_Authoritative [on|off] Default: off directory, .htaccess

    If this directive is on and the client fails anonymous authorization, she fails all authorization. If it is off, other authorization schemes will get a crack at her. Anonymous_MustGiveEmail Anonymous_MustGiveEmail [on|off] Default: on directory, .htaccess

    The user must give an email ID as a password.

    5.13 Experiments Run ./go. Exit from your browser on the client machine, and reload it to make sure it does password checking properly (you will probably need to do this every time you make a change throughout this exercise). If you access the salespeople's site again with the user ID guest, anonymous, or air-head and any password you like (fff or 23 or rubbish), you will get access. It seems rather silly, but you must give a password of some sort. Set: Anonymous_NoUserID on

    This time you can leave both the ID and password fields empty. If you enter a valid username (bill, ben, sonia, or gloria), you must follow through with a valid password. Set: Anonymous_NoUserID off Anonymous_VerifyEmail on Anonymous_LogEmail on

    The effect here is that the user ID has to look something like an email address, with (according to the documentation) at least one "@" and one ".". However, we found that one "." orone "@" would do. Email is logged in the error log, not the access log as you might expect.

    Set: Anonymous_VerifyEmail off Anonymous_LogEmail off Anonymous_Authoritative on

    The effect here is that if an access attempt fails, it is not now passed on to the other methods. Up to now we have always been able to enter as bill, password theft, but no more. Change the Anonymous section to look like this: Anonymous_Authoritative off Anonymous_MustGiveEmail on

    Finally: Anonymous guest anonymous air-head Anonymous_NoUserID off Anonymous_VerifyEmail off Anonymous_Authoritative off Anonymous_LogEmail on Anonymous_MustGiveEmail on

    The documentation says that Anonymous_MustGiveEmail forces the user to give some sort of password. In fact, it seems to have the same effect as VerifyEmail:. A "." or "@" will do. 5.13.1 Access.conf In the first edition of this book we said that if you wrote your httpd.conf file as shown earlier, but also created .../conf/access.conf containing directives as innocuous as:

    security in the salespeople's site would disappear. This bug seems to have been fixed in Apache v1.3.

    5.14 Automatic User Information This is all great fun, but we are trying to run a business here. Our salespeople are logging in because they want to place orders, and we ought to be able to detect who they are so we can send the goods to them automatically. This can be done by looking at the environment variable REMOTE_USER, which will be set to the current username. Just for the sake of completeness, we should note another directive here. 5.14.1 IdentityCheck

    The IdentityCheck directive causes the server to attempt to identify the client's user by querying the identd daemon of the client host. (See RFC 1413 for details, but the short explanation is that identd will, when given a socket number, reveal which user created that socket — that is, the username of the client on his home machine.) IdentityCheck [on|off]

    If successful, the user ID is logged in the access log. However, as the Apache manual austerely remarks, you should "not trust this information in any way except for rudimentary usage tracking." Furthermore (or perhaps, furtherless), this extra logging slows Apache down, and many machines do not run an identd daemon, or if they do, they prevent external access to it. Even if the client's machine is running identd, the information it provides is entirely under the control of the remote machine. Many providers find that it is not worth the trouble to use IdentityCheck.

    5.15 Using .htaccess Files We experimented with putting configuration directives in a file called ... /htdocs/.htaccess rather than in httpd.conf. It worked, but how do you decide whether to do things this way rather than the other? The point of the .htaccess mechanism is that you can change configuration directives without having to restart the server. This is especially valuable on a site where a lot of people maintain their own home pages but are not authorized to bring the server down or, indeed, to modify its Config files. The drawback to the .htaccess method is that the files are parsed for each access to the server, rather than just once at startup, so there is a substantial performance penalty. The httpd1.conf (from ... /site.htaccess) file contains the following: User webuser Group webgroup ServerName www.butterthlies.com AccessFileName .myaccess ServerAdmin [email protected] DocumentRoot /usr/www/APACHE3/site.htaccess/htdocs/salesmen ErrorLog /usr/www/APACHE3/site.htaccess/logs/error_log TransferLog /usr/www/APACHE3/site.htaccess/logs/access_log ServerName sales.butterthlies.com

    Access control, as specified by AccessFileName, is now in ... /htdocs/salesmen/.myaccess: AuthType Basic AuthName darkness AuthUserFile /usr/www/APACHE3/ok_users/sales AuthGroupFile /usr/www/APACHE3/ok_users/groups

    require group cleaners

    If you run the site with ./go 1 and access http://sales.butterthlies.com /, you are asked for an ID and a password in the usual way. You had better be daphne or sonia if you want to get in, because only members of the group cleaners are allowed. You can then edit ... /htdocs/salesmen/.myaccess to require group directors instead. Without reloading Apache, you now have to be bill or ben. 5.15.1 AccessFileName AccessFileName gives authority to the files specified. If a directory is given, authority is

    given to all files in it and its subdirectories. AccessFileName filename, filename|direcory and subdirectories ... Server config, virtual host

    Include the following line in httpd.conf: AccessFileName .myaccess1, myaccess2 ...

    Restart Apache (since the AccessFileName has to be read at startup). You might expect that you could limit AccessFileName to .myaccess in some particular directory, but not elsewhere. You can't — it is global (well, more global than per-directory). Try editing ... /conf/httpd.conf to read: AccessFileName .myaccess

    Apache complains: Syntax error on line 2 of /usr/www/APACHE3/conf/srm.conf: AccessFileName not allowed here

    As we have said, this file is found and parsed on each access, and this takes time. When a client requests access to a file /usr/www/APACHE3/site.htaccess/htdocs/salesmen/index.html, Apache searches for the following: • • • • • •

    /.myaccess /usr/.myaccess /usr/www/APACHE3/.myaccess /usr/www/APACHE3/site.htaccess/.myaccess /usr/www/APACHE3/site.htaccess/htdocs/.myaccess /usr/www/APACHE3/site.htaccess/htdocs/salesmen/.myaccess

    This multiple search also slows business down. You can turn multiple searching off, making a noticeable difference to Apache's speed, with the following directive: AllowOverride none

    It is important to understand that / means the real, root directory (because that is where Apache starts searching) and not the server's document root.

    5.16 Overrides We can do more with overrides than speed up Apache. This mechanism allows the webmaster to exert finer control over what is done in .htaccess files. The key directive is AllowOverride. 5.16.1 AllowOverride This directive tells Apache which directives in an .htaccess file can override earlier directives. AllowOverride override1 override2 ... Directory

    The list of AllowOverride overrides is as follows: AuthConfig Allows individual settings of AuthDBMGroupFile, AuthDBMUserFile, AuthGroupFile, AuthName, AuthType, AuthUserFile, and require FileInfo Allows AddType, AddEncoding, AddLanguage, AddCharset, AddHandler, RemoveHandler, LanguagePriority, ErrorDocument, DefaultType, Action, Redirect, RedirectMatch, RedirectTemp, RedirectPermanent, PassEnv, SetEnv, UnsetEnv, Header, RewriteEnging, RewriteOptions, RewriteBase, RewriteCond, RewriteRule, CookieTracking, and Cookiename Indexes Allows FancyIndexing, AddIcon, AddDescription (see Chapter 7) Limit Can limit access based on hostname or IP number

    Options Allows the use of the Options directive (see Chapter 13) All All of the previous None None of the previous You might ask: if none switches multiple searches off, which of these options switches it on? The answer is any of them, or the complete absence of AllowOverride. In other words, it is on by default. To illustrate how this works, look at .../site.htaccess/httpd3.conf, which is httpd2.conf with the authentication directives on the salespeople's directory back in again. The Config filewants cleaners; the .myaccess file wants directors. If we now put the authorization directives, favoring cleaners, back into the Config file: User webuser Group webgroup ServerName www.butterthlies.com AccessFileName .myaccess ServerAdmin [email protected] DocumentRoot /usr/www/APACHE3/site.htaccess/htdocs/salesmen ErrorLog /usr/www/APACHE3/site.htaccess/logs/error_log TransferLog /usr/www/APACHE3/site.htaccess/logs/access_log ServerName sales.butterthlies.com #AllowOverride None AuthType Basic AuthName darkness AuthUserFile /usr/www/APACHE3/ok_users/sales AuthGroupFile /usr/www/APACHE3/ok_users/groups require group cleaners

    and restart Apache, we find that we have to be a director (Bill or Ben). But, if we edit the Config file and uncomment the line: ... AllowOverride None ...

    we find that we have turned off the .htaccess method and that cleaners are back in fashion. In real life, the webmaster might impose a general policy of access control with this:

    .. AllowOverride AuthConfig ... require valid-user ...

    The owners of the various pages could then limit their visitors further with this: require group directors

    See .../site.htaccess/httpd4.conf. As can be seen, AllowOverride makes it possible for individual directories to be precisely tailored. [1] Note that this version of the file is produced by FreeBSD, so it doesn't use the oldstyle DES version of the crypt( ) function — instead, it uses one based on MD5, so the password strings may look a little peculiar to you. Different operating environments may produce different results, but each should work in its own environment. [2] This is a method in which the Bad Guy simply monitors the Good Guy's session and reuses the headers for her own access. If there were no nonce, this would work every time! [3] Which is why MD5 is applied to the password, as well as to the whole thing: the server then doesn't have to store the actual password, just a digest of it. [4] It is unfortunate that the nonce must be returned as part of the client's digest authentication header, but since HTTP is a stateless protocol, there is little alternative. It is even more unfortunate that Apache simply believes it! An obvious way to protect against this is to include the time somewhere in the nonce and to refuse nonces older than some threshold.

    CONTENTS

    Chapter 6. Content Description and Modification • • • • • •

    6.1 MIME Types 6.2 Content Negotiation 6.3 Language Negotiation 6.4 Type Maps 6.5 Browsers and HTTP 1.1 6.6 Filters

    Apache has the ability to tune the information it returns to the abilities of the client — and even to improve the client's efforts. Currently, this affects: •

    • • •

    The choice of MIME type returned. An image might be the very old-fashioned bitmap, the old-fashioned .gif, the more modern and smaller .jpg, or the extremely up-to-date .png. Once the type is indicated, Apache's reactions can be extended and controlled with a number of directives. The language of the returned file. Updates to the returned file. The spelling of the client's requests.

    Apache v2 also offers a new mechanism — Section 6.6, which is described at the end of this chapter.

    6.1 MIME Types MIME stands for Multipurpose Internet Mail Extensions, a standard developed by the Internet Engineering Task Force for email but then repurposed for the Web. Apache uses mod_mime.c, compiled in by default, to determine the type of a file from its extension. MIME types are more sophisticated than file extensions, providing a category (like "text," "image," or "application"), as well as a more specific identifier within that category. In addition to specifying the type of the file, MIME permits the specification of additional information, like the encoding used to represent characters. The "type" of a file that is sent is indicated by a header near the beginning of the data. For instance: content-type: text/html

    indicates that what follows is to be treated as HTML, though it may also be treated as text. If the type were "image/jpg", the browser would need to use a completely different bit of code to render the data.

    This header is inserted automatically by Apache[1] based on the MIME type and is absorbed by the browser so you do not see it if you right-click in a browser window and select "View Source" (MSIE) or similar. Notwithstanding, it is an essential element of a web page. The list of MIME types that Apache already knows about is distributed in the file ..conf/mime.types or can be found at http://www.isi.edu/innotes/iana/assignments/media-types/media-types. You can edit it to include extra types, or you can use the directives discussed in this chapter. The default location for the file is ...//conf, but it may be more convenient to keep it elsewhere, in which case you would use the directive TypesConfig. Changing the encoding of a file with one of these directives does not change the value of the Last-Modified header, so cached copies with the old label may linger after you make such changes. (Servers often send a Last-Modified header containing the date and time the content of was last changed, so that the browser can use cached material at the other end if it is still fresh.) Files can have more than one extension, and their order normally doesn't matter. If the extension .itl maps onto Italian and .html maps onto HTML, then the files text.itl.html and text.html.itl will be treated alike. However, any unrecognized extension, say .xyz, wipes out all extensions to its left. Hence text.itl.xyz.html will be treated as HTML but not as Italian. TypesConfig TypesConfig filename Default: conf/mime.types

    The TypesConfig directive sets the location of the MIME types configuration file. filename is relative to the ServerRoot. This file sets the default list of mappings from filename extensions to content types; changing this file is not recommended unless you know what you are doing. Use the AddType directive instead. The file contains lines in the format of the arguments to an AddType command: MIME-type extension extension ...

    The extensions are lowercased. Blank lines and lines beginning with a hash character (#) are ignored. AddType Syntax: AddType MIME-type extension [extension] ... Context: Server config, virtual host, directory, .htaccess Override: FileInfo Status: Base Module: mod_mime

    The AddType directive maps the given filename extensions onto the specified content type. MIME-type is the MIME type to use for filenames containing extensions. This mapping is added to any already in force, overriding any mappings that already exist for the same extension. This directive can be used to add mappings not listed in the MIME types file (see the TypesConfig directive). For example: AddType image/gif .gif

    It is recommended that new MIME types be added using the AddType directive rather than changing the TypesConfig file. Note that, unlike the NCSA httpd, this directive cannot be used to set the type of particular files. The extension argument is case insensitive and can be specified with or without a leading dot. DefaultType DefaultType mime-type Anywhere

    The server must inform the client of the content type of the document, so in the event of an unknown type, it uses whatever is specified by the DefaultType directive. For example: DefaultType image/gif

    would be appropriate for a directory that contained many GIF images with file-names missing the .gif extension. Note that this is only used for files that would otherwise not have a type. ForceType ForceType media-type directory, .htaccess

    Given a directory full of files of a particular type, ForceType will cause them to be sent as media-type. For instance, you might have a collection of .gif files in the directory .../gifdir, but you have given them the extension .gf2 for reasons of your own. You could include something like this in your Config file: ForceType image/gif



    You should be cautious in using this directive, as it may have unexpected results. This directive always overrides any MIME type that the file might usually have because of its extension — so even .html files in this directory, for example, would be served as image/gif. RemoveType RemoveType extension [extension] ... directory, .htaccess RemoveType is only available in Apache 1.3.13 and later.

    The RemoveType directive removes any MIME type associations for files with the given extensions. This allows .htaccess files in subdirectories to undo any associations inherited from parent directories or the server config files. An example of its use is to have the following in /foo/.htaccess: RemoveType .cgi

    This will remove any special handling of .cgi files in the /foo/ directory and any beneath it, causing the files to be treated as the default type. RemoveType directives are processed after any AddType directives,

    so it is possible that they may undo the effects of the latter if both occur within the same directory configuration. The extension argument is case insensitive and can be specified with or without a leading dot. AddEncoding AddEncoding mime-enc extension extension Anywhere

    The AddEncoding directive maps the given filename extensions to the specified encoding type. mime-enc is the MIME encoding to use for documents containing the extension. This mapping is added to any already in force, overriding any mappings that already exist for the same extension. For example: AddEncoding x-gzip .gz AddEncoding x-compress .Z

    This will cause filenames containing the .gz extension to be marked as encoded using the x-gzip encoding and filenames containing the .Z extension to be marked as encoded with x-compress. Older clients expect x-gzip and x-compress; however, the standard dictates that they're equivalent to gzip and compress, respectively. Apache does content-encoding comparisons by ignoring any leading x-. When responding with an encoding, Apache will use whatever form (i.e., x-foo or foo) the client requested. If the client didn't specifically request a particular form, Apache will use the form given by the AddEncoding directive. To make this long story short, you should always use x-gzip and x-compress for these two specific encodings. More recent encodings, such as deflate, should be specified without the x-. The extension argument is case insensitive and can be specified with or without a leading dot. RemoveEncoding RemoveEncoding extension [extension] ... directory, .htaccess RemoveEncoding is only available in Apache 1.3.13 and later.

    The RemoveEncoding directive removes any encoding associations for files with the given extensions. This allows .htaccess files in subdirectories to undo any associations inherited from parent directories or the server config files. An example of its use might be: /foo/.htaccess: AddEncoding x-gzip .gz AddType text/plain .asc RemoveEncoding .gz

    This will cause foo.gz to be marked as being encoded with the gzip method, but foo.gz.asc as an unencoded plain-text file. This might, for example, be a hash of the binary file to prevent illicit alteration. Note that RemoveEncoding directives are processed after any AddEncoding directives, so it is possible they may undo the effects of the latter if both occur within the same directory configuration. The extension argument is case insensitive and can be specified with or without a leading dot. AddDefaultCharset

    AddDefaultCharset On|Off|charset AddDefaultCharset is only available in Apache 1.3.12 and later.

    This directive specifies the name of the character set that will be added to any response that does not have any parameter on the content type in the HTTP headers. This will override any character set specified in the body of the document via a META tag. A setting of AddDefaultCharset Off disables this functionality. AddDefaultCharset On enables Apache's internal default charset of iso-8859-1 as required by the directive. You can also specify an alternate charset to be used; e.g. AddDefaultCharset utf-8. The use of AddDefaultCharset is an important part of the prevention of Cross-Site Scripting (XSS) attacks. For more on XSS, refer to http://www.idefense.com/XSS.html. AddCharset AddCharset charset extension [extension] ... Server config, virtual host, directory, .htaccess AddCharset is only available in Apache 1.3.10 and later.

    The AddCharset directive maps the given filename extensions to the specified content charset. charset is the MIME charset parameter of filenames containing the extension. This mapping is added to any already in force, overriding any mappings that already exist for the same extension. For example: AddLanguage ja .ja AddCharset EUC-JP .euc AddCharset ISO-2022-JP .jis AddCharset SHIFT_JIS .sjis

    Then the document xxxx.ja.jis will be treated as being a Japanese document whose charset is ISO-2022-JP (as will the document xxxx.jis.ja). The AddCharset directive is useful both to inform the client about the character encoding of the document so that the document can be interpreted and displayed appropriately, and for content negotiation, where the server returns one from several documents based on the client's charset preference. The extension argument is case insensitive and can be specified with or without a leading dot. RemoveCharset Directive

    RemoveCharset extension [extension] directory, .htaccess RemoveCharset is only available in Apache 2.0.24 and later.

    The RemoveCharset directive removes any character-set associations for files with the given extensions. This allows .htaccess files in subdirectories to undo any associations inherited from parent directories or the server config files. The extension argument is case insensitive and can be specified with or without a leading dot. The corresponding directives follow: AddHandler AddHandler handler-name extension1 extension2 ... Server config, virtual host, directory, .htaccess The AddHandler directive wakes up an existing handler and maps the filename(s) extension1, etc., to handler-name. You might specify the following in your Config file: AddHandler cgi-script cgi bzq

    From then on, any file with the extension .cgi or .bzq would be treated as an executable CGI script. SetHandler SetHandler handler-name directory, .htaccess, location

    This does the same thing as AddHandler, but applies the transformation specified by handler-name to all files in the , , or section in which it is placed or in the .htaccess directory. For instance, in Chapter 10, we write: order deny,allow allow from 192.168.123.1 deny from all SetHandler server-status

    RemoveHandler Directive

    RemoveHandler extension [extension] ... directory, .htaccess RemoveHandler is only available in Apache 1.3.4 and later.

    The RemoveHandler directive removes any handler associations for files with the given extensions. This allows .htaccess files in subdirectories to undo any associations inherited from parent directories or the server config files. An example of its use might be: /foo/.htaccess: AddHandler server-parsed .html /foo/bar/.htaccess: RemoveHandler .html

    This has the effect of returning .html files in the /foo/bar directory to being treated as normal files, rather than as candidates for parsing (see the mod_include module). The extension argument is case insensitive and can be specified with or without a leading dot. AcceptFilter AcceptFilter on|off Default: AcceptFilter on server config Compatibility: AcceptFilter is available in Apache 1.3.22 and later

    AcceptFilter controls a BSD-specific filter optimization. It is compiled in by default — and switched on by default if your system supports it (setsocketopt( ) option SO_ACCEPTFILTER). Currently, only FreeBSD supports this.

    See http://httpd.apache.org/docs/misc/perf-bsd44.html for more information.

    The compile time flag AP_ACCEPTFILTER_OFF can be used to change the default to off. httpd -V and httpd -L will show compile-time defaults and whether or not SO_ACCEPTFILTER was defined during the compile.

    6.2 Content Negotiation

    There may be different ways to handle the data that Apache returns, and there are two equivalent ways of implementing this functionality. The multiviews method is simpler (and more limited) than the *.var method, so we shall start with it. The Config file (from ... /site.multiview) looks like this: User webuser Group webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/site.multiview/htdocs ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin AddLanguage it .it AddLanguage en .en AddLanguage ko .ko LanguagePriority it en ko Options + MultiViews

    For historical reasons, you have to say: Options +MultiViews

    even though you might reasonably think that Options All would cover the case. The general idea is that whenever you want to offer variations of a file (e.g., JPG, GIF, or bitmap for images, or different languages for text), multiviews will handle it. Apache v2 offers a relevant directive. 6.2.1 MultiviewsMatch MultiviewsMatch permits three different behaviors for mod_negotiation's Multiviews

    feature. MultiviewsMatch [NegotiatedOnly] [Handlers] [Filters] [Any] server config, virtual host, directory, .htaccess Compatibility: only available in Apache 2.0.26 and later.

    Multiviews allows a request for a file, e.g., index.html, to match any negotiated extensions following the base request, e.g., index.html.en, index.html.fr, or index.html.gz. The NegotiatedOnly option provides that every extension following the base name must correlate to a recognized mod_mime extension for content negotiation, e.g., Charset, Content-Type, Language, or Encoding. This is the strictest implementation with the fewest unexpected side effects, and it's the default behavior. To include extensions associated with Handlers and/or Filters, set the MultiviewsMatch directive to either Handlers, Filters, or both option keywords. If all other factors are equal, the smallest file will be served, e.g., in deciding between index.html.cgi of 500

    characters and index.html.pl of 1,000 bytes, the .cgi file would win in this example. Users of .asis files might prefer to use the Handler option, if .asis files are associated with the asis-handler. You may finally allow Any extensions to match, even if mod_mime doesn't recognize the extension. This was the behavior in Apache 1.3 and can cause unpredictable results, such as serving .old or .bak files that the webmaster never expected to be served. 6.2.2 Image Negotiation Image negotiation is a special corner of general content negotiation because the Web has a variety of image files with different levels of support: for instance, some browsers can cope with PNG files and some can't, and the latter have to be sent the simpler, more oldfashioned, and bulkier GIF files. The client's browser sends a message to the server telling it which image files it accepts: HTTP_ACCEPT=image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*

    Browsers almost always lie about the content types they accept or prefer, so this may not be all that reliable. In theory, however, the server uses this information to guide its search for an appropriate file, and then it returns it. We can demonstrate the effect by editing our ... /htdocs/catalog_summer.html file to remove the .jpg extensions on the image files. The appropriate lines now look like this: ... ... ...

    When Apache has the Multiviews option turned on and is asked for an image called bench, it looks for the smaller of bench.jpg and bench.gif — assuming the client's browser accepts both — and returns it. Apache v2 introduces a new directive, which is related to the Filter mechanism (see later in this chapter, Section 6.6).

    6.3 Language Negotiation The same useful functionality also applies to language. To demonstrate this, we need to make up .html scripts in different languages. Well, we won't bother with actual different languages; we'll just edit the scripts to say, for example:

    Italian Version



    and edit the English version so that it includes a new line:

    English Version



    Then we give each file an appropriate extension: • • •

    index.html.en for English index.html.it for Italian index.html.ko for Korean

    Apache recognizes language variants: en-US is seen as a version of general English, en, which seems reasonable. You can also offer documents that serve more than one language. If you had a "franglais" version, you could serve it to both English speakers and Francophones by naming it frangdoc.en.fr. Of course, in real life you would have to go to substantially more trouble, what with translators and special keyboards and all. Also, the Italian version of the index would need to point to Italian versions of the catalogs. But in the fantasy world of Butterthlies, Inc., it's all so simple. The Italian version of our index would be index.html.it. By default, Apache looks for a file called index.html.. If it has a language extension, like index.html.it, it will find the index file, happily add the language extension, and then serve up what the browser prefers. If, however, you call the index file index.it.html, Apache will still look for, and fail to find, index.html.. If index.html.en is present, that will be served up. If index.en.html is there, then Apache gives up and serves up a list of all the files. The moral is, if you want to deal with index filenames in either order — index.it.html alongside index.html.en — you need the directive: DirectoryIndex index

    to make Apache look for a file called index. rather than the default index.html.. To give Apache the idea, we need the corresponding lines in the httpd1.conf file: AddLanguage it .it AddLanguage en .en AddLanguage ko .ko

    Now our browser behaves in a rather civilized way. If you run ./go 1 on the server, go to the client machine, and go to Edit Preferences Languages (in Netscape 4) or Tools Internet Options Languages (MSIE) or wherever the language settings for your browser are kept, and set Italian to be first, you see the Italian version of the index. If you change to English and reload, you get the English version. It you then go to catalog_summer, you see the pictures even though we didn't strictly specify the filenames. In a small way...magic! Apache controls language selection if the browser doesn't. If you turn language preference off in your browser, edit the Config file (httpd2.conf ) to insert the line: LanguagePriority it en ko

    stop Apache and restart with ./go 2, the browser will get Italian. LanguagePriority LanguagePriority MIME-lang MIME-lang... Server config, virtual host, directory, .htaccess

    The LanguagePriority directive sets the precedence of language variants for the case in which the client does not express a preference when handling a multiviews request. The MIME-lang list is in order of decreasing preference. For example: LanguagePriority en fr de

    For a request for foo.html, where foo.html.fr and foo.html.de both exist but the browser did not express a language preference, foo.html.fr would be returned. Note that this directive only has an effect if a "best" language cannot be determined by any other means. It will not work if there is a DefaultLanguage defined. Correctly implemented HTTP 1.1 requests will mean that this directive has no effect. How does this all work? You can look ahead to the environment variables in Chapter 16. Among them were the following: ... HTTP_ACCEPT=image/gif,image/x-bitmap,image/jpeg,image/pjpeg,*/* ... HTTP_ACCEPT_LANGUAGE=it ...

    Apache uses this information to work out what it can acceptably send back from the choices at its disposal. AddLanguage AddLanguage MIME-lang extension [extension] ... Server config, virtual host, directory, .htaccess

    The AddLanguage directive maps the given filename extension to the specified content language. MIME-lang is the MIME language of filenames containing extensions. This mapping is added to any already in force, overriding any mappings that already exist for the same extension. For example: AddEncoding x-compress .Z AddLanguage en .en AddLanguage fr .fr

    Then the document xxxx.en.Z will be treated as a compressed English document (as will the document xxxx.Z.en). Although the content language is reported to the client, the browser is unlikely to use this information. The AddLanguage directive is more useful for content negotiation, where the server returns one from several documents based on the client's language preference. If multiple language assignments are made for the same extension, the last one encountered is the one that is used. That is, for the case of: AddLanguage en .en AddLanguage en-uk .en AddLanguage en-us .en

    documents with the extension .en would be treated as being en-us. The extension argument is case insensitive and can be specified with or without a leading dot. DefaultLanguage DefaultLanguage MIME-lang Server config, virtual host, directory, .htaccess DefaultLanguage is only available in Apache 1.3.4 and later.

    The DefaultLanguage directive tells Apache that all files in the directive's scope (e.g., all files covered by the current container) that don't have an explicit language extension (such as .fr or .de as configured by AddLanguage) should be considered to be in the specified MIME-lang language. This allows entire directories to be marked as containing Dutch content, for instance, without having to rename each file. Note that unlike using extensions to specify languages, DefaultLanguage can only specify a single language. If no DefaultLanguage directive is in force and a file does not have any language extensions as configured by AddLanguage, then that file will be considered to have no language attribute. RemoveLanguage RemoveLanguage extension [extension] ... directory, .htaccess RemoveLanguage is only available in Apache 2.0.24 and later.

    The RemoveLanguage directive removes any language associations for files with the given extensions. This allows .htaccess files in subdirectories to undo any associations inherited from parent directories or the server config files.

    The extension argument is case insensitive and can be specified with or without a leading dot.

    6.4 Type Maps In the last section, we looked at multiviews as a way of providing language and image negotiation. The other way to achieve the same effects in the current release of Apache, as well as more lavish effects later (probably to negotiate browser plug-ins), is to use type maps, also known as *.var files. Multiviews works by scrambling together a plain vanilla type map; now you have the chance to set it up just as you want it. The Config file in .../site.typemap/conf/httpd1.conf is as follows: User webuser Group webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/site.typemap/htdocs AddHandler type-map var DirectoryIndex index.var

    One should write, as seen in this file: AddHandler type-map var

    Having set that, we can sensibly say: DirectoryIndex index.var

    to set up a set of language-specific indexes. What this means, in plainer English, is that the DirectoryIndex line overrides the default index file index.html. If you also want index.html to be used as an alternative, you would have to specify it — but you probably don't, because you are trying to do something more elaborate here. In this case there are several versions of the index — index.en.html, index.it.html, and index.ko.html — so Apache looks for index.var for an explanation. Look at ... /site.typemap/htdocs. We want to offer language-specific versions of the index.html file and alternatives to the generalized images bath, hen, tree, and bench, so we create two files, index.var and bench.var (we will only bother with one of the images, since the others are the same). This is index.var : # It seems that this URI _must_ be the filename minus the extension... URI: index; vary="language" URI: index.en.html # Seems we _must_ have the Content-type or it doesn't work... Content-type: text/html

    Content-language: en URI: index.it.html Content-type: text/html Content-language: it

    This is bench.var : URI: bench; vary="type" URI: bench.jpg Content-type: image/jpeg; qs=0.8 level=3 URI: bench.gif Content-type: image/gif; qs=0.5 level=1

    The first line tells Apache what file is in question, here index.* or bench.* ; vary tells Apache what sort of variation we have. These are the possibilities: • • • •

    type language charset encoding

    The name of the corresponding header, as defined in the HTTP specification, is obtained by prefixing these names with Content-. These are the headers: • • • •

    content-type content-language content-charset content-encoding

    The qs numbers are quality scores, from 0 to 1. You decide what they are and write them in. The qs values for each type of return are multiplied to give the overall qs for each variant. For instance, if a variant has a qs of .5 for Content-type and a qs of .7 for Content-language, its overall qs is .35. The higher the result, the better. The level values are also numbers, and you decide what they are. In order for Apache to decide rationally which possibility to return, it resolves ties in the following way: 1. Find the best (highest) qs. 2. If there's a tie, count the occurrences of "*" in the type and choose the one with the lowest value (i.e., the one with the least wildcarding). 3. If there's still a tie, choose the type with the highest language priority. 4. If there's still a tie, choose the type with the highest level number. 5. If there's still a tie, choose the highest content length. If you can predict the outcome of all this in your head, you must qualify for some pretty classy award! Following is the full list of possible directives, given in the Apache documentation:

    URI: uri [; vary= variations] URI of the file containing the variant (of the given media type, encoded with the given content encoding). These are interpreted as URLs relative to the map file; they must be on the same server (!), and they must refer to files to which the client would be granted access if the files were requested directly. Content-type: media_type [; qs= quality [level= level]] Often referred to as MIME types; typical media types are image/gif, text/plain, or text/html. Content-language: language The language of the variant, specified as an ISO 3166 standard language code (e.g., en for English, ko for Korean). Content-encoding: encoding If the file is compressed or otherwise encoded, rather than containing the actual raw data, indicates how compression was done. For compressed files (the only case where this generally comes up), content encoding should be x-compress or gzip or deflate, as appropriate. Content-length: length The size of the file. The size of the file is used by Apache to decide which file to send; specifying a content length in the map allows the server to compare the length without checking the actual file. To throw this into action, start Apache with ./go 1, set the language of your browser to Italian (in Netscape, choose Edit Preferences Netscape Languages), and access http://www.butterthlies.com /. You should see the Italian version. MSIE seems to provide less support for some languages, including Italian. You just get the English version. When you look at Catalog-summer.html, you see only the Bench image (and that labeled as "indirect") because we did not create var files for the other images.

    6.5 Browsers and HTTP 1.1 Like any other human creation, the Web fills up with rubbish. The webmaster cannot assume that all clients will be using up-to-date browsers — all the old, useless versions are out there waiting to make a mess of your best-laid plans. In 1996, the weekly Internet magazine devoted to Apache affairs, Apache Week (Issue 25), had this to say about the impact of the then-upcoming HTTP 1.1:

    For negotiation to work, browsers must send the correct request information. For human languages, browsers should let the user pick what language or languages they are interested in. Recent beta versions of Netscape let the user select one or more languages (see the Netscape Options, General Preferences, Languages section). For content-types, the browser should send a list of types it can accept. For example, "text/html, text/plain, image/jpeg, image/gif." Most browsers also add the catch-all type of "*/*" to indicate that they can accept any content type. The server treats this entry with lower priority than a direct match. Unfortunately, the */* type is sometimes used instead of listing explicitly acceptable types. For example, if the Adobe Acrobat Reader plug-in is installed into Netscape, Netscape should add application/pdf to its acceptable content types. This would let the server transparently send the most appropriate content type (PDF files to suitable browsers, else HTML). Netscape does not send the content types it can accept, instead relying on the */* catch-all. This makes transparent content-negotiation impossible. Although time has passed, the situation has probably not changed very much. In addition, most browsers do not indicate a preference for particular types. This should be done by adding a preference factor (q) to the content type. For example, a browser that accepts Acrobat files might prefer them to HTML, so it could send an accept-type list that includes: content-type: text/html: q=0.7, application/pdf: q=0.8

    When the server handles the request, it combines this information with its source quality information (if any) to pick the "best" content type to return.

    6.6 Filters Apache v2 introduced a new mechanism called a "Filter", together with a reworking of Multiviews. The documentation says: A filter is a process which is applied to data that is sent or received by the server. Data sent by clients to the server is processed by input filters while data sent by the server to the client is processed by output filters. Multiple filters can be applied to the data, and the order of the filters can be explicitly specified. Filters are used internally by Apache to perform functions such as chunking and byterange request handling. In addition, modules can provide filters which are selectable using run-time configuration directives. The set of filters which apply to data can be manipulated with the SetInputFilter and SetOutputFilter directives. The only configurable filter currently included with the Apache distribution is the INCLUDES filter which is provided by mod_include to process output for Server Side

    Includes. There is also an experimental module called mod_ext_filter which allows for external programs to be defined as filters. There is a demonstration filter that changes text to uppercase. In .../site.filter/htdocs we have two files, 1.txt and 1.html, which have the same contents: HULLO WORLD FROM site.filter

    The Config file is as follows: User webuser Group webgroup Listen 80 ServerName my586 AddOutputFilter CaseFilter html DocumentRoot /usr/www/APACHE3/site.filter/htdocs

    If we visit the site, we are offered a directory. If we choose 1.txt, we see the contents as shown earlier. If we choose 1.html, we find it has been through the filter and is now all uppercase: HULLO WORLD FROM SITE.FILTER

    The Directives are as follows: AddInputFilter AddInputFilter filter[;filter...] extension [extension ...] directory, files, location, .htaccess AddInputFilter is only available in Apache 2.0.26 and later. AddInputFilter maps the filename extensions extension to the filter or filters that will

    process client requests and POST input when they are received by the server. This is in addition to any filters defined elsewhere, including the SetInputFilter directive. This mapping is merged over any already in force, overriding any mappings that already exist for the same extension. If more than one filter is specified, they must be separated by semicolons in the order in which they should process the content. Both the filter and extension arguments are case insensitive, and the extension may be specified with or without a leading dot. AddOutputFilter

    AddOutputFilter filter[;filter...] extension [extension ...] directory, files, location, .htaccess AddOutputFilter is only available in Apache 2.0.26 and later.

    The AddOutputFilter directive maps the filename extensions extension to the filters that will process responses from the server before they are sent to the client. This is in addition to any filters defined elsewhere, including the SetOutputFilter directive. This mapping is merged over any already in force, overriding any mappings that already exist for the same extension. For example, the following configuration will process all .shtml files for server-side includes. AddOutputFilter INCLUDES shtml

    If more than one filter is specified, they must be separated by semicolons in the order in which they should process the content. Both the filter and extension arguments are case insensitive, and the extension may be specified with or without a leading dot. SetInputFilter SetInputFilter filter[;filter...] Server config, virtual host, directory, .htaccess

    The SetInputFilter directive sets the filter or filters that will process client requests and POST input when they are received by the server. This is in addition to any filters defined elsewhere, including the AddInputFilter directive. If more than one filter is specified, they must be separated by semicolons in the order in which they should process the content. SetOutputFilter SetOutputFilter filter [filter] ... Server config, virtual host, directory, .htaccess

    The SetOutputFilter directive sets the filters that will process responses from the server before they are sent to the client. This is in addition to any filters defined elsewhere, including the AddOutputFilter directive. For example, the following configuration will process all files in the /www/data/ directory for server-side includes: SetOutputFilter INCLUDES



    If more than one filter is specified, they must be separated by semicolons in the order in which they should process the content. RemoveInputFilter RemoveInputFilter extension [extension] ... directory, .htaccess RemoveInputFilter is only available in Apache 2.0.26 and later.

    The RemoveInputFilter directive removes any input filter associations for files with the given extensions. This allows .htaccess files in subdirectories to undo any associations inherited from parent directories or the server config files. The extension argument is case insensitive and can be specified with or without a leading dot. RemoveOutputFilter RemoveOutputFilter extension [extension] ... directory, .htaccess RemoveOutputFilter is only available in Apache 2.0.26 and later.

    The RemoveOutputFilter directive removes any output filter associations for files with the given extensions. This allows .htaccess files in subdirectories to undo any associations inherited from parent directories or the server config files. The extension argument is case insensitive and can be specified with or without a leading dot. [1] If you are constructing HTML pages on the fly from CGI scripts, you have to insert it explicitly. See Chapter 14 for additional detail.

    For Apache 1.3.3 and Later Apache 1.3.3 introduced some significant changes in the handling of IndexOptions directives. In particular: •





    Multiple IndexOptions directives for a single directory are now merged together. The result of the previous example will now be the equivalent of IndexOptions FancyIndexing ScanHTMLTitles. The addition of the incremental syntax (i.e., prefixing keywords with + or -). Whenever a + or - prefixed keyword is encountered, it is applied to the current IndexOptions settings (which may have been inherited from an upper-level directory). However, whenever an unprefixed keyword is processed, it clears all inherited options and any incremental settings encountered so far. Consider the following example: IndexOptions +ScanHTMLTitles -IconsAreLinks FancyIndexing IndexOptions +SuppressSize

    The net effect is equivalent to IndexOptions FancyIndexing +SuppressSize, because the unprefixed FancyIndexing discarded the incremental keywords before it, but allowed them to start accumulating again afterward. To set the IndexOptions unconditionally for a particular directory — clearing the inherited settings — specify keywords without either + or - prefixes. IndexOrderDefault IndexOrderDefault Ascending|Descending Name|Date|Size|Description Server config, virtual host, directory, .htaccess IndexOrderDefault is only available in Apache 1.3.4 and later.

    The IndexOrderDefault directive is used in combination with the FancyIndexing index option. By default, FancyIndexed directory listings are displayed in ascending order by filename; IndexOrderDefault allows you to change this initial display order. IndexOrderDefault takes two arguments. The first must be either Ascending or Descending, indicating the direction of the sort. The second argument must be one of the keywords Name, Date, Size, or Description and identifies the primary key. The

    secondary key is always the ascending filename. You can force a directory listing to be displayed only in a particular order by combining this directive with the SuppressColumnSorting index option; this will prevent the client from requesting the directory listing in a different order.

    ReadmeName ReadmeName filename Server config, virtual host, directory, .htaccess Some features only available after 1.3.6; see text

    The ReadmeName directive sets the name of the file that will be appended to the end of the index listing. filename is the name of the file to include and is taken to be relative to the location being indexed. The filename argument is treated as a stub filename in Apache 1.3.6 and earlier, and as a relative URI in later versions. Details of how it is handled may be found under the description of the HeaderName directive, which uses the same mechanism and changed at the same time as ReadmeName. See also HeaderName. FancyIndexing FancyIndexing on_or_off Server config, virtual host, directory, .htaccess FancyIndexing turns fancy indexing on. The user can click on a column title to sort the

    entries by value. Clicking again will reverse the sort. Sorting can be turned off with the SuppressColumnSorting keyword for IndexOptions (see earlier in this chapter). See also the FancyIndexing option for IndexOptions.

    IndexIgnore IndexIgnore file1 file2 ... Server config, virtual host, directory, .htaccess

    We can specify a description for individual files or for a list of them. We can exclude files from the listing with IndexIgnore. IndexIgnore is followed by a list of files or wildcards to describe files. As we see in the following example, multiple IndexIgnores add to the list rather than replacing each other. By default, the list includes ".".

    You might well want to ignore .ht* files so that the Bad Guys can't look at the actual .htaccess files. Here we want to ignore the *.jpg files (which are not much use without the .html files that display them and explain what they show) and the parent directory, known to Unix and to Win32 as "..":

    ... FancyIndexing on AddDescription "One of our wonderful catalogs" catalog_autumn.html catalog_summer.html IndexIgnore *.jpg ..

    You might want to use IndexIgnore for security reasons as well: what the eye doesn't see, the mouse finger can't steal.[1] You can put in extra IndexIgnore lines, and the effects are cumulative, so we could just as well write: FancyIndexing on AddDescription "One of our wonderful catalogs" catalog_autumn.html catalog_summer.html IndexIgnore *.jpg IndexIgnore ..

    AddIcon AddIcon icon_name name Server config, virtual host, directory, .htaccess

    We can add visual sparkle to our page by giving icons to the files with the AddIcon directive. Apache has more icons than you can shake a stick at in its ... /icons directory. Without spending some time exploring, one doesn't know precisely what each one looks like, but bomb.gif will do for an example. The icons directory needs to be specified relative to the DocumentRoot directory, so we have made a subdirectory ... /htdocs/icons and copied bomb.gif into it. We can attach the bomb icon to all displayed .html files with this: ... AddIcon icons/bomb.gif

    .html

    AddIcon expects the URL of an icon, followed by a file extension, wildcard expression, partial filename, or complete filename to describe the files to which the icon will be added. We can iconify subdirectories off the DocumentRoot with ^^DIRECTORY^^, or make blank lines format properly with ^^BLANKICON^^. Since we have the convenient icons directory to practice with, we can iconify it with this: AddIcon /icons/burst.gif ^^DIRECTORY^^

    Or we can make it disappear with this: ... IndexIgnore ...

    icons

    Not all browsers can display icons. We can cater to those that cannot by providing a text alternative alongside the icon URL: AddIcon ("DIR",/icons/burst.gif) ^^DIRECTORY^^

    This line will print the word DIR where the burst icon would have appeared to mark a directory (that is, the text is used as the ALT description in the link to the icon). You could, if you wanted, print the word "Directory" or "This is a directory." The choice is yours. Here are several examples of uses of AddIcon: AddIcon (IMG,/icons/image.xbm) .gif .jpg .xbm AddIcon /icons/dir.xbm ^^DIRECTORY^^ AddIcon /icons/backup.xbm *~ AddIconByType should be used in preference to AddIcon, when possible.

    AddAlt AddAlt string file file ... Server config, virtual host, directory, .htaccess AddAlt sets alternate text to display for the file if the client's browser can't display an icon. The stringmust be enclosed in double quotes.

    AddDescription AddDescription string file1 file2 ... Server config, virtual host, directory, .htaccess AddDescription expects a description string in double quotes, followed by a file

    extension, partial filename, wildcards, or full filename: FancyIndexing on AddDescription "One of our wonderful catalogs" catalog_autumn.html catalog_summer.html IndexIgnore *.jpg IndexIgnore .. AddIcon (CAT,icons/bomb.gif) .html AddIcon (DIR,icons/burst.gif) ^^DIRECTORY^^ AddIcon icons/blank.gif ^^BLANKICON^^ DefaultIcon icons/blank.gif

    Having achieved these wonders, we might now want to be a bit more sensible and choose our icons by MIME type using the AddIconByType directive. DefaultIcon DefaultIcon url Server config, virtual host, directory, .htaccess DefaultIcon sets a default icon to display for unknown file types. url is relative and points to the icon.

    AddIconByType AddIconByType icon mime_type1 mime_type2 ... Server config, virtual host, directory, .htaccess AddIconByType takes an icon URL as an argument, followed by a list of MIME types.

    Apache looks for the type entry in mime.types, either with or without a wildcard. We have the following MIME types: ... text/html html htm text/plain text text/richtext rtx text/tab-separated-values tsv text/x-setext text ...

    So, we could have one icon for all text files by including the line: AddIconByType (TXT,icons/bomb.gif) text/*

    Or we could be more specific, using four icons, a.gif, b.gif, c.gif, and d.gif : AddIconByType AddIconByType AddIconByType AddIconByType

    (TXT,/icons/a.gif) (TXT,/icons/b.gif) (TXT,/icons/c.gif) (TXT,/icons/d.gif)

    text/html text/plain text/tab-separated-values text/x-setext

    Let's try out the simpler case: FancyIndexing on AddDescription "One of our wonderful catalogs" catalog_autumn.html catalog_summer.html IndexIgnore *.jpg IndexIgnore ..

    AddIconByType (CAT,icons/bomb.gif) text/* AddIcon (DIR,icons/burst.gif) ^^DIRECTORY^^

    For a further refinement, we can use AddIconByEncoding to give a special icon to encoded files. AddAltByType AddAltByType string mime_type1 mime_type2 ... Server config, virtual host, directory, .htaccess AddAltByType provides a text string for the browser to display if it cannot show an icon.

    The string must be enclosed in double quotes. AddIconByEncoding AddIconByEncoding icon mime_encoding1 >mime_encoding2 ... Server config, virtual host, directory, .htaccess AddIconByEncoding takes an icon name followed by a list of MIME encodings. For instance, x-compress files can be iconified with the following: ... AddIconByEncoding (COMP,/icons/d.gif) application/x-compress ...

    AddAltByEncoding AddAltByEncoding string mime_encoding1 mime_encoding2 ... Server config, virtual host, directory, .htaccess AddAltByEncoding provides a text string for the browser to display if it can't put up an icon. The string must be enclosed in double quotes.

    Next, in our relentless drive for perfection, we can print standard headers and footers to our directory listings with the HeaderName and ReadmeName directives. HeaderName HeaderName filename Server config, virtual host, directory, .htaccess

    This directive inserts a header, read from filename, at the top of the index. The name of the file is taken to be relative to the directory being indexed. Apache will look first for filename.html and, if that is not found, then filename. Apache Versions After 1.3.6 filename is treated as a URI path relative to the one used to access the directory being indexed and must resolve to a document with a major content type of "text" (e.g., text/html, text/plain, etc.). This means that filename may refer to a CGI script if the script's actual file type (as opposed to its output) is marked as text/html, such as with the following directive: AddType text/html .cgi

    Content negotiation will be performed if the MultiViews option is enabled. If filename resolves to a static text/html document (not a CGI script) and the Includes option is enabled, the file will be processed for server-side includes (see the mod_include documentation). If the file specified by HeaderName contains the beginnings of an HTML document (, , etc.), then you will probably want to set IndexOptions +SuppressHTMLPreamble, so that these tags are not repeated. (See also ReadmeName.) FancyIndexing on AddDescription "One of our wonderful catalogs" catalog_autumn.html catalog_summer.html IndexIgnore *.jpg IndexIgnore .. icons HEADER README AddIconByType (CAT,icons/bomb.gif) text/* AddIcon (DIR,icons/burst.gif) ^^DIRECTORY^^ HeaderName HEADER ReadMeName README

    Since HEADER and README can be HTML documents, you can wrap the directory listing up in a whole lot of fancy interactive stuff if you want. On the whole, however, FancyIndexing is just a cheap and cheerful way of getting something up on the Web. For a more elegant solution, study the next section.

    7.2 Making Our Own Indexes In the last section, we looked at Apache's indexing facilities. So far we have not been very adventurous with our own indexing of the document root directory. We replaced Apache's adequate directory listing with a custom-made .html file: index.html (see Chapter 3).

    We can improve on index.html with the DirectoryIndex command. This command specifies a list of possible index files to be used in order. 7.2.1 DirectoryIndex The DirectoryIndex directive sets the list of resources to look for when the client requests an index of the directory by specifying a / at the end of the directory name. DirectoryIndex local-url local-url ... Default: index.html Server config, virtual host, directory, .htaccess local-url is the URL of a document on the server relative to the requested directory; it is usually the name of a file in the directory. Several URLs may be given, in which case the server will return the first one that it finds. If none of the resources exists and IndexOptions is set, the server will generate its own listing of the directory. For example, if this is the specification: DirectoryIndex index.html

    then a request for http://myserver/docs/ would return http://myserver/docs/index.html if it did not exist; if it exists, the request would list the directory, provided indexing was allowed. Note that the documents do not need to be relative to the directory: DirectoryIndex index.html index.txt /cgi-bin/index.pl

    This would cause the CGI script /cgi-bin/index.pl to be executed if neither index.html nor index.txt existed in a directory. A common technique for getting a CGI script to run immediately when a site is accessed is to declare it as the DirectoryIndex: DirectoryIndex /cgi-bin/my_start_script

    If this is to work, redirection to cgi-bin must have been arranged using ScriptAlias or ScriptAliasMatch higher up in the Config file. The Config file from ... /site.ownindex is as follows: User webuser Group webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/site.ownindex/htdocs AddHandler cgi-script cgi Options ExecCGI indexes DirectoryIndex hullo.cgi index.html goodbye

    DirectoryIndex index.html goodbye DirectoryIndex goodbye

    In ... /htdocs we have five subdirectories, each containing what you would expect to find in ... /htdocs itself, plus the following files: • • •

    hullo.cgi index.html goodbye

    The CGI script hullo.cgi contains: #!/bin/sh echo "Content-type: text/html" echo env echo Hi there

    The HTML document index.html contains:

    looks like this:

    Welcome to Butterthlies Inc

    Summer Catalog

    All our cards are available in packs of 20 at $2 a pack. There is a 10% discount if you order more than 100.


    Style 2315

    Be BOLD on the bench

    How many packs of 20 do you want?


    Style 2316



    Get SCRAMBLED in the henhouse

    How many packs of 20 do you want?


    Style 2317

    Get HIGH in the treehouse

    How many packs of 20 do you want?


    Style 2318

    Get DIRTY in the bath

    How many packs of 20 do you want?


    Which Credit Card are you using?

    1. Access
    2. Amex
    3. MasterCard

    Your card number?


    Postcards designed by [email protected]



    Butterthlies Inc, Hopeful City, Nevada, 99999



    This is all pretty straightforward stuff, except perhaps for the line:

    which on Windows might look like this:

    The tag introduces the form; at the bottom, ends it. The METHOD attribute tells Apache how to return the data to the CGI script we are going to write, in this case using POST.

    In the Unix case, the ACTION attribute tells Apache to use the URL cgi-bin/mycgi.cgi (which the server may internally expand to /usr/www/cgi-bin/mycgi.cgi, depending on server configuration) to do something about it all:

    It would be good if we wrote perfect HTML, which this is not. Although most browsers allow some slack in the syntax, they don't all allow the same slack in the same places. If you write HTML that deviates from the standard, you have to expect that your pages will behave oddly somewhere, sometime. To make sure you have not done so, you can submit your pages to a validator — for instance, http://validator.w3.org. For more information on the many HTML features used to create forms, see HTML & XHTML: The Definitive Guide by Chuck Musciano and Bill Kennedy (O'Reilly, 2002). 13.1.3 Other Approaches to Application Building While HTML forms are likely the most common use for application logic on web servers, there are many other cases where users interact with applications without necessarily filling out forms. Large sites often use content-management systems to store the information the site presents in databases, generating content regularly even though it may look to users exactly like an ordinary site with static files. Even smaller sites may use tools like Cocoon (discussed in Chapter 19) to manage and generate content for users. Many sites create customized experiences for their users, making suggestions based on prior visits to the site or information users have provided previously. These sites typically use "cookies," a mechanism that lets sites store a tiny amount of information on the user's computer and that the browser will report each time the user visits the site. Cookies may last for a single session, expiring when the user quits the browser, or they may last longer, expiring at some preset date. Cookies raise a number of privacy issues, but are frequently used in applications that interact with users over more than a single transaction. Using mechanisms like this, a web site might in fact generate every page a user sees, customizing the entire site. Building complex web applications is well beyond the scope of this book, which focuses on the Apache server you would use as their foundation. For more on web-application design in general, see Information Architecture for the World Wide Web by Louis Rosenfeld and Peter Morville (O'Reilly, 2002). For more on application design in specific environments, see the books referenced in the environment-specific chapters.

    13.2 Providing Application Logic While you could write Apache modules that provide the logic for your applications, most developers find it much easier to use scripting languages and integrate them with Apache using modules others have already written. Ultimately, all any computer language can do is to make the CPU compare, add, subtract, multiply, and divide bytes. An important point about scripting languages is that they should run without modification on as many platforms as possible, so that your site can move from machine to machine. On the other hand, if you are a beginner and know someone who can help with one particular language, then that one might be the best choice. We devote a chapter to installing support for each of the major languages and run over the main possibilities here. The discussion of computer languages is made rather difficult by the fact that human beings fall into two classes: those who love some particular language and those don't. Naturally, the people who discuss languages fall into the first class; many of the people who read books like this in the hope of doing something useful with a computer tend more towards the second. The authors regard computer languages as a necessary evil. Languages all have their quirks, ranging from the mildly amusing to pleasures comparable to gargling battery acid. We would like enthusiasts for each of these languages to know that our comments on the others have reduced those enthusiasts to fury as well. 13.2.1 Server-Side Includes Server-side includes are more of a means of avoiding scripting languages than a proper scripting language. If your needs are very limited, you may also find that the basic functionality this tool provides can solve a number of content issues, and it may also prove useful in combination with other approaches. Server-side includes are covered in Chapter 14. 13.2.2 PHP Another approach to the problem of orchestrating HTML with CGI scripts, databases, and Apache is PHP. Someone who is completely new to programming of any sort might do best to start with PHP, which extends HTML — and one has to learn HTML anyway. Instead of writing CGI scripts in a language like Perl or Java, which then run in interaction with Apache and generate HTML pages to be sent to the client, PHP's strategy is to embed itself into the HTML. The author then writes HTML with embedded commands, which are interpreted by the PHP package as the page is served up. For instance, you could include the line: Hello world!


    in your HTML. Or, you could have the PHP statement:

    which would produce exactly the same effect. The construction embeds PHP commands within standard HTML. PHP has resources to interact with databases and do most things that other scripting languages do. The syntax of PHP is based on that of C with bits of Perl. The main problem with learning a new programming language is unlearning irrelevant bits of the ones you already know. So if you have no programming experience to confuse you, PHP may be as good a place to start as any. Its promoters claim that over a million web sites use it, so you will not be the first. Also, since it was designed for its web function from the start, it avoids a lot of the bodging that has proven necessary to get Perl to work properly in a web environment. On the other hand, it is relatively new and has not accumulated the wealth of prewritten modules that fill the Comprehensive Perl Archive Network (CPAN) library (see http://www.cpan.org). For example, one of us (PL) was creating a web site that offered a full-text search on a medical encyclopedia. The problem with text searching is that the visitor looks for "operation," but the text talks about "operated on," "operating theater," etc. The answer is to work back to the word stem, and there are several Perl modules in CPAN that strip the endings from English words to get, for instance, the stem "operat" from "operation," the word the enquirer entered. If one wanted to go further and parse English sentences into their parts of speech, modules to do that exist as well. But they might not exist for PHP and it might be hard to create them on your own. An early decision to take the simple route might prove expensive later on. PHP installation is covered in Chapter 15. 13.2.3 Perl Perl, on the other hand, is an effective but annoyingly idiosyncratic language that has not been designed along sound theoretical lines. However, it has been around since 1987, has had many tiresome features ironed out of it, and has accumulated an enormous body of enthusiasts and supporting software in the CPAN archive. Its star feature is its regular expression tool for parsing lines of text. When one is programming for the Web, this is constantly in use to dissect URLs and strip meaning out of the returns from HTML forms. Perl also has a construct called an "associative array," which gives names to the array elements. This can be very useful, but its syntax can also be very complicated and mindbending. Perhaps the most serious defect of Perl is its absence of variable declaration. You can make up variable names on the fly (usually by mistyping or misthinking): Perl will create them and reference them, even if they are wrong and should not exist. This problem can be mitigated, however, with the use of the -w command line flag, as well as the following:

    use strict;

    within the scripts. Anyone who writes Perl needs the "Camel Book"[1] from O'Reilly & Associates. For all its occasional jokes, this is a fairly heavyweight book that is not meant to guide novices' first steps. Sriram Srinivasan's Advanced Perl Programming (O'Reilly, 1997) is also useful. If you are a complete newcomer to programming (and we all were once) you might like to look at Perl for Web Site Management by John Callender (O'Reilly, 2001) or Learning Perl by Randal L. Schwartz and Tom Phoenix (O'Reilly, 2001). The use of Perl in CGI applications is covered in Chapter 16, while mod_perl is covered in Chapter 17. 13.2.4 Java Java is a more "proper" (and compiled) programming language, but it is newish.[2] In the Apache world, server-side Java is now available through Tomcat. See Chapter 17. Whether you choose Java over Perl, Python, or PHP probably depends on what you think of Java. As President Lincoln once famously said: "People who like this sort of thing will find this the sort of thing they like." But it is the strongly held, if possibly cranky, view of at least one of us (PL) that a lot of what is wrong with the Web is due to Java. Java makes it possible for web creators to invest their energies in an interestingly complicated medium that allows them to make pages that judder, vibrate, bounce, flash, dissolve, and swim about... By the time a programmer has mastered Java and all its distracting tricks, it is probably far too late to suggest that what the viewer really wants is static information in lucidly laid out words and pictures, for which Perl or PHP are perfectly adequate and much easier to use. As we went to press with this edition, it became plain that this Luddite view might have other supporters. Velocity, seemingly yet another page-authoring language, but one written in Java so that you can mess with its innards, was announced: Velocity is a Java-based template engine. It permits web page designers to use simple yet powerful template language to reference objects defined in Java code. Web designers can work in parallel with Java programmers to develop web sites according to the ModelView-Controller (MVC) model, meaning that web page designers can focus solely on creating a site that looks good, and programmers can focus solely on writing top-notch code. Velocity separates Java code from the web pages, making the web site more maintainable over the long run and providing a viable alternative to Java Server Pages (JSPs) or PHP. The curious will find Velocity at http://jakarta.apache.org/velocity/. In addition to these stylistic reservations about Java as a creative medium, we felt that Tomcat showed several symptoms of being an over-complicated project, which is as yet

    in an early stage of development. There seemed to be a lot of loose ends and many ways of getting things wrong. Certainly, we struggled over the interface between Tomcat and Apache for several months without success. Each time we returned to the problem, a new release of Tomcat had changed a lot of the ground rules. But in the end we succeeded, though we had to hack both Apache and Tomcat to make it work. Using Java with Apache is covered in Chapter 18. 13.2.5 Other Options Python is fairly similar to Perl — less well known but also less idiosyncratic. It is also a scripting language, but one that has been properly written along sound academic lines (not necessarily a bad thing) and is easy to learn. JavaScript was originally created for use in browsers, but it has found use on servers as well. It has only a very superficial relationship to Java, but is commonly used as a scripting language in a variety of different application environments. Another possibility, which we would suggest you pass by unless you have absolutely no choice, is Visual Basic — more likely the VBScript form used in various Microsoft products. BASIC was invented as a painless way of introducing students to programming. It was never intended to be a proper programming language, and subsequent attempts to make it one have proved largely unsuccessful, though developers certainly use it. A surprising number of big, expensive e-commerce sites often collapse in a spray of Visual Basic error messages. People who like Microsoft's Active Server Pages (ASP) but don't like Microsoft's server can find a Perl emulator in the CPAN archive (http://www.cpan.org/), and Sun Microsystems offers a commercial ASP implementation that works with Apache (http://wwws.sun.com/software/chilisoft/ ).

    13.3 XML, XSLT, and Web Applications Extensible Markup Language (XML) has taken off in the last few years as a generic format for storing information. XML looks much like HTML, with a similar combination of elements and attributes for marking up text, but it lets developers create their own vocabularies. Some XML is shared directly over the Web; some XML is used by web services applications; and some XML is used as a foundation for web sites that need to present information in multiple forms. Serving XML documents is just like serving any other files in Apache, requiring only putting the files up and setting a MIME type identifier for them. Web services generally require the installation of modules specific to a particular web-service protocol, which then act as a gateway between the web server and application logic elsewhere on the computer. The last option — using XML as a foundation for information the Apache server needs to be able to present in multiple forms — is growing more common and fits well in more typical web-server applications. In this case, XML typically provides a format for storing information separate from its presentation details. When the Apache server gets a request for a particular file, say in HTML, it passes it to a tool that deals with the XML. That tool

    typically loads the XML document, generates a file in the format requested, and passes it back to Apache, which then transmits it to the user. (The XML processor may pull the file from a cache if the file has been requested previously.) If a site is only serving up HTML files, all this extra work is probably unnecessary, but sites that provide HTML, PDF, WML (Wireless Markup Language), and plain-text versions of the same content will likely find this approach very useful. Even sites that offer multiple HTML renditions of the same information may find this approach easier than managing multiple files. Most commonly, the transformation between the original XML document and the result the user wants is defined using Extensible Stylesheet Language Transformations (XSLT). Developers use XSLT to create templates that define the production of result documents from original XML documents, and these templates can generally be applied to many originals to produce many results. Making this work on Apache requires adding some parts that support XSLT and manage the caching process. Chapter 19 will explore Cocoon, a Java-based sub-project of the Apache Project that is widely used for this work. Perl devotees may want to explore AxKit, another Apache project that does similar work in Perl. (For a complete list of XML-related projects at Apache, visit http://xml.apache.org/.) XML and XSLT are subjects that go well beyond the scope of this book. Chapter 19 will provide a brief introduction, but you may also want to explore Learning XML by Erik Ray (O'Reilly, 2001), XSLT by Doug Tidwell (O'Reilly, 2001), and XML in a Nutshell by Elliotte Rusty Harold and Scott Means (O'Reilly, 2002). [1] Wall, Larry, Jon Orwant, and Tom Christiansen. Programming Perl (O'Reilly, 2000). [2] "New" is a bad four letter word in computing.

    Chapter 14. Server-Side Includes • • • • • •

    14.1 File Size 14.2 File Modification Time 14.3 Includes 14.4 Execute CGI 14.5 Echo 14.6 Apache v2: SSI Filters

    Server-side includes trigger further actions whose output, if any, may then be placed inline into served documents or affect subsequent includes. The same results could be achieved by CGI scripts — either shell scripts or specially written C programs — but server-side includes often achieve these results with a lot less effort. There are, however, some security problems. The range of possible actions is immense, so we will just give basic illustrations of each command in a number of text files in ...site.ssi/htdocs. The Config file, .../conf/httpd1.conf, is as follows: User webuser Group webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/site.ssi/htdocs ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin AddHandler server-parsed shtml Options +Includes

    Run it by executing ./go 1. shtml is the normal extension for HTML documents with server-side includes in them and is found as the extension to the relevant files in ... /htdocs. We could just as well use brian or dog_run, as long as it appears the same in the file with the relevant command and in the configuration file. Using html can be useful — for instance, you can easily implement site-wide headers and footers — but it does mean that every HTML page gets parsed by the SSI engine. On busy systems, this could reduce performance. Bear in mind that HTML generated by a CGI script does not get put through the SSI processor, so it's no good including the markup listed in this chapter in a CGI script. Options Includes turns on processing of SSIs. As usual, look in the error_log if things

    don't work. The error messages passed to the client are necessarily uninformative since they are probably being read three continents away, where nothing useful can be done about them. The trick of SSI is to insert special strings into our documents, which then get picked up by Apache on their way through, tested against reference strings using =, !=, =, and then replaced by dynamically written messages. As we will see, the strings

    have a deliberately unusual form so they won't get confused with more routine stuff. This is the syntax of a command:

    The Apache manual tells us what the elements are: config This command controls various aspects of the parsing. The valid attributes are as follows: errmsg The value is a message that is sent back to the client if an error occurs during document parsing. sizefmt The value sets the format to be used when displaying the size of a file. Valid values are bytes for a count in bytes or abbrev for a count in kilobytes or megabytes, as appropriate. timefmt The value is a string to be used by the strftime( ) library routine when printing dates. echo This command prints one of the include variables, defined later in this chapter. If the variable is unset, it is printed as (none). Any dates printed are subject to the currently configured timefmt. This is the only attribute: var The value is the name of the variable to print. exec The exec command executes a given shell command or CGI script. Options IncludesNOEXEC disables this command completely — a boon to the prudent webmaster. The valid attribute is as follows: cgi

    The value specifies a %-encoded URL relative path to the CGI script. If the path does not begin with a slash, it is taken to be relative to the current document. The document referenced by this path is invoked as a CGI script, even if the server would not normally recognize it as such. However, the directory containing the script must be enabled for CGI scripts (with ScriptAlias or the ExecCGI option). The protective wrapper suEXEC will be applied if it is turned on. The CGI script is given the PATH_INFO and query string (QUERY_STRING) of the original request from the client; these cannot be specified in the URL path. The include variables will be available to the script in addition to the standard CGI environment. If the script returns a Location header instead of output, this is translated into an HTML anchor. If Options IncludesNOEXEC is set in the Config file, this command is turned off. The include virtual element should be used in preference to exec cgi. cmd The server executes the given string using /bin/sh. The include variables are available to the command. If Options IncludesNOEXEC is set in the Config file, this is disabled and will cause an error, which will be written to the error log. fsize This command prints the size of the specified file, subject to the sizefmt format specification. The attributes are as follows: file The value is a path relative to the directory containing the current document being parsed. virtual The value is a %-encoded URL path relative to the document root. If it does not begin with a slash, it is taken to be relative to the current document. flastmod This command prints the last modification date of the specified file, subject to the timefmt format specification. The attributes are the same as for the fsize command. include This command includes other files immediately at that point in parsing — right there and then, not later on. Any included file is subject to the usual access control. If the directory containing the parsed file has Options IncludesNOEXEC

    set and including the document causes a program to be executed, it isn't included: this prevents the execution of CGI scripts. Otherwise, CGI scripts are invoked as normal using the complete URL given in the command, including any query string. An attribute defines the location of the document; the inclusion is done for each attribute given to the include command. The valid attributes are as follows: file The value is a path relative to the directory containing the current document being parsed. It can't contain ../, nor can it be an absolute path. The virtual attribute should always be used in preference to this one. virtual The value is a %-encoded URL relative to the document root. The URL cannot contain a scheme or hostname, only a path and an optional query string. If it does not begin with a slash, then it is taken to be relative to the current document. A URL is constructed from the attribute's value, and the server returns the same output it would have if the client had requested that URL. Thus, included files can be nested. A CGI script can still be run by this method even if Options IncludesNOEXEC is set in the Config file. The reasoning is that clients can run the CGI anyway by using its URL as a hot link or simply by typing it into their browser; so no harm is done by using this method (unlike cmd or exec).

    14.1 File Size The fsize command allows you to report the size of a file inside a document. The file size.shtml is as follows: sizefmt="bytes"--> this file is bytes. another_file is bytes.

    The first line provides an error message. The second line means that the size of any files is reported in bytes printed as a number, for instance, 89. Changing bytes to abbrev gets the size in kilobytes, printed as 1k. The third line prints the size of size.shtml itself; the fourth line prints the size of another_file. config commands must appear above commands that might want to use them. You can replace the word file= in this script, and in those which follow, with virtual=, which gives a %-encoded URL path relative to the document root. If it does not begin with a slash, it is taken to be relative to the current document.

    If you play with this stuff, you find that Apache is strict about the syntax. For instance, trailing spaces cause an error because valid filenames don't have them: The size of this file is bytes.

    If we had not used the errmsg command, we would see the following: ...[an error occurred while processing this directive]...

    14.2 File Modification Time

    The last modification time of a file can be reported with flastmod. This lets the client know how fresh the data is that you are offering. The format of the output is controlled by the timefmt attribute of the config element. The default rules for timefmt are the same as for the C-library function strftime( ), except that the year is now shown in fourdigit format to cope with the Year 2000 problem. Win32 Apache is soon to be modified to make it work in the same way as the Unix version. Win32 users who do not have access to Unix C manuals can consult the FreeBSD documentation at http://www.freebsd.org, for example:

    % man strftime

    (We have not included it here because it may well vary from system to system.) The file time.shtml gives an example: The mod time of this file is The mod time of another_file is

    This produces a response such as the following: The mod time of this file is Tuesday August 19, the 240th day of the year, 841162166 seconds since the Epoch The mod time of another_file is Tuesday August 19, the 240th day of the year, 841162166 seconds since the Epoch

    14.3 Includes

    We can include one file in another with the include command: This is some text in which we want to include text from another file: We're now going to execute 'cmd="ls -l"'': >> and now the 'virtual' option: Echoing the DATE_GMT

    and produces the response: Echoing the Document_URI /echo.shtml Echoing the DATE_GMT Saturday, 17-Aug-96 07:50:31

    14.6 Apache v2: SSI Filters Apache v2, with its filter mechanism, introduced some new SSI directives: SSIEndTag SSIEndTag tag Default: SSIEndTag " -- >" Context: Server config, virtual host

    This directive changes the string that mod_include looks for to mark the end of an include element. Example SSIEndTag "%>"

    See also SSIStartTag. SSIErrorMsg SSIErrorMsg message Default: SSIErrorMsg "[an error occurred while processing this directive]" Context: Server config, virtual host, directory, .htaccess

    The SSIErrorMsg directive changes the error message displayed when mod_include encounters an error. For production servers you may consider changing the default error message to "" so that the message is not presented to the user. This directive has the same effect as the element. Example SSIErrorMsg ""

    SSIStartTag SSIStartTag message Default: SSIStartTag "