Opening the HTTP cookies jar

Cookies offload server overhead and can
reduce some client overhead issues

By Charles Rejonis
Summary
Cookies, those little text files that HTTP servers write to your hard disk, have the potential of reducing the client's overhead while using very little disk space themselves. However, some users may be concerned that a foreign server is writing data to their disks. Over time, however, cookies may well end up saving far more overhead than they require. (2,100 Words)

Do you remember the old Cookie Monster virus? The "I wanna cookie!" screen message was satisfied easily with "chocolate chip" or "oatmeal."

Cookies in the context of HTTP servers are not so straightforward or whimsical. In fact, it's more like "don't ask, don't tell." As a client, you don't ask and the HTTP server won't tell before it goes ahead and gives you one. So, you might want to learn a little more about these Magic Cookies that are being used on your browser with increasing frequency.

Cookies are pieces of information that an HTTP server, such as Netscape Commerce Server, can store on your machine. All a cookie really is is a name and a value. Cookies are sent by a server to your browser and your browser stores them by writing them to a file on your disk. The next time you reconnect to that server, your browser sends back some or all of the cookies that the server originally sent to you.

Why do I want to use cookies?
Cookies seem like a lot of trouble over nothing, but it's not. One of the characteristics of the HTTP protocol is that the server keeps no state information between connections. Thus, to keep any data about your connection from one page to the next, the server has to generate a new page dynamically with that data somehow embedded in it. The traditional way to embed this information was through hidden fields in forms. However, this was not really a trustworthy mechanism because the person running the browser could move forward and backward through pages, set jump to bookmarks and all kinds of unpredictable actions. By using cookies, the server has much more control over what data it maintains on the browser side and how that data is manipulated.

All that sounds terribly abstract. What are some real-world applications for cookies? A prime example is Netscape's "personalize your Web page" option. When you tell the server what you part icular preferences are, the server encodes all that information and saves them as cookies on your hard drive. The next time you connect to the Netscape server, it looks for cookies that indicate your preferences. If it finds such cookies, it will configure the page accordingly. If not, you get the default page. This is a very easy way to allow a user to customize their environment without forcing the overhead onto the servers.

How can I use cookies?
One way to use cookies that's also related to user configuration involves a database of registered users. If, as the PC World Online site does, you require users to fill out a registration form, you can keep their chosen password in a cookie and have the password request and response automated. PC World Online registered users never have to go through the tedium of telling the PC World page their password over and over again.

Once you have that database, you can use cookies to transmit the user demographics to advertisers when the user clicks through an ad banner. This is a valid use of cookies, although the webmaster has to weigh the benefit to advertisers versus the possible offense to the client.

One use that is a clear benefit to the client is maintaining a shopping list of items as the client moves through an online catalog. As the client browses each page of the catalog, each selection is encoded as a cookie on the client side. At the end of the session, the server collects the cookies, uses that information to fill out the invoice form and requests the necessary credit card information.

What's in that cookie?
Cookies are stored in a text file on your machine. In Unix, the file is named "cookies" and is in the .netscape directory under your home directory. In Windows, the file is in the netscape directory, and is named cookies.txt. The Mac has the best name of the bunch. There, the file is called MagicCookie in the netscape preferences folder. That file looks something like:

# Netscape HTTP Cookie File
# http://www.netscape.com/newsref/std/cookie_spec.html
# This is a generated file!  Do not edit.

netscape.com      TRUE   /  FALSE   946684799   NETSCAPE_ID  c98ffb1e,c68818dc
infoseek.com      TRUE   /  FALSE   859574919   InfoseekUserId  D2679D5862DEA4FE
cgi.netscape.com  FALSE  /  FALSE   946684799   NETSCAPE_VERIFY  c65ff94b,c6a6abcb
adobe.com         TRUE   /  FALSE   946684799   INTERSE   123.123.123.1231212183113897
Each line of the file describes one cookie. The exact format of the file de pends on the syntax of the cookie spec. As the example illustrates, generally the first column is the domain of the server that gave you the cookie, the next to last column is the name of the cookie and the last column is the value of the cookie. So, in this file, there are cookies from Netscape, Adobe, and Infoseek. Each of them has sent down some sort of code that identifies this particular machine.

However, the Internet is full of people who want to know what their software is doing. Currently, the user of a browser receives no notification that a server has written information to their hard disk. The concerns this raises are valid and significant. So, to assess the risk involved, we first need to look at the Cookie syntax.

Syntax
The syntax of Cookies is simple and straightforward. According to the Netscape specification of Cookies:

"This is the format a CGI script would use to add to the HTTP headers a new piece of data which is to be stored by the client for later retrieval.
Set-Cookie: NAME=VALUE; expires=DATE; path=PATH; domain=DOMAIN; secure"
Unsurprisingly, NAME is the name of the cookie, VALUE is the data contained in that cookie. And the browser can throw away = the cookie anytime after DATE. For security, only servers that match DOMAIN will get this cookie. PATH is another restriction on where the cookie is sent. When the browser is about to request a URL on a server that matches DOMAIN, then the browser checks to see if that the path in that URL matches PATH. If so, the cookie is sent along with the URL request. If not, no cookie is sent. SECURE is a flag that, if present, indicates that the cookie is only to be sent if the connection is secure.

Each time a browser requests a URL from any server, the browser checks all its cookies for potential DOMAIN and PATH matches. Those lucky cookies that match are included with the request in the format:
Cookie: NAME1=VALUE1; NAME2=VALUE2 ...

How big is the Cookie Jar
The specification states that a browser should be prepared to keep up to 300 cookies in its cookie jar at a time, with a size limit of 4 Kb per cookie. This includes up to 20 cookies per domain or server.

If a browser runs out of cookie jar space, it deletes the least recently used cookies. The expires value is a guideline--browsers can delete cookies earlier if they run out of space. However, the cookie cannot be given out after the expiration date.

Security Issues
So, how secure is all of this? It can be a surprise to find out that an application can write to your disk under the control of someone else. Although the Cookie specification has been out for nearly a year, not many people know the details about it. Is the fact that a server can write to a text file on your disk more of a risk than the fact that you can download a Java applet that actually executes code? Probably not. It's important to think about the risk in the context of other network transactions we do without a second thought.

According to Frank Chen, security product manager at Netscape Communications Corp., the information maintained by cookies is no different than the data that could be captured at the server. No new information collection capabilities are added with cookies. He said that as a browser-user, he would be more nervous about the potential for server administrators to track his movements through their servers, to later mine that data for patterns, rather than about the information trackable via cookies.

For those who remain concerned about cookies and privacy, the Netscape Navigator Ver. 3.0 addresses this worry. This version, including the currently available beta 4, has an option (under Options/Network Preferences/Protocols) to show an allow/deny alert whenever a server tries to set a cookie. This gives you full control over server access to your cookie file, and, if nothing else, gives you an idea of who is offering you cookies, and what's in them.

Example
To experiment with cookies on your server, here are two Unix scripts to try. The first sets two cookies. The second is a script that, when called, prints back that stored cookie info.


Setting the Cookies This script sets the cookies Name and Color. The expires date format looks little funky, but it conforms to the USENET date standards. After invoking this script from your browser, you will be able to find the cookies in your cookie file. (You may have to quit the browser first.)
#!/bin/sh
echo "Content-type: text/html"
echo "Set-cookie: Name=Fred; expires=Wednesday, 01-Jan-97 12:00:00 GMT"
echo "Set-cookie: Color=Red; expires=Wednesday, 01-Jan-97 12:00:00 GMT"
echo ""
echo "This script sets cookies: Name=Fred, Color=Red"

Reading the Cookies When any browser connects to a server and runs a CGI script, that script can access a set of environment variables. Although the exact implementation of these variables is platform-dependent, all the cookies are stored in a variable with a name like HTTP_COOKIE. A Unix shell script that would display the cookies that were set by the above script is:

#!/bin/sh
echo "Content-type: text/html"
echo "
echo "These are your cookies:"
echo $HTTP_COOKIE

That's sort of the "hello, world" of cookies. A cookie doesn't really do anything itself. But the good news is that it doesn't take too much more to make it useful. On the server side, decide what you need to keep track of, assign appropriate codes to these states and write a small bit of code to determine what the cookie means when it comes back. And that's all there is to it.

About the author
Charles Rejonis is a computer scientist in Germantown, Md, he can be reached at rejonis@cs.stanford.edu.