<?xml version="1.0"?>
<?xml-stylesheet type="text/xml" href="presenter.xsl"?>
<slideshow>
	<title>Web Applications</title>

	<slide><title>Web Applications</title><body>
		<p>By Karl Voelker</p>
	</body></slide>

	<slide><title>The Goal</title><body>
		<p>The goal: to understand applications that</p>
		<ul>
		<li>use the stateless HTTP protocol,</li>
		<li>are rendered with HTML in web browsers,</li>
		<li>require server-side programming,</li>
		<li>and provide an "interactive" application for users.</li>
		</ul>
		<p>Confusing parts of our example code will be 
			explained.</p>
	</body></slide>

	<slide><title>HTTP</title><body>
		<p>HTTP is the protocol that web clients (browsers) use to 
			communicate with web servers.</p>
		<p>HTTP communication goes like this:</p>
		<ol>
		<li>Client sends a complete request to server.</li>
		<li>Server sends a complete response to client.</li>
		<li>The end.</li>
		</ol>
	</body></slide>

	<slide><title>A Web Page</title><body>
		<webcpp>web_page.html</webcpp>
	</body></slide>

	<slide><title>A Web Page</title><body>
		<p>That was an ordinary web page.</p>
		<p>Let's see a program the webserver can run that will 
			produce the same web page:</p>
	</body></slide>

	<slide><title>Web Application #1</title><body>
		<webcpp>web_page.pl</webcpp>
	</body></slide>

	<slide><title>Web Application #1</title><body>
		<p>That was pretty boring.</p>
		<p>Let's spice it up with some user input.</p>
	</body></slide>

	<slide><title>User Input</title><body>
		<p>Remember HTTP? The entire client request comes 
			before the server response.</p>
		<p><strong>All user input must be included by the 
			client in the request.</strong></p>
		<p>User input in the request consists of key-value 
			pairs.</p>
		<p>We hope the user knows what keys to provide.</p> 
	</body></slide>

	<slide><title>Web Application #2</title><body>
		<webcpp>hello_name.pl</webcpp>
	</body></slide>

	<slide><title>Web Application #2</title><body>
		<p>Extracting the key-value user inputs is tricky.</p>
		<p>Libraries exist for most languages 
			that make it easy!</p>
		<h3>In Perl:</h3>
		<dl>
		<dt><tt>use CGI qw/param/;</tt></dt>
		<dd>Load the CGI library and import the <tt>param</tt>
			subroutine.</dd>
		<dt><tt>param('foo')</tt></dt>
		<dd>Get the user input value for key <tt>foo</tt>.</dd>
		</dl>
	</body></slide>

	<slide><title>Web Application #2</title><body>
		<p>Our users would include the <tt>name</tt> key 
			in their requests like this:</p>
		<p><tt>http://some.domain/hello.pl?name=Karl</tt></p>
		<p>But most users aren't that clever.</p>
		<p>We need to make user input easier for users!</p>
	</body></slide>

	<slide><title>Web Application #3</title><body>
		<webcpp>hello_form_1.pl</webcpp>
	</body></slide>

	<slide><title>Web Application #3</title><body>
		<webcpp>hello_form_2.pl</webcpp>
	</body></slide>

	<slide><title>Web Application #3</title><body>
		<webcpp>hello_form_3.pl</webcpp>
	</body></slide>

	<slide><title>Web Application #3</title><body>
		<p>You've just seen a <tt>form</tt>.</p>
		<p>A <tt>form</tt> contains named input controls.</p>
		<p>When the form is <em>submitted</em>, a new HTTP 
			request is made, and the values entered into 
			the input controls are included!</p>
	</body></slide>

	<slide><title>Forms</title><body>
		<p>The <tt>form</tt> tag encloses a form. It has these 
			attributes:</p>
		<dl>
		<dt><tt>method</tt></dt>
		<dd>Type of HTTP request to use on submit (<tt>GET</tt> or 
			<tt>POST</tt>)</dd>
		<dt><tt>action</tt></dt>
		<dd>URL to request on submit (defaults to that of the 
			current page)</dd>
		</dl>
	</body></slide>

	<slide><title>Form Inputs</title><body>
		<p>Most input controls are created with the <tt>input</tt> 
			tag, with these attributes:</p>
		<dl>
		<dt><tt>name</tt></dt>
		<dd>Key to associate with the value entered into this 
			input</dd>
		<dt><tt>type</tt></dt>
		<dd>Input type (<tt>text</tt>, <tt>hidden</tt>, 
			<tt>checkbox</tt>, <tt>radio</tt>, 
			<tt>submit</tt>)</dd>
		<dt><tt>value</tt></dt>
		<dd>Meaning varies by input type</dd>
		</dl>
	</body></slide>

	<slide><title><tt>text</tt> Inputs</title><body>
		<p>The <tt>value</tt> attribute is the default value:</p>
		<p><input type="text" value="Foo Bar Baz"/></p>
		<webcpp>input_text.html</webcpp>
	</body></slide>

	<slide><title><tt>hidden</tt> Inputs</title><body>
		<p>The <tt>hidden</tt> input is invisible. Use it for:</p>
		<ul>
		<li>Input values determined by JavaScript</li>
		<li>Pre-determined input values embedded in the page 
			by the server</li>
		</ul>
		<webcpp>input_hidden.html</webcpp>
	</body></slide>

	<slide><title><tt>checkbox</tt> and <tt>radio</tt> Inputs</title><body>
		<p>A set of checkboxes or radio buttons shares a single 
			<tt>name</tt>, but each needs a unique 
			<tt>value</tt>:</p>
		<p>Bar: <input type="radio" name="foo" value="bar"/><br/>
		Baz: <input type="radio" name="foo" value="baz"/></p>
		<webcpp>input_radio.html</webcpp>
	</body></slide>

	<slide><title><tt>submit</tt> Inputs</title><body>
		<p>A <tt>submit</tt> input is a button that causes the form 
			to be submitted when pressed.</p>
		<p>The <tt>value</tt> is the text shown on the button.</p>
		<p><input type="submit" value="Go!"/></p>
		<webcpp>input_submit.html</webcpp>
	</body></slide>

	<slide><title><tt>select</tt></title><body>
		<p>Use <tt>select</tt> to create a menu of choices:</p>
		<p><select name="foo">
		<option value="bar">Bar</option>
		<option value="baz">Baz</option>
		</select></p>
		<webcpp>input_select.html</webcpp>
	</body></slide>

	<slide><title>Web Application #4</title><body>
		<p>Now let's add a good-bye page to our application.</p>
		<p>We want the user to choose between our hello and 
			good-bye pages.</p>
	</body></slide>

	<slide><title>Web Application #5</title><body>
		<p>How can we add our good-bye page?</p>
		<ul>
		<li>Create a new file, like <tt>goodbye.pl</tt>, accessed 
			by its own URL, which produces the new page</li>
		<li>Have our existing script accept another input which 
			determines which page we show</li>
		</ul>
		<p>We will use the second approach.</p>
	</body></slide>

	<slide><title>Web Application #5</title><body>
		<webcpp>hello_goodbye_1.pl</webcpp>
	</body></slide>

	<slide><title>Web Application #5</title><body>
		<webcpp>hello_goodbye_2.pl</webcpp>
	</body></slide>

	<slide><title>Web Application #5</title><body>
		<webcpp>hello_goodbye_3.pl</webcpp>
	</body></slide>

	<slide><title>Web Application #5</title><body>
		<webcpp>hello_goodbye_4.pl</webcpp>
	</body></slide>

	<slide><title>Like an Elephant</title><body>
		<p>Most interesting web applications remember some 
			information that users can affect.</p>
		<p>How can we do this?</p>
		<dl>
		<dt>In files</dt>
		<dd>Good unless you need to search</dd>
		<dt>In databases</dt>
		<dd>The typical approach</dd>
		<dt>In memory</dt>
		<dd>If your program continues running between HTTP requests</dd>
		</dl>
	</body></slide>

	<slide><title><tt>GET</tt> and <tt>POST</tt></title><body>
		<p>What's the difference?</p>
		<dl>
		<dt><tt>GET</tt></dt>
		<dd>Get something, but don't <em>change</em> anything.</dd>
		<dt><tt>POST</tt></dt>
		<dd>Change something.</dd>
		</dl>
		<p>This is an important distinction that affects 
			caching and browser behavior.</p>
	</body></slide>

	<slide><title>Authentication</title><body>
		<p>Everyone loves <em>logging in</em>. You need:</p>
		<ul>
		<li>Server-side user and password data</li>
		<li>A way for the client to give a username and password, 
			thus logging in</li>
		<li>A way for the client to <em>remain logged in</em></li>
		</ul>
		<p>The path to authentication is fraught with dangers.</p>
	</body></slide>

	<slide><title>Storing Usernames and Passwords</title><body>
		<p>This isn't too hard, until your server gets hacked 
			and all the passwords are stolen.</p>
		<p>You should store all passwords <em>hashed</em>.</p>
	</body></slide>

	<slide><title>Hashing</title><body>
		<p>A <em>hashing function</em> works like this:</p>
		<p><tt>hash(unique_input) -> unique_garbage</tt></p>
		<p>Also, you want a hash function for which this function 
			doesn't exist or is impossibly slow:</p>
		<p><tt>unhash(unique_garbage) -> unique_input</tt></p>
	</body></slide>

	<slide><title>Using Hashing</title><body>
		<p>It may not be obvious, but it is simple:</p>
		<ol>
		<li>Get claimed password from client</li>
		<li>Hash claimed password</li>
		<li>Compare hash of claimed password to stored hash of 
			correct password</li>
		</ol>
	</body></slide>

	<slide><title>I Can Has Hash?</title><body>
		<p>There are many real-world hashing functions.</p>
		<p>Attempting to break a hash function is a focus of 
			security research.</p>
		<dl>
		<dt><tt>MD5, SHA-1</tt></dt>
		<dd>Widely used, but recently broken</dd>
		<dt><tt>SHA-256, SHA-512</tt></dt>
		<dd>Theoretically similar to SHA-1, but not yet broken</dd>
		</dl>
	</body></slide>

	<slide><title>Sending the Password</title><body>
		<p>To prevent interception of passwords, configure your 
			webserver to use SSL or TLS (<tt>https</tt>).</p>
		<p>To prevent bystanders from seeing a password, use 
			<tt><show-xml><input type="password"/></show-xml></tt>
			instead of 
			<tt><show-xml><input type="text"/></show-xml></tt>.</p>
	</body></slide>

	<slide><title>Staying Logged-In</title><body>
		<p>This is trickier than it sounds. Once the login request 
			is done, how do you know which later requests are 
			coming from which users?</p>
		<p>The standard answer: cookies.</p>
	</body></slide>

	<slide><title>What is a cookie?</title><body>
		<p>A cookie is a way to store data on the client.</p>
		<p>On each request, the client's browser will include 
			your cookie.</p>
	</body></slide>

	<slide><title>How Cookies Work</title><body>
		<p>In some HTTP response, you include a special header 
			containing cookie data.</p>
		<p>In each subsequent HTTP request, the client includes 
			a special header with that same data.</p>
		<p>In our login scenario, we will set the cookie in 
			response to the client's login request.</p>
	</body></slide>

	<slide><title>Sessions with Cookies</title><body>
		<p>What could go in the cookie?</p>
		<dl>
		<dt>The client's username</dt>
		<dd><strong><span style='color:red'>Bad idea!</span></strong>
			Any client could make up a cookie containing 
			someone else's username.</dd>
		<dt>The client's username and password</dt>
		<dd><strong><span style='color:red'>Still risky!</span>
			</strong>
			If the client's computer is compromised, so is 
			their password.</dd>
		<dt>Random garbage</dt>
		<dd>Huh?</dd>
		</dl>
	</body></slide>

	<slide><title>Random-Garbage Authentication</title><body>
		<p>Here's what happens on a login:</p>
		<ol>
		<li>Server makes up random garbage (call it X)</li>
		<li>Server records X with the username that is logging in</li>
		<li>Server responds with X as the new cookie value</li>
		</ol>
		<p>The random garbage is like a password, but it's only valid 
			until the user logs out (or the session times out).</p>
	</body></slide>

	<slide><title>Injection Attacks</title><body>
		<p>Injection vulnerabilities typically appear when 
			code is prepared with one language 
			as a string, then executed or used elsewhere:</p>
		<ul>
		<li>SQL database commands</li>
		<li>Shell commands</li>
		<li>Filenames</li>
		<li>Regular expressions</li>
		<li>Response to client (HTML, JavaScript)</li>
		</ul>
		<p>We will consider examples of all of these, and how to 
			avoid them.</p>
	</body></slide>

	<slide><title>SQL Injection</title><body>
		<p>Suppose you want to let users look each other up by 
			name.</p>
		<p>You get input from the client and put it in 
			<tt>$username</tt>.</p>
		<p>You have a query to search for user information:</p>
		<p><tt>$query = 
			"SELECT favorite_animal FROM users 
			WHERE name = $username"</tt></p>
	</body></slide>

	<slide><title>SQL Injection</title><body>
		<p>Now, if we search for this user:</p>
		<p><tt>'foo'; DELETE FROM important_data</tt></p>
		<p>The query becomes:</p>
		<p><tt>SELECT favorite_animal FROM users WHERE name = 'foo'; 
			DELETE FROM important_data</tt></p>
	</body></slide>

	<slide><title>Preventing SQL Injection</title><body>
		<p>Any good database library provides a way to make safe 
			queries, usually called <em>bind variables</em>.</p>
		<pre>$query = "SELECT favorite_animal 
	FROM users WHERE name = ?";
$sth = $dbh->prepare($query);
$sth->execute($username);</pre>
		<p>The <tt>?</tt> creates a bind variable.</p>
	</body></slide>

	<slide><title>Shell Injection</title><body>
		<p>Much like SQL injection, shell injection typically 
			involves appending a malicious command.</p>
		<p>If you don't actually need the shell, you should be 
			running the other program directly instead.</p>
	</body></slide>

	<slide><title>Filename Injection</title><body>
		<p>Suppose your application creates a file in a directory, 
			say <tt>/home/foo/web</tt>, for each user.</p>
		<p>What happens when a user's name is 
			<tt>../../../something/important</tt>?</p>
	</body></slide>

	<slide><title>Response Content Injection</title><body>
		<p>Suppose that you let users write a little about 
			themselves:</p>
		<webcpp>xss.html</webcpp>
	</body></slide>

	<slide><title>Response Content Injection</title><body>
		<p>What just happened? The attacker:</p>
		<ol>
		<li>created an invisible frame,</li>
		<li>sent the frame to an attacker-owned URL,</li>
		<li>and included the cookie of the user 
			viewing the page in the request to the 
			attacker's URL.</li>
		</ol>
		<p>The user's session has been stolen!</p>
	</body></slide>

	<slide><title>The End</title><body>
		<p>Questions?</p>
	</body></slide>

</slideshow>

