XSS - Cross Site Scripting
Web Security: An Introduction
By Isaac H. C. Turner, September 2007
This document is intended for anyone new to developing websites using CGI or those who have but have little idea of what web security entails. I will start with a general introduction to CGI (also known as dynamic webpages), and move into a generalised approach to CGI web security. Example code is written in PHP as this is a common language to learn first, however all CGI scripting languages tackle the security issues discussed here similarly.
CGI stands for Common Gateway Interface, although that's not so helpful. It is the name given to the mechanism by which dynamic webpages are generated. The `classic' method for serving webpages is for the server to deliver a static webpage from a file to the user's browser. However, more often then not, modern websites are dynamic. When a dynamic webpage is requested the webserver uses CGI to execute an application that will in turn execute a script that will generate the webpage.
[ HTTP Request ] -> [ Webserver ] -> [ CGI executed application ] -> [ script generates page ]
[ HTTP Request ] -> [ Apache ] -> [ PHP ] -> [ myscript.php ]
Common CGI script languages include PHP, Perl, Python and Ruby.
Usually when a script is executed, the webserver makes several sets of input data available to the script. This data includes all data passed by HTTP (Hypertext Transfer Protocol, the protocol used to request webpages for webservers) from the browser as well as many server settings. Scripts can also usually access files and databases on the server or in its local network. Occassionally remote sources may also be used and these can be very varied, almost unrestricted, and always changing with the times. However, as webpages must be generated in a very short time (usually under 1 second) and can be put under considerable load (1000s of hits an hour), remote resources are best cached and looked up locally (using a file or database).
After a script has done some processing of the data, it should then respond to the HTTP request. The most common response is to send some file back to the user, along with with some HTTP data (if desired). Another option is to use the HTTP instructions, for instance the 'location' instruction to redirect the user to another location (page/URL). HTTP instructions also include response (error) codes. Other output methods available to the script involve files, databases or remote routes, however these are very rare.
Dynamic webpages are priniciply about inputing, proccessing, storing and outputing data. Sometimes this data is confidential, and almost always it is worth something to the website owner. That is why verifying the data is correct and valid, and shielding it from damage (or exposure) should be the top priority of any web developer. This section will address the most common security flaws in websites today and how to prevent them. Do not forget, however many design precautions are made, one day you'll find yourself wishing you made backups of your data.
The most common security flaw in web software is the SQL Injection. SQL (Standard Query Language) is the language used to interrogate databases. SQL Injections are a kind of command injection - these are security flaws where input data is misinterpretted as commands.
Example interpretted from an xkcd comic:
To look up a student's name in a university database, we may run the database query:
SELECT name FROM Students WHERE student_id='$studentId'
Where $studentId is a PHP variable with the submitted student ID. However if the user submitted the username "Robert'; DROP TABLE Students; --", we see that the query that would be executed would be:
SELECT name FROM Students WHERE student_id='Robert'; DROP TABLE Students; --'
Note: -- (double dash) comments out the remaining query in SQL.
$query = "SELECT name FROM Students WHERE student_id='" . mysql_real_escape_string($studentId) . "'";
This would convert the previously harmful query into the completely safe:
SELECT name FROM Students WHERE student_id='Robert\'; DROP TABLE Students; --'
It's simplest to wrap every variable with this method when used in a query and it saves having to keep track of which variables contain raw data and which contain sanatised data. The method has a low overhead for small pieces of text. Data from most sources ought to be passed through this method when being used in a database query, however there are some exceptions. Exceptions are sources where you know they can't contain sensitive characters (e.g. single quotes), and these include values from a database column of type int.
For more on SQL Injections: Wikipedia - SQL Injection
If we wanted to display a comment from a user of our site, in PHP we might do:
echo 'Paul said: ' . $comment;
However, if a user were to enter '<meta http-equiv="REFRESH" content="0;http://www.cs.man.ac.uk">' as a comment, all views of that page would be instantly redirected to the University of Manchester School of Computer Science website. This would stop anyone viewing your site, and could also be used in phishing attacks, bank scams etc. to trick users into giving up personal data.
Most XSS flaws can be prevented by simply striping HTML code (entities) from data before it is displayed on a webpage. In PHP there exists such a method: htmlentities()
echo 'Paul said: ' . htmlentities($comment);
For more on XSS: Wikipedia - XSS
File permissions can be a more subtle security issue, and one that needs regular inspection. Web scripts that connect to databases need to hold database usernames, host addresses and passwords in plain text. This is safe as long as users only execute them on the server and do not read the files themselves. This is where file permissions come in, if they are not set properly anyone can read sensitive data as well as trawl your code for other bugs to exploit.
** This section to be finished **
Finally it should be noted that most input into a web application can be faked. That includes GET and POST variables and cookies. Data taken from a local database cannot be trusted if it was previously entered by a user without being sanatised first.