Hackers love cross site scripting (XSS). The range of mischief they can cause and the information they can harvest using XSS is amazing at first sight.
XSS allows everything, from automated attacks that are initiated by the victim opening a seemingly innocent web page to code injections and session hijacking. What's more, despite having been around since the 1990s, XSS now forms the basis of a growing number of high profile attacks.
What frustrates online security experts is that XSS attacks are also completely avoidable using simple creative measures that are designed to make them impossible.
Cross site anatomy
Your web browser is a wonderful piece of software. It can be expanded as needed in so many directions at once that it has become the dominant method of interaction with the internet. Now, in an age of on-demand web apps, your browser is more important than ever.
However, this ubiquity and extensibility makes XSS attacks easy. This situation arose because whenever a web browser meets the <SCRIPT> directive in a piece of HTML, it simply executes the resultant script. While this facility can make websites bristle with useful functionality, it can also have a nasty side effect.
To demonstrate, imagine a website that requires you to register an account. You enter your email address, think up a username and create a password. So far, so good. Now suppose that this same website has a personal page for each member. You surf to a specific member's page and their username is displayed at the top, along with a list of their interests and posting statistics.
How can this page possibly be a threat to your online security?
The answer lies in the username. A potential XSS vulnerability occurs if the member's username is not 'sanitised'. This means that it doesn't have any non-alphanumeric characters stripped out before use.
This lack of checking means that at the end of his user name, a hacker can add a reference to a script that's hosted on another site, thereby giving the cross site scripting attack its name. Instead of displaying the suspicious script at the end of the username, the average browser simply displays the username up to the script directive, and runs the script.
Simply surfing to the hacker's infected profile page can induce your browser to silently load and run a malicious script. The script could do anything, and because you're logged into the site, it will run in your browser and access resources on the site as if it's you.
It's possible that the malicious script could steal your session cookies, your username and password, your real name and other important identifying information. If the website in question is a pay site, the script might be able to access your stored credit card details.
It may even be able to get into your personal messages and send messages pretending to be you. Your account may then begin recommending that other users click on a link that installs a botnet on their PCs or worse.
MANY AVENUES OF ATTACK: The chart above breaks down the top attack methods used by hackers
The attack described above is an example of a 'persistent' XSS. It is so-called because, as part of a stored username, it embeds itself in a target website and is triggered every time a browser accesses the infected page.
Such attack vectors can persist for years without anyone knowing. Even searching the site for a username will trigger the attack because many hand-crafted site searches don't sanitise the results they display.
The other type of XSS attack is called 'non-persistent', and is by far the most prevalent form used today. Rather than being tagged at the end of a website member's username or other piece of innocent looking text, the script is delivered as part of a URL – usually as part of an apparently genuine link in an email.
We all know not to click on links in unsolicited emails, and to turn off HTML rendering in email clients to show up fake domains, but what if the domain is genuine?
This is where non-persistent XSS attacks get their power to deceive. Complex URLs that can take many parameters make great places to hide a call to an injected script.
When the clued-up recipient of such an XSS URL examines it, they might only inspect the domain name:
<A HREF=http://www.mybank.com/login.cgi?clientprofile=<SCRIPT>script in here</SCRIPT>>Log in securely here</A>
The domain in this case is www.mybank.com. Look further along and you'll see that the URL actually points to the bank's login script (login.cgi). This may be as far as you try to understand the URL. If you're a user of My Bank then you might have no reason not to use the link to log in securely. The URL helpfully supplies a value for a variable called 'clientprofile'.
Such variables are very common in URLs and autofill input boxes for you, but they are also potentially vulnerable to XSS attacks. In this example, rather than the value of clientprofile being a string containing your username, it's a call to a script on another site, surrounded by <SCRIPT> markers. Clicking the link runs the script to fetch a value to submit for clientprofile.
If you were to remove everything from the question mark onwards, you could safely visit the website to log in. If you leave the URL as it is though, the hacker's malicious script will run on your machine, although you may not be aware of it happening. The only clue might be an incorrect username autofilling an input box.
Behind the scenes, however, you may have just installed a botnet client. This process is called 'reflecting' the attack to the hacker's site, which is why non-persistent XSS attacks are said to reflect.
So, despite the domain in the URL being legitimate and apparently OK to click, the script that gets called might never return a value to login.cgi. Instead it may send you to a fake website that appears to be the real thing to steal usernames and passwords, to infect you with malware, or commit a wide range of other profitable mischief.
The confused deputy
An exploit very closely related to XSS exploits is the cross site request forgery (XSRF), also known as the 'confused deputy'. It gets this name from a plot device in cowboy movies, where the sheriff is out of town and the baddie convinces the deputy to follow orders that appear to come from the sheriff himself.
A confused deputy attack is technically the inverse of an XSS attack. This means that rather than have the browser execute a malicious script, the hacker convinces a legitimate script on a legitimate site to do things as if the user had requested them.
Suppose you don't log out of your online bank after you finish paying your bills. You reason that the session will time out after a few minutes, so what's the point? You then surf to another site that an attacker knows you frequent. The attacker can craft an HTML element that lets remote content load, and which contains a call to a script on the bank's server. By passing commands to the bank's script, and while your session at the bank is still live, the attacker can pose as you.
An example of a suitable HTML element would be a remote image. Instead of containing the URL for the image to load, it would contain a call to the bank's script. The content of the image tags might look something like this:
As with the reflected XSS attack, the browser blindly tries to fulfil what's requested of it by calling the script to see whether it returns the address of the image it thinks it is supposed to load. If your session at the bank is still valid, the transfer script runs transfers and £1,000 from the account 'jon' to 'attacker'.
This is a very simplified example, but the confused deputy attack highlights the importance of secondary verification methods before carrying out important functions like transferring money. UK banks are very good at minimising the possibility of such attacks by insisting on the use of hardware devices like card readers to verify your identity, for example.
Always remembering to log out of important accounts and never using a website's 'remember me' login facility will also cut down the possibility of falling victim to such attacks.
Avoiding XSS exploits
We mentioned at the start that XSS vulnerabilities are simple to fix. Input sensitisation is the process of making whatever the user types safe for further processing and display. The process involves stripping out all characters that are not in the ranges a-z, A-Z and 0-9 before any other processing occurs. This removes any special characters that the browser might interpret as the start of a script directive.
Here's an example of using input sanitising to prevent an SQL injection attack that logs an attacker in without a password. For example, suppose we have a PHP login script that takes a username and password and checks them against the 'id' and 'password' fields in MySQL database of existing users:
$user = $_POST['username']; $pass = md5($_POST['password']; $query = 'SELECT id FROM users WHERE username="'.$user'" AND password="'.$pass'"; $return = mysql_query($sql);
If the value of $return is greater than zero, the username and password are valid. However, if you enter the following as a username (including the quotes): " OR password LIKE "%" – then the '%' is a wildcard, and tells MySQL to simply match any password.
The '-' is a comment, which tells MySQL to ignore everything else after it including the value of $pass. Doing so means that $return is always > 0, so the attacker is logged in without a password.
If you replace the first line of the PHP with: $user = mysql_real_escape_ string($_POST['username']); the function mysql_real_escape_ string will place a backslash before errant characters so they won't be passed to MySQL in their original form. The query fails and the attacker remains locked out.
There are several other functions in PHP that will sanitise input to forms, and we've listed some in the Resources box below. One such function is filter_input:
$user = filter_input(INPUT_POST, 'username', FILTER_SANITIZE_ STRING);
This takes the input field 'username' and filters it to just contain alphanumeric characters. It strips out all tags that could allow an attacker to slip a script directive into his input. The fact that so many major websites are being subjected to scripting attacks means they still have power to subvert the web despite preventative measures being simple to implement.
In the past year, Twitter, Facebook, The Daily Telegraph, McDonalds and scores of other high profile sites have fallen prey to XSS attacks, thereby highlighting the need to mistrust all input before it has been checked and sanitised.
Avoiding XSS vulnerabilities depends on the language in which you're writing web applications. We've assembled a detailed list of resources that will help keep your code safe from malicious input.
For a general XSS FAQ, go to cgisecurity.com. You'll find examples of XSS attacks here. For information on HTML tag sanitising in Java, visit ibm.com. You can read an older, but valid, research paper on XSS (in PDF format) here. Details of filtering input in PHP can be found here.
Article continues below