Round-trip time (RTT) is the time it takes for a client to send a request and the server to send a response over the network, not including the time required for data transfer. That is, it includes the back-and-forth time on the wire, but excludes the time to fully download the transferred bytes (and is therefore unrelated to bandwidth). For example, for a browser to initiate a first-time connection with a web server, it must incur a minimum of 3 RTTs: 1 RTT for DNS name resolution; 1 RTT for TCP connection setup; and 1 RTT for the HTTP request and first byte of the HTTP response. Many web pages require dozens of RTTs.
RTTs vary from less than one millisecond on a LAN to over one second in the worst cases, e.g. a modem connection to a service hosted on a different continent from the user. For small download file sizes, such as a search results page, RTT is the major contributing factor to latency on "fast" (broadband) connections. Therefore, an important strategy for speeding up web page performance is to minimize the number of round trips that need to be made. Since the majority of those round trips consist of HTTP requests and responses, it's especially important to minimize the number of requests that the client needs to make and to parallelize them as much as possible.
Reducing the number of unique hostnames from which resources are served cuts down on the number of DNS resolutions that the browser has to make, and therefore, RTT delays.
Before a browser can establish a network connection to a web server, it must resolve the DNS name of the web server to an IP address. Since DNS resolutions can be cached by the client's browser and operating system, if a valid record is still available in the client's cache, there is no latency introduced. However, if the client needs to perform a DNS lookup over the network, the latency can vary greatly depending on the proximity of a DNS name server that can provide a valid response. All ISPs have DNS servers which cache name-IP mappings from authoritative name servers; however, if the caching DNS server's record has expired, and needs to be refreshed, it may need to traverse several nodes in the DNS serving hierarchy, sometimes around the globe, to find an authoritative server. If the DNS resolvers are under load, they can queue DNS resolution requests, which further adds to the latency. In other words, in theory, DNS resolution takes 1 RTT to complete, but in practice, the latency can vary significantly due to DNS resolver queuing delays. It's therefore important to reduce DNS lookups more than any other kinds of requests.
The validity of a DNS record is determined by the time-to-live (TTL) value set by its primary authoritative server; many network administrators set the TTL to very low (between 5 minutes and 24 hours) to allow for quick updates in case network traffic needs to be shifted around. (However, many DNS caches, including browsers, are "TTL disobeyers" and keep the cached record for longer than instructed by the origin server, up to 30 minutes in some cases.) There are a number of ways to mitigate DNS lookup time — such as increasing your DNS records' time-to-live setting, minimizing CNAME records (which require additional lookups), replicating your name servers in multiple regions, and so on — but these go beyond the scope of web application development, and may not be feasible given your site's network traffic management requirements.
Instead, the best way to limit DNS-lookup latency from your application is to minimize the number of different DNS lookups that the client needs to make, especially lookups that delay the initial loading of the page. The way to do that is to minimize the number of different hostnames from which resources need to be downloaded. However, because there are benefits from using multiple hostnames to induce parallel downloads, this depends somewhat on the number of resources served per page. The optimal number is somewhere between 1 and 5 hosts (1 main host plus 4 hosts on which to parallelize cacheable resources). As a rule of thumb, you shouldn't use more than 1 host for fewer than 6 resources; fewer than 2 resources on a single host is especially wasteful. It should never be necessary to use more than 5 hosts (not counting hosts serving resources over which you have no control, such as ads).
Minimizing HTTP redirects from one URL to another cuts out additional RTTs and wait time for users.
Sometimes it's necessary for your application to redirect the browser from one URL to another. There are several reasons web applications issue redirects:
Whatever the reason, redirects trigger an additional HTTP request-response cycle and add round-trip-time latency. It's important to minimize the number of redirects issued by your application — especially for resources needed for starting up your homepage. The best way to do this is to restrict your use of redirects to only those cases where it's absolutely technically necessary, and to find other solutions where it's not.
One popular way of recording page views in an
asynchronous fashion is to include a JavaScript snippet at the bottom
of the target page (or as an onload event
handler), that notifies a logging server when a user loads the
page. The most common way of doing this is to construct a request to
the server for a "beacon", and encode all the data of interest
as parameters in the URL for the beacon resource. To keep the HTTP
response very small, a transparent 1x1-pixel image is a good
candidate for a beacon request. A slightly more optimal beacon would
use an HTTP 204 response ("no content") which is marginally smaller
than a 1x1 GIF.
Here is a trivial example that assumes that www.example.com/logger is the logging server, and that requests an image called beacon.gif. It passes the URL of the current page and the URL of the referring page (if there is one) as parameters:
<script type="text/javascript">
var thisPage = location.href;
var referringPage = (document.referrer) ? document.referrer : "none";
var beacon = new Image();
beacon.src = "http://www.example.com/logger/beacon.gif?page=" + encodeURI(thisPage)
+ "&ref=" + encodeURI(referringPage);
</script>
This type of beacon is best included at the very end of the page's HTML to avoid competing with other HTTP requests that are actually needed to render the page contents. In that way, the request is made while the user is viewing the page, so no additional wait time is added.
Location
header set to the new URL. http-equiv="refresh"
attribute in the meta
tag or set the JavaScript window.location
object
(with or without the replace() method) in the
head of the HTML document. If you must use a redirect mechanism, prefer the server-side method over client-side methods. Browsers are able to handle HTTP redirects more efficiently than meta and JavaScript redirects. For example, JS redirects can add parse latency in the browser, while 301 or 302 redirects can be processed immediately, before the browser parses the HTML document.
In addition, according to the HTTP/1.1 specification,
301 and
302
responses can be cached by the browser. This means that even if the
resource
itself is not cacheable, the browser can at least look up the
correct URL in its local cache. 301 responses are cacheable by default
unless otherwise specified. To make a 302 response cacheable, you need
to configure your web server to add an Expires
or Cache-Control max-age header (see
Leverage
browser caching for details). The caveat here is that many
browsers don't actually honor the spec, and won't cache either 301 or
302 responses; see
Browserscope
for a list of conforming and non-conforming browsers.
Google
Analytics uses the image beacon method to track inbound,
internal, and outbound traffic on any web page owned by an Analytics
account holder. The
account owner embeds a reference to an external JavaScript file in the
web page, which defines a function called trackPageview().
At the bottom of the document body, the page includes a JavaScript
snippet that calls this function when a viewer requests the page. The trackPageview()
function constructs a request for a 1x1-pixel image called __utm.gif,
with multiple parameters in the URL. The parameters specify variables
such as the page URL, referring page, browser settings, user
locale, and so on. When the Analytics server gets the
request, it logs the information and can serve it
to account holders when they sign in to the reporting site.
mod_rewrite.
Removing "broken links", or requests that result in 404/410 errors, avoids wasteful requests.
As your website changes over time, it's inevitable that resources will be moved and deleted. If you don't update your frontend code accordingly, the server will issue 404 "Not found" or 410 "Gone" responses. These are wasteful, unnecessary requests that lead to a bad user experience and make your site look unprofessional. And if such requests are for resources that can block subsequent browser processing, such as JS or CSS files, they can virtually "crash" your site. In the short term, you should scan your site for such links with a link checking tool, such as the crawl errors tool in Google's Webmaster Tools, and fix them. Long term, your application should have a way of updating URL references whenever resources change their location.
Combining external scripts into as few files as possible cuts down on RTTs and delays in downloading other resources.
Good front-end developers build web applications in modular, reusable components. While partitioning code into modular software components is a good engineering practice, importing modules into an HTML page one at a time can drastically increase page load time. First, for clients with an empty cache, the browser must issue an HTTP request for each resource, and incur the associated round trip times. Secondly, most browsers prevent the rest of the page from from being loaded while a JavaScript file is being downloaded and parsed. (For a list of which browsers do and do not support parallel JS downloads, see Browserscope.)
Here is an example of the download profile of an HTML file containing requests for 13 different .js files from the same domain; the screen shot is taken from Firebug's Net panel over a DSL high-speed connection with Firefox 3.0+:

All files are downloaded serially, and take a total of 4.46 seconds to complete. Now here is the the profile for the same document, with the same 13 files collapsed into 2 files:

The same 729 kilobytes now take only 1.87 seconds to download. If your site contains many JavaScript files, combining them into fewer output files can dramatically reduce latency.
However, there are other factors that come into play to
determine
the optimal number of files to be served. First, it's important also to
defer
loading JS code
that is not needed at a page's startup. Secondly, some code may have
different versioning needs, in which case you will want to separate it
out into files. Finally, you
might have to serve JS from domains that you don't control, such as
tracking scripts or ad scripts. We recommend a maximum of 3, but
preferably 2, JS files.
It often makes sense to use many different JavaScript files
during the development cycle, and then bundle those JavaScript files
together as part of your deployment process. See below for recommended
ways of partitioning your files. You would also need to update all of
your pages to refer to the bundled files as part of the deployment
process.
<head>
as possible,
and keep the size of those files to a minimum. Combining external stylesheets into as few files as possible cuts down on RTTs and delays in downloading other resources.
As with external JavaScript, multiple external CSS files incurs additional RTT overhead. If your site contains many CSS files, combining them into fewer output files can reduce latency. We recommend a maximum of 3, but preferably 2, CSS files.
It often makes sense to use many different CSS files during the development cycle, and then bundle those CSS files together as part of your deployment process. See below for recommended ways of partitioning your files. You would also need to update all of your pages to refer to the bundled files as part of the deployment process.
Combining images into as few files as possible using CSS sprites reduces the number of round-trips and delays in downloading other resources, reduces request overhead, and can reduce the total number of bytes downloaded by a web page.
Similar to JavaScript and CSS, downloading multiple images incurs additional round trips. A site that contains many images can combine them into fewer output files to reduce latency.
Correctly ordering external stylesheets and external and inline scripts enables better parallelization of downloads and speeds up browser rendering time.
Because JavaScript code can alter the content and layout of a web page, the browser delays rendering any content that follows a script tag until that script has been downloaded, parsed and executed. However, more importantly for round-trip times, many browsers block the downloading of resources referenced in the document after scripts until those scripts are downloaded and executed. On the other hand, if other files are already in the process of being downloaded when a JS file is referenced, the JS file is downloaded in parallel with them. For example, let's say you have 3 stylesheets and 2 scripts and you specify them in the following order in the document
<head>
<link rel="stylesheet" type="text/css" href="stylesheet1.css" />
<script type="text/javascript" src="scriptfile1.js" />
<script type="text/javascript" src="scriptfile2.js" />
<link rel="stylesheet" type="text/css" href="stylesheet2.css" />
<link rel="stylesheet" type="text/css" href="stylesheet3.css" />
</head>
Assuming that each one takes exactly 100 milliseconds to download, that the browser can maintain up to 6 concurrent connections for a single host (for more information about this, see Parallelize downloads across hostnames), and that the cache is empty, the download profile will look something like this:

The second two stylesheets must wait until the JS files are finished downloading. The total download time equals the time it takes to download both JS files, plus the largest CSS file (in this case 100 ms + 100 ms + 100 ms = 300 ms). Merely changing the order of the resources to this:
<head>
<link rel="stylesheet" type="text/css" href="stylesheet1.css" />
<link rel="stylesheet" type="text/css" href="stylesheet2.css" />
<link rel="stylesheet" type="text/css" href="stylesheet3.css" />
<script type="text/javascript" src="scriptfile1.js" />
<script type="text/javascript" src="scriptfile2.js" />
</head>
Will result in the following download profile:

100 ms is shaved off the total download time. For very large stylesheets that can take longer to download, the savings could be more.
Therefore, since stylesheets should always be specified in the head of a document for better performance, it's important, where possible, that any external JS files that must be included in the head (such as those that write to the document) follow the stylesheets, to prevent delays in download time.
Another, more subtle, issue is caused by the presence of an inline script following a stylesheet, such as the following:
<head>
<link rel="stylesheet" type="text/css" href="stylesheet1.css" />
<script type="text/javascript">
document.write("Hello world!");
</script>
<link rel="stylesheet" type="text/css" href="stylesheet2.css" />
<link rel="stylesheet" type="text/css" href="stylesheet3.css" />
<link rel="alternate" type="application/rss+xml" href="front.xml" title="Say hello" />
<link rel="shortcut icon" type="image/x-icon" href="favicon.ico">
</head>
In this case, the reverse problem occurs: the first stylesheet actually blocks the inline script from being executed, which then in turn blocks other resources from being downloaded. Again, the solution is to move the inline scripts to follow all other resources, if possible, like so:
<head>
<link rel="stylesheet" type="text/css" href="stylesheet1.css" />
<link rel="stylesheet" type="text/css" href="stylesheet2.css" />
<link rel="stylesheet" type="text/css" href="stylesheet3.css" />
<link rel="alternate" type="application/rss+xml" title="Say hello" href="front.xml" />
<link rel="shortcut icon" type="image/x-icon" href="favicon.ico">
<script type="text/javascript">
document.write("Hello world!");
</script>
</head>
Using document.write() to fetch external resources, especially early in the document, can significantly increase the time it takes to display a web page.
Modern browsers use speculative parsers to more efficiently discover external resources referenced in HTML markup. These speculative parsers help to reduce the time it takes to load a web page. Since speculative parsers are fast and lightweight, they do not execute JavaScript. Thus, using JavaScript's document.write() to fetch external resources makes it impossible for the speculative parser to discover those resources, which can delay the download, parsing, and rendering of those resources.
Using document.write() from external JavaScript resources is especially expensive, since it serializes the downloads of the external resources. The browser must download, parse, and execute the first external JavaScript resource before it executes the document.write() that fetches the additional external resources. For instance, if external JavaScript resource first.js contains the following content:
document.write('<script src="second.js"><\/script>');
The download of first.js and second.js will be serialized in all browsers. Using one of the recommended techniques described below can reduce blocking and serialization of these resources, which in turn reduces the time it takes to display the page.
<script> tag like so:
<html>
<body>
<script>
document.write('<script src="example.js"><\/script>');
</script>
</body>
</html>
insert the document.written script tag directly into the HTML:
<html>
<body>
<script src="example.js"></script>
</body>
</html>
Using CSS @import in an external stylesheet can add additional delays during the loading of a web page.
CSS @import
allows stylesheets to import other stylesheets. When CSS @import is
used from an external stylesheet, the browser is unable to download
the stylesheets in parallel, which adds additional round-trip times
to the overall page load. For instance, if first.css contains the following content:
@import url("second.css")
The browser must download, parse, and
execute first.css before it is able to discover that it
needs to download second.css.
<link> tag for each stylesheet. This allows the browser to download stylesheets in parallel, which results in faster page load times:
<link rel="stylesheet" href="first.css">
<link rel="stylesheet" href="second.css">
Fetching resources asynchronously prevents those resources from blocking the page load.
When a browser parses a traditional script tag, it must wait for the script to download, parse, and execute before rendering any HTML that comes after it. With an asynchronous script, however, the browser can continue parsing and rendering HTML that comes after the async script, without waiting for that script to complete. When a script is loaded asynchronously, it is fetched as soon as possible, but its execution is deferred until the browser's UI thread is not busy doing something else, such as rendering the web page.
JavaScript resources that aren't needed to construct the initial view of the web page, such as those used for tracking/analytics, should be loaded asynchronously. Some scripts that display user-visible content may also be loaded asynchronously, especially if that content is not the most important content on the page (e.g. it is below the fold).
Using a script DOM element maximizes asynchronous loading across current browsers:
<script>
var node = document.createElement('script');
node.type = 'text/javascript';
node.async = true;
node.src = 'example.js';
// Now insert the node into the DOM, perhaps using insertBefore()
</script>
Using a script DOM element with an async attribute allows for asynchronous loading in Internet Explorer, Firefox, Chrome, and Safari. By contrast, at the time of this writing, an HTML <script> tag with an async attribute will only load asynchronously in Chrome or Firefox 3.6 or newer, as other browsers do not yet support this mechanism for asynchronous loading.
Serving resources from two different hostnames increases parallelization of downloads.
The HTTP 1.1 specification (section 8.1.4) states that browsers should allow at most two concurrent connections per hostname (although newer browsers allow more than that: see Browserscope for a list). If an HTML document contains references to more resources (e.g. CSS, JavaScript, images, etc.) than the maximum allowed on one host, the browser issues requests for that number of resources, and queues the rest. As soon as some of the requests finish, the browser issues requests for the next number of resources in the queue. It repeats the process until it has downloaded all the resources. In other words, if a page references more than X external resources from a single host, where X is the maximum connections allowed per host, the browser must download them sequentially, X at a time, incurring 1 RTT for every X resources. The total round-trip time is N/X, where N is the number of resources to fetch from a host. For example, if a browser allows 4 concurrent connections per hostname, and a page references 100 resources on the same domain, it will incur 1 RTT for every 4 resources, and a total download time of 25 RTTs.
You
can get around this restriction by serving resources from multiple
hostnames. This "tricks" the browser into parallelizing additional
downloads, which leads to faster page load times. However, using
multiple concurrent connections can cause increased CPU usage on the
client, and introduces additional round-trip time for each new TCP
connection setup, as well as DNS lookup latency for clients with empty
caches. Therefore, beyond a certain number of connections, this
technique can actually degrade performance. The optimal number of hosts
is generally believed to be between 2 and 5, depending on various
factors such as the size of the files, bandwidth and so on. If
your pages serve large numbers of static resources, such as images,
from a single hostname, consider splitting them across multiple
hostnames using DNS aliases. We recommend this technique for any page
that serves more than 10 resources from a single host. (For
pages
that serve fewer resources than this, it's overkill.)
To set up additional hostnames, you can configure subdomains in your DNS database as CNAME records that point to a single A record, and then configure your web server to serve resources from the multiple hosts. For even better performance, if all or some of the resources don't make use of cookie data (which they usually don't), consider making all or some of the hosts subdomains of a cookieless domain. Be sure to evenly allocate all the resources to among the different hostnames, and in the pages that reference the resources, use the CNAMEd hostnames in the URLs.
If you host your static files using a CDN, your CDN may support serving these resources from more than one hostname. Contact your CDN to find out.
On the other hand, many browsers do not download JavaScript files in parallel*, so there is no benefit from serving them from multiple hostnames. So when balancing resources across hostnames, remove any JS files from your allocation equation.
*For a list of browsers that do and do not support parallel downloading of JavaScript files, see Browserscope.
To display its map images, Google Maps delivers multiple small images called "tiles", each of which represents a small portion of the larger map. The browser assembles the tiles into the complete map image as it loads each one. For this process to appear seamless, it's important that the tiles download in parallel and as quickly as possible. To enable the parallel download, the application assigns the tile images to four hostnames, mt0, mt1, mt2 and mt3. So, for example, in Firefox 3.0+, which allows up to 6 parallel connections per hostname, up to 24 requests for map tiles could be made in parallel. The following screen shot from Firebug's Net panel shows this effect in Firefox: 15 requests, across the hostnames mt[0-3] are made in parallel:
