|
DesignDocumentForClientURLInternetEmissionSniffer
Shaopeng Jia and Erik van der Poel, 25 Nov 2009 IntroductionThis is a design document for an automated browser URL transformation testing tool. It describes the need for browser URL behavior testing, the current solution, and possible future plans to enhance the tool. BackgroundThere are many URL parsing, escaping and encoding details, and the browsers and platforms often differ in subtle ways. Client implementers are interested in being compatible with the major implementations, and in canonicalizing URLs for storage in internal data structures. The tool described here will help people stay informed of the major implementations as they evolve over time. We also publish some of the differences between the browsers and recommendations for browser developers in the hope that browsers will align more. Current SituationTests are being carried out across all major modern browsers (IE 6, 7, 8; Firefox 2, 3.0, 3.5; Safari 3, 4; Chrome 2, 3; Opera 9, 10) on all the major platforms (Windows XP, Windows Vista, Windows 7, Mac, Linux and Android 1.6, 2.0 and iPhone 3.0). Over 1500 tests have been created to test various parts of the URL (such as host, path, query, etc) and HTML form submissions. The generated reports are available here. Read the README file to see how the result folders are organized. When viewing the results in code.google.com, click "View raw file" to see formatted reports. Possible Future Goals
alert( document.getElementById('bar').getAttribute('src').indexOf('\n') ); Detailed DesignIn a nutshell, the tool automatically generates testcases which are URLs that contain strings of interest. The testcases are then loaded on each browser/platform, and the tool reports how the strings of interest are handled by the browser/platform by analyzing the corresponding DNS and HTTP packets that were sent out. The tool then automatically generates formatted reports with results from the specified browsers/platforms listed side-by-side. Differences among browsers/platforms are highlighted in yellow. The testing process has 4 independent steps:
Test Page GenerationTest pages are automatically generated by the code at test-page-generator.cc, and live in the folder test_pages. To generate test pages, test-page-generator.cc makes use of testcases.cc, which contains all the test cases we want to test. New tests could be added by modifying testcases.cc. To regenerate tests, first cd to trunk/, then run the following commands: % g++ -Wall -g source/testcases.cc source/test-page-generator.cc -o test-page-generator % ./test-page-generator The generated test pages are placed under the directory test_pages/ Currently over 1300 test cases are generated. These contain tests for various parts of the URL (host, path, parameter and query) and HTTP form submission, for different character encodings (ASCII and Big5). Tests for other encodings could easily be added. Each test case is a URL that is embedded in a html file as <img src="URL to test">. The benefit of doing this is that all the URLs we want to test will be loaded automatically when the page is loaded, so people don't have to manually click each URL to test it. The only exception is HTML form testing, which is described below. Each URL is constructed in a way which makes it easy for us to do packet sniffing later. For that purpose, we embed the test case between special character sequences, so that our packet sniffer could later easily retrieve this information without parsing the packet in detail. In particular, we use "9qz" to enclose the string we want to test in an URL. In addition, we use "9pz" to enclose the test ID for a test case, so that the test result for a test case could be put into the right place during report generation. Tests for host, path, parameter and query all follow this scheme. Here are some examples for each of them, for the escaped Ascii test case %00: <tr><td>0</td><td><img src="http://9pz09pz9qz%009qz.wildcard.invalid./">%00</td></tr> <tr><td>256</td><td><img src="http://256.wildcard.invalid./9pz2569pz9qz%009qz">%00</td></tr> <tr><td>512</td><td><img src="http://512.wildcard.invalid./search;q=9pz5129pz9qz%009qz">%00</td></tr> <tr><td>768</td><td><img src="http://768.wildcard.invalid./search?q=9pz7689pz9qz%009qz">%00</td></tr> HTML form tests also follow the scheme, but instead of being a URL, each test case is a HTML form whose content contains the string we want to test. The string still follows the "9pz" and "9qz" scheme, with "9pz" enclosing the test ID and "9qz" enclosing the test string. For example, below are 3 test cases: <form name='form1309' method='get' action='http://http204.invalid' target='frame1309'> <input type='text' name='query' value='9pz13099pz9qz%009qz' /></form> <iframe name='frame1309' width='0' height='0' frameborder='0' /> <form name='form1310' method='get' action='http://http204.invalid' target='frame1310'> <input type='text' name='query' value='9pz13109pz9qz%019qz' /></form> <iframe name='frame1310' width='0' height='0' frameborder='0' /> <form name='form1311' method='get' action='http://http204.invalid' target='frame1311'> <input type='text' name='query' value='9pz13119pz9qz%029qz' /></form> <iframe name='frame1311' width='0' height='0' frameborder='0' /> To make sure all the forms are automatically submitted when the page is loaded, we add the following script to the HTML file: <script type='text/javascript'>
function myfunction() {
document.form1309.submit();
document.form1310.submit();
document.form1311.submit();
}
window.onload = myfunction;
</script>A few interesting points to note:
http://9pz10249pz9qz.十.9qz.wildcard.invalid./ The reason for doing this is that international domain names in a URL are converted to Punycode by browsers before sending out. As part of the process, the host name is reordered in such a way that our 9qz pairs are no longer surrounding the test results. For example, the above URL without dot surrounding 十 is encoded as http://xn--9pz10249pz9qz9qz-i970a.wildcard.invalid in punycode. Surrounding it with dot solves the problem, as reordering occurs between dots.
As of July 2009, Opera enables HTTP Pipelining by default (and there is no easy way to turn it off); Firefox supports it, but turns it off by default; IE and Chrome don't currently support it. Our packet sniffer doesn't work reliably under HTTP pipelining because it assumes a HTTP packet contains one and only one "9pz" pair and "9qz" pair. The hostname uniqueness prevents HTTP pipelining as HTTP packets cannot be pipelined when they are sent to different sockets.
Link InvocationTest pages are generated in the last step as html files containing <img src="..."> where the src link is the URL we want to test; link invocation is automatic when a test page is loaded. For HTTP FORM tests, loading the page will trigger the onload event, which will invoke myfunction and submit all the forms. In this step, Wireshark is used to capture packets generated by loading test files and save them in .pcap files, which could later be analyzed by our packet-sniffer. It is advisable to use a simple filter to minimize the size of the .cap file, for example, ip.src == "<ip of machine under test>" and (dns or http). It is also helpful to load the test page more than once, so packet loss will be minimized. The approach of using Wireshark to first capture packets into .pcap file then analyzing them using libpcap has several advantages:
Each browser/platform of interest should have a .pcap file storing packets captured from the loading all test pages, and they should be organized into the following structure: <Platform>/<Browser> Other important points to note during this step:
We considered Selenium as a possible candidate to automate the link invocation process. If offers easy ways to programatically launch browsers, and to automatically click links and buttons on pages, and it supports almost all browser/platforms. However, this approach was dropped due to the concern that Selenium Server acts as a client-configured proxy. The difference is subtle: When an HTTP client sends a request to an HTTP proxy, it sends the entire URL (including the host name) to the proxy, which then processes the request. When an HTTP client processes a URL by itself, it parses the URL to find the host name, sends a DNS packet to look up the IP address of the host, and then makes a TCP connection to the HTTP port (80, by default) at that IP address. Since we want to test both the DNS and HTTP behavior of the browsers, we want to avoid using an HTTP proxy. Packet sniffingIn the previous step, packets generated by loading our test files are captured and stored in .pcap files. In this step, we go through each packet to extract results generated by our tests and store them in arrays, so that formatted reports could be generated in the next step. This is achieved by our packet-sniffer at source/packet-sniffer.cc. We use libpcap to analyze the .pcap file. Libpcap is a mature and well-maintained packet capturing library written in C, and it is the underlying library used by Wireshark. We have also considered Jpcap and JNetPcap (both of which are Java wrappers around libpcap), but decided not to use them because they are either immature or are not actively developed. For the purpose of this project, we are primarily interested in DNS and HTTP packets containing the test URL. DNS packet In terms of pcap filter language, DNS packet could be identified by the filter rule: "udp dst port 53". If no DNS packet is found, <not sent> will be reported to indicate "no dns packet was sent". There might be multiple DNS packets matching our criteria. In that case, we choose the first packet that matches our criteria and extract the part of dns.qry.name that is of interest to us. HTTP packet In terms of pcap filter language, HTTP packet could be identified by the filter rule: "tcp dst port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)". In this expression:
This expression says that the HTTP packet is a TCP packet going to destination port 80 and it contains a non-empty HTTP message. Report generationThe report generator lives at source/report-generator.cc. It could be invoked by passing in mutiple .pcap files as parameters. A report will be generated by listing the results from each .pcap file vertically side-by-side. To generate reports, run: g++ -Wall -g source/report-generator.cc source/packet-sniffer.cc source/testcases.cc -o report-generator -lpcap ./report-generator output/folder/ path/to/pcap/files/MacOSX10_5_7/FF3_0_11.pcap path/to/pcap/files/MacOSX10_5_7/Safari4_0.pcap path/to/pcap/files/MacOSX10_5_7/Chrome3_0_18.pcap The example above generates reports which contain test results side by side for FireFox 3.0.11, Safari 4.0 and Chrome 3.0.18 on MacOS X 10.5.7. A report will be generated for each character encoding and for each URL component. Reports across other browsers/platforms are generated in a similar manner. For a set of pre-generated results, see /test_results Mobile TestingSince Q4 2009, testing on mobile browsers are included into this project. At this moment, we are primarily focused on the iPhone and Android platforms. Here is how to test mobile browsers on the two platforms:
Related ProjectsReferences |