Design: Out of Process Hosted Mode (OOPHM)
Introduction & Motivation
GWT's hosted mode browser is an essential part of developing GWT applications. It allows developers to use a standard java debugger to debug GWT/Java code while that code actually affects a real production browser. The current architecture leverages the SWT browser bindings to run the browser instance inside of the hosted mode process. This approach has proved limiting for a number of reasons.
Limitations of the current approach
- It is difficult to support new versions of browsers. For example, we still use Mozilla 1.7.12 on Linux and a custom WebKit build on Mac OS X.
- Due to the way SWT embeds the browser, many plugins/extensions do not work. (Firebug, Google Gears, DOM Inspector)
- There are unpleasant AWT/SWT interactions that continue to require attention. Also, our reliance on AWT has increased in the past few releases and this is expected to continue.
- We only support one browser per platform (theoretically this could be worked around, but it would require a lot of work and have very high maintenance cost.
- We can't use hosted mode across platforms (for example, using IE from a Linux hosted mode across the network). Fixing a late bug on IE often requires setting up an IDE and importing the entire project.
Goals
- Support use of multiple browsers on each supported platform: Linux: Firefox 1.5+. Windows: Firefox 1.5+, IE6/7 and Safari3. OS X: Firefox 1.5+ and any WebKit browser.
- Enable the use of standard and current browser plugins, tools and capabilities (Firebug, DOM Inspector, Gears).
- Avoid version dependencies in supported browsers or system-supplied libraries (and minimize it where it is absolutely not possible).
- Provide user-visible performance no worse than the current implementation.
- Do nothing to impede "instant hosted mode" plans.
- User should be able to start a hosted mode session directly from the IDE, as it currently possible. This includes being able to debug that process in a meaningful way.
- Continue to support -noserver functionality and use cases.
- Minimize the total number of plugins required (i.e. favor cross-browser plugins over their browser specific brethren).
- Minimize platform-specific code. We should no longer need a gwt-dev-xxx.jar.
Non-goals
- We are specifically not trying to implement "instant hosted mode" with this change, although we don't want to do anything to prevent it later.
- Hosted mode across a high-latency network will not be specifically supported, but may work with limitations.
- Opera support. (This could become a goal at a later date)
- Provide an interface for third-party tools to leverage our communication protocols to the browser. (This could become a goal at a later date)
Use Cases
- Retain the original use case - A GWT developer should be able to launch and debug a GWT application from within a standard Java debugger and IDE. This means that the spawning process must be a jvm instance and we must not do anything to obscure useful stack trace information.
- Debugging in multiple browsers - A GWT developer should be able to launch and debug the same GWT application (or different applications) from different browser instances. Of course, the browsers can be the same type of browser or multiple tabs in the same browser. (There is, however, one caveat in debugging two applications in different tabs. Most browsers have a single event queue for all of the tabs. So a breakpoint in one of the applications will prevent a tab switch so long as execution is suspended.)
- Remote debugging - I'm not including this on at this point.
Design & Architecture
Overview
The diagram below gives a high-level picture of how all the parts fit together in out-of-process mode. Each of the different components shown is explained in greater detail in the following sections.
User Interface
The following is an incomplete sketch of the new UI for hosted mode. The mocks will be updated again soon, but this illustrates the primary motivation for updating the UI. The possibility of having multiple browsers running modules in the same hosted mode server requires more visibility and separation of the different clients and their associated reporting. Another component that is not represented currently is the embedded Tomcat instance. That will be fixed in the next mock.
Browser Channel / Communications Protocol
All communication between the hosted GWT module and the corresponding JavaScript environment (browser) takes place via a TCP socket. The two sides communicate through asynchronous message passing to allow method invocations to be re-entrant onto the same thread. This maintains the constraint that the hosted mode process be debuggable with a standard Java debugger. A simple example of a re-entrant invocation is given below which demonstrates the need for non-synchronous dispatch. A channel is established for each GWT module that is being hosted and the channel setup is initiate from the Browser Plugin. The hosted mode process acts as a TCP server listening for connections and instantiating modules (and their associated infrastructure) on demand. Note this also means that multiple modules on a single host page will establish multiple channels.
Consider the following GWT code:
public class MyEntryPoint implements EntryPoint {
private static native int jsniMethod() /*-{
return 1;
}-*/;
public void onModuleLoad() {
jsniMethod();
}
}Executing this code in the hosted mode browser requires the following steps:
- JavaScript: the browser plugin sends a LoadModuleMessage with the module name.
- Java: the hosted mode server receives the LoadModuleMessage, loads the module and invokes the onModuleLoad in the corresponding EntryPoints. In this case MyEntryPoint::onModuleLoad is called. When onModuleLoad invokes jsniMethod an InvokeMessage is sent.
- JavaScript: This is the key part of the example. The JavaScript engine is currently awaiting a return from the LoadModuleMessage it sent, but it must be in a position to invoke the call to MyEntryPoint::jsniMethod on the same thread. This is accomplished by having the thread enter a read-and-dispatch routine following every remote invocation. In this case, the thread receives the InvokeMessage, invokes jsniMethod and sends a ReturnMessage containing the value 1.
- Java: The read-and-dispatch routine receives the ReturnMessage and knows to return from the call to jsniMethod. Having fully executed the onModuleLoad method it sends a ReturnMessage and falls back into a top level read-and-dispatch loop. (Since all calls originate from the browser's UI event dispatch, only the hosted mode server needs to remain in a read-and-dispatch routine during idle time. The browser simply returns control by exiting the JavaScript function that was originally called.)
To further illustrate this functionality, the following is a simplified state diagram shows how the messaging scheme simulates method invocation over an asynchronous messaging channel.
The wire format for the communications protocol is a simple binary format. A need may arise for something more elaborate at a later date, but we have elected for the simplest possible scheme that works for now. The details of each message's binary format is given below along with formats for primitive data types.
Messages
NOTE: This is likely not a complete list of the messages that exist in the system.
LoadModuleMessage: requests that the hosted mode server load and begin executing a module.
| type (byte) | version (int) | module name (string) | user agent (string) |
InvokeMessage: used to do method invocation on Java and JavaScript objects.
| type (byte) | method (string) | this (Value) | number of args (int) | args (Value[]) |
EvaluateMessage: used to evaluate JavaScript code in the browser.
| type (byte) | this (Value) | js code (string) |
QuitMessage: used to cooperatively shutdown the browser channel.
ReturnMessage: - used to send the return values associated with invoke and evaluate messages.
| type (byte) | return value (Value) |
ExceptionMessage: - used to throw exceptions associted with invoke and evaluate messages.
| type (byte) | exception reference (Value) |
* all strings are encoded as a length, n, followed by n bytes of data containing the string in utf8 encoding.
** the encoding of values is given below.
Values
null
boolean
| tag (byte) | value (8 bit signed) |
byte
| tag (byte) | value (8 bit signed) |
char
| tag (byte) | value (16 bit signed) |
float
| tag (byte) | value (32 bit IEEE 754 ) |
double
| tag (byte) | value (64 bit IEEE 754 ) |
string
| tag (byte) | length (32 bit signed) | data (utf8 data, variable length) |
java object (this is an instance that exists in the JVM process)
| tag (byte) | ref id (32 bit signed) |
javascript object (this is an instance that exists in the browser process)
| tag (byte) | ref id (32 bit signed) |
Browser Plugin
The browser plugin is responsible for handling and dispatch messages in the browser and also for interacting with the browser's JavaScript engine. Each plugin consists of two conceptual parts: browser-specific functionality for interacting with the JavaScript engine and a set of shared C++ classes that implement the communication channel and message serialization. We continue to make every effort to implement the plugins using common and standard APIs (like NPAPI/npruntime), but where that is insufficient we rely on proprietary (but public) plugin APIs. Below is a list of the supported browsers and the APIs we are using, or planning to use.
WebKit - WebKit (WBPL) Plugin (Sadly, npruntime has limitations we haven't been able to overcome in WebKit)
Mozilla - NPAPI/npruntime
IE6/7 - ActiveX control
Opera - Unsupported (NPAPI/npruntime when we confirm that they have finally implemented it fully)
Hosted GWT module space
The infrastructure in place in the current version of hosted mode has remained largely intact. At a very high level, this new model for hosted mode replaces the implementation of the JavaScriptHost interface (which provides an interface directly to the corresponding JavaScript environment) with the BrowserChannel construct that is described above. We are intentionally avoiding a massive restructuring of the hosted space infrastructure at this point.
Security Considerations
At this point, we present the security considerations without explicitly identifying solutions. We will update this again soon to propose solutions.
The biggest threat vector comes from the fact that the hosted mode functionality is a general purpose plugin that is instantiable in the browser you use daily by any site. A couple of other issues that come into play here are, (1) using the hosted server UI to validate a user's intent to debug is problematic since that would require the plugin to open a socket to a potentially private address (2) NPAPI and other page based plugins do not have a reliable way to interact with the browser chrome to present dialogs to the user.
Planned Milestones
- Feb. 22nd - Demo of WebKit plugin working against mock server and server working against mock plugin.
- Mar. 07th - Functioning version of OOPHM on WebKit using the current !GWTShell UI (in SWT).
- Mar. 21st - Add NPAPI plugin (for Mozilla and potentially Opera) and convert the hosted mode server over to the new swing UI.
- (contingent on scheduling IE integration help) - Add IE6/7 support and go feature complete.