|
AcquiringDynamicLibraries
How dynamic libraries will be fetched
Acquiring dynamic librariesDraft
OverviewThe normal model for shared libraries in Unix is that libraries are installed into the filesystem by a package manager into the centralised locations /lib and /usr/lib. Native Client, however, does not have a built-in filesystem, and the concept of a centralised package manager is not applicable to web apps. Instead, we propose to use a virtualised filesystem namespace, implemented via IPC calls. Each NaCl process may be launched with a custom filesystem namespace populated with the library versions the web app chooses to use. How files are fetchedEach library and executable can be fetched from a URL. There are at least two interfaces through which libraries can be fetched for use in NaCl processes:
In principle, any mechanism that Javascript code can currently use for fetching data can be used for fetching libraries. However, using the latter, NaCl-specific descriptor-based interface has two advantages:
The basic interface for fetching files is therefore a Javascript API. We need a way to hook that up to Native Client. How files are requested by NaCl processesNaCl processes will open files by making requests over IPC, using NaCl IMC sockets. Javascript code running in the browser can handle these requests and call __urlAsNaclDesc() on behalf of the NaCl process. Javascript objects can provide a virtual file namespace that may contain a Unix-like file layout. The open() library function will be implemented as a remote procedure call which sends a message across an IMC socket and expects to receive a reply message containing a file descriptor. This open() implementation will be used by the dynamic linker and can be made available in libc/libnacl.
Receiving messages from NaCl asynchronously in JavascriptThe current NaCl Javascript API does not allow Javascript code to receive messages asynchronously from NaCl processes. We propose to extend the Javascript API to allow this. Javascript will need to be able to receive open() requests from the NaCl process. Currently the only way to do this is to busy-wait. Implementing this in the NaCl NPAPI plugin will require using NPN_PluginThreadAsyncCall(). Initial socket connectionsThe current interface assumes that the Javascript code will be sending requests to the NaCl process. The NaCl plugin creates the NaCl process with a BoundSocket descriptor. The NaCl process is expected to start by going into an imc_accept() loop on this descriptor to receive connections from Javascript. We would like to remove this assumption and allow the reverse arrangement. It should be possible to start the NaCl process with a SocketAddress descriptor -- or ideally, an array of NaCl descriptors of any descriptor type. The NaCl process should be able to send open() requests early on and should not need to call imc_accept() on startup. Prototype implementationI wrote a prototype of this earlier in 2009. As an example web app I implemented a Python read-eval-print loop (REPL), using CPython running under Native Client using dynamic linking. It is able to use Python extension modules such as Sqlite. The prototype works in Firefox on Linux. The code is in Git:
imcplugin provides the following interfaces to Javascript:
Fetches a file from the given URL. When the file becomes available, the plugin calls the callback function passing a Javascript wrapper object for a NaCl file descriptor. This simple interface lacks error handling for when the URL cannot be fetched.
Spawns a NaCl process. Under the hood, this runs sel_ldr.
Sends a message to the process. Messages consist of an array of bytes (represented as a Javascript string) and an array of file descriptor wrapper objects. (The latter array may of course be empty.) Call-return over IMCThere are two ways we might implement call-return on top of IMC sockets. Option 1: Use the same channel, C, for sending and receiving:
Option 2: Create a new channel for each request:
See IMCSockets for a further discussion. QuestionsHow will this interact with Web Workers? Sharing libraries across sitesIt will be desirable to share library files across sites, so that the browser does not have to download identical files multiple times. This issue already occurs for Javascript libraries. NaCl executables and libraries are expected to be larger than Javascript libraries which makes this issue more important for NaCl. Background: Same Origin PolicyXMLHttpRequest is constrained by a Same Origin Policy (SOP). __urlAsNaClDesc will also be constrained by a SOP. (Note that the NaCl NPAPI plugin has to implement the SOP itself because NPAPI does not provide a way to reuse the browser's SOP.) The main reason for the SOP is that XMLHttpRequest requests convey cookies -- a type of ambient authority. The Same Origin Policy is not intended to prevent web apps from sending messages across origins; it is only intended to prevent the web app from seeing the server's response to the request. (Sending cross-origin messages can already be done using mechanisms other than XMLHttpRequest, including redirects and <img> elements.) Comparison: script elementLoading libraries in NaCl is analogous to loading Javascript files via the <script src=...> element. Interestingly, <script> is not constrained by the SOP. By setting the response's content-type to text/javascript, the server effectively opts in to revealing the response to the web app. Supposedly, the response is not revealed directly to the web app. The DOM, a trusted part of the browser, evaluates the Javascript code, and the web app gets access only to the values the script assigns to variables. In practice, one cannot rely on text/javascript data from being revealed across origins. In NaCl's case, however, interpreting .so files is unambiguously the responsibility of untrusted code. We have to reveal the fetched data to the web app, so NaCl cannot be as unconstrained as the <script> tag. The <script> element permits a centralised model for sharing library code. Suppose multiple web apps use the library libjfoo.js. If this is hosted at http://libjfoo.org/libjfoo-1.0.js, the web apps can opt to link to this URL. The down side of using the <script> element in this way is that the web apps will be vulnerable to the centralised site, libjfoo.org. This site can change the file contents it serves up (there was an example of this happening with json.org's copy of json.js) and thereby run arbitrary Javascript in the context of the web apps. Since the script text is not available across origins, the web app cannot check the text against a hash before eval'ing it. Fetching libraries across originsFor NaCl, web apps could fetch libraries using Uniform Messaging (formerly known as GuestXHR) or CORS, which are not NaCl-specific. We might also wish to allow decentralised sharing of files. For example, sites A and B both host libfoo.so. If the browser has already downloaded libfoo.so from site A, it won't need to download it again from site B, and vice-versa. Schemes for doing this by embedding secure hashes into URLs have been proposed; for example, see Douglas Crockford's post. This problem is not unique to NaCl, so we should not adopt a solution which is NaCl-specific. Trust relationship between Javascript and NaCl processIn the above scheme, there are two principals:
The NaCl process depends on the Javascript code to provide its execution environment. The Javascript code provides all the code running in the NaCl process. The Javascript code therefore has at least as much authority as the NaCl process. This is at odds with the current same origin policy, described in issue 238. In the current scheme, executable http://a.com/foo.nexe can be embedded in the page http://b.com. Javascript on b.com's page can use __urlAsNaClDesc() to fetch an a.com URL, getting a file descriptor in return. However, only the NaCl process can use the file descriptor to read the file's contents. The NaCl process therefore has strictly greater authority than the Javascript. However, it has no trusted path for fetching files from a.com. This is a dangerous situation which is likely to lead to XRSF-like Confused Deputy vulnerabilities. foo.nexe is expected to distrust the messages and file descriptors it receives from the page; this is difficult or impossible to achieve. It is incompatible with the dynamic library scenario above in which the NaCl process must trust the library data supplied by the page. We propose that if __urlAsNaClDesc() (or a similar API) is to follow a same origin policy at all, it should use the origin of the page, not the origin of the executable's URL. It may be that directly embedding a NaCl plugin object across origins should not be permitted at all. In this case, it would still be possible to embed a NaCl plugin object across origins indirectly, through a cross-origin iframe. In such a scenario, one is embedding a combination of Javascript and NaCl code in which the latter can legitimately trust the former. Prefetching filesThe simplest approach to fetching library files is to fetch them one by one, as ld.so does synchronous open() calls. However, this means the inbound network connection will be idle after the end of a file is received by the client and before ld.so's request for the next file is received by the server. This costs one network round trip per file. We could reduce the time taken to fetch the whole set of files by pipelining the requests. A simple way to do this, which does not involve changing the dynamic linker, is to list up-front all the libraries we expect to load. The Javascript code could request the files on startup in order to pre-populate the browser's cache. VersioningAs with static linking, each web app gets to choose its own version of libc and other libraries. Furthermore, different NaCl processes in the same web app can use different libc versions. Libc is not supplied by the browser. We don't expect there to be a huge number of libc versions, but older and newer versions of the same libc are likely to be around at the same time, as are different libc implementations (such as newlib and glibc). Web apps get to pick a set of libraries that are known to work well together. This is analogous to selecting a set of Javascript libraries, or selecting a set of packages for a software distribution such as Debian or Fedora. This way we can avoid "DLL hell"; libraries are not the responsibility of the end user. This provides extra flexibility that is not available to typical applications on Linux when packaged with commonly-used packaging systems like dpkg or RPM. Packaging systems such as Zero-Install and Nix allow multiple library versions to coexist in the same way that I am proposing for NaCl. Though we have this extra flexibility we will still have all the versioning mechanisms that are available in ELF shared libraries normally: libraries can opt to provide stable ABIs and declare interfaces via sonames and ELF symbol versioning; we get the benefit of separate compilation. Upgrading libraries is the responsibility of the web app. A web app may choose to delegate this responsibility to another site by fetching libraries from that site. | |
Without interface semantics change, NaCl I/O descriptors resulting from urlAsNaClDesc descriptors should never be usable w/ mmap for code. The problem is a classic time-of-check vs time-of-use attack: NaCl I/O descriptors could, for example, be to a file under another NaCl module's control -- e.g., from a future filesystem API for temporary files -- and if the underlying file contents can be modified, even with MAP_PRIVATE we do not get any guarantees about what happens if the underlying file is modified by another process. see mmap(2)'s note for MAP_PRIVATE, and also a proof-of-concept in ~bsy/tmp/tocvtou.c.
this design need to take the FileReader? (and eventually writer) interface being proposed on plugin-future@ into account. the current thinking is that the implementation of FileReader? etc for JavaScript? will have an internal interface where it will be easy for NaCl to tap into, so that NaCl modules can open files the same way that JavaScript? code might, except w/o having to bounce through ThreadAsyncCall? etc. we may also wish to expand on / explore the notion of a file server process which one or more NaCl modules invoke RPCs to get descriptors, and that RPC interface will be (more) cast in stone. so, i don't think we should freeze this design to have to use urlAsNaClDesc -- that is likely to be premature.
it would be helpful to have a "life of a shared executable" story somewhere that gives a rough narrative how we expect things to work
This looks like it should be a very useful document as developers start using DSOs. Is it up-to-date with respect to Pepper2 APIs? It obviously needs to be. Also it needs to be updated (maybe also our plans) with respect to sharing of DSOs. We must allow for reasonable sharing of DSOs across domains, for example libc.so provided by Google.