My favorites | Sign in
Project Logo
             
Search
for
Updated Feb 16, 2008 by mattcasters
Labels: Phase-Requirements, Phase-Design, Featured
BudaServer  
Buda Server : what problems do we want to solve?

Introduction

This project aims to create a service that delivers data based on centrally defined metadata. This service is intended for the developers of data representation tools like the ones you can find in the world of Business Intelligence: reports, dashboards, OLAP, mining, etc.

Problem setting

New emerging technologies in the area of open source business intelligence, such as open source metadata layers (for example Pentaho Metadata) are making more advanced reporting applications possible. However, the complexity of the software also increases dramatically. This is especially the case in these areas: software stack configuration and data acquisition:

Software stack configuration

Problem details

In general, it is very easy and simple to use and deploy software stacks, APIs, in the Java programming language. However, in the case where we use libraries that are built themselves on top of other API, every time you want to use a single top-level library, for example to do metadata, you also need to deploy 30 or 40 other libraries. Because typically, the possibilities and capabilities in these libraries are covering a wide range of topics, we typically need JDBC drivers (20), Apache commons libraries (10), a bunch of general purpose libraries (10), metadata libraries (JMI, MDR, etc), the list goes on and on. By creating libraries ourselves on top of these others, we are also not helping out, we are in fact making the problem worse for others. The problem then is that a simple program with limited functionality needs to drag all these libraries along for the ride. Not only that, keeping track of the versions of all these libraries becomes an art-form.

Proposed solution

What is needed is something that shields the end-user of the Buda libraries (our developers, as mentioned in the introduction) from these complexities. You also want to create a clear simple contract between what is needed and given back. This gives you the flexibility to swap out parts of the stack with new versions without having to change anything in the client application. In other words, what is needed is a service, a bus, a business data server.

Data Acquisition

Problem details

The problem that is being solved by the metadata stacks at the moment is the storage and retrieval of attributes and relationships. They perform the very important task of being a layer between the gory technical details of the physical world (files, tables, etc) and the business world. They add descriptions, translations, layout and representation information and this is very interesting and important. However, what they do not do yet is deliver any data. At most, you can expect this metadata layer to generate SQL to retrieve the information from the database. While having the SQL makes programming a data representation client easier, the data acquisition layer needs to be re-written over and over again for each application. The application needs to deal with gory details such as there are JDBC, opening and closing connections, etc. As such, you run again into problems if you want to take the problem setting even closer to the business or do clever things such as auditing, profiling, data caching, data comparisons, auto-data source selection, clustering, high availability, data archiving, work with multiple data sources, do Venn diagram calculations, ...

Proposed solution

Again, what is needed is a service that shields the data acquisition tools (reporting, etc) from the gory details.

Result formats

Problem details

There are many different data consumers and each may have their own paradigm for the representation of data underlying the meta data on the business level. For example, there are many reporting tools that use a primarily relational model with regard to the data that they process. Similarly, OLAP Cube browsers may use a multi-dimensional paradigm for processing their resultset.

To complicate things, some tools may use a more or less well defined standard interface for consuming data, such as a JDBC resultset, or a CSV text format. whereas other may rely on some proprietary, closed format to encode result data.

Another complicating aspect is that even if the underlying data format is fixed (say relational), some clients may require a single result to be 'chunked' or 'paged'. Similarly, some requests may require the result to be presented in a number of distinct chunks that are sent in one response (this would occur in case the original request represent a batch)

Proposed Solution

Buda should provide default functionality to provide for a number of commonly used data formats.

For example, it seems self-evident that Buda should provide functionality to deliver data in a tabular format to serve those clients that employ a relational paradigm. It seems also evident that there should be default functionality to provide in a number of commonly used, well understood formats, such as CSV. In addition, it seems sensible to provide some method of delivering XML and JSON, although the problem for those formats is that there is not a single obvious standard that can be used.

In order to be able to serve different data formats to accomodate different clients, Buda should be designed to support a plug-in interface to allow implementors to provide a specific encoding of result data.

Central Security

Problem Details

In todays decentralized computing environments each client tool needs access to the data-bases the user wants to access. In practice, this usually means that all client tools not only need to have drivers (see: Software Stack Configuration) but also have all passwords copied into the client's domain. To make things worse, the client usually accesses the database directly, which means that ports have to be opened and therefore the database server is happily accepting connections from anyone. The database's security is the only line of defense against users spying in all tables.

Proposed Solution

Buda should provide security definitions to limit the data that is accessible from the user depending on the authentication token provided by the user. By running Buda in a server mode which is acting as proxy between the user's need for data and the databases raw-data access, Buda easily increases the security as the database can be locked down to only accept connections from the central server. Likewise this remote service removes the need to ship database passwords with the client tools.

In an ideal world, Buda would provide a central layer to enforce cross-datasource data-level security.


Sign in to add a comment
Hosted by Google Code