|
DSPDataPersistenceOnMongoDB
Description of the persistence model for the DSP Messages using a Document-Oriented System IntroductionThis page describes the persistence model using a mongoDB, a document-oriented system based on the best features from a KVP and a RDBS databases. Although this document will cover a new architecture chosen during the technology research, it will reference the first version of the DSPDataPersistence as needed to avoid repeating information. Last, but not least, this document will be guided by the comments regarding revisions r584 and r585. Persisting Generic Sensor Data into a DatastoreSensor networks are commonly used in the scientific community, serving as tools to monitor the state of the environment. NASA SensorWeb uses sensors to collect data from specific volcanoes around the world that has given properties, informing researchers about volcanoes activity and behavior around the world. No matter how data is collected, sensors collaborating in a network will produce meaningful data for the users based on environment conditions. In this way, sampling data can be collected during a week, a month, or any specific period of time in order draw analysis results about the state of the environment in different moments. When it comes to the data representation, however, each sensor's Architect/Engineer had previously come up with his/her own Metadata to describe the sensor's properties. For example, the car's temperature and the water's salinity and temperature are different properties applied to the car and water, respectively. Therefore, sensors may contain several dozens or even hundreds of different properties, which can semantically identify data for different objects. Additionally, time is an important variable that must also be collected when collecting data from sensor networks, and in this case, Time Series studies ways to track these points in time. Considering that sensors can join a dynamic sensor network, the data model chosen might accommodate a completely different list of properties. The model can closely relate to the relational model and yet just have properties described as key-value pairs. Furthermore, scalability plays an important role when choosing a persistence technology, specially in the area of sensor networks, where the number of sensors can grow at any time. Choosing a Data Model for Sensor NetworksOne may think what is a good design decision to model the data in such dynamic environment. Based on the properties described in the previous section, it's an important time to have find alternatives to the Relational Model, which has been the choice in different types of projects when it comes to persisting data in a persistent storage. The article Is The Relational Database Dommed? details the difficulties the Relational Data Model has to keep up with the constant schema changes faced on dynamic environments such as dynamic web sites, since the author argued that RDBS is more suitable for environments whose types are well-known and don't constantly change. For this reason, the authors presented the concepts of the Key-Value Pair (KVP) model, which considers a simple hash-like structure to describe the entities of a system using the notion of a set of key-value pairs to describe a given entity. Taking into account that sensors are only based on properties and their relating values, the KVP model, I considered KVP an option to address the problem of the dynamic type of environments of sensor networks. However, in addition to the KVP model, the blog entry "What is the right data model?" describes not only the KVP model, but also the Tabular and the Document-Oriented Models. The former is described as an infinite table that hold infinity objects as implemented by Google's Big Table, while the latter describes a model in between the Relational and the Tabular, which gives the best properties of relating entities through its properties and yet using a KVP notation. In general, a persistence storage system that can easily accept new data types without re-engineering the current schemas is what I wanted to find. However, trade-offs are considered to these types of approaches. KVP data model suggests the repetition of data, and in this case, the size of disk space used is larger then usual. Although this can represent a draw back, it has been shown that better and faster data retrieval algorithms pay off in the end, what is the case of the MapReduce. Furthermore, since disk space is considered commodity, scaling data storage with more machines in the format of a grid computing or cluster is cheaper than buying very expensive servers. Therefore, I was looking for a system that provides alternatives of System Replication or Database Partitioning, also referred to as Database Sharding. mongoDB Data StoreGiven that Document-Oriented Model makes a good candidate persist sensors' properties and the recently enumerated list technologies in the previous article DSPDataPersistence, the open-source project called mongoDB was chosen for the evaluation on our case study, the netBEAMS DSP Platform. mongoDB supports storage based on collections of data, stored using BSON, a binary representation the JSON data representation format, including dynamic queries and indexing support. As it's stated in their web site, mongoDB "bridges the gap between key/value stores (which are fast and highly scalable) and traditional RDBMS systems (which are deep in functionality)".
The rest of this documentation shows the experiments that were used with mongoDB, providing a data-centric persistence layer for NetBEAMS. Experiment: Saving 1 Million Objects into mongoDBConsidering how much dynamic sensors can scale in size, I have designed an experiment that generates 1 million random Sonde Data Types to be saved in a mongoDB server. The Sonde Data Type is a POJO representation of the YSI Data Acquisition, and the goal of the experiment is to transfer the transient instances into the mongoDB server using a single host or a sharded distributed list of servers, verifying if the way scientists describe the data influence on the performance of the database system. In summary, the experiments will analyze the following:
As for the use cases evaluation, the following experiments will be performed:
Case Study RequirementsOur case study is based on the NetBEAMS, a collaboration project between the department of Computer Science at San Francisco State University and the Romberg Tiburon Center (RTC) that assists the San Francisco Bay Environmental Assessment and Monitoring Station (SF-BEAMS) project. The RTC focuses its research on complex marine and estuarine environments and uses environmental sensors for its research. Its sensor network is located offshore of the RTC pier (SEE LIVE! cam). Among the used devices, this research used the YSI 6600 ESD V2 sondes to develop the case study. A picture of an YSI sonde can be seen below:
In May 2009, the infrastructure setup for the YSI sondes were defined as follows:
In order to extract the measurements data, the YSI sonde provides a RS-232 serial connection that can be used to connect a computer. The following snapshot is an example of the 52 bytes of data (13x4 Bytes) transferred from the YSI data stream: "21.20 193 179 5588.40 0.09 0.084 0.059 7.98 -79.6 99.5 8.83 0.4 8.7" The size of the data in memory as estimated for the number of YSIs reported:
The following section describes the NetBEAMS infrastructure, developed to program and interrogate the Sf-Beams sensor network without requiring human intervention. NetBEAMS InfrastructureThe NetBEAMS infrastructure is set on top of the existing one from the SF-BEAMS. The following image summarizes this joint infrastructure:
This documentation focus on the development of a Software Platform for the NetBEAMS Gateway Embedded System. The architecture of the system can be summarized in the following picture.
The main components of such system can be summarized as follows:
OSGi - The Foundation for the Data Sensor PlatformSince Each NetBEAMS component is managed by an OSGi component and its infrastructure, this section describes the basic functionality of the OSGi platform. The OSGi platform was conceived to support modularity in terms resources-limited environments such mobile devices and vehicles, but it was first widely deployed on Eclipe, the Integrated Development Environment (IDE) focused in different programming languages developed in Java because of its loosely-coupled architecture and easy-to-use API. In general, the OSGi Platform can be run on top of any Operating System that contains the Java Virtual Machine (JVM), and publishing the set of OSGi bundles to the system, as it is shown in the following image.
The OSGi Platform provides 2 basic layers:
The interoperability of OSGi follows the simple Producer-Consumer paradigm of a service model as shown in the first picture below. The Producer of the service registers into the Service Broker, while the Consumer uses the Service Look up to find and reuse the service. In this way, during an OSGi bundle life-cycle, it can first publish its service to the OSGi Platform where other OSGi bundles can reuse it as show in the second picture below.
An existing Java application can be "bundled" as an OSGi bundle by providing descriptors following the Java Archival Repository (JAR) specification. In general, an OSGi bundle must provide specifications that describes the module to be published into the OSGi Platform, as shown in the next diagram.
The main properties of the OSGi MANIFEST.MF artifact can be summarized as follows:
Once the OSGi bundle is installed into the OSGi Platform, it will be managed by the OSGi Execution layer and change the bundle state according to a set of specifications. The following diagram shows the UML State Diagram from the an OSGi Bundle life-cycle:
In summary, the OSGi Platform is the main foundation of the system, built using building blocks.
NetBEAMS and the Data Sensor Platform (DSP)This experiment targets the transport of the data described above to a database system, here called persistence storage system, by using the NetBEAMS's DSP Platform. The DSP Platform is built on top of OSGi, taking advantage of the modular capabilities of its plug-and-play OSGi bundle infrastructure. In this way, the DSP Platform and each of its DSP Components are extensions of the OSGi bundles. The following diagrams depicts the OSGi platform: DSP Component <:|------ DSP Component Activator ------|> OSGi Bundle As a DSP Component Activator takes advantage of the basic specifications of the OSGI bundle infrastructure, it inherits the life cycle stages and properties. The following image summarizes the life cycle of a DSP Component Activator, which is responsible for initializing and terminating the DSP Component. Whenever the DSP Component Activator is on the initialized mode, the DSP Component can be initialized by using initial configuration parameters provided by who configures the DSP. Similarly, when the DSP Component Activator is terminated, the DSP Component must be stopped. In this way, when the DSP Component is initialized, it start all the needed resources and be active in the system until its function is required. The only responsibility of the DSP Component is to implement the "contract": being a Data Producer (DP) or Data Consumer (DC). In this way, the "contract" is defined by the following Interface methods:
The DSP Platform will route the message, inside of the message, to DSP Components that need to receive the message. Details will be added into the Data Delivery section. Data RepresentationAs the YSI sonde documentation describes each of these values and data format, the data stream is mapped into a Java POJO called SondeDataType, which is marshalled into an XML instance of the XML Schema "Abstract Message Content". In addition to the regular data from the sensor, note it contains properties about time. As a result, more data have been added into the initial 130 bytes of data as shown in the picture below:
In the current implementation, the DSP Framework is responsible to wrap up each of the collected sampling data to be added into the the body of a DSP Message for transmission. Other information regarding the DSP component producer and consumer are added into the header of the DSP Message. The following UML Class Diagram shows the participating classes from the DSP Messages packages.
As highlighted in the diagram, there are several types of DSP Messages used for different purposes. For example, any measurement data must be wrapped up in a Measurement Message, while a Query Message is used to exchange messages among the components for the purpose of management. In this way, the main DSP Messages can be summarized as follows:
Whenever a DSP component is ready to transmit messages, it wraps up the set of DSP Messages into an instance of a DSP Messages Container, which contains information about the collection of messages being transmitted with its own identification. In this fashion, the DSP Messages Container is the main communication unit between 2 different DSP Components. Data DeliveryIn general, when a DSP Component finishes preparing the DSP Messages Container, it contacts the DSP Broker to send the DSP Message. At this point, the DSP Broker acquires the a list of possible DSP Components that are expected to consume the DSP Message in the current DSP by the assistance of the DSP Matcher. In this way, the DSP Matcher can be seen as a function that takes a DSP Message as an input and returns a list of DSP consumers. DSP Components Consumers ( DSP Message ) := Verify the DSP Message's Header + Verify the matcher rules (which contains the list of consumers) However, the selection is done by analyzing the matching rules against the specifications of the DSP Message header's properties, and upon receiving all the matching rules, the DSP Broker selects a set of unique DSP components to receive a copy of the DSP Message object in two different ways:
Remote Data Transport and DeliveryThe DSP Platform promotes the data transport by using specialized DSP Components that are capable of marshalling and unmarshalling POJO objects into XML and POJO, and vice-versa. In order to transport the DSP Messages, a pair of symmetric DSP Components were developed to use the HTTP protocol to transport the serialized version of the DSP Messages.
The following image shows the XML Schema of the DSP Messages Container and the DSP Message. The former is the main unit of communication between 2 instances of remote DSP components, and contains is composed of at least one instance of the latter. The latter will carry the specific information about its producer and consumer in the header, and an instance of any payload in the body. Note that both the former and the latter have attributes regarding point in time as part of the time series definition for the collected data.
Here's an example of transmitted YSI data shown in the previous section. The Messages Container contains an instance of a Measurement Message marshallized from the host 192.168.0.103 to be transmitted to the host 192.168.0.106 using the DSP Wire Transport Client component. The Header of the DSP Message contains all information about its producer and potential consumer, as well as the body of the message containing an instance of the Sonde Data Container, carrying an instance of the Sonde Data Type enclosed. <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<MessagesContainer uudi="24929c29-60ee-4d17-af08-64d9446277ef"
creationTime="2009-03-06T15:17:18-0800" destinationHost="192.168.0.106">
<MeasureMessage ContentType="org.netbeams.dsp.ysi"
messageID="435a61f6-370f-458d-aeb7-6e92270a79cb">
<Header>
<CreationTime>1236381438480</CreationTime>
<Producer>
<ComponentType>
org.netbeams.dsp.platform.management.component.ComponentManager
</ComponentType>
<ComponentLocator>
<ComponentNodeId>1234</ComponentNodeId>
<NodeAddress>192.168.0.103</NodeAddress>
</ComponentLocator>
</Producer>
<Consumer>
<ComponentType>org.netbeams.dsp.wiretransport.client
</ComponentType>
<ComponentLocator>
<NodeAddress>LOCAL</NodeAddress>
</ComponentLocator>
</Consumer>
</Header>
<Body>
<SondeDataContainer>
<soundeData date="15:17:18" time="03-06-2009">
<Temp>21.20</Temp>
<SpCond>193</SpCond>
<Cond>179</Cond>
<Resist>5588.40</Resist>
<Sal>0.09</Sal>
<Press>0.084</Press>
<Depth>0.059</Depth>
<pH>7.98</pH>
<phmV>-79.6</phmV>
<ODOSat>99.5</ODOSat>
<ODOConc>8.83</ODOConc>
<Turbid>0.4</Turbid>
<Battery>8.7</Battery>
</soundeData>
</SondeDataContainer>
</Body>
</MeasureMessage>
</MessagesContainer>When the counterpart DSP Wire Transport Server receives the Messages Container instance, it unmarshalls the DSP Messages back to a POJO, and sends it to the DSP Broker to make its normal delivery. As previously explained, the DSP Broker may decide to deliver the DSP Message or drop it, depending on the DSP Matcher rules specification. DSP Data Persistence Component
The output from the execution of the netBEAMS bundle is shown on DSPDataPersistence. The process of transforming it into a mongoDB ready data is described in the next sections. The netBEAMS to mongoDB conversion is given at the DSPMongoCRUDService class, which uses the mongoDB Java driver. See the artifact http://code.google.com/p/netbeams/source/browse/branches/marcello/persistence/versions/v2/apps/osgi-bundles/dsp/DSPDataPersistence/src/org/netbeams/dsp/persistence/controller/DSPMongoCRUDService.java for details. The following code snippet is the method that inserts the message content from the DSP message of the PersistentMessageUnit into the mongoDB. mongoDB drivers use the BasicDBObject instance to set key and values. The keys and values are created and then saved into the mongoDB. /**
* Inserts the DSP Message Content into the mongoDB as it is extracted and converted from the given
* PersistentMessageUnit.
* @param tranMsg is the PersistentMessageUnit containing information about the sensor location and the message.
* @throws UnknownHostException
* @throws MongoException
*/
public static void insertPersistentUnitMessageContents(PersistentMessageUnit tranMsg) throws UnknownHostException,
MongoException {
DBCollection netbeamsDbCollection = getPersistenceStorage(tranMsg);
MessageContent messageContent = tranMsg.getDspMessage().getBody().getAny();
System.out.println("Starting mongodb transaction at " + DATE_FORMATTER.format(new Date()));
getNetbeamMongoDb().requestStart();
if (messageContent instanceof SondeDataContainer) {
SondeDataContainer sondeContainer = (SondeDataContainer) messageContent;
for (SondeDataType sondeData : sondeContainer.getSondeData()) {
BasicDBObject docValue = new BasicDBObject();
docValue.put("temperature", "" + sondeData.getTemp().floatValue());
docValue.put("sp_condition", "" + sondeData.getSpCond().floatValue());
docValue.put("condition", "" + sondeData.getCond().floatValue());
docValue.put("resistence", "" + sondeData.getResist().floatValue());
docValue.put("salinity", "" + sondeData.getSal().floatValue());
docValue.put("pressure", "" + sondeData.getPress().floatValue());
docValue.put("depth", "" + sondeData.getDepth().floatValue());
docValue.put("ph", "" + sondeData.getPH().floatValue());
docValue.put("pH_mv", "" + sondeData.getPhmV().floatValue());
docValue.put("odo_sat", "" + sondeData.getODOSat().floatValue());
docValue.put("odo_condition", "" + sondeData.getODOConc().floatValue());
docValue.put("turbidity", "" + sondeData.getTurbid().floatValue());
docValue.put("battery", "" + sondeData.getBattery().floatValue());
BasicDBObject docKey = buildKeySegment(tranMsg);
// extract the fact time from the message, adding to the key
docKey.put("fact_time", sondeData.getDateTime().getTimeInMillis());
docKey.put("data", docValue);
// insert the final collection
netbeamsDbCollection.insert(docKey);
}
}
getNetbeamMongoDb().requestDone();
}
Setting up the environmentTaking into account the mongoDB architecture and the properties of a DSP Message (see section "Acquiring the properties of a DSP Message Content" at DSPDataPersistence), here are the conventions followed on revision r585:
The following is the list of properties that composes the Key of a document:
The definition of the Value of a document is as follows:
Some remarks about the creation of the items from contains with the collections:
The following steps are described to run the experiment shell-script located at http://code.google.com/p/netbeams/source/browse/branches/marcello/persistence/versions/v2/persistence/run-persistence-experiment
./run-persistence-experiment 500 | tee running-500.log ExperimentThe execution of the command-line script will launch the mongoDB, remove old files, generate the given number of elements and insert them into the database, and will display the results, giving the shell access to the current database. The goals are as follows:
Main Experiment outputThe following is the snapshot of the file http://code.google.com/p/netbeams/source/browse/branches/marcello/persistence/versions/v2/persistence/logs/experiment-1000000-main-20090912-202055.log ########### Netbeams to MongoDB Experiment 20090912-202055.log ############# * 1. Cleaning any existing MongoDB data at 'data' total 12K drwxr-xr-x 3 marcello marcello 4.0K 2009-09-12 19:50 . drwxrwxrwx 6 marcello marcello 4.0K 2009-09-12 19:48 .. drwxr-xr-x 6 marcello marcello 4.0K 2009-09-12 20:06 .svn * 2. Starting MongoDB Server... NetBEAMS data will be saved at dir 'data' * 3. Ready to run Java experiment with 1000000 samples Sat Sep 12 20:20:55 Mongo DB : starting : pid = 5831 port = 27017 dbpath = data master = 0 slave = 0 32-bit ** NOTE: when using MongoDB 32 bit, you are limited to about 2 gigabytes of data ** see http://blog.mongodb.org/post/137788967/32-bit-limitations for more Sat Sep 12 20:20:55 db version v1.0.0, pdfile version 4.4 Sat Sep 12 20:20:55 git version: afe21e02c11f9a923ab1c95edf6fdd95b9a4a51e Sat Sep 12 20:20:55 sys info: Linux domU-12-31-39-01-70-B4 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:39:36 EST 2008 i686 Sat Sep 12 20:20:55 waiting for connections on port 27017 These first 3 steps are used to setup the environment for the new execution of the experiment.
The Java execution started right after the log snippet, and it took the application around 1 minute and half to generate 1 million POJOS with random values. At this point, the conversion of the objects into mongoDB objects and the insertion of each of them is going to take place. Experiment started at 09/12/2009 20:20:56:992 Starting to generate 1000000 sonde samples at 09/12/2009 20:20:56:993 Finished Generating 1000000 sonde samples on 88.368 seconds (88368 milliseconds) at 09/12/2009 20:22:25:361 consuming ~6848Kb Started saving netbeams samples as mongodb objects at 09/12/2009 20:22:48:25 Starting mongodb transaction at 09/12/2009 20:22:59:513 At this point, the transaction with the mongoDB has been opened, and the database is locked, everything done using the Java driver provided by the mongoDB project. As you can see, the allocation of new file system space starts, as well as the creation of indexes for the new type. Everything from now on will be based on the database netbeams and the collection SondeDataContainer, as described in the beginning of this section. Sat Sep 12 20:20:55 web admin interface listening on port 28017
Sat Sep 12 20:22:59 connection accepted from 127.0.0.1:19841 #1
Sat Sep 12 20:22:59 allocating new datafile data/netbeams.ns, filling with zeroes...
Sat Sep 12 20:22:59 done allocating datafile data/netbeams.ns, size: 16777216, took 0.018 secs
Sat Sep 12 20:22:59 allocating new datafile data/netbeams.0, filling with zeroes...
Sat Sep 12 20:23:00 done allocating datafile data/netbeams.0, size: 67108864, took 0.954 secs
Sat Sep 12 20:23:00 building new index on { _id: ObjId(000000000000000000000000) } for netbeams.SondeDataContainer...done for 0 records
Sat Sep 12 20:22:59 insert netbeams.SondeDataContainer 979ms
Sat Sep 12 20:23:00 insert netbeams.SondeDataContainer 0ms
Sat Sep 12 20:23:00 insert netbeams.SondeDataContainer 0ms
Sat Sep 12 20:23:00 insert netbeams.SondeDataContainer 0ms
Sat Sep 12 20:23:00 insert netbeams.SondeDataContainer 0ms
Sat Sep 12 20:23:00 insert netbeams.SondeDataContainer 0ms
...
...The completion of the insertion into the database is completed after almost 3 minutes. Memory consumption went beyond the mark of 1.5Gb (which is not displayed correctly yet). The Java driver closes the connection with the database automatically after the Java program exits. Sat Sep 12 20:25:45 insert netbeams.SondeDataContainer 0ms Sat Sep 12 20:25:45 insert netbeams.SondeDataContainer 0ms Sat Sep 12 20:25:45 insert netbeams.SondeDataContainer 0ms Finished saving netbeams samples to mongodb objects in 176.093 seconds (176093 milliseconds) at 09/12/2009 20:25:44:118 consuming ~11.0Kb Experiment finished saving 1000000 sonde samples on MongoDB on 306.883 seconds (306883 milliseconds) at 09/12/2009 20:26:03:876 consuming ~11 Kb Sat Sep 12 20:26:21 end connection 127.0.0.1:19841 The experiment shell outputs the directories where the logs are created, showing the list of files created and the size of them (please don't consider the ones ".svn") as part of the experiment). * 4. Experiments Results on the following logs: - Mongo DB Server Output: logs/experiment-1000000-mongodb-server-status-20090912-202055.log - NetBEAMS to MongoDB data transfer output: logs/experiment-1000000-netbeams-to-mongodb-20090912-202055.log * 5. MongoDB data dir 'data' size after experiments... 4.0K data/.svn/tmp/props 4.0K data/.svn/tmp/text-base 4.0K data/.svn/tmp/prop-base 16K data/.svn/tmp 4.0K data/.svn/props 4.0K data/.svn/text-base 4.0K data/.svn/prop-base 44K data/.svn 1.5G data total 1.5G drwxr-xr-x 3 marcello marcello 4.0K 2009-09-12 20:25 . drwxrwxrwx 6 marcello marcello 4.0K 2009-09-12 19:48 .. -rwxr-xr-x 1 marcello marcello 5 2009-09-12 20:20 mongod.lock -rw------- 1 marcello marcello 64M 2009-09-12 20:25 netbeams.0 -rw------- 1 marcello marcello 128M 2009-09-12 20:23 netbeams.1 -rw------- 1 marcello marcello 256M 2009-09-12 20:25 netbeams.2 -rw------- 1 marcello marcello 512M 2009-09-12 20:25 netbeams.3 -rw------- 1 marcello marcello 512M 2009-09-12 20:25 netbeams.4 -rw------- 1 marcello marcello 16M 2009-09-12 20:25 netbeams.ns drwxr-xr-x 6 marcello marcello 4.0K 2009-09-12 20:06 .svn The mongoDB client is running for the use and "check" on the just created data. Instructions are printed as well. * 6. Running the MongDB after the experiments...
- The database name is 'netbeams'. The collection name is 'ysi'
- Type 'use netbeams' to change to that database.
- Type 'show collections' to show all the collections in the current database
- Type 'db.ysi.*' to issue a command to the collection 'ysi'
- Ex: 'db.ysi.count()' = returns the number of elements on the collection 'ysi'
- 'db.ysi.findOne()' = returns the first element of the collection 'ysi'
- 'db.ysi.find().limit(3)' = returns the first 3 elements of the collection 'ysi'
- 'db.ysi.find( {sensor_ip_address:192.168.0.79} ).count())' = returns the number of elements of the collection ysi with the given sensor's ip address.
- 'db.ysi.find({data.ph:1.45})' = returns all the elements that has the property 'data.ph' equals to '1.45'mongoDB Server OutputSee the previous section. The database can be started simply as follows: mongod --dbpath NETBEAMS/persistence/data mongoDB Client OutputThe mongoDB client can be started by using the following command. Make sure you have started the mongoDB server before executing the mongoDB client. mongo netbeams | tee output_number_date.log Here, the iterative mongo client shell offers users to verify and navigate on a given database and its collections. This first section shows the connection of the mongo client to the database netbeams. It also highlights the query for the collections available. During the experiment, the SondeDataContainer collection was created as related to the type from the DSP Messages for the YSI Sonde. The shell references to the mongoDB system can be found at http://www.mongodb.org/display/DOCS/dbshell+Reference MongoDB shell version: 1.1.0- url: netbeams connecting to: netbeams type "help" for help > show collections SondeDataContainer system.indexes Then, the first verification of the data integrity is regarding the number of elements created. Here, the first count() function on the collection returned 1000000. > > db.SondeDataContainer.count() 1000000 An example about retrieving the first element of the collection can be done using the findOne() function. It will return an element instance on the JSON notation. > db.SondeDataContainer.findOne()
{"_id" : ObjectId( "d36f4007b7e7ac4a03c60000") , "sensor_ip_address" : "192.168.0.136" , "message_id" : "7b6624d6-0ca1-4cba-a343-f166e88da73b" ,
"transaction_time" : 1252845473412 , "fact_time" : 1252845346000 , "data" : {"temperature" : "45.01" , "sp_condition" : "37.6"
, "condition" : "145.8" , "resistence" : "159.77" , "salinitude" : "0.0" , "pressure" : "0.391" , "depth" : "0.46" , "ph" : "5.64" ,
"pH_mv" : "-62.1" , "odo_sat" : "89.7" , "odo_condition" : "59.34" , "turbidity" : "0.0" , "battery" : "9.4"}}The query based on attributes can be done using the "dot" notation, as you navigate through the JSON documents. Additionally, you can use the functions as aggregated on the result of others. This next example counts the number of documents with the key "data.ph" equals to "5.64". (THIS REVISION USES STRIGS AS THE DATATYPE AS A BUG). > db.SondeDataContainer.find({"data.ph":"5.64")}).count()
1226The following example is the output of the first 3 documents from the same previous query using the limit() function. > db.SondeDataContainer.find({"data.ph":"5.64"}).limit(3)
{"_id" : ObjectId( "d36f4007b7e7ac4a03c60000") , "sensor_ip_address" : "192.168.0.136" , "message_id" : "7b6624d6-0ca1-4cba-a343-f166e88da73b"
, "transaction_time" : 1252845473412 , "fact_time" : 1252845346000 , "data" : {"temperature" : "45.01" , "sp_condition" : "37.6" ,
"condition" : "145.8" , "resistence" : "159.77" , "salinitude" : "0.0" , "pressure" : "0.391" , "depth" : "0.46" , "ph" : "5.64" ,
"pH_mv" : "-62.1" , "odo_sat" : "89.7" , "odo_condition" : "59.34" , "turbidity" : "0.0" , "battery" : "9.4"}}
{"_id" : ObjectId( "d36f4007b7e7ac4a1fc80000") , "sensor_ip_address" : "192.168.0.136" , "message_id" : "7b6624d6-0ca1-4cba-a343-f166e88da73b" ,
"transaction_time" : 1252845473412 , "fact_time" : 1252845346000 , "data" : {"temperature" : "46.71" , "sp_condition" : "60.8" ,
"condition" : "160.6" , "resistence" : "1399.4" , "salinitude" : "0.01" , "pressure" : "1.057" , "depth" : "2.485" , "ph" : "5.64" ,
"pH_mv" : "-16.3" , "odo_sat" : "58.8" , "odo_condition" : "19.29" , "turbidity" : "0.2" , "battery" : "9.2"}}
{"_id" : ObjectId( "d36f4007b8e7ac4a1ec90000") , "sensor_ip_address" : "192.168.0.136" , "message_id" : "7b6624d6-0ca1-4cba-a343-f166e88da73b" ,
"transaction_time" : 1252845473412 , "fact_time" : 1252845346000 , "data" : {"temperature" : "69.99" , "sp_condition" : "39.0" ,
"condition" : "115.7" , "resistence" : "3490.92" , "salinitude" : "0.05" , "pressure" : "0.537" , "depth" : "0.544" , "ph" : "5.64" ,
"pH_mv" : "-73.8" , "odo_sat" : "81.4" , "odo_condition" : "2.44" , "turbidity" : "0.0" , "battery" : "3.1"}}
> Other logs are located at http://code.google.com/p/netbeams/source/browse/#svn/branches/marcello/persistence/versions/v2/persistence/logs Exporting the data into Spreasheets format (CSV)mongoDB has an export facility shell called mongoexport. It can export the data in JSON format or CSV. One may also write its own export tool in any of the languages such as Java, PHP, Python, Perl, Ruby, among others. A list of the existing drivers in different languages is provided at http://www.mongodb.org/display/DOCS/Drivers. The following command can be executed to have the exported version of the data in CSV (read the help output of the command for details). mongoexport -d netbeams -c SondeDataContainer --dbpath ./data/ --csv -f "_id,sensor_ip_address,transaction_time,fact_time, data.temperature,data.sp_condition,data.condition,data.resistence,data.salinitude,data.pressure,data.depth,data.ph,data.pH_mv,data.odo_sat, data.odo_condition,data.turbidity,data.battery" -o sonde-data-exported.csv The result of the export can be downloaded at http://netbeams.googlecode.com/files/experiment-1000000-data-exported-20090913-053538.csv.tar.gz. The first items of the list is shown below. Note that the columns were printed in the order provided in the export command. This feature was fixed after I found a bug as described at (http://groups.google.com/group/mongodb-user/browse_thread/thread/d7f1685d006ae4c7). _id,sensor_ip_address,transaction_time,fact_time,data.temperature,data.sp_condition,data.condition,data.resistence,data.salinitude,data.pressure, data.depth,data.ph,data.pH_mv,data.odo_sat,data.odo_condition,data.turbidity,data.battery "d36f400700e8ac4a00070600","192.168.0.136",1252845473412,1252845377000,"86.64","164.8","59.7","4594.6","0.06","0.09","2.32","6.49","-79.0", "69.6","18.29","0.2","9.8" "d36f400700e8ac4a00080600","192.168.0.136",1252845473412,1252845377000,"32.79","175.6","135.0","5346.77","0.07","1.289","2.477","7.5","-48.7", "8.6","41.8","0.1","9.4" "d36f400700e8ac4a00090600","192.168.0.136",1252845473412,1252845377000,"93.43","78.7","86.0","2467.38","0.01","1.384","0.287","0.47","-90.9", "63.2","2.67","0.2","5.1" "d36f400700e8ac4a000a0600","192.168.0.136",1252845473412,1252845377000,"72.17","179.3","7.3","2614.64","0.01","0.352","2.412","0.11","-85.5", "90.6","59.33","0.2","7.8" "d36f400700e8ac4a000b0600","192.168.0.136",1252845473412,1252845377000,"76.31","168.1","39.7","413.49","0.08","0.45","2.81","7.87","-8.2", "19.5","54.78","0.0","3.4" Data Access through Java API and REST Web ServicesThe mongoDB server offers different drivers to access the data, as well as the Web Services.
The following HTTP GET Request method returns the first 5 documents in the collection: http://127.0.0.1:28017/netbeams/SondeDataContainer/?limit=-5 GET /netbeams/SondeDataContainer/?limit=-5 HTTP/1.1 Host: 127.0.0.1:28017 User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko/2009033100 Ubuntu/9.04 (jaunty) Firefox/3.0.8 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive The HTTP Response's body is a JSON format: HTTP/1.0 200 OK
x-action:
x-ns: netbeams.SondeDataContainer
Content-Type: text/plain;charset=utf-8
{
"offset" : 0,
"rows": [
{ "_id" : "156f4007e4c3b74a36ed3100", "sensor_ip_address" : "192.168.0.117", "message_id" : "08b02c08-9290-4517-9a28-c6ee7e16509a",
"transaction_time" : 1253557219486, "fact_time" : 1253557217000, "data" : { "temperature" : "31.44", "sp_condition" : "99.8", "condition" : "53.5",
"resistence" : "1157.08", "salinity" : "0.0", "pressure" : "1.066", "depth" : "0.161", "ph" : "1.08", "pH_mv" : "-82.0", "odo_sat" : "40.3",
"odo_condition" : "56.85", "turbidity" : "0.2", "battery" : "8.2" } } ,
{ "_id" : "156f4007e5c3b74a37ed3100", "sensor_ip_address" : "192.168.0.117", "message_id" : "08b02c08-9290-4517-9a28-c6ee7e16509a",
"transaction_time" : 1253557219486, "fact_time" : 1253557217000, "data" : { "temperature" : "37.83", "sp_condition" : "176.3", "condition" : "2.6",
"resistence" : "1324.97", "salinity" : "0.01", "pressure" : "1.36", "depth" : "1.564", "ph" : "0.12", "pH_mv" : "-23.5", "odo_sat" : "104.5",
"odo_condition" : "19.44", "turbidity" : "0.1", "battery" : "5.0" } } ,
{ "_id" : "156f4007e5c3b74a38ed3100", "sensor_ip_address" : "192.168.0.117", "message_id" : "08b02c08-9290-4517-9a28-c6ee7e16509a",
"transaction_time" : 1253557219486, "fact_time" : 1253557217000, "data" : { "temperature" : "74.3", "sp_condition" : "84.0", "condition" : "104.7",
"resistence" : "4089.13", "salinity" : "0.01", "pressure" : "1.222", "depth" : "2.788", "ph" : "6.56", "pH_mv" : "-78.1", "odo_sat" : "40.0",
"odo_condition" : "6.02", "turbidity" : "0.3", "battery" : "3.2" } } ,
{ "_id" : "156f4007e5c3b74a39ed3100", "sensor_ip_address" : "192.168.0.117", "message_id" : "08b02c08-9290-4517-9a28-c6ee7e16509a",
"transaction_time" : 1253557219486, "fact_time" : 1253557217000, "data" : { "temperature" : "87.79", "sp_condition" : "91.8", "condition" : "162.4",
"resistence" : "3226.59", "salinity" : "0.02", "pressure" : "1.325", "depth" : "0.698", "ph" : "1.19", "pH_mv" : "-83.5", "odo_sat" : "4.8",
"odo_condition" : "39.87", "turbidity" : "0.3", "battery" : "9.3" } } ,
{ "_id" : "156f4007e5c3b74a3aed3100", "sensor_ip_address" : "192.168.0.117", "message_id" : "08b02c08-9290-4517-9a28-c6ee7e16509a",
"transaction_time" : 1253557219486, "fact_time" : 1253557217000, "data" : { "temperature" : "42.48", "sp_condition" : "170.4", "condition" : "0.5",
"resistence" : "1710.97", "salinity" : "0.07", "pressure" : "1.532", "depth" : "1.354", "ph" : "5.46", "pH_mv" : "-24.9", "odo_sat" : "106.7",
"odo_condition" : "28.61", "turbidity" : "0.0", "battery" : "0.5" } }
],
"total_rows" : 5 ,
"query" : {} ,
"millis" : 0
}
Data visualisation tools for mongoDB is slowly being developed by open-source developers. The next picture shows the database "netbeams" and the collection "SondeDataContainer" being rendered by futon4mongodb, one of the open-source tools developed to visualise mongoDB data.
By clicking on the collection name, the list of all the "documents" are displayed. Note that 1 million documents are displayed in the counter. The ID is displayed as the main key, while the list of keys of the value column is displayed. This should be changed in the next releases of futon4mongodb.
To view a single document, just a click on one of the documents. The keys and values are displayed.
Data format used by BiologistsNote that this format can be easily translated to the OPenDAP format used by the RTC's sensor network. An example of such data can be see accessing the RTC's website link http://sfbeams.sfsu.edu:8080/opendap/sfbeams/data_ctd/rtc_ctd2-floating/archive/2008-RTCCTDM2_qc_DIST/2008-RTCCTDM2_qc_DIST.dat.ascii? using the ASCII representation. Dataset: 2008-RTCCTDM2_qc_DIST.dat CTD_DIST_CSV.Month, CTD_DIST_CSV.Day, CTD_DIST_CSV.Year, CTD_DIST_CSV.Hour, CTD_DIST_CSV.Min, CTD_DIST_CSV.Sec, CTD_DIST_CSV.Water_Temp, CTD_DIST_CSV.Cond, CTD_DIST_CSV.Pres, CTD_DIST_CSV.Skufa1, CTD_DIST_CSV.Skufa2, CTD_DIST_CSV.Xmis, CTD_DIST_CSV.PAR, CTD_DIST_CSV.Sal, CTD_DIST_CSV.Sigma, CTD_DIST_CSV.InstSN 1, 1, 2008, 0, 0, 31, 9.4281, 2.79835, 0.727, 1.6628, 0.4951, 7.0798, 0.8331, 25.2725, 19.4095, 4195 1, 1, 2008, 0, 6, 31, 9.4053, 2.79205, 0.726, 1.5797, 0.4723, 7.3472, 0.5699, 25.226, 19.3765, 4195 1, 1, 2008, 0, 12, 31, 9.3983, 2.79188, 0.725, 1.5886, 0.4672, 7.3773, 0.4411, 25.2291, 19.38, 4195 1, 1, 2008, 0, 18, 31, 9.3865, 2.79317, 0.726, 1.5865, 0.4639, 7.4817, 0.3513, 25.2503, 19.3981, 4195 1, 1, 2008, 0, 24, 31, 9.3812, 2.79453, 0.726, 1.5806, 0.4591, 7.5355, 0.3327, 25.2676, 19.4124, 4195 Experiment Analysis
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||