My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
XeniaSqliteNotes  
Optimization notes for Sqlite, SQL statements
Updated Oct 12, 2011 by jeremy.c...@gmail.com

Data Dictionary Listing(Simple Scalar Types)

-- scalar data dictionary listing for simple scalar obs (standard listing)
-- Xenia version 3
-- select t0.row_id,t2.standard_name,t3.standard_name from m_type t0,m_scalar_type t1,obs_type t2,uom_type t3 where t0.m_scalar_type_id = t1.row_id and t0.num_types = 1 and t1.obs_type_id = t2.row_id and t1.uom_type_id = t3.row_id;

-- Xenia version 2
-- select t1.row_id,t2.standard_name,t3.standard_name from m_type t1,obs_type t2,uom_type t3 where t1.obs_type_id = t2.row_id and t1.uom_type_id = t3.row_id;

Adding simple scalar types to the data dictionary tables and processing scripts

The below example shows how to add a simple scalar data type (a vector of only one component) to the data dictionary. The particular observation type is 'depth' and the unit of measurement is meters 'm'

For the tables involved, the row_id on each of the tables is defined as the primary key so they will autoincrement on inserts to the next available row_id.

For further examples see the initial data dictionary population script at http://code.google.com/p/xenia/source/browse/trunk/sqlite/sql/obs.sql


Add observation type row to obs_type table. Note that the naming convention is all lower case with underscore character as a space/word separator. INSERT INTO obs_type(standard_name,definition) VALUES ('depth', 'approximate water depth');

Check the row_id of the newly inserted row for reference in later step

obs_type.row_id = 42


Add unit of measurement type row to uom_type table. Note that the naming convention is all lower case with underscore character as a space/word separator.

For this example the uom_type row (meter) already exists on the database so we do not perform the below step, but the below line shows how we would add this row.

INSERT INTO uom_type(standard_name,definition,display) VALUES ('m', 'meter', 'm');

Check the row_id of the newly inserted row for reference in later step

uom_type.row_id = 6


Note if using Xenia version 2 database the below INSERT statement will be against the m_type table and not the m_scalar_type

Add row to m_scalar_type table to associate the obs_type with its uom - note the referenced row_id's used from the previous 2 steps

INSERT INTO m_scalar_type(obs_type_id,uom_type_id) VALUES (42,6);

Check the row_id of the newly inserted row for reference in later step

m_scalar_type.row_id = 48


Note step for Xenia version 3(or above) which includes both tables m_type and m_scalar_type

Add row to m_type table to associate the m_type.row_id=m_type_id to the m_scalar_type.row_id The inserted '1' value lets us know that there is only one component to this vector

INSERT INTO m_type (num_types,m_scalar_type_id) VALUES (1,48);

m_type.row_id = 44


Add or update rows on m_type_display_order which correspond to the display order of observation types for certain products(air,surface,water,etc) - if the order is unimportant, just insert the m_type_id the last row of the table.


associated scripts for new observation types

http://code.google.com/p/xenia/source/browse/trunk/sqlite/import_export/obskml_to_xenia_sqlite.pl script, function get_m_type_id allows for creating synonym terms to also recognize the available data

graph.xml used by http://code.google.com/p/xenia/source/browse/#svn/trunk/sqlite/time_series and http://code.google.com/p/xenia/source/browse/#svn/trunk/obskml/products/html_tables

style.xml used by http://code.google.com/p/xenia/source/browse/#svn/trunk/obskml/products/gearth

datetime formatting example for time comparison statements

see http://www.sqlite.org/cvstrac/wiki?p=DateAndTimeFunctions

multi_obs.m_date <= strftime('%Y-%m-%dT%H:00:00','2008-09-24T14:00:00','+1 hour')

Note that function datetime by default returns a space separated result which will only compare the date portion against the ISO8601 'T' separated multi_obs.m_date field. The above comparison avoids that date only comparison error.

Note to use strftime, datetime, etc on single values or small subsets as they add much time cost to queries.

use UNION instead of JOIN/OR

from experimentation it seems JOINS and OR statements may not correctly use/invoke indexes and it may be faster to do a UNION of two separate SELECT statements which are making use of table indexes

dumping a table select as insert statements

http://www.sqlite.org/sqlite.html

Below commands will dump the table select on table platform to the text file 'test.sql' as insert statements

sqlite> .output test.sql
sqlite> .mode insert platform
sqlite> select * from platform;
sqlite> .quit

get all sensors per platform

in below example, change WHERE statement for testing as needed

select distinct platform.platform_handle
    ,fixed_longitude
    ,fixed_latitude
    ,'2009-01-02T15:00:00Z'
    ,''
    ,platform.description
    ,obs_type.standard_name
  from platform
    left join sensor on platform.row_id=sensor.platform_id
    left join m_type on m_type.row_id=sensor.m_type_id
    left join m_scalar_type on m_scalar_type.row_id=m_type.m_scalar_type_id
    left join obs_type on obs_type.row_id=m_scalar_type.obs_type_id
where platform_handle like 'carocoops%'
  order by platform_handle,obs_type.standard_name;

get all the latest observations per platform

http://www.sql-tutorial.com/sql-group-by-sql-tutorial/
http://www.w3schools.com/sql/sql_groupby.asp
http://manuals.sybase.com/onlinebooks/group-as/asg1250e/sqlug/@Generic__BookTextView/9669;pt=9699

in below example, change WHERE statement for testing as needed

use left join to add in additional table columns

select platform_handle
    ,m_type_id
    ,max(m_date)
    ,m_lat
    ,m_lon
    ,m_z
    ,m_value
  from multi_obs
  where platform_handle like 'carocoops%'
  and m_date > strftime('%Y-%m-%dT%H:%M:%S','now','-12 hours')
group by platform_handle,m_type_id;

note: on postgres, group by requires all select columns except the aggregate column specified in the group by

simple row formatting on SELECT result

select ',"'||standard_name||'"' from obs_type;

SELECT dates from one sensor that are the same dates from another sensor

Recently I wanted to do some correlation testing for our platforms against some known good NDBC platforms. Correlation requires the list of values to be the same length, however we might miss an update throughout the day so we'd have a hole in our data. This query is useful to make sure the dates in the outer select match the dates in the sbuquery select, thereby giving us the data points for the same time periods. You have to execute this twice, swapping the sensor ID's.

SELECT m_date,sensor_id,m_value,d_report_hour as control_hour 
FROM multi_obs mo 
WHERE m_date >= '2011-03-23T00:00:00' and 
m_date < '2011-03-23T24:00:00' AND 
sensor_id = 518 AND 
EXISTS 
(SELECT d_report_hour 
FROM multi_obs 
WHERE m_date >= '2011-03-23T00:00:00' AND 
m_date < '2011-03-23T24:00:00' AND 
sensor_id=4644 and 
mo.d_report_hour=d_report_hour) 
ORDER BY sensor_id ASC, d_report_hour ASC;

crosstab query

see also http://www.postgresql.org/docs/9.1/static/tablefunc.html

http://code.google.com/p/xenia/wiki/VMwareMod#crosstab_query

Not sure how well this query performs or utilizes indexes, but need something like this for 'pivot' table type transforms of row values to column values.

Can dynamically do a pre-query to lookup and substitute readable column header names like wind_speed,air_pressure,etc in the below final results query.

Can also dynamically substitute in the where clauses additional m_date,sensor_id or other key search parameters.

sample query for platform = carocoops.CAP2.buoy - wind_speed(4644) and air_pressure(4648) sensor_id's

  • note the 'SELECT' and 'AS' parameters should be equal in number of expected columns.
  • note the SELECT statements have been optimized to trigger the multi_obs index usage with m_date,sensor_id order and specifics - below query returned in < 100 ms. Can run 'EXPLAIN ANALYZE' on overall and subqueries to gauge performance.

SELECT * FROM crosstab
(
  'SELECT m_date,sensor_id,m_value from multi_obs where m_date >= ''2011-10-10'' and sensor_id in (4644,4648) ORDER BY 1',
  'SELECT DISTINCT sensor_id FROM multi_obs where m_date >= ''2011-10-10'' and sensor_id in (4644,4648) ORDER BY 1'
)
AS
(
       m_date timestamp,
       m_value_1 float8,
       m_value_2 float8
);

sample output

     m_date        | m_value_1 | m_value_2 
---------------------+-----------+-----------
 2011-10-09 01:00:00 |       9.9 |    1024.9
 2011-10-09 02:00:00 |      11.6 |    1024.4
 2011-10-09 03:00:00 |        10 |    1024.6
 2011-10-09 04:00:00 |      11.9 |    1023.8
 2011-10-09 05:00:00 |      11.3 |      1023
 2011-10-09 06:00:00 |      10.4 |    1022.7
 2011-10-09 07:00:00 |      11.1 |    1022.9
 2011-10-09 08:00:00 |      12.2 |    1022.2
 2011-10-09 09:00:00 |       9.4 |    1022.2
 2011-10-09 10:00:00 |      11.1 |      1022
 2011-10-09 11:00:00 |      11.6 |    1022.1

segmentation fault: work-around

I ended up getting intermittent(hard to troubleshoot) segmentation faults in postgresql using the crosstab functionality and just ended up using the below perl transformation of the resultset to get the same type of crosstab output format.

#output results
my $sensor_id_list = '1,2,3,4,5";

my $content .= $sensor_header_display_list."\n";

$sql = "SELECT m_date,sensor_id,m_value from multi_obs where m_date >= '$startd' and m_date <= '$endd' and sensor_id in ($sensor_id_list) ORDER BY 1";
#print $sql; #debug
$sth = $dbh->prepare($sql);
$sth->execute();

my $time_key = '';
my %sensor = ();
my @array_sensor_id_list = split(/,/,$sensor_id_list);
while (my @row = $sth->fetchrow_array) {
if ($time_key eq '') { $time_key = $row[0]; } #initial row only

if (@row[0] ne $time_key) {
  $time_key=row[0];
  my $line = "$time_key,";
  foreach my $id (@array_sensor_id_list) {
    $line .= "$sensor{$id},";
  }
  chop($line);
  $content .= "$line\n";

  $time_key = @row[0];
  %sensor = ();
}

$sensor{@row[1]} = @row[2];
}

print $content;

Sign in to add a comment
Powered by Google Project Hosting