My favorites | Sign in
Project Logo
             
Search
for
Updated Jun 04, 2009 by griswolf
Labels: database
SocorroCodeAndDatabaseUpdate  
Possible changes to both code and database

Socorro Wish List

One of my (griswolf) directives is approximately "make everything work efficiently and the same." Toward this end, there are several tasks:

Probably most important, we have an inefficient database design, and some inefficient code working with it.

Next, we have a collection of 'one-off' code (and database schemas) that could be more easily maintained using a common infrastructure, common coding conventions, common schema layout, common patterns.

Finally, we have enhancement requests that would become more feasible after such changes: Such requests would be more easily handled in a cleaner programming environment; and in a cleaner environment there might be fewer significant bugs, leaving more time to work on enhancements.

Current state: See SocorroDatabaseSchema

Another Way to do Materialized Views?

The current system is somewhere between ad hoc reporting and a star architecture. The main part of this proposal focuses on converting further toward a star architecture. However there may be another way: MapReduce techniques, which could possibly be run external to Mozilla (for instance: Amazon Web Services) could be used to mine dump files and create statistical data stored in files or database. Lars mentioned to me that we now have some statistics folk on board who are interested in this.

Database Design

Commonality

Viewable Interface

Consequences of Possible Changes

Rough plan as of 2009 June

Specific Database Changes

Star Data Warhousing

Existing tables

Needed/Changed tables

Matview changes "Soon"
Incoming (raw) changes "Later"

Details

New or significantly changed tables

New product_visibility table (soon, matview)

table product_visibility (
  id serial NOT NULL PRIMARY KEY,
  productdims_id integer not null,
  start_date timestamp, -- used by MTBF
  end_date timestamp, 
  ignore boolean default False -- force aggregation off for this product id

New osdims table (soon, matview) NOTE: Data available only if 'recently frequent':

table osdims(
  id serial NOT NULL PRIMARY KEY,
  os_name TEXT NOT NULL,
  os_version TEXT);
  constraint osdims_key (os_name, os_version) unique (os_name, os_version);

Edited productdims table (soon, matview) NOTE: use case for adding products is under discussion

  CREATE TYPE release_enum AS ENUM ('major', 'milestone', 'development');
table productdims (
  id serial NOT NULL PRIMARY KEY,
  product TEXT NOT NULL, 
  version TEXT NOT NULL, 
  release release_enum NOT NULL,
  constraint productdims_key (product, version) unique ( product, version )
  );

New product_details table (later, raw data) NOTE: All data will be stored (raw data should not lose details)

table product_details (
  id serial NOT NULL PRIMARY KEY,
  product TEXT NOT NULL, -- /was/ character varying(30)
  version TEXT NOT NULL, -- /was/ character varying(16)
  release release_enum NOT NULL -- /was/ character varying(50) NOT NULL
  );

Edit mtbffacts to use edited productdims and new osdims (soon, matview)

table mtbffacts (
  id serial NOT NULL PRIMARY KEY,
  avg_seconds integer NOT NULL,
  report_count integer NOT NULL,
  window_end timestamp, -- was DATE
  productdims_id integer,
  osdims_id integer
  constraint mtbffacts_key unique ( productdims_id, osdims_id, day );
  );

New top_crashes_by_signature table (soon, matview):

table top_crashes_by_signature (
  id serial NOT NULL PRIMARY KEY,
  count integer NOT NULL DEFAULT 0,
  average_uptime real DEFAULT 0.0,
  window_end timestamp without time zone,
  window_size interval,
  productdims_id integer NOT NULL,  -- foreign key. NOTE: Filtered by recent frequency
  osdims_id integer NOT NULL,       -- foreign key. NOTE: Filtered by recent frequency
  signature TEXT
  constraint top_crash_by_signature_key (window_end, signature, productdims_id, osdims_id) unique (window_end, signature, productdims_id, osdims_id)
  );
  -- some INDEXes are surely needed --

New/Renamed top_crashes_by_url table (soon, matview):

table top_crashes_by_url (
  id serial NOT NULL,
  count integer NOT NULL,
  window_end timestamp without time zone NOT NULL,
  window_size interval not null,
  productdims_id integer,
  osdims_id integer NOT NULL,
  urldims_id integer
  constraint top_crashes_by_url_key (uridims_id,osdims_id,productdims_id, window_end) unique (uridims_id,osdims_id,productdims_id, window_end)
  );

New top_crashes_by_url_signature (soon, matview):

table top_crash_by_url_signature (
  top_crashes_by_url_id integer, -- foreign key
  count integer NOT NULL,
  signature TEXT NOT NULL
  constraint top_crashes_by_url_signature_key (top_crashes_by_url_id,signature) unique (top_crashes_by_url_id,signature)
  );

New crash_reports table (later, raw view) Replaces reports table

table crash_reports (
  id serial NOT NULL PRIMARY KEY,
  uuid TEXT NOT NULL -- /was/ character varying(50)
  client_crash_date timestamp with time zone,
  install_age integer,
  last_crash integer,
  uptime integer,
  cpu_name TEXT, -- /was/ character varying(100),
  cpu_info TEXT, -- /was/ character varying(100),
  reason TEXT, -- /was/ character varying(255),
  address TEXT, -- /was/ character varying(20),
  build_date timestamp without time zone,
  started_datetime timestamp without time zone,
  completed_datetime timestamp without time zone,
  date_processed timestamp without time zone,
  success boolean,
  truncated boolean,
  processor_notes TEXT,
  user_comments TEXT, -- /was/ character varying(1024),
  app_notes TEXT, -- /was/ character varying(1024),
  distributor TEXT, -- /was/ character varying(20),
  distributor_version TEXT, -- /was/ character varying(20)
  signature TEXT,
  productdims_id INTEGER, -- /new/ foreign key NOTE Filtered by recent frequency
  osdims_id INTEGER, -- /new/ foreign key NOTE Filtered by recent frequency
  urldims_id INTEGER -- /new/ foreign key NOTE Filtered by recent frequency
  -- /remove - see productdims_id/ - product character varying(30), 
  -- /remove - see productdims_id/ version character varying(16),
  -- /remove - redundant with build_date/ -- build character varying(30),
  -- /remove - see urldims_id/ url character varying(255),
  -- /remove - see osdims_id/ os_name character varying(100),
  -- /remove - see osdims_id/ os_version character varying(100),
  -- /remove - deprecated/ email character varying(100),
  -- /remove - deprecated/ user_id character varying(50),
  );
  -- This is a partitioned table: INDEXes are provided on date-based partitions

Tables with Minor Changes: varchar->text

table branches (
  product TEXT NOT NULL, -- /was/ character varying(30)
  version TEXT NOT NULL, -- /was/ character varying(16)
  branch TEXT NOT NULL, -- /was/ character varying(24)
  PRIMARY KEY (product, version)
table extensions (
  report_id integer NOT NULL, -- foreign key
  date_processed timestamp without time zone,
  extension_key integer NOT NULL,
  extension_id TEXT NOT NULL, -- /was/ character varying(100)
  extension_version TEXT -- /was/ character varying(16)
table frames (
  report_id integer NOT NULL,
  date_processed timestamp without time zone,
  frame_num INTEGER NOT NULL,
  signature TEXT -- /was/ varchar(255)
  );
table priority_jobs
  uuid TEXT NOT NULL PRIMARY KEY -- /was/ varchar(255)
table processors (
  id serial NOT NULL PRIMARY KEY,
  name TEXT NOT NULL UNIQUE, -- /was/ varchar(255)
  startdatetime timestamp without time zone NOT NULL,
  lastseendatetime timestamp without time zone
  );
table jobs (
  id serial NOT NULL PRIMARY KEY,
  pathname TEXT NOT NULL, -- /was/ character varying(1024)
  uuid TEXT NOT NULL UNIQUE, -- /was/ varchar(50)
  owner integer,
  priority integer DEFAULT 0,
  queueddatetime timestamp without time zone,
  starteddatetime timestamp without time zone,
  completeddatetime timestamp without time zone,
  success boolean,
  message TEXT,
  FOREIGN KEY (owner) REFERENCES processors (id)
  );
table urldims (
  id serial NOT NULL PRIMARY KEY,
  domain TEXT NOT NULL, -- /was/ character varying(255)
  url TEXT NOT NULL -- /was/ character varying(255)
  key url    -- for drilling by url
  key domain -- for drilling by domain
  );
table topcrashurlfactsreports (
  id serial NOT NULL PRIMARY KEY,
  uuid TEXT NOT NULL, -- /was/ character varying(50)
  comments TEXT, -- /was/ character varying(500)
  topcrashurlfacts_id integer
  );          

Sign in to add a comment
Hosted by Google Code