My favorites | Sign in
Project Home Downloads Wiki
Search
for

OpenPSIPearl

combining Governmental administrative datasets

Project Summary

OpenPSIPearl is a collaboration between the University of Southampton (UoS), the UK Government, and the Office of Public Sector Information (OPSI) part of The National Archive and leading academic research groups with an interest in the use of UK government administrative datasets in combination with specifically provisioned survey data.

This OpenPSIPearl project will produce three deliverables;

  1. a national scale exemplar of a 'linked data' research challenge to excite the UK research community. It will be grounded in a relevant research policy question of understanding the effect of positive activities and opportunities within the locality of UK schools (content aggregation);
  2. an understanding of how provision of UK government 'linked data' will effect the methods and best practice in research use of administrative data in combination with additional survey data (critique of methods) and;
  3. sustained collaboration via an ongoing articulation on the needs of the UK Research community for, and provision by the Government of non-personl Public Sector Information using the previously JISC Rapid Innovation funded OpenPSIportal (http://www.openpsi.org/).

Video introduction

Please find the 3 minute introduction to the OpenPSIPearl project, from the VRERI Kick-off meeting at: http://vimeo.com/9854591 You can find the presentation in the Downloads section of this site.

Twitter description

Help researchers make best use of UK Government actions to make more data available as linked data on the web.

What end user problems will we look to solve

It is anticipated that using linked administrative data will enhance methods for the analysis of survey data and to develop methods that can overcome problems commonly associated with survey data.

In terms of data linkage (also known by this community as record linkage) we anticipate that web scale linking will overcome some of the challenging problems faced in the past with small scale file to file based record linkage.

How will it change the way things were done by Community

It has been recognised by the Community that;

  1. data (‘old definition’) - rectangular dataset of numbers
  2. data (‘new definition’) - information, survey data, administrative data, additional resources (video, texts, voice)

The community has a tradition of master data management with linking new transactions/events to master records. The linked data web should provide a much broader and richer context of a master data file with shared contribution of additionally linked new data.

Project Details

  • Host Institution: University of Southampton, School of Electronics and Computer Science
  • VRERI/Strands: Theme-Resourcemgt / Theme-Experimentation
  • Duration: 6 months
  • Start Date: 01/02/2010
  • End Date: 31/07/2010
  • Amount Awarded to Project: £49.633

Project Team

  • Product Owner: John Darlington, jd@ecs.soton.ac.uk, 02380-599045
  • Developer: Mario Hernandez (mhc@ecs.soton.ac.uk), Manuel Salvadores (ms8@ecs.soton.ac.uk), Landong Zuo (lz@ecs.soton.ac.uk), Benedicto Rodriguez (br205r@ecs.soton.ac.uk)
  • Admin:
  • Partners: Office of Public Sector Information (OPSI), part of The National Archives, The Institute of Education, University of

London, The Social Policy Research Unit,York University

  • Consultants:

Documentation

Comment by project member j...@ecs.soton.ac.uk, Jan 27, 2010

userCase, endUser, rapidInnovation, progressPosts, VRERI, JISC, OpenPSIPearl

I met with Mac McDonald? on Friday 22nd January to discuss the project. Mac is from one of the academic partners, The Institute of Education, University of London.

Discussed the ongoing and relative programmes within the academic research community that use Governmental administrative datasets.

Discussed needs such as, historical changes within administrative datasets not just current state, providing some early view of what can be harvested to augment administrative datasets, consider the risks of statistical disclosure when combining datasets, capability to interact with the data via known tools such as SPSS, STATA and R.

Comment by project member j...@ecs.soton.ac.uk, Feb 17, 2010

userCase, endUser, rapidInnovation, progressPosts, VRERI, JISC, OpenPSIPearl

One of the three high level objectives for the new Open PSI project is the development of a national scale exemplar of a linked data research challenge for the UK research community. It will be grounded in a relevant research policy question of understanding the effect of positive activities and opportunities within the locality of UK schools.

In discussion with The Institute of Education, University of London and Social Policy Research Unit, University of York about this challenge we determined that a key administrative datasets include the Index of Multiple Deprivation data at ward level. However, they also use separate statistics on employment rates, occupational profiles, qualification rates, those in receipt of benefit, etc.

The background to the Index of Multiple Deprivation data can be found at:

http://www.neighbourhood.statistics.gov.uk/dissemination/Info.do?page=aboutneighbourhood/indicesofdeprivation/indices-of-deprivation.htm

Comment by project member j...@ecs.soton.ac.uk, Feb 17, 2010

techStandards, technicalDevelopment, progressPosts, rapidInnovation, VRERI, JISC, OpenPSIPearl

The team has been working on representing statistical data using a cube modeling with the scovo ontology as a basis for the representation. As planned we attended an Office of National Statistics hosted workshop recently were it was resolved to use the SDMX data model to bring statistical data to the Linked data web. The approach developed in SCOVO is compatible with the SDMX model. The SDMX model has a representation in XML and it was felt that the community would benefit from an authorised Linked Data representation of the SDMX model.

Jeni Tennisons blog has some useful material on the use of scovo to bring aggregated statistics to the web of data.

http://www.jenitennison.com/blog/

The work of the Office of National Statistics workshop is being made available via a google group. Here is the link to the workshop outcome.

http://groups.google.com/group/publishing-statistical-data/web/workshop-summary?hl=en

Comment by project member j...@ecs.soton.ac.uk, Feb 23, 2010

WIN, progressPosts, rapidInnovation, VRERI, JISC, OpenPSIPearl

The team invited John Goodwin over from Ordnance Survey to discuss the recent work on both sides on geo coding of Linked Data. John was impressed with our recent work and also we agreed to share some work he had done on visualisation using the OS Openspace API.

Comment by project member j...@ecs.soton.ac.uk, Feb 26, 2010

WIN, progressPosts, rapidInnovation, VRERI, JISC, OpenPSIPearl

We have completed another of our planned work items to provide Baseline data for schools loaded in SPARQL end point with simple visualisation. Here is the link to the visualisation on the project web site.

http://www.openpsi.org/schools/

There is more power behind this visualisation but we want to open this up as we bring the users on board.

The team also attended http://dev8d.org/ and had many good interactions with people.

Comment by project member j...@ecs.soton.ac.uk, Mar 18, 2010

WIN, progressPosts, rapidInnovation, VRERI, JISC, OpenPSIPearl

Our early work plan items were to get some Baseline data for schools and basic area statistics in linked data form with a simple visualisation over the top so we can review with our academic research partners how this linked data can be used as the basis for research activities.

The early input from our partners was they largely use IMD data at ward level augmented with some separate statistics. Landong has been working on the representation of the Index of Multiple Deprivation data reflecting the discussions by the community on the use of SCOVO and the SDMX model. The data has been translated into this form and made available in our SPARQL end point.

Mario has extended the basic visual interface with a visualisation using the OS Openspace API. It is still quite basic but using a map interface researhcers can navigate to ward level boundaries and shows schools and IMD data (England for now but we plan Scotland, Wales, NI).

http://www.openpsi.org/schools/

When you view a School you will be taken to the new data.gov.uk site where the linked data for Schools is available, for example.

http://education.data.gov.uk/doc/school/112885

Here is another prototype service developed here at UoS that gives access to some more UK gov stats

http://map.psi.enakting.org/

We plan to discuss with our partners how researchers might best discover, visualise and access the data from the web of linked data that they then might us in their research.

The principles behind the web of linked data is that things should be referencable by web browser (HTTP) and that both humans and machine processes should be able to get back useful information about the thing.

For example Ordnance Survey have defined geography for us. So 'North Swindon' is:

http://data.ordnancesurvey.co.uk/doc/7000000000038140

We have another useful service here called backlinks using it you can find all things that reference a thing, for example North Swindon

http://backlinks.psi.enakting.org/resource/doc/http://data.ordnancesurvey.co.uk/id/7000000000038140

You can click on the Population Statistic link to see the population data.

We realise this is very basic stuff but it starts to show what might be possible if all the data was linked up in the web (the open world assumption ) and not hidden in silos of separate web sites and data bases ( the closed world ).

Comment by project member j...@ecs.soton.ac.uk, Mar 23, 2010

WIN, progressPosts, rapidInnovation, VRERI, JISC, OpenPSIPearl

Landong investigated an issue of representing ranges in Statistical data and suggested a solution to the Publishing Statistical Data community. Had some positive feedback:

Hmm, that's quite clever...neat idea, thanks.

Comment by project member j...@ecs.soton.ac.uk, Mar 23, 2010

userCase, endUser, rapidInnovation, progressPosts, VRERI, JISC, OpenPSIPearl

Feedback from one of our user partners on the Baseline data for schools and basic area statistics in linked data form with a simple visualisation is below:

It looks like the open data world will be a great place if we get there. At the moment even limited linked up data appears quite restrictive in terms of the access we can get. But fingers crossed.

What you are developing would be extremely useful in providing comprehensive descriptive statistics for areas/ neighbourhoods/ schools etc. To have all these data sources linked up would be very useful.

Comment by project member j...@ecs.soton.ac.uk, Jun 9, 2010

WIN, progressPosts, rapidInnovation, VRERI, JISC, OpenPSIPearl

One of our work actions for May was to obtain some additional data for schools and improve visual analysis of the data. The data we have added is Administrative data for School level attainment.

One of our goals was to develop a national scale exemplar of a 'linked data' research challenge grounded in a relevant research policy question of understanding the effect of positive activities and opportunities within the locality of UK schools.

In order to do this we need some way of measuring the effects of positive activities on 'pupil attainment'. We need School level attainment figures. We contacted the appropriate government department and they have provided us with the School based performance data.

Comment by project member j...@ecs.soton.ac.uk, Jun 9, 2010

WIN, progressPosts, rapidInnovation, VRERI, JISC, OpenPSIPearl

We planned to hold a workshop to review utility of multidimensional statistical datasets and the utility of expressing via linked data (RDF) engaging national statistics producers, researchers and research centres. This has been slightly overtaken by the creation of a virtual community around this work that came out of the February meeting of statistics producers, hosted by ONS. We are engaged in this forum. There is a planned workshop now in July covering this community and the work that is being done to develop appopriate representations. We have created an example data set using the work of this virtual community using the IMD data to test their new SDMX-RDF representation.

Comment by project member j...@ecs.soton.ac.uk, Jun 9, 2010

WIN, progressPosts, rapidInnovation, VRERI, JISC, OpenPSIPearl

We planned in June to provide a Portal service to add new survey data, export SPARQL query results for use in research tools such SPSS. We are currently in testing phase for this work. We have the service to add new survey data under User authentication and authorisation. This week we hope to finalise the creation of criteria that allow the user to select properties of interest for export into research tools such SPSS.

Comment by project member j...@ecs.soton.ac.uk, Jun 9, 2010

FAIL, progressPosts, rapidInnovation, VRERI, JISC, OpenPSIPearl

We planned to provide an improved visual analysis of the data. We have been developing a nice visualisation of the IMD data using the OS OpenSpace? API (like Google Maps) but the smart visualisation we developed requests too many polygons for the free limit of the OS OpenSpace? API. We are talking to Ordnance Survey about this and they are being helpful so hopefully soon we can release this work.

Comment by project member j...@ecs.soton.ac.uk, Nov 11, 2010

progressPosts, rapidInnovation, JISCRI, JISC, finalProgressPost, output, prototype, product, demonstrator, OpenPSIPearl

  • Title of Primary Project Output: OpenPSIPearl: an innovative information service for ‘linked data’ research using understanding the effects on UK schools as an example research task.
  • Description of Prototype: The prototype service aims to allow a researcher to discover and use ‘linked data’ in the form administrative data that can be in combination with their own survey data. The example research task used to validate the service was understanding the effects and opportunities within the locality of UK schools.
  • End User of Prototype including Screenshots:

The aim of the new service is to allow a researcher to locate ‘linked data’ on the web that can be used as part of their study. The service has been populated with data related to Schools in the UK but could be repurposed to other reearch topics. The user logs into the service primarily to keep any uploaded survey data secured to the user. We have created an anonymous user to allow anyone to experiment with the service. Once logged in the researcher sees the home page.

The navigation bar has the core functions of the service which are:

  1. Find ‘linked data’ on the web that can be used as part of the study.
  2. Upload any personal survey data.
  3. Review and create selections of appropriate candidates to study (criteria).
  4. Export the data to existing analysis tools.

The first action is to find ‘linked data’ on the web that can be used as part of the study. The service has already had some ‘linked data’ already identified such as the Index of Multiple Deprivation data at ward level which was considered by our partners to be core underlying data to the research task as well as individual School performance so we can look for links between the effects and opportunities within the locality of UK schools that may have an impact on performance of schools.

The home page allows the researcher to visually investigate data. The service looks to paint relative performance of geographic areas on the map. The researcher can select from a variety of ‘linked data’ and see the geographic variance. As well as geographically the researcher can use charts to look at relative variance.

School performance can be view both in aggregate and down to individual school. Below we can see markes for specific schools in a location and the relative performance. Here we are looking at a specific age group, Key Stage 1 performance, and some schools, marked white, do not have these age groups (early years).

Here is another view where we can look at performance over time in the chart as well as current area variance in the graph.

The service allows the researcher to create selections of appropriate candidates to study (criteria). A simple step process allows this.

The first step is to give the new criteria a simple name and description.

The second step is to choose the type of criteria. We can create criteria based on the data in the service or by combining existing defined criteria.

The third step is to choose the data (or criteria) we want and then how we want to restrict the data to make the selection. Here we wants schools with army in their name.

Criteria once created are available in the service and can be used in various ways in the service.

For example here we show the age group, Key Stage 1 school performance for a specific criteria for cathelic schools.

Another function of the service is to allow the reearcher to upload personal survey data for combination with the 'linked data' in support of their study. The process starts with a simple upload.

The researher is asked to provide some core columns that can be used to map the data to instances in the UK backbone of data ( in this case Schools ).

The researher is provided with some feedback on the mapping.

And can see the data properties they have added from their survey data and other uploaded items.

Another way to bring new data into the service is to add new sources of ‘linked data’ on the web that can be used as part of the study.

Currently the user specifies a SPARQL end point as a url. The service can interrogate the SPARQL end point to understand the available data properties within it. It can identify a number of types of data modelling that are commonly used in the ‘linked data’ web to represent useful data such as statistical data including Time Series and

can then create a query plan for the SPARQL end point to be able access the data held. This is then used by the service to make the data available to the research as part of the data properties available for actions such as creating criteria, exporting data for analysis, etc.

The last stage is to export the data to existing analysis tools.

It was felt that there was no point replicating the features of existing analysis tools such as SPSS, STATA and R. It was appropriate to allow for the export of data from the new service to be used in existing tools. The service allows for the selection of combinations of ‘linked data’ and additional survey data to export.

The user selects a subset of schools to analyse by selecting a previously defined criteria:

and then the attributes of the school and surounding area data properties that are to be used in the analysis. They then generate the data to be downloaded for use in the existing tools.

The job of the service is now complete.

  • Date prototype was launched: 28/07/2010
  • Project Team Names, Emails and Organisations:

John Darlington (jd@ecs.soton.ac.uk), Mario Hernandez (mhc@ecs.soton.ac.uk), Manuel Salvadores (ms8@ecs.soton.ac.uk), Landong Zuo (lz@ecs.soton.ac.uk)

  • Table of Content for Project Posts ( see http://code.google.com/p/vreri/wiki/OpenPSIPearl )
    • userCase historical change in datasets as well as current state, capability to interact with the data via known tools such as SPSS, STATA and R.
    • userCase key administrative datasets include the Index of Multiple Deprivation data at ward level.
    • techStandards work on representing statistical data using the SDMX data model and SCOVO
    • WIN Ordnance Survey impressed with our recent work on Geographic visualisation of linked data.
    • WIN Baseline data for schools loaded in SPARQL end point with simple visualisation.
    • WIN Representation of the Index of Multiple Deprivation data using SCOVO and the SDMX model.
    • WIN Thanks for representing ranges in Statistical data for the Publishing Statistical Data community.
    • userCase extremely useful to link up comprehensive descriptive statistics for areas/ neighbourhoods/ schools etc.
    • WIN Obtain School level attainment figures from the appropriate government department.
    • WIN Supported Publishing Statistical Data community by providing our IMD data as exemplar implementation.
    • WIN Portal service to add new survey data, export SPARQL query results for use in research tools such SPSS.
    • FAIL OS OpenSpace? API we developed to unable to support colour visualisation, too many polygons for free license.

Sign in to add a comment
Powered by Google Project Hosting