My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
DataIngestion  
Scripts that create and import the data into the Xenia database.
Updated Mar 30, 2012 by d...@inlet.geol.sc.edu

Introduction

These are the scripts currently running that retrieve the data from various sources and import that data into the Xenia database.

Machine: Neptune

neptune.baruch.sc.edu

Scripts Neptune


NetCDF Scout

Script: populate_xenia_netcdfscout.sh

Command line parameters:

Directory: /home/xeniaprod/cron

Description: Shell script for processing netcdf files that aren't handled through the other RA specific processing scripts below. The files are pulled down then processed into the database. Deletes the previous KML, ZIP and KMZ files.

Shell Script Makeup

Script: GetLatestData.pl

Directory: /home/xeniaprod/scripts/scout/trunk

Command Line Parameters:

  --URLControlFile=./URLList.xml 
  --DirForObsKml=/home/xeniaprod/feeds/scout/general 
  --NetCDFDir=/home/xeniaprod/tmp/netcdf/scout 
  --FetchLogDir=/home/xeniaprod/tmp/netcdf/scout/fetch_logs
  --UseLastNTimeStamps=10 --UseLastNTimeStampsForOrg="nos;60,nws;20" > /home/xeniaprod/tmp/log/scout_netcdf.log

  --URLControlFile is the XML file with the URL list the script will download the files from.
  --FileFilter is the filter to apply to the file listings to allow the script to download the files we really want. Optional.
  --DirForObsKml provides the path to the store the ObsKML files created. This is an optional argument, if not provided no ObsKML files are written.
  --Delete specifies we delete the netcdf files after we write the ObsKML files. This is an optional argument.
  --NetCDFDir is the directory to store the downloaded files. Optional, default is ./latest_netcdf_files.
  --FetchLogDir is the directory were the file time stamps are stored. The script uses these files to determine which files are really the latest. Optional, default is ./fetch_logs
  --UseLastNTimeStamps Integer representing the last N time entries to use when converting the data to obsKML. Optional.
  --UseLastNTimeStampsForOrg Optional. List specifing orginization and the last N time entries to use for them. Whereas the UseLastNTimeStamps is applied to every organization, this can tailor to the specific. If not provided and UseLastNTimeStamps is provided, it is used.
  --SkipZipObsKmlDirs Optional. A value of 1 will skip the zipping of the directories under the DirForObsKml option.
Description: Processes the netCDF files into obsKML files.

Schedule: Twice an hour


Carocoops NetCDF Scout

Script: populate_xenia_carocoops.sh

Command line parameters:

Directory: /home/xeniaprod/cron

Description: Shell script for processing the Carocoops netcdf files. This script incoporates the NetCDF scout above, but only handles the Carocoops netcdf files. The files are pulled down then processed into the database. Deletes the previous KML, ZIP and KMZ files.

Shell Script Makeup

Script: GetLatestData.pl

Directory: /home/xeniaprod/scripts/scout/trunk

Command Line Parameters: See NetCDF section above.

  --URLControlFile=./CarocoopsURLList.xml 
  --DirForObsKml=/home/xeniaprod/feeds/scout/carocoops 
  --NetCDFDir=/home/xeniaprod/tmp/netcdf/scout 
  --FetchLogDir=/home/xeniaprod/tmp/netcdf/scout/fetch_logs
  --UseLastNTimeStamps=10
Description: Processes the netCDF files into obsKML files.

Script: obskml_to_xenia_postgresql.pl

Directory: /home/xeniaprod/scripts/postgresql/import_export

Command Line Parameters:

  Argument 1 - The URL to the KMZ file to process.<br/>
  Argument 2 - Path where the SQL file will be written.<br/>
  Argument 3 - Filename to use for the SQL file.<br/>
  Argument 4 - Database name to connect to.<br/>

  http://localhost/xenia/feeds/carocoops/carocoops_metadata_latest.kmz 
  /home/xeniaprod/tmp/sqlfiles
  "latest_carocoops"
  "xenia"


Description: Processes the obsKML files writing them into a sql file that is then imported into the database.

Schedule: Runs every quarter hour. We run it this way since the dial up times for the platforms occur throughout the hour, one may be dialed at 5 past the hour and another at 20 past.


CORMP NetCDF Scout

Script: populate_xenia_cormp.sh

Command line parameters:

Directory: /home/xeniaprod/cron

Description: Shell script for processing the CORMP netcdf files. This script incoporates the NetCDF scout above, but only handles the CORMP netcdf files. The files are pulled down then processed into the database. Deletes the previous KML, ZIP and KMZ files.

Shell Script Makeup

Script: GetLatestData.pl

Directory: /home/xeniaprod/scripts/scout/trunk

Command Line Parameters: See NetCDF section above.

  --URLControlFile=./CormpURLList.xml
  --DirForObsKml=/home/xeniaprod/feeds/scout/cormp 
  --NetCDFDir=/home/xeniaprod/tmp/netcdf/scout 
  --FetchLogDir=/home/xeniaprod/tmp/netcdf/scout/fetch_logs
  --UseLastNTimeStamps=10
Description: Processes the netCDF files into obsKML files.

Script: obskml_to_xenia_postgresql.pl Directory: /home/xeniaprod/scripts/postgresql/import_export

Command Line Parameters:

  Argument 1 - The URL to the KMZ file to process.<br/>
  Argument 2 - Path where the SQL file will be written.<br/>
  Argument 3 - Filename to use for the SQL file.<br/>
  Argument 4 - Database name to connect to.<br/>
  
  http://localhost/xenia/feeds/cormp/cormp_metadata_latest.kmz
  /home/xeniaprod/tmp/sqlfiles
  "latest_cormp"
  "xenia"
Description: Processes the obsKML files writing them into a sql file that is then imported into the database.

Schedule: Runs every quarter hour. We run it this way since the dial up times for the platforms occur throughout the hour, one may be dialed at 5 past the hour and another at 20 past.


CORMP2 ObsKML Scout

Script: populate_xenia_cormp2.sh

Command line parameters:

Directory: /home/xeniaprod/cron

Description: Shell script for processing the CORMP obskml files(currently just SUN2WAVE for some wave and current observation types).

Shell Script Makeup

Script: get_latest_data.pl

Directory: /home/xeniaprod/scripts/postgresql/feeds/cormp2

Script: obskml_to_xenia_postgresql.pl Directory: /home/xeniaprod/scripts/postgresql/import_export

Command Line Parameters:

  Argument 1 - The URL to the KMZ file to process.<br/>
  Argument 2 - Path where the SQL file will be written.<br/>
  Argument 3 - Filename to use for the SQL file.<br/>
  Argument 4 - Database name to connect to.<br/>
  
  http://localhost/xenia/feeds/cormp/cormp2_metadata_latest.kmz
  /home/xeniaprod/tmp/sqlfiles
  "latest_cormp2"
  "xenia"
Description: Processes the obsKML files writing them into a sql file that is then imported into the database.

Schedule: 15,45 minute marks every hour.


USF NetCDF Scout

Script: populate_xenia_usf.sh

Command line parameters:

Directory: /home/xeniaprod/cron

Description: Shell script for processing the USF. This script incoporates the NetCDF scout above, but only handles the USF netcdf files. The files are pulled down then processed into the database. Deletes the previous KML, ZIP and KMZ files.

NOTE NDBC netcdf processing stopped. See here for new NDBC ingestion.

Shell Script Makeup

Script: GetLatestData.pl

Directory: /home/xeniaprod/scripts/scout/trunk

Command Line Parameters: See NetCDF section above.

  --URLControlFile=./UsfURLList.xml
  --DirForObsKml=/home/xeniaprod/feeds/scout/usf 
  --NetCDFDir=/home/xeniaprod/tmp/netcdf/scout 
  --FetchLogDir=/home/xeniaprod/tmp/netcdf/scout/fetch_logs 
  --UseLastNTimeStamps=10
Description: Processes the netCDF files into obsKML files.

Script: obskml_to_xenia_postgresql.pl Directory: /home/xeniaprod/scripts/postgresql/import_export

Command Line Parameters:

  Argument 1 - The URL to the KMZ file to process.<br/>
  Argument 2 - Path where the SQL file will be written.<br/>
  Argument 3 - Filename to use for the SQL file.<br/>
  Argument 4 - Database name to connect to.<br/>
  
  For USF:
  http://localhost/xenia/feeds/usf/usf_metadata_latest.kmz
  /home/xeniaprod/tmp/sqlfiles
  "latest_usf"
  "xenia"


Description: Processes the obsKML files writing them into a sql file that is then imported into the database.

Schedule: Runs every quarter hour. We run it this way since the dial up times for the platforms occur throughout the hour, one may be dialed at 5 past the hour and another at 20 past.

Gliders

Script: populate_gliders_usf.sh

Command line parameters:

Directory: /home/xeniaprod/cron

Description: Shell script for processing the USF Webb gliders data.

Shell Script Makeup

Script: WebbGlider.py

Directory: /home/xeniaprod/scripts/postgresql/feeds/gliders/webb

Command Line Parameters:

  --ConfigFile=/home/xeniaprod/config/usfWebbGliderDataIngest.ini
Description: Connects to the glider database at USF to process currentmissions. Since these are not constantly on going, we schedule the cron job manually when one is occurring.


NWS Processing

Script: populate_xenia_nws

Directory: /home/xeniaprod/cron

Description: Pulls in NWS data directly from the NWS site. This is used in tandem with the NetCDF scout. Currently this is scheduled to run in crontab.

Shell Script Makeup

Script: mk_sql_for_xenia.pl

Directory: /home/xeniaprod/scripts/postgresql/feeds/federal

Command line parameters:

  Argument 1: Provider name – nos, nws, etc <br/>
  Argument 2: Not used <br/>
  Argument 3: Fully qualified path to SQL output file <br/>
  Argument 4: Value of 'debug' will enable logging of some print   statements

  nws 
  24 
  /home/xeniaprod/tmp/sqlfiles/nws.sql
  debug
Description: Connects to the NWS webservice, pulls the data down into a SQL file for importation into the database. Next psql is executed to import the SQL file.

Schedule: Runs every 10 minutes. NWS stations have a fairly rapid update rate.


NDBC Processing

Script: populate_xenia_ndbc

Directory: /home/xeniaprod/cron

Description: Now using the IOOS Dif web service, pulls in NDBC data directly from the NDBC site. Only pulling a limited number of stations to test out the Dif service. Remaining NDBC stations are processed through the USF netcdf processing.

Shell Script Makeup

Script:DataIngestion.py

Directory: /home/xeniaprod/scripts/postgresql/feeds/ioosdif

Command line parameters:

  --ConfigFile=/home/xeniaprod/config/ndbcDataIngest.ini  

Schedule: Runs twice an hour.

INI File

[Database] - Section for the database parameters
user =     - User name used to log onto the database
password = - Password used
host =     - Host address where the database server runs.
name =     - Database name
connectionstring = - SQLAlchemey database driver connection string.

[settings] - Section for general settings
uomconversionfile= - Path to the XML units conversion file.
 
[logging] - Section for log file
configfile = The fully qualified path to the logging config file.

[area] - Section for the area of interest when checking for platforms in the bounding box given below.
bbox= - Polygon defining our area of interest.

[processing] 
organizationlist = - Comma seperataed list of Organization IDs to process. For each entry here, there should be a corresponding section below. 

[ndbc] - Organization from the organization list above.
difurl= - Base URL for the dif service
checkfornewplatforms= - Either 0 or 1. 1 will do a getCapabilities query to determine if there are any new platforms in the bbox defined above. Also checks the available observations against the observation defined on the platform in our database.
lastnhours= - The last N hours of data to request in a getObservation query.
processingobject= - Name of the object used to do the processing.
addnewplatformstodb= Either 0 or 1. 1 will create new platforms/sensors according to what is discovered in the getCapabilities query.
jsonconfig= - JSON config file that defines the mapping of observations available from the Dif service to the Xenia names of those observations.
stationoffering= -  The provider specific URN portion needed in the getObservation query.
newplatformkml= - If checkfornewplatforms is 1, the is the fully qualified path to write a kml file out containing new platforms found and observations found that do not exist on the database platforms.
whitelist= If defined, this is a comma delimeted list of stations to pull data via getObservations. If not defined, the organization name in this section is used to pull all stations.

Xenia Mapping File

For NDBC, we ask for the getObservation data to be returned in a CSV format. This file defines a mapping for those columns. At the head of the file are columns common across all the CSV data returns, then under the "observation_columns" area we define the Dif providers observation name which defines columns specific to that datum. The mapping from the CSV column to the Xenia observation are defined under "m_value_columns". Some observations(Winds, Waves, Currents) contain multiple datums, each having a mapping in the m_value_column. For multiple datums, the column order must be preserved in the "m_value_columns".

{
  "platform_identifier" : "station_id",
  "sensor_identifier" : "sensor_id",
  "fixed_location" : 
  {
    "lat" : "latitude (degree)",
    "lon" : "longitude (degree)"
  },
  "datetime" : "date_time",
  "observation_columns" :
  {
    "air_pressure_at_sea_level": 
    {
      "depth" : "depth (m)",
      "m_value_columns":
      [
        {"air_pressure_at_sea_level (hPa)": "air_pressure"}
      ]
    },
    
    "air_temperature": 
    {
      "depth" : "depth (m)",
      "m_value_columns":
      [
        {"air_temperature (C)": "air_temperature"}
      ]
    }, 
    "currents": 
    {
      "depth" : "depth (m)",
      "bin (count)" : "s_order",
      "m_value_columns":
      [
        {"direction_of_sea_water_velocity (degree)": "current_to_direction"}, 
        {"sea_water_speed (cm/s)": "current_speed"}
      ]
    }, 
    "sea_floor_depth_below_sea_surface": 
    {
      "m_value_columns":
      [
        {"sea_floor_depth_below_sea_surface (m)": "depth"}
      ]
    }, 
    "sea_water_electrical_conductivity": 
    {
      "depth" : "depth (m)",
      "m_value_columns":
      [
        {"sea_water_electrical_conductivity (mS/cm)": "water_conductivity"}
      ]
    }, 
    "sea_water_salinity": 
    {
      "depth" : "depth (m)",    
      "m_value_columns":
      [
        {"sea_water_salinity (psu)": "salinity"}
      ]
    }, 
    "sea_water_temperature": 
    {
      "depth" : "depth (m)",    
      "m_value_columns":
      [
        {"sea_water_temperature (C)": "water_temperature"}
      ]
    }, 
    "waves": 
    {
      "m_value_columns":
      [
        {"mean_wave_direction (degree)": "mean_wave_direction_peak_period"}, 
        {"principal_wave_direction (degree)": "principal_wave_direction"}, 
        {"sea_surface_swell_wave_period (s)": "dominant_wave_period"}, 
        {"sea_surface_swell_wave_significant_height (m)": "swell_height"}, 
        {"sea_surface_swell_wave_to_direction (degree)": "swell_wave_direction"}, 
        {"sea_surface_wave_mean_period (s)": "average_wave_period"}, 
        {"sea_surface_wave_significant_height (m)": "significant_wave_height"}, 
        {"sea_surface_wave_to_direction (degree)": "significant_wave_to_direction"}, 
        {"sea_surface_wind_wave_period (s)": "wind_wave_period"}, 
        {"sea_surface_wind_wave_significant_height (m)": "wind_wave_height"}, 
        {"sea_surface_wind_wave_to_direction (degree)": "wind_wave_direction"}
      ]
    }, 
    "winds": 
    {
      "depth" : "depth (m)",    
      "m_value_columns":
      [
        {"wind_from_direction (degree)": "wind_from_direction"}, 
        {"wind_speed (m/s)": "wind_speed"}, 
        {"wind_speed_of_gust (m/s)": "wind_gust"}
      ]
    }
  }

NOS Processing

Script: populate_xenia_nos

Directory: /home/xeniaprod/cron

Description: Now using the IOOS Dif web service, pulls in NOS data directly from the NOS site. Currently this is scheduled to run in crontab.

Shell Script Makeup

Script:DataIngestion.py

Directory: /home/xeniaprod/scripts/postgresql/feeds/ioosdif

Command line parameters:

  --ConfigFile=/home/xeniaprod/config/nosDataIngest.ini  

Schedule: Runs twice an hour.


USGS Processing

Script: populate_xenia_usgs

Directory: /home/xeniaprod/scripts/postgresql/feeds/federal/usgs

Description: Pulls in USGS data directly from the USGS site. Currently this is scheduled to run in crontab. Cleans up previous KML, KMZ, and SQL files.

Shell Script Makeup

Script: gen_wq_obskml.pl

Directory: /home/xeniaprod/scripts/postgresql/feeds/federal/usgs

Command line parameters:

  Argument 1: Provider name – usgs 
  Argument 2: Base output directory. Argument one is appended to complete the directory. 
  Argument 3: XML platform list that details the platforms to query.
Description: Connects to the USGS webservice, pulls the data down into a KML file for importation into the database.

Script: obskml_to_xenia_postgresql.pl Directory: /home/xeniaprod/scripts/postgresql/import_export

Command Line Parameters:

 Argument 1 - The URL to the KMZ file to process.
 Argument 2 - Path where the SQL file will be written.
 Argument 3 - Filename to use for the SQL file.
 Argument 4 - Database name to connect to.
Description: Processes the obsKML files writing them into a sql file that is then imported into the database.

Schedule: 45 minutes after the hour.


FIT Processing

Script: feed_fit.sh

Directory: /home/xeniaprod/cron

Description: Shell script that pulls in data from the Florida Institute of Technology

Shell Script Makeup

Script: fit_to_obskml.pl

Directory: /home/xeniaprod/scripts/postgresql/feeds/fit

Command line parameters:

  None
  Note: There are hardcoded paths and other settings that control where the data is retrieved and saved.
  URL: my $url = 'http://my.fit.edu/coastal/DATA.HTM';
  The text data is stored: "./latest.txt"
  The obsKml is stored: "./fit.kml"
  The kml file is then zipped and copied to: "/var/www/xenia/feeds/fit/fit_metadata_latest.kmz"
Description: Process the FIT data.

Script: obskml_to_xenia_postgresql.pl

Directory: /home/xeniaprod/scripts/postgresql/import_export

Command line parameters:

  Argument 1 - The URL to the KMZ file to process.
  Argument 2 - Path where the SQL file will be written.
  Argument 3 - Filename to use for the SQL file.
  Argument 4 - Database name to connect to.

  http://localhost/xenia/feeds/fit/fit_metadata_latest.kmz 
  ""
  "latest_in_situ" 
  "xenia"
Description: Takes the obsKML and writes it into the database.

Schedule: Runs every quarter hour.


SCCF Processing

Script: feed_sccf.sh

Directory: /home/xeniaprod/cron

Description: Shell script that pulls in data from Sanibel-Captiva Conservation Foundation River,Estuary and Coastal Observing Network in Florida.

Shell Script Makeup

Script: sccf_to_obskml.pl

Directory: /home/xeniaprod/scripts/postgresql/feeds/sccf

Command line parameters:

  None
  Note: There are hardcoded paths and other settings that control where the data is retrieved and saved.
  URL: my $url = 'http://recon.sccf.org/latest.kml';
  The text data is stored: "./latest.txt"
  The obsKml is stored: "./sccf.kml"
  The kml file is then zipped and copied to: "/var/www/xenia/feeds/sccf/sccf_metadata_latest.kmz"
Description: Process the SCCF data.

Script: obskml_to_xenia_postgresql.pl

Directory: /home/xeniaprod/scripts/postgresql/import_export

Command line parameters:

  Argument 1 - The URL to the KMZ file to process.
  Argument 2 - Path where the SQL file will be written.
  Argument 3 - Filename to use for the SQL file.
  Argument 4 - Database name to connect to.

  http://localhost/xenia/feeds/fit/fit_metadata_latest.kmz 
  ""
  "latest_in_situ" 
  "xenia"
Description: Takes the obsKML and writes it into the database.

Schedule: Runs every quarter hour.


FLDEP Processing

Script: feed_fldep.sh

Directory: /home/xeniaprod/cron

Description: Shell script that pulls in data from the Florida Department of Environmental Protection

Shell Script Makeup

Script: fldep_to_obskml.pl

Directory: /home/xeniaprod/scripts/postgresql/feeds/fldep

Command line parameters:

  None
  Note: There are hardcoded paths and other settings that control where the data is retrieved and saved.
  URL: my $url = 'http://www.fldep-stevens.com/...';
  The text data is stored: "./latest.txt"
  The obsKml is stored: "./fldep.kml"
  The kml file is then zipped and copied to: "/var/www/xenia/feeds/fldep/fldep_metadata_latest.kmz"
Description: Process the FLDEP data.

Script: obskml_to_xenia_postgresql.pl

Directory: /home/xeniaprod/scripts/postgresql/import_export

Command line parameters:

  Argument 1 - The URL to the KMZ file to process.
  Argument 2 - Path where the SQL file will be written.
  Argument 3 - Filename to use for the SQL file.
  Argument 4 - Database name to connect to.

  http://localhost/xenia/feeds/fldep/fldep_metadata_latest.kmz 
  ""
  "latest_in_situ" 
  "xenia"
Description: Takes the obsKML and writes it into the database.

Schedule: Runs every 20 minutes.


YSI Apache Pier Processing

Script: getYSIData.py

Directory: /home/xeniaprod/scripts/postgresql/feeds/ysi

Command line parameters:

  Argument 1: Fullpath to the XML configuration file.

  /home/xeniaprod/config/ysiParseConfig.xml

Description: Pulls in the data for the YSI site which is described in the configuration file provided on the command line. This script pulls in the data for the Apache Pier.

Schedule: Runs twice an hour.


NERRS Processing

Script: populate_xenia_nerrs

Directory: /home/xeniaprod/cron

Description: Pulls in NERRS data directly from the NERRS site. This script has a command line parameter to call out how many records to request. This is needed since once a day a backfill request of 96(15 minute interval between records, so last 24 hours) as there is a downtime each time for the NERRS data telemytry.

Shell Script Makeup

Script: filter_CDMO_Soap.pl

Directory: /home/xeniaprod/scripts/postgresql/feeds/nerrs

Command line parameters:

  Argument 1: Number of records to request 

  Hourly data:
  4
  Backfill:
  96
Description: Connects to the NERRS SOAP webservice requesting the observation data as XML. Each station has an xml file created.

Script: gen_nerrs_obskml.pl

Directory: /home/xeniaprod/scripts/postgresql/feeds/nerrs

Command line parameters:

  Argument 1: Provider Name, nerrs  
  Argument 2: Output directory. Argument 1 is appended to it to form the complete path.  
  Argument 3: Platform list xml file.  
  Argument 4: Input file directory, should point to the location of the XML files created by the filter_CDMO_Soap.pl script above. 

  nerrs 
  /home/xeniaprod/feeds/scout 
  ./nerrs_platform_list.xml  
  /home/xeniaprod/feeds/scout/nerrs/soap 
Description: Creates the obsKML files by parsing the XML files created by filter_CDMO_Soap.pl.

Script: obskml_to_xenia_postgresql.pl

Directory: /home/xeniaprod/scripts/postgresql/import_export

Command Line Parameters:

  Argument 1 - The URL to the KMZ file to process.
  Argument 2 - Path where the SQL file will be written.
  Argument 3 - Filename to use for the SQL file.
  Argument 4 - Database name to connect to.

  http://localhost/xenia/feeds/nerrs/nerrs_metadata_latest.kmz 
  /home/xeniaprod/tmp/sqlfiles 
  "latest_nerrs" 
  "xenia"
Description: Processes the obsKML files writing them into a sql file that is then imported into the database.

Schedule: Runs twice an hour.


YSI Nerrs Processing

Script: getYSIData.py

Directory: /home/xeniaprod/scripts/postgresql/feeds/ysi

Command line parameters:

  Argument 1: Fullpath to the XML configuration file.

  /home/xeniaprod/config/nerrsYSIConfigLinux.xml

Directory: /home/xeniaprod/ /scripts/postgresql/feeds/ysi

Description: Pulls in the data for the YSI site which is described in the configuration file provided on the command line. This script pulls in the data for the some NERRS water quality sites.

Schedule: Runs twice an hour.


NCSU

Gliders

Script: populate_gliders_ncsu.sh

Command line parameters:

Directory: /home/xeniaprod/cron

Description: Shell script for processing the NCSU Webb gliders data.

Shell Script Makeup

Script: WebbGlider.py

Directory: /home/xeniaprod/scripts/postgresql/feeds/gliders/webb

Command Line Parameters:

  --ConfigFile=/home/xeniaprod/config/ncsuWebbEmailDataIngest.ini
Description: This setup processes emails that the Webb glider service sends. We have an email account setup to receive the emails.


Machine: Mapping

http://129.252.139.139/

Scripts: Mapping


MODIS Sea Surface Temperature

Script: get_usf_modis_sst.sh

Directory: /home/xeniaprod/cron

Description: Gets the latest modis file and updates the Xenia database on neptune.

Shell Script Makeup

Script: get_latest_data.pl

Directory: /home/xeniaprod/scripts/postgresql/remotesensing/modis_sst

Command line parameters:

  None. 
  NOTE: There are a number of hardcoded settings in the script, such as working directorys and the URLs to connect to for the data.       
The following are the hardcoded working directories:
  my $scratch_dir = '/home/xeniaprod/tmp/remotesensing/usf/'.$layer_name;
  my $dest_dir    = '/home/xeniaprod/feeds/remotesensing/'.$layer_name;  
  my $fetch_logs   = '/home/xeniaprod/tmp/remotesensing/usf/'.$layer_name.'/fetch_logs';    
The following are the hardcoded URLS:
  @dir_urls = (
    "http://cyclops.marine.usf.edu/modis/level3/husf/fullpass/$currentYear/$yesterdayJulian/1km/pass/intermediate/",
    "http://cyclops.marine.usf.edu/modis/level3/husf/fullpass/$currentYear/$yesterdayJulian/1km/pass/final/",
    "http://cyclops.marine.usf.edu/modis/level3/husf/fullpass/$currentYear/$todayJulian/1km/pass/intermediate/",
    "http://cyclops.marine.usf.edu/modis/level3/husf/fullpass/$currentYear/$todayJulian/1km/pass/final/"
  );    
The $product_id variable is hardcoded to the value from the product_type table for the data we want. See the Description below.
The $psql_command is also hardcoded to connect to the database to import the SQL statement.

Description: Gets the modis file and updates the Xenia database on neptune. The files are downloaded then the file name modified to the format of modis_sst_year_month_day_hour_minute.png then moved to the $dest_dir described above. This is done to have a consistent file naming scheme across the products. The database table used is "timestamp_lkp" which is a generalized table for the remote sensing products. For a given data layer, a mappping product would query the database for the specific product id's(layer's) most recent time entry. The database would return the timestamp which would then be appended to the layer name to then get access to the most recent file. The table column layout is as follows:

  row_id - Autoincrementing integer as the primary key
  row_entry_date - Date when the row was added to the table
  row_update_date - Date the row was modified last.
  product_id - Integer that specifies what product the row is for.  These are described in the product_type table. Modis is id 1.
  pass_timestamp - the products date, the time the date was taken.
  filepath - the child path to the file. The user would have to know the parent directory, such as /home/xeniaprod/feeds/remotesensing, then the filepath would be appended to that to then get the requested file.

Schedule: Top of each hour


Interpolated Remote Sea Surface Temperature

Script: get_usf_modis_sst.sh

Directory: /home/xeniaprod/cron

Description: Gets the latest interpolated SST file and updates the Xenia database on neptune.

Shell Script Makeup

Script: get_latest_data.pl

Directory: /home/xeniaprod/scripts/postgresql/remotesensing/modis_sst

Command line parameters:

  None. 
  NOTE: There are a number of hardcoded settings in the script, such as working directorys and the URLs to connect to for the data.       
The following are the hardcoded working directories:

    my $scratch_dir = '/home/xeniaprod/tmp/remotesensing/usf/'.$layer_name;
    my $dest_dir    = '/home/xeniaprod/feeds/remotesensing/'.$layer_name;  
    my $fetch_logs   = '/home/xeniaprod/tmp/remotesensing/usf/oi_sst/fetch_logs';
The following are the hardcoded URLS:

  @dir_urls = (
    'http://ocgweb.marine.usf.edu/Products/OI/oidat/'
  );    
The $product_id variable is hardcoded to the value from the product_type table for the data we want. See the Description below.
The $psql_command is also hardcoded to connect to the database to import the SQL statement. Description: Gets the modis file and updates the Xenia database on neptune. The files are downloaded then the file name modified to the format of oi_sst_year_month_day_hour_minute.png then moved to the $dest_dir described above. This is done to have a consistent file naming scheme across the products. The database table used is "timestamp_lkp" which is a generalized table for the remote sensing products. For a given data layer, a mappping product would query the database for the specific product id's(layer's) most recent time entry. The database would return the timestamp which would then be appended to the layer name to then get access to the most recent file.

The database would return the timestamp which would then be appended to the layer name to then get access to the most recent file. The table column layout is as follows:

  row_id - Autoincrementing integer as the primary key
  row_entry_date - Date when the row was added to the table
  row_update_date - Date the row was modified last.
  product_id - Integer that specifies what product the row is for. These are described in the product_type table. Modis is id 1.
  pass_timestamp - the products date, the time the date was taken.
  filepath - the child path to the file. The user would have to know the parent directory, such as /home/xeniaprod/feeds/remotesensing, then the filepath would be appended to that to then get the requested file.

Schedule: Top of each hour


Advanced Very High Resolution Radiometer Sea Surface Temperature

Script: get_usf_avhrr_sst.sh.sh

Directory: /home/xeniaprod/cron

Description: Gets the latest AVHRR SST file and updates the Xenia database on neptune. Also updates the old schema database to feed the static maps for SECOORA generated at Chapel Hill.

Shell Script Makeup

Script: get_latest_data.pl

Directory: /home/xeniaprod/scripts/postgresql/remotesensing/avhrr_sst

Command line parameters:

  None. 
  NOTE: There are a number of hardcoded settings in the script, such as working directorys and the URLs to connect to for the data.       
The following are the hardcoded working directories:

    my $scratch_dir = '/home/xeniaprod/tmp/remotesensing/usf/'.$layer_name;
    my $dest_dir    = '/home/xeniaprod/feeds/remotesensing/'.$layer_name;  
    my $dest_dir_2    = '/nautilus_usr2/maps/seacoos/data/usf/'.$layer_name;  
The following are the hardcoded URLS:

    @dir_urls = (
      'http://www.imars.usf.edu/husf_avhrr/products/images/fullpass/'.$yyyy_dot_mm
    );

    @final_dods_urls = (
      'http://www.imars.usf.edu/dods-bin/nph-dods/husf_avhrr/FULL_PASS_HDF_SST/'.$yyyy_dot_mm
    );
    $final_suffix = 'usf.sst.hdf.html';

    @auto_dods_urls = (
      'http://www.imars.usf.edu/dods-bin/nph-dods/husf_avhrr/FULL_PASS_HDF_SST/auto/'.$yyyy_dot_mm
    );
  );    
The $product_id variable is hardcoded to the value from the product_type table for the data we want. See the Description below.
The $psql_command is also hardcoded to connect to the database to import the SQL statement. Description: Gets the modis file and updates the Xenia database on neptune. The files are downloaded then the file name modified to the format of oi_sst_year_month_day_hour_minute.png then moved to the $dest_dir described above. This is done to have a consistent file naming scheme across the products. The database table used is "timestamp_lkp" which is a generalized table for the remote sensing products. For a given data layer, a mappping product would query the database for the specific product id's(layer's) most recent time entry. The database would return the timestamp which would then be appended to the layer name to then get access to the most recent file.

The database would return the timestamp which would then be appended to the layer name to then get access to the most recent file. The table column layout is as follows:

  row_id - Autoincrementing integer as the primary key
  row_entry_date - Date when the row was added to the table
  row_update_date - Date the row was modified last.
  product_id - Integer that specifies what product the row is for. These are described in the product_type table. Modis is id 1.
  pass_timestamp - the products date, the time the date was taken.
  filepath - the child path to the file. The user would have to know the parent directory, such as /home/xeniaprod/feeds/remotesensing, then the filepath would be appended to that to then get the requested file.
This script also feeds entries to the old schema database. In that database there is a table per remote sensing product.

Schedule: 30 minutes after the hour


MODIS RGB True Color

Script: get_usf_modis_rgb.sh

Directory: /home/xeniaprod/cron

Description: Gets the latest MODIS RGB file and updates the Xenia database on neptune. Also updates the old schema database to feed the static maps for SECOORA generated at Chapel Hill.

Shell Script Makeup

Script: get_latest_data.pl

Directory: /home/xeniaprod/scripts/postgresql/remotesensing/modis_rgb

Command line parameters:

  None. 
  NOTE: There are a number of hardcoded settings in the script, such as working directorys and the URLs to connect to for the data.       
The following are the hardcoded working directories:

    my $scratch_dir = '/home/xeniaprod/tmp/remotesensing/usf/'.$layer_name;
    my $dest_dir    = '/home/xeniaprod/feeds/remotesensing/'.$layer_name;  
    my $dest_dir_2    = '/nautilus_usr2/maps/seacoos/data/usf/'.$layer_name;  
The following are the hardcoded URLS:

  @dir_urls = (
    "http://cyclops.marine.usf.edu/modis/level3/husf/fullpass/$currentYear/$yesterdayJulian/1km/pass/intermediate/",
    "http://cyclops.marine.usf.edu/modis/level3/husf/fullpass/$currentYear/$yesterdayJulian/1km/pass/final/",
    "http://cyclops.marine.usf.edu/modis/level3/husf/fullpass/$currentYear/$todayJulian/1km/pass/intermediate/",
    "http://cyclops.marine.usf.edu/modis/level3/husf/fullpass/$currentYear/$todayJulian/1km/pass/final/"
    #'http://modis.marine.usf.edu/products/fullpass/rgb/'
  );
The $product_id variable is hardcoded to the value from the product_type table for the data we want. See the Description below.
The $psql_command is also hardcoded to connect to the database to import the SQL statement. Description: Gets the modis file and updates the Xenia database on neptune. The files are downloaded then the file name modified to the format of oi_sst_year_month_day_hour_minute.png then moved to the $dest_dir described above. This is done to have a consistent file naming scheme across the products. The database table used is "timestamp_lkp" which is a generalized table for the remote sensing products. For a given data layer, a mappping product would query the database for the specific product id's(layer's) most recent time entry. The database would return the timestamp which would then be appended to the layer name to then get access to the most recent file.

The database would return the timestamp which would then be appended to the layer name to then get access to the most recent file. The table column layout is as follows:

  row_id - Autoincrementing integer as the primary key
  row_entry_date - Date when the row was added to the table
  row_update_date - Date the row was modified last.
  product_id - Integer that specifies what product the row is for. These are described in the product_type table. Modis is id 1.
  pass_timestamp - the products date, the time the date was taken.
  filepath - the child path to the file. The user would have to know the parent directory, such as /home/xeniaprod/feeds/remotesensing, then the filepath would be appended to that to then get the requested file.
This script also feeds entries to the old schema database. In that database there is a table per remote sensing product.

Schedule: 35 minutes after the hour


Remote Sensing Cache Reseed

Script: rs_clean_reseed.sh

Directory: /home/xeniaprod/cron

Description: Uses the tilecache scripts tilecache_clean.py and tilecache_seed.py to flush the tile cache then reseed the primary zoom level(the view a user first sees on the map). Tilecaching the remote sensing layers allows us to serve up the layers quicker than constantly hitting mapserver.

NOTE: This jobs runs under the www-data cron tab since the incoming layer requests are handled by this users. Otherwise we'd run into file/directory permission issues.

If the tilecache root directory is changed in the tilecache.cfg file, the directory for the clean and seed scripts would need to be changed in the shell script as well. Currently the root directory is: /usr2/data/xeniaprod/tmp/mapserver_tmp/tilecache/

Command line parameters: None

Schedule: 30 minutes after the hour. We could refine this by having the recaching done whenever we actually get new remote sensing layers.


Hourly Observation Cache Reseed

Script: hourly_obs_clean_reseed.sh

Directory: /home/xeniaprod/cron

Description: Uses the tilecache scripts tilecache_clean.py and tilecache_seed.py to flush the tile cache then reseed the primary zoom level(the view a user first sees on the map). Tilecaching the hourly observation visualization layers allows us to serve up the layers quicker than constantly hitting mapserver. NOTE: This jobs runs under the www-data cron tab since the incoming layer requests are handled by this users. Otherwise we'd run into file/directory permission issues.

If the tilecache root directory is changed in the tilecache.cfg file, the directory for the clean and seed scripts would need to be changed in the shell script as well. Currently the root directory is: /usr2/data/xeniaprod/tmp/mapserver_tmp/tilecache/

Command line parameters: None

Schedule: 10&50 minutes after the hour.


HF Radar

mapfiles/wrappers


Machine: Squid

http://129.252.139.124/

Scripts: Squid


HF Radar

These scripts run once an hour to help aggregate and check uptime for the Secoora HF Radar stations. The final rendering of these hf radar netcdf files is currently handled by servers at NC-Chapel Hill.

directory /home/xeniaprod/scripts/radar

The directory itself contains scripts for getting Savannah HF radar(get_latest_listing.pl,get_latest_data.pl,convert_latest.pl,cronNotify.pl)

The Savannah script checks a remote http folder for the latest (downloaded as latest.txt) data file which gets converted into a seacoos/secoora netcdf grid file using a netcdf template/substitution file.

For Long Bay, SC, a similar set of scripts is under the 'seacoos2' folder.

For Tampa/West Florida Shelf(WFS), a similar set of scripts is under the 'tampa' folder.

For Miami, a similar set of scripts is under 'miami'.

For NC Outer banks, a similar set of scripts is under 'banks' - for 'banks' no data is collected or converted, just uptime/notification checks.


Machine: Nautilus

nautilus.baruch.sc.edu

Scripts Nautilus


CarolinasRCOOS(South Carolina - SUN2,CAP2,FRP2) Data feed

Jeff Jefferson's telemetry and server pushes the most recent files to /usr2/carocoops_data

There is a cron job which checks this folder every minute and processes these files to the following directory unchanged

/usr2/prod/buoys/processed

so if looking for files on a certain date, supply the wildcard(like /usr2/prod/buoys/processed/*20120319* ) for that date to see those records(there are many files in that directory so don't try to list all of them)

The two primary scripts which process those files are similar, the first processes the data to the old carocoops database on nautilus, the second changes the website pages for the old carocoops website at http://carocoops.org If there is a change in the telemetry feed in regards to the number or ordering of parameters, then these two scripts need to change accordingly. The telemetry feed lines are split into sections(like FCAT,WXPAK,etc) and the indexes from these inline headers should be adjusted as needed.

#add to database
/usr2/prod/buoys/processed/processBuoysDB.pl
#change webpage
/usr2/prod/buoys/processed/processBuoys.pl
#process netcdf
/usr2/prod/buoys/processed/process_netcdf.pl

Sign in to add a comment
Powered by Google Project Hosting