Placing Surface Water Assessment Tool (SWAT) on DEP Server for Internal and External Use
Description
The goal of this project is to test functionality of SWAT on an internal DEP server. Although this will not allow for any external use, it will permit testing of the application for eventual public dissemination. In addition, placing SWAT on a server will allow DEP staff to immediately access SWAT without wasting time waiting for data frames to pre-load. This time-saver is particularly timely, as staff have just finished the latest biennial Integrated Report and are beginning to use SWAT for the next cycle of assessment and reporting. This project will further test whether the DEP can use DEP-hosted servers for eventual public dissemination of SWAT, or if other resources are needed.
Project Justification
The NJDEP biennially assesses the quality of all surface waters in the State. To accomplish this task, staff aggregate millions of data points collected from various organizations and processes this data to assess whether quality standards are met. The entirety of this analysis – which is both a Clean Water Act and a USEPA Priority Partnership Grant requirement – is published as an Integrated Report, which is submitted to the EPA and is made publicly available. As such, there has been substantial push to make the contents of this report interactive and accessible to the public through interactive web pages and ESRI StoryMaps. It is the goal of the DEP to make both the data analysis and the results of this report as transparent as possible to increase stakeholder engagement and understanding.
Attachments
Activity
Show:
Idea
Add watchers
Details
Sponsoring Leadership Area
Water Resource Management
Sponsoring Leadership Area's Priority
AP-17
Program Area Lead(s)
Aaron Stoler
DOIT technical lead(s)
None
All Involved Leadership Areas
Div. of Information TechnologyWater Resource Management
Created: 3 May 2022, 17:06
Updated:
14 April 2023, 11:52
@Knute Jensen A new lead had been identified for this project.
May be de-prioritized
Will the system be used by only internal (State) users or will external access be needed? Both Internal and External. It would be nice to have the option at some point in the future to open it up for external users (vendors) to have limited access. The applications are currently supported internally by the Division, but it has been a struggle to keep them up to date with staff overturn. We’re beginning to consider whether having a contractor support the product for us is a better option. If a contractor were to take some of these projects over, and there are concerns with security issues, we would need someone from the Division to be able to transfer the projects developed by the contractor and execute them. Examples of projects are to share model visualizations, with input/output data, model calibration visualizations, projected scenarios etc.
What is the overlap (if any) between the products produced in R/Shiny and the standard enterprise reporting/dashboarding products (Business Objects/Tableau), especially in the case of external access given that DEP/OIT has made a significant investment in Tableau Public and the recent legislation requires us to standardize the methods of operation for interacting with State data to the extent possible. One example is the application used for the statewide assessment that generates the list of impaired waters (the 303(d) list) and the Water Quality Assessment Report (Integrated Report or the 305(b) report). These products are requirements of the Federal Clean Water Act. The Integrated Report Application is a customized data processing/assessment/visualization tool that runs via thousands of lines of code. It’s developed to be a self-contained application that meets the program’s needs from start to finish for assessing data throughout NJ’s waters. By using R/Shiny, the program is keeping the Integrated Report application in its native environment. Not that Business Objects/Tableau can’t be used, but it would require development time to identify source output tables and files generated via the current Integrated report tool so that (Business Objects/Tableau) could read those source datasets to create the equivalent visuals. Using RShiny seemed to be the path of least resistance with the least amount of potential points of failure. Other examples are applications which includes simple/complex calculations based on formulas that allows the user to analyze water quality data (either ambient or wastewater discharge) and compare to the surface water (such as the freshwater ammonia, recreational criteria etc.) or ground water quality criteria. Another example is to incorporate automated trend analysis in the water quality data visualizations.
What is the estimated volume of data that will need to be hosted on the Shiny server? Currently there are about 3-4 million discrete and 1-2 million continuous data and growing (not including the glider data which has about 100 million records and growing). An example of a smaller application is http://bguhaapps.shinyapps.io/ShinyApp
What is the projected growth rate of this data over time? Not including the glider data, the integrated report application referenced earlier is large, about 11.2 GB for 2022, of which about 5 GB includes libraries/scripts and 5 GB data. Anticipate that every two years about 0.5 to 1 GB of additional data will be added. The other application referenced above (the ammonia application with the RShiny link) is about 39 MB, which is also considered to be large by RShiny.
How many (and which) DEP employees will need the ability to upload dashboards/apps to the Shiny server? Biswarup Guha, Vincent Mina, Huang-Lindabury, Claire [DEP] Claire.Huang-Lindabury@dep.nj.gov, and another staff (undecided until the hiring is complete).
Is chrooted FTP access to the application deployment folder sufficient for deployment? Yes, all the data is stored in subfolders/subdirectories, and if needed can be placed in one directory. Note: There may be webservice calls which allow connecting to external websites and ESRI base maps.
If not, what access does the program envision their users having at the OS level given that OIT will not allow program-area OS administrators.
Need strategy to loaded updated scripts, data files, applications. If Program is unable to do this, will need to make request to DOIT as needed.
Timeframe? There is no set timeframe to do this. But the program has been trying to have someone figure this out for the last several years, so some traction now would be good. Moreover, if the cloud-based deployment can be figured out, other programs will also benefit.
What would be the impacts of the system being offline for any given increment of time? While not optimal, if they system were to be offline for a day or so, we don’t see this as being a catastrophic event. But we wouldn’t want to see the system being offline for a week as that would reflect poorly on both the Department and Division.
Does it need to operate 24x7 or can it tolerate limited outages? An occasional outage will not be devastating to the program, but having as close to 100% reliability and uptime is certainly a goal. In addition, we don’t really expect that most users will try to access the data after work hours, assuming the working hours fall between 7 am and 7 pm.
Is the data being stored in the Shiny dashboards the only repository of that data? No, the data will reside on shared drive servers within the Department in addition to various source locations from where the original datasets are being pulled.
Does it need to be backed up as a unit or could it be reconstituted from original sources in a reasonable timeframe. It can be reconstituted from original sources if needed. The application ingest various data files and performs a series of assessments/analysis on the data. So long as we have the source files stored away safely somewhere, the application can reprocess all the source data over and over.
The simplest/most straightforward path would be to create a Linux VM on either AWS or Azure and install the Shiny server package on it. We would need to answer some questions from the product owner first:
Will the system be used by only internal (State) users or will external access be needed?
What is the overlap (if any) between the products produced in R/Shiny and the standard enterprise reporting/dashboarding products (Business Objects/Tableau), especially in the case of external access given that DEP/OIT has made a significant investment in Tableau Public and the recent legislation requires us to standardize the methods of operation for interacting with State data to the extent possible.
What is the estimated volume of data that will need to be hosted on the Shiny server? What is the projected growth rate of this data over time?
How many (and which) DEP employees will need the ability to upload dashboards/apps to the Shiny server?
Is chrooted FTP access to the application deployment folder sufficient for deployment? If not, what access does the program envision their users having at the OS level given that OIT will not allow program-area OS administrators.
Timeframe?
Fault tolerance requirements?
Consider putting program directly in touch with AWS leads? Not a huge $$$ issue, but may go quickly. Kurt describes an easy RShiny implementation through OIT, but paid for by CAR, not clear how to apply the Water money to this. Doing this through RShiny we could use Water money directly to them via procurement.