NACIS 2016 Presentation
Jo Ashley, OCUL Scholars Portal, University of Toronto Libraries
Amber Leahey, OCUL Scholars Portal, University of Toronto Libraries
The Ontario Council of University Libraries (OCUL) is a consortium of twenty-one university libraries in the province of Ontario, Canada that collaborates through collective purchasing and shared digital library infrastructure. OCUL's Scholars GeoPortal service (geo.scholarsportal.info) uses Esri software to provide a set of online tools for identifying, exploring, and downloading licensed geospatial datasets for academic research in Ontario. Since 2012, the usage and size of geospatial data collections housed and showcased in Scholars GeoPortal has grown significantly, with more than 220,000 site visits and over 140TB of data, resulting in a number of challenges. This session will introduce the GeoPortal's interface and discuss various data related issues and demands facing the current version of the geoportal, lessons learned, as well as future ideas and plans for continued success.
Call In girls Bhikaji Cama Place 🔝 ⇛8377877756 FULL Enjoy Delhi NCR
Where Do We Put It All? Lessons Learned Housing Large Geospatial Data Collections In OCUL's Scholars GeoPortal
1. Where Do We Put It All? Lessons
Learned Housing Large Geospatial Data
Collections In OCUL’s Scholars
GeoPortal
Jo Ashley
GIS Analyst, Scholars Portal
3. Context
Ontario is a large Province with diverse data needs
and interests represented by OCUL
4. Timeline of Development
2002
2002-
2007
2008 2009
2010-
2012
2012 2016
Founded
Established
services
Draft proposal
for GeoPortal
GeoVisioning
Workshop
OCUL Map
Group driving
force
Project grant
awarded for
GeoPortal
GeoPortal
development
Official launch
Going strong
1,500 datasets
120 TB’s and
counting…
5. Overview
• How do hardware & software requirements guide our
data loading?
• How do we prioritize, manage and load large amounts of
geospatial data?
• How can we improve our overall processing workflow?
• How do we make all the data loaded discoverable to the
user?
7. Hardware & software
Current
• Encountering
performance issues
related to size and shear
amount of data 10.0.
• Updating HD & software
system aims to reduce
these issues.
• Currently producing
clusters in architecture
Moving forward
• Will look into ArcGIS
Online & Portal for
ArcGIS if improvements
in performance don’t
improve… early days yet.
11. Vector loading challenges
• Balancing different needs across different
schools.
• Size of data and length of time to process.
• Popularity or value to research community.
• In future we will need to consider loading
researcher data (to comply with funder
mandates and archiving policies).
13. Raster (image services)
TIFF MrSID JPEG JPEG2000
Compression
Lossless (raw)
Lossy (JPEG)
Lossless or
Lossy
Lossy
(Lossless*)
Lossless or
Lossy
File size Usually largest
Small to
Moderate
Small
Small to
Moderate
16. Raster loading challenges
• Be mindful of imagery size, type (i.e.
orthos vs. DSM derivative), and storage
capacity (jpegs vs. tiffs)
• Consider loading larger data into the
Cloud (Ontario Library Research Cloud) to
reduce redundancies and facilitate
preservation.
20. Data discovery solutions
• Work with the OCUL community to
determine preferred functionality, portal
objective(s) and overall functionality
• Review usage statistics and let them
assist in future development
21. Lessons learned
• Imperative to upgrade to ArcGIS 10.4 to
support continued growth of GeoPortal
• Must automate data loading process in
order to meet ongoing demands
• Work with OCUL community and analyze
usage stats to prioritize loading
• Continue to review and upgrade our
interface to improve data discovery
22. Thank You
Jo Ashley
GIS Analyst, Scholars Portal
jo@scholarsportal.info
Questions?
datagis@scholarsportal.info
Hinweis der Redaktion
Many thanks for that introduction Fritz, I’m really glad to be here, this is the title of my presentation…
Where do we put it all? Read it… let’s begin.
May I present to you a screen shot of the Scholars GeoPortal web application. This service allows students, staff and faculty @ ON Universities to discover, manipulate, and download a wide range of geospatial datasets.
This is a collaborative project, involving participants from libraries across the province, the GeoPortal presents consortially licensed data collections to the academic community, and has been in existence since 2012.
At this stage in our development we are dealing with some issues related to managing & showcasing our large amounts of data. One main driver from this is that we are currently making a significant upgrade to the back-end of the portal which is allowing us to determine areas of improvement for future development.
We are right in the middle of things, early days as they say, but we thought it would be beneficial to share these issues with the Cartographic communities like yourselves NACIS.
To provide a little context; on the left we have the State of Colorado super imposed within the Province of Ontario and on the right the Province of Ontario showing all the 21 Universities that OCUL services (OCUL being the Ontario Council of University Libraries); hopefully gives you a sense that Ontario is a large Province with diverse data needs and interests represented by OCUL and in turn serviced by the Scholars GeoPortal.
Here is a quick timeline slide of the development of the Scholars GeoPortal.
From its inception; the original draft proposal in 2008 to the 2 years of development in 2010-2012, a successful launch in 2012 to present…
Scholars GeoPortal is a well established service now with 1500 datasets representing 120 TBs of data and continues to grow…
Here are the overriding points I will share during this presentation.
So let’s jump right in to the back end architecture of the GeoPortal:
The GeoPortal is not an Esri ArcGIS out of the box application; this is a highly customized application with many inter-related components which include Esri, JavaScript, a Metadata editor, Databases, Authentication and User Accounts
This is our set-up as it pertains to an Esri ArcGIS Server 10.0 environment; which has served us well until now.
The shear size and amount of geospatial data in the portal is starting to produce some issues in our ability to maintain our current load (servers acting up…), ArcGIS 10.0 will be obsolete soon, and we need to migrate to 10.4 for the continued sustainability of the GeoPortal.
We are also looking into other options like ArcGIS Online and Portal for ArcGIS for continued sustainability of the GeoPortal for years to come.
To sum up in the area of hardware/software we are currently…
Just trying to migrate all the services we have is causing problems…
Less server disruptions, smoother requests from user server and back…
The number and size of our datasets have drastically increased in the last two years.
This years requested collections will put us at double the number of datasets to date. We have a conservative estimate of projected growth being around 10TB per year.
The different types of data we do house in the GeoPortal include Vector and Raster, Government, Private, & Open Data, consortially licensed vs. publicly available
What are the loading challenges for vector & raster data?
Let’s first have a look at our vector datasets… here is a visual sampling of some standard layers…
Route logistics showcased via our annual DMTI collection (top left)
One of our local collections; topographic layers representing the City of Guelph (top right)
DLI Statistics Canada urban built-up areas for three Census years (bottom left) updates for these occur every Census year (that being 5 years)
& one of our land use related datasets from the Ministry of Natural Resources (OGDE) (bottom right) (This collection represents around 300 datasets and gets updated on a regular but seemingly random basis)
For all of these each dataset is prepared into a single map service uploaded to the GeoPortal (i.e. DMTI annual is over 100 data layers/services)
We also have a few very large vector collections that consist of many vector layers covering large geographic areas; an example of this is our Ontario Base Map digital data collection which has over 1600 tiles with multiple vector Topographic layers for each tile.
It is not efficient to produce these as individual services (layers that you can see). We dealt with this issue by producing an index that shows the availability of the OBM collection.
A user is able to select an area of interest and download a pre-packaged zip of all the relevant data (all layers for the tiles selected) rather than trying to sift through search results on an individual data layer basis.
An illustration of this process is shown in this slide:
Top left: search results for the OBM data with the index added to the map view (note the detailed metadata information given in the search results, I really like this part about the portal, no one like producing metadata but it’s really nice to have it)
Top right: Zooming in to desired area with the categories of the dataset shown in the maps tab.
Bottom left: downloading an area of interest, the zip file results of my selection from the download tab.
My result extracted and multiple layers shown in ArcMap
Because we have so many datasets there are some factors we have to consider to prioritize what we load first.
Some of our challenges include:
- Schools want their local collections in our portal but we must also maintain our core dataset loading commitments (i.e. DMTI annual collection)
- Determining the best way to showcase a particular dataset in the portal; the visual representation of data vs. direct downloads only (pre-packaged zips available to users upon download) OGDE link to metadata/data off site…
Take into consideration the needs of course work and student research. Engineering & Planning students like AutoCAD DTM data, Geography & GIS students conducting temporal research will like our historical Census and historical route logistics data.
This is coming in future development BUT how will we package project & thesis research data so it is easily discoverable to the average user/student.
Our raster data collection is mostly from the Ministry of Natural Resources (OGDE) but in recent years we have begun to process data for other main collections i.e. DMTI, local datasets and some open data.
Here I would like to demonstrate with some examples the size and coverage of these datasets and how I feel this sets our portal apart from others… you can actually see this stuff, it’s not a direct download from an index map AND we are not Google Maps with the kind of server power they have….
AND it’s also because of this that we cannot just keep loading whatever we receive as we have hardware/software limits and must find efficient ways for a sustainable repository.
For our raster datasets, we usually prepared all data types for the same collection that we received (original imagery and any derivative products DSM DEM). Now we are realizing, to save some space, that we should probably just load the most efficient version to suit both user analysis and meet our portal data loading commitments.
After consultation and review of this chart we concluded that the JPEG file type is sufficient for user analysis after download while also being a small enough compression size that we can maintain it on our servers. If TIFFs or derivative products exist they can be requested.
Here is a sampling of a project we are currently working on.
Historical Topographic Digitization Project (HTDP)
Is a project whereby the OCUL libraries are coordinating the digitization and georeferencing of older topographic maps of Ontario 1906-1977.
We have just over 1000 maps showcasing two scales (25K & 63360K) with multiple editions/years represented for most map sheets. An example of this is shown in this slide.
Here’s a zoom in of the 1909 map sheet showing the care taken in the scanning/digitizing process. Showcasing the preservation nature of this project. Quite impressive.
How to display these. This project doesn’t fit into our usual raster/mosaic/image service workflow in that the schools have requested that this have a more traditional feel for discovery. Like, back in the day, when you would go to the hard copy index, find your area of interest (i.e. list of map sheets), then go to the map drawers and pull out each map to look at further. How do you translate that experience into a web-application?
So we can’t put them all in a mosaic as all the editions overlapping would be a bit confusing to the average user to discover.
Currently preparing a service for each map sheet and an have these connected in some way to an index map service (like the OBM example that I showed a moment ago) but enhanced with a visual popup of each tile accessed via the identify button and search results triggered by clicking a tile on the index. Still working on this one.
So our current raster loading challenges are:
Moving forward only load JPEG versions and have the others available upon request.
OCUL has recently developed a Cloud storage service () and this may turn out to be a great place to put the raster datasets that we don’t load into the portal (i.e. other data types and derivatives).
OK now let’s chat about my overall service workflow process.
Here is my basic workflow for both our main data types vector and raster:
So in very general terms we have…
vector mxd map service (layer file, custom thumbnail, pre-packaged zip for download all option)
raster mosaic image service (just a layer file and custom thumbnail for this one)
This process is quit manual and involves many checks and subareas within each section outlined here. This could take up a whole afternoon going over all the details
Areas with an asterisk note significant bottlenecks, redundancies that have been identified
For our map service process we have 4 streams where data is copied and significant portions of the process are repeated (caching, dev, production1 & production2)
For our image services it’s a little less with 3 (mosaic, dev, production) which is more tolerable but could still be improved.
The big take home here is the longer the process with all it’s repetitive tasks, the longer it takes for the datasets to be available in the GeoPortal. A mostly manual process is just not acceptable anymore.
We are hoping to reduce or remove these redundancies when the migration is fully developed AND with the incorporation of automated scripts to reduce error and time in the overall process.
Currently in the overall data production process we have…
Explain currently we have a dev, caching, prod1 & prod2 stream in our 10.0 server architecture. In 10.4 there will just be a dev and prod stream.
This relates to my overall QA/QC process workflow, less manual checks reduced error, saves a lot of time (i.e. pre-packaged zips for our download all option)
And lastly but certainly not the least; we must always continuously review and revise our GeoPortal interface as the objectives and functionality needs for this web application change over time.
Can the average user discover data available to them via the GeoPortal, easily and efficiency.
The two main areas we are working to improve throughout our migration are:
Search results and navigation.
Currently we have dealt with annual collection layers by grouping them in a series/aggregated record (top left) with a top end metadata description (top right) followed by a listing of all related layers, some that you can add to the portal and older versions that you can directly download (bottom)
Review of functions.
Do we really need both print & export options, a share link & permalink?
Does anyone really use the annotations section and/or data table functionality to work with/further discover data before downloading?
So moving forward we would like to…
Through workshops, meetings and surveys.
We have collected usage statistics from our 10.0 environment and will continue with the 10.4 setup. These will all be combined and analyzed to identify areas for improvement.
In hindsight it would likely have been less shocking if we had upgraded more regularly so we wouldn’t be dealing with such a large change in the back end BUT time, money and other business priorities, what I like to call life, do get in the way of these things sometimes.
Can we utilize multiple layers in a single map service to cut down on server loads. Given our customization so far (back-end and the fact that we cache all our map services) we have not been able to take advantage of this… yet.
Thanks for listening to my presentation, any questions?
Any feedback greatly appreciated