By: Dora Ann Lange Canhos of CRIA

SUMMARY:

• Effective on-line sharing depends not only on the will to share data, but also on ensuring accessible infrastructure for users and data providers.
• Independent herbaria contributing data for public sharing through the Virtual Herbarium Platform reported a variety of benefits – including increased recognition by their institutions, more involvement with graduate programs, more visits to the herbarium by specialists, increased holdings due to more collaboration opportunities and more grant opportunities due to greater visibility.
• Emerging findings suggest that open data sharing, in the case of herbaria in Brazil, has substantial positive benefits both for data users and providers.

– – –

 In 2001, CRIA (acronym in Portuguese for the Reference Centre on Environmental Information) began developing the speciesLink network with the aim of integrating data from biological collections in Brazil. Many technical and cultural barriers for open and free online data sharing had to be overcome in order to build an e-infrastructure of public interest. At the time, “open science,” largely meant publishing research results in journals, this way making findings accessible to one’s peers. Open and free access to data available on the Internet made researchers concerned about who was going to use the data and for what purpose. Open Access was a new concept and to many, meant losing control of their data. This represented a cultural barrier to overcome, in a Brazilian context.

In addition to these cultural barriers, there were also many technical and infrastructural barriers for achieving open access in the sciences – including poor internet connectivity, scarce human resources, and at the time, complex hurdles around interoperability between systems and platforms.

Since 2001, CRIA has learned that on-line data sharing does not only depend on the will to share data. It must be planned, have necessary resources, adequate human resources and compatible infrastructure. It must also use internationally accepted standards, protocols and a common vocabulary in order to be interoperable with other on-line systems. Besides being useful, data must also be usable. The format must be adequate, and its quality known.

To present our own experience as a case study, the speciesLink Platform was specifically designed to integrate data of different taxonomic groups from existing collections where each use different software and have varying communication infrastructure. An important premise was that each collection should have complete control over their data, such that all data policy was set at the data provider’s end. At the same time, all data indexed by the network would be openly and freely shared to all interested parties. Standards, protocols, and software had to be developed to guarantee that the complexity of data sharing was at the e-infrastructure’s end. This eventually allowed for the streamlining of data sharing at both the data provider and user’s end.

After a slow start, one can observe a consistent increase in the number of data providers and content over the last 13 years, as the speciesLink infrastructure has become more accessible and the idea of open data more popular in Brazil (Figure 1).

networkgrowth

Figure 1. Evolution of the number of data providers and data content of the speciesLink network

The products and outputs of the network throughout the years are substantial, particularly since 2009 when the Virtual Herbarium of Plants and Fungi project was approved as one of Brazil’s National Institutes of Science and Technology, receiving funds for six years. Today, The Virtual Herbarium shares more than 5.3 million records with close to a million associated images.

CRIA’s involvement with OCSDNet offered an opportunity to study the impact of Brazil’s Virtual Herbarium in e-Science. Through this project, the aim is to study the potential outcomes and implications for data providers that choose to share their data, and to more concretely identify who is using The Virtual Herbarium and why.

It is important to understand that herbaria in Brazil are geographically dispersed (figure 2) and, in the case of those participating in the Virtual Herbarium, 27% have less than ten thousand specimens in their holdings, 40% have between 10 and 50 thousand, 17% between 50 and 100 thousand, 9% between one hundred and 200 thousand and only 7% have more than 200 thousand specimens in their holdings.

networkGEO

Figure 2. Geographic location of the herbaria in Brazil that are part of the Virtual Herbarium

 During the first phase of the project, we have begun to analyse outcomes for data providers who have chosen to openly share their data through speciesLink. Interviews have been conducted with specific curators to identify possible outcomes. Based on the interviews, a questionnaire was prepared and sent to 99 Brazilian herbaria participating in the network, asking questions around their motivations and outcomes, as well as leaving space for free contributions. To date, 39 responses have been received.

In general, due to their participation in the Virtual Herbarium, data providers have indicated the following benefits:

Greater recognition by their institutions (82%);
increased their involvement with graduate programs (72%);
more visits by specialists to their herbaria; (85%)
increased their holdings, due to a greater involvement with research (graduate courses) and collaboration with other institutions (74%) and,
received more external grants. due to a greater visibility (49%)

The results are quite stimulating and indicate true commitment and interest in sharing data, not only due to agreements with funding agencies, but as something that brings direct benefits to independent herbariums. Recently, Brazil’s Virtual Herbarium also shared its data with the Global Biodiversity Information Facility (GBIF). This is an important step for positioning the network at an international level and possibly  enhancing opportunities for further collaboration and partnership.

Along with indicating possible outcomes of their involvement in the network, each herbaria highlighted their perception concerning the Strengths, Weaknesses, Opportunities, and Threats (SWOT Analysis) of participation in the network. All answers were tabulated and presented at a workshop and a new round of discussions was carried out. This material is currently being analyzed and will be presented in the near future.

The project will also work on the usage of data. Statistics show that 95% of Virtual Herbarium users are from Brazil and that in 2014, an average of 1.4 million records per day were recuperated from the system. These statistics already show the importance of developing local e-infrastructure. In the project’s next phase, we will begin to identify who is using this system and for what purposes.