Science

The Access to Biological Collections Data (ABCD) Schema is an evolving comprehensive standard for the access to and exchange of data about specimens and observations (a.k.a. primary biodiversity data). The ABCD Schema attempts to be comprehensive and highly structured, supporting data from a wide variety of databases. It is compatible with several existing data standards. Parallel structures exist so that either (or both) atomised data and free-text can be accommodated.

The ABCD Schema was ratified as a standard by the Biodiversity Information Standards Taxonomic Databases Working Group (TDWG) in 2005. It was developed as a community-driven effort, with contributions from CODATA, BioCASE and GBIF among other organizations.

ABCD Zoology is an application profile of ABCD tailored for use in zoological contexts. It was the first official application profile to use the RDF-based version 3.0 of ABCD.

An extension of the ABCD standard for DNA data.

An extension of the ABCD standard for Geosciences data.

A profile of ISO 19115, also mapping to the AGLS profile of Dublin Core, designed to facilitate efficient access to descriptions of information resources, particularly geographic or spatial data.

Darwin Core documentation and recommendations for herbaria.

The AVM scheme supports the cross-searching of collections of print-ready and screen-ready astronomical imagery rendered from telescopic observations (also known as ‘pretty pictures’). The scheme is compatible with the Adobe XMP specification, so the metadata can be embedded within common image formats such as JPEG, TIFF and PNG.

Such images can combine data acquired at different wavebands and from different observatories. While the primary intent is to cover data-derived astronomical images, there are broader uses as well. Specifically, the most general subset of this schema is also appropriate for describing artwork and illustrations of astronomical subject matter.

AVM is a proposed recommendation of the International Virtual Observatory Alliance and was last updated in 2011.

A simple and intuitive way to organize and describe your neuroimaging and behavioral data.
Describes the format of the generic metadata artifacts—the templates, elements, and instances—that make up the CEDAR metadata framework and allow for exchange of the metadata artifacts with external systems.

The CF standard was originally framed as a standard for data written in netCDF format, with model-generated climate forecast data particularly in mind. However, it is equally applicable to observational datasets, and can be used to describe other formats. It is a standard for “use metadata” that aims both to distinguish quantities (such as physical description, units, and prior processing) and to locate the data in space–time.

Sponsored by the NetCDF Climate and Forecast Metadata Convention, the current version dates from December 2011.

A well-established standard file structure for the archiving and distribution of crystallographic information, CIF is in regular use for reporting crystal structure determinations to Acta Crystallographica and other journals.

Sponsored by the International Union of Crystallography, the current standard dates from 1997. As of July 2011, a new version of the CIF standard is under consideration.

The Common Information Model (CIM) describes climate data, the models and software from which they derive, the geographic grids used to calculate and project them, and the experimental processes (typically simulations) that produced them.

The CIM was originally developed by the EU-funded Metafor Project. It is now maintained and developed by Earth Science Documentation (ES-DOC). The latest release dates from 2014.

Developed by the Cooperative Ocean-Atmosphere Research Data Service (COARDS), these conventions constitute a standard set of metadata to include in netCDF files, allowing them to be shared and interchanged.

The COARDS Conventions are generalized and extended by the CF (Climate and Forecast) Metadata Conventions.

CRMsci is the extension for scientific observation and is specifically designed to support the documentation and integration of scientific data. It provides a structured framework for capturing and describing scientific observations, measurements, and experiments in various domains such as environmental studies, natural sciences, and biodiversity research.

The National Oceanographic Data Centre's required format for reporting on cruises or field experiments at sea, formulated using tags from the ISO19115 metadata standard.

An extension to the FGDC/CSDGM metadata standard providing a common terminology and set of definitions for documenting geospatial data obtained by remote sensing.

A study-data oriented model, primarily in support of the ICAT data managment infrastructure software. The CSMD is designed to support data collected within a large-scale facility’s scientific workflow; however the model is also designed to be generic across scientific disciplines.

Sponsored by the Science and Technologies Facilities Council, the latest full specification available is v 4.0, from 2013.

A body of standards, including a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing reference definitions, examples, and commentaries.

Sponsored by Biodiversity Information Standards (TWDG), the current standard was last modified in October 2009.

A protocol-independent XML schema for a geospatial extension to the Darwin Core.

A widely used, international standard for describing data from the social, behavioral, and economic sciences. Two versions of the standard are currently maintained in parallel:

  • DDI Codebook (or DDI version 2) is the simpler of the two, and intended for documenting simple survey data for exchange or archiving. Version 2.5 was released in January 2012.
  • DDI Lifecycle (or DDI version 3) is richer and may be used to document datasets at each stage of their lifecycle from conceptualization through to publication and reuse. It is modular and extensible. Version 3.2 was published in March 2014.

Both versions are XML-based and defined using XML Schemas. They were developed and are maintained by the DDI Alliance.

An early metadata initiative from the Earth sciences community, intended for the description of scientific data sets. It includes elements focusing on instruments that capture data, temporal and spatial characteristics of the data, and projects with which the dataset is associated. It is defined as a W3C XML Schema.

Sponsored by the Global Change Master Directory, the DIF Writer's Guide Version 6 is from November 2010.

An extension to the Darwin Core standard, it includes additional terms required to describe plant genetic resources and in particular germplasm seed samples.

A Dublin Core Metadata Application Profile created for the eBank UK project, which provides access to the detailed results of scientific experiments in crystallography.

The European Directory of Marine Environmental Datasets metadata scheme, which is a profile of ISO 19115.

Ecological Metadata Language (EML) is a metadata specification particularly developed for the ecology discipline. It is based on prior work done by the Ecological Society of America and associated efforts (Michener et al., 1997, Ecological Applications).

A widely-used, but no longer current standard defining the information content for a set of digital geospatial data required by the US Federal Government.

CSDGM was sponsored by the US Federal Geographic Data Committee.  However, in September 2010 the FGDC endorsed ISO 19115 and began encouraging federal agencies to transition to ISO metadata.

A profile of the FGDC/CSDGM metadata standard, intended to support the collection and processing of biological data.

FHIR is a set of standards for the exchange of healthcare information and data. It defines metadata schemas for describing various entities relevant to healthcare – such as patients, procedures, and clinical reasoning – as well as protocols for exchanging data and metadata records between systems. The information may be serialized as XML, JSON, ND-JSON, or RDF/Turtle.

The Infrastructure Package was approved as standard ANSI/HL7 FHIR® R4 INFRASTRUCTURE R1-2019.

FITS is an image data file format for encoding astronomical data. The WCS (World Coordinate System) conventions map elements in data arrays to standard physical coordinates in the sky. FITS has provisions for image metadata encoded in an ASCII header at the beginning of files.

An extension of FITS that enables data to be defined to specify physical, or world coordinates within each pixel in an image. The conventions were orignally proposed in 2002 then incorporated into the 3.0 release of the FITS standard.

Established by a global network of countries and organizations, GBIF is a web portal promoting and facilitating the mobilization, access, discovery and use of biodiversity data. The portal uses a profile of EML; a How-to Guide and Reference Guide for using the profile are available.

Genome metadata on PATRIC consists of 61 different metadata fields, called attributes, which are organized into the following seven broad categories: Organism Info, Isolate Info, Host Info, Sequence Info, Phenotype Info, Project Info, and Others.

An extension of SDMX used to exchange statistical data and metadata.

A reference framework that provides a common terminology acroos and between statistical organisations; aligns with DDI and SDMX.

An extension to ABCD 2.06, it is designed to allow the storage and transmission of herbarium plant specimen data.

A simulation extention to the SPASE data model.

A profile of ISO 19115:2003, adopted in 2007 as the common metadata standard for the Infrastructure for Spatial Information in the European Community (INSPIRE). The other profiles of ISO 19115 in use in European Member States have been made compliant with INSPIRE.

The technical specifications defined by the IVOA (International Virtual Observatory Alliance) enable interoperability between and the integration of astronomical archives across the world into an international virtual observatory. They include several data models that act as metadata schemas for particular data types: for example, photometry data, simulation data, space-time coordinates, spectral lines data, spectral data, observational data, and the physical parameter space of astronomical datasets.

These data models are under active development by the IVOA Data Modelling Working Group.

Additional recommendations have been made for metadata concepts and terms necessary for the discovery and the use of astronomical data collections and services.

The Investigation/Study/Assay (ISA) tab-delimited (TAB) format is a general purpose framework with which to collect and communicate complex metadata (i.e. sample characteristics, technologies used, type of measurements made) from 'omics-based' experiments employing a combination of technologies.

Created by core developers from the University of Oxford, ISA-TAB v1.0 was released in November 2008.

An extension of ISA-TAB specifying the format for representing and sharing information about nanomaterials, small molecules and biological specimens along with their assay characterization data.

An internationally-adopted schema for describing geographic information and services. It provides information about the identification, the extent, the quality, the spatial and temporal schema, spatial reference, and distribution of digital geographic data.

Sponsored by the International Standards Organisation, the first edition of ISO 19115 was published in 2003. It has since been split into parts: ISO 19115-1:2014 contains the fundamentals of the standard; ISO 19115-2:2009 contains extensions for imagery and gridded data; and ISO/TS 19115-3:2016 provides an XML schema implementation for the fundamental concepts compatible with ISO/TS 19138:2007 (Geographic Metadata XML, or GMD).

A common profile of ISO19115:2003 between the United States and Canada, designed to enhance interoperability of geographic information metadata in North America.

An extension of ISO 19115 defining the schema required for describing imagery and gridded data.

LIDO is an XML schema intended for delivering metadata, for use in a variety of online services, from an organization’s online collections database to portals of aggregated resources, as well as exposing, sharing, and connecting data on the web. Its strength lies in its ability to support the typical range of descriptive information about objects of material culture. It can be used for all kinds of objects, e.g., art, architecture, cultural history, history of technology, and natural history. LIDO supports multilingual application environments. Being an application of the CIDOC Conceptual Reference Model (CRM), LIDO is the result of a collaborative effort of international stakeholders in the museum sector, starting in 2008, to create a common solution for contributing cultural heritage content to portals and other repositories of aggregated resources. LIDO is maintained under the patronage of CIDOC - ICOM International Committee for Documentation.

A profile developed in accordance with ISO 19115 rules by the Australian Ocean Data Centre that supports the documentation and discovery of marine spatial datasets.

The MEDIN Discovery Metadata Standard is a marine profile of the UK government Standard GEMINI2 and also complies with other international conventions such as INSPIRE and ISO19115.

A common portal to a group of nearly 40 checklists of Minimum Information for various biological disciplines. The MIBBI Foundry is developing a cross-analysis of these guidelines to create an intercompatible, extensible community of standards.

The concept was realized initially through the joint efforts of the Proteomics Standards Initiative, the Genomic Standards Consortium and the MGED RSBI Working Groups. The latest project to register with MIBBI is the MIABie guidelines for reporting biofilm research, as of January 2012.

A list of nearly 40 Minimum Information standards projects registered with the MIBBI initiative.

MIxS currently consists of three separate checklists; MIGS for genomes, MIMS for metagenomes, and MIMARKS for marker genes. To create a single entry point to all minimum information checklists from the GSC and to the environmental packages, we created an overarching framework, the MIxS standard (publication in Nature Biotechnology). MIxS includes the technology-specific checklists from the previous MIGS and MIMS standards, provides a way of introducing additional checklists such as MIMARKS, and also allows annotation of sample data using environmental packages.

NeXus is an international standard for the storage and exchange of neutron, x-ray, and muon experiment data. The structure of NeXus files is extremely flexible, allowing the storage of both simple data sets, such as a single data array and its axes, and highly complex data and their associated metadata, such as measurements on a multi-component instrument or numerical simulations. NeXus is built on top of the container format HDF5, and adds domain-specific rules for organizing data within HDF5 files in addition to a dictionary of well-defined domain-specific field names.

The goal of these standards is to expose the rich content in aggregations of Web resources to applications that support authoring, deposit, exchange, visualization, reuse, and preservation. The standards support the changing nature of scholarship and scholarly communication, and the need for cyberinfrastructure to support that scholarship, with the intent to develop standards that generalize across all web-based information including the increasing popular social networks of “Web 2.0”.

Observ-OM is founded on four basic concepts to represent any kind of observation: Targets, Features, Protocols (and their Applications), and Values. It is intended to lower the barrier for future data sharing and facilitate integrated search across panels and species. All models, formats, documentation, and software are available for free and open source (LGPLv3) at http://www.observ-om.org.

This encoding is an essential dependency for the OGC Sensor Observation Service (SOS) Interface Standard. More specifically, this standard defines XML schemas for observations, and for features involved in sampling when making observations. These provide document models for the exchange of information describing observation acts and their results, both within and between different scientific and technical communities.

Open Data for Access and Mining (ODAM) Structural Metadata is a format describing how the metadata should be formatted and what should be included to ensure ODAM compliance for a data set. To comply with this format, two metadata files in TSV format are required in addition to the data file(s). These two files describe the metadata of the dataset, which includes descriptions of measures and structural metadata like references between tables. The metadata lets non-expert users explore and visualize your data. By making data interoperable and reusable by both humans and machines, it also encourages data dissemination according to FAIR principles. The structural metadata is specified in section 'Data collection and preparation' on the website.
ODM-XML is a data exchange standard, vendor-neutral, platform-independent suited for exchanging and archiving clinical and translational research data, along with their associated metadata, administrative data, reference data, and audit information. ODM-XML facilitates the regulatory-compliant acquisition, archival and exchange of metadata and data.

A specification of how to embed OME-XML metadata within a TIFF or BigTIFF image file.

OME-XML is a vendor-neutral file format for biological image data, with an emphasis on metadata supporting light microscopy. It can be used as a data file format in its own right, or as a way of encoding metadata within a TIFF or BigTIFF file (for which purpose there is the OME-TIFF specification).

The standard is maintained by the Open Microscopy Environment Consortium, and was last updated in June 2012.

OpenPMD provides naming and attribute conventions that allow the exchange of particle and mesh based data from scientific simulations and experiments. The primary goal is to define a minimal set/kernel of meta information that enables the sharing and exchange of data to achieve

  • portability between various applications and differing algorithms;
  • a unified open-access description for scientific data (publishing and archiving);
  • a unified description for post-processing, visualization and analysis.

OpenPMD suits any kind of hierarchical, self-describing data format, such as, but not limited to ADIOS1 (BP3), ADIOS2 (BP4), HDF5, JSON, and XML.

Protein Data Bank archive (PDB) is the single worldwide archival repository of information about the 3D structures of proteins, nucleic acids, and complex assemblies, managed by the Worldwide PDB (wwPDB). The PDB Exchange Dictionary (PDBx) is used by the wwPDB to define data content for deposition, annotation and archiving of PDB entries. PDBx incorporates the community standard metadata representation, the Macromolecular Crystallographic Information Framework (mmCIF), orginally developed under the auspices of the International Union of Crystallography (IUCr). PDBx has been extended by the wwPDB to include descriptions of other experimental methods that produce 3D macromolecular structure models such as Nuclear Magnetic Resonance Spectroscopy, 3D Electron Microscopy and Tomography.

Plasma-MDS is used to provide structured disciplinary metadata to data sets in the field of plasma science and technology. Its main aim is to facilitate the discovery and exchange of research data in this field.
A draft set of data elements required by the National Institues of Health (U.S.) for the submission of trial information to the CLincalTrials.gov registry and results database.

The standard provides a means to publish multi-dimensional data, such as statistics, on the web in such a way that it can be linked to related data sets and concepts using the W3C RDF (Resource Description Framework) standard. The model underpinning the Data Cube vocabulary is compatible with the cube model that underlies SDMX (Statistical Data and Metadata eXchange), an ISO standard for exchanging and sharing statistical data and metadata among organizations.

Recommended Metadata for Biological Images (REMBI) provides guidelines for metadata for biological images to enable the FAIR sharing of scientific data. REMBI is the result of the bioimaging community coming together to develop metadata standards that describe the imaging data itself, together with supporting metadata such as those describing the biological study and sample.

Some repositories have decided that current standards do not fit their metadata needs, and so have created their own requirements.

Defines metadata terms and concepts necessary for discovery and use of astronomical data collections and services.

The extension is based on Dublin Core, but with astronomy-specific extensions. Resource Metadata are collected in resource "registries" that are populated and synchronized using the OAI-PMH (Protocol for Metadata Handling). Version 1.12, March 2007. Developed and maintained by IVOA Resource Registry Working Group and NVO Metadata Working Group

The Standard for Documentation of Astronomical Catalogues is a set of conventions for archiving astronomical data. As well as path, filename and data format conventions, it also specifies how to construct a plain text description file for documenting the data files. It was developed as an alternative to FITS that would be more suited to archives, permit human inspection, and allow manipulation via standard Unix command-line tools.

SDAC was developed by CDS (Centre de Données astronomiques de Strasbourg). Version 2.0 is the most recent; it was released in February 2000.

A set of common technical and statistical standards and guidelines to be used for the efficient exchange and sharing of statistical data and metadata.

Sponsoring institutions include BIS, ECB, EUROSTAT, IMF, OECD, UN, and the World Bank. Technical Specification 2.1 was amended in May 2012.

Providing the format and content for describing data sets related to shoreline and other coastal data sets, this profile complies with the FGDC/CSDGM standard.

An ISA-Tab-based standard for reporting the results of single nucleotide resolution nucleic acid structure mapping experiments.

An information model for describing the elements of the heliophysics data environment, and a set of resource types which can be used to describe data along with its scientific context, source, provenance, content and location. It is designed to support a federated data system where data may reside at different locations and may be seperated from the metadata which describes it. The preferred expression form is XML.

The Space Physics Archive Search and Extract (SPASE) effort is implemented by the SPASE Consortium which is composed of representatives of the international Heliophysics data community. The Current Release of the data model (2.2.9) was updated in January 2018.

A profile of the CSMD model for Australian crystallographic data.

A profile of ISO 19115 designed to support the documentation and discovery of spatial datasets, dataset series and geo services within Higher and Further Education.

A specification for a set of metadata elements describing geospatial data resources for discovery purposes, based on ISO 19115.

A metadata standard for describing environmental monitoring activities, programmes, networks and facilities published by the UK Environmental Observation Framework (UKEOF).

This metadata scheme is a profile of ISO 19115 and ISO 19119 intended for use by the US Geoscience Information Network to describe a broad range of geoscience information resources. It provides guidance for the population of metadata documents to enable interoperability of catalog service clients with multiple servers conforming to this profile. It specifically targets serialization in XML compliant with ISO 19139.

The World Meteorological Organisation, WMO, has defined a restrictive subset of ISO19115 appropriate for global meteorogical use.