Biological sciences

The Access to Biological Collections Data (ABCD) Schema is an evolving comprehensive standard for the access to and exchange of data about specimens and observations (a.k.a. primary biodiversity data). The ABCD Schema attempts to be comprehensive and highly structured, supporting data from a wide variety of databases. It is compatible with several existing data standards. Parallel structures exist so that either (or both) atomised data and free-text can be accommodated.

The ABCD Schema was ratified as a standard by the Biodiversity Information Standards Taxonomic Databases Working Group (TDWG) in 2005. It was developed as a community-driven effort, with contributions from CODATA, BioCASE and GBIF among other organizations.

An extension of the ABCD standard for DNA data.

A simple and intuitive way to organize and describe your neuroimaging and behavioral data.
CIMR was developed by the Metabolomics Standards Initiative (MSI) to specify guidelines for the minimum information to include when reporting metabolomics work. It was developed in textual form, but work is underway to develop a data model, exchange format and ontology for expressing the information in machine-actionable form.

A study-data oriented model, primarily in support of the ICAT data managment infrastructure software. The CSMD is designed to support data collected within a large-scale facility’s scientific workflow; however the model is also designed to be generic across scientific disciplines.

Sponsored by the Science and Technologies Facilities Council, the latest full specification available is v 4.0, from 2013.

An early metadata initiative from the Earth sciences community, intended for the description of scientific data sets. It includes elements focusing on instruments that capture data, temporal and spatial characteristics of the data, and projects with which the dataset is associated. It is defined as a W3C XML Schema.

Sponsored by the Global Change Master Directory, the DIF Writer's Guide Version 6 is from November 2010.

A widely-used, but no longer current standard defining the information content for a set of digital geospatial data required by the US Federal Government.

CSDGM was sponsored by the US Federal Geographic Data Committee.  However, in September 2010 the FGDC endorsed ISO 19115 and began encouraging federal agencies to transition to ISO metadata.

Genome metadata on PATRIC consists of 61 different metadata fields, called attributes, which are organized into the following seven broad categories: Organism Info, Isolate Info, Host Info, Sequence Info, Phenotype Info, Project Info, and Others.

GIATE is a minimum information checklist for transparently reporting the purpose, methods and results of the therapeutic experiments. Resources are provided for compiling metadata records in spreadsheet form (GIATE-TAB), rather than using a machine-readable serialization, though machine readability is one of the stated objectives for the checklist.

The Investigation/Study/Assay (ISA) tab-delimited (TAB) format is a general purpose framework with which to collect and communicate complex metadata (i.e. sample characteristics, technologies used, type of measurements made) from 'omics-based' experiments employing a combination of technologies.

Created by core developers from the University of Oxford, ISA-TAB v1.0 was released in November 2008.

An extension of ISA-TAB specifying the format for representing and sharing information about nanomaterials, small molecules and biological specimens along with their assay characterization data.

The MIBBI Project was an international collaboration seeking to harmonize the efforts of the various bioscience communities developing Minimum Information (MI) reporting guidelines or checklists. Approximately 40 such checklists registered with the project.

The MIBBI Foundry was an attempt to identify common features of the various MI checklists and codify them into modules. The aim was to evolve the existing checklists towards formal intercompatibility, and to enable new checklists to be produced by selecting and extending the available modules.

The concept was realized initially through the joint efforts of the Proteomics Standards Initiative, the Genomic Standards Consortium and the MGED RSBI Working Groups.

While the MIBBI Foundry did not develop to the point where it could become a true, technical parent standard for the MI checklists, the MIBBI Project provided a useful grouping of standards that shared a common purpose, philosophy and inspiration.

A list of nearly 40 Minimum Information standards projects registered with the MIBBI initiative.

MIxS is a superset of metadata elements that can be used to compile minimum information checklists for reporting sequencing data. It was developed by the Genomic Standards Consortium (GSC) as an overarching framework that could act as a single entry point for all their minimum information checklists (as reported in Nature Biotechnology).

MIxS includes the technology-specific checklists from the previous MIGS and MIMS standards (for genomes and metagenomes respectively), provides a way of introducing additional checklists such as MIMARKS (for marker sequences), and also allows annotation of sample data using environmental packages.

Observ-OM is founded on four basic concepts to represent any kind of observation: Targets, Features, Protocols (and their Applications), and Values. It is intended to lower the barrier for future data sharing and facilitate integrated search across panels and species. All models, formats, documentation, and software are available for free and open source (LGPLv3) at http://www.observ-om.org.

A specification of how to embed OME-XML metadata within a TIFF or BigTIFF image file.

OME-XML is a vendor-neutral file format for biological image data, with an emphasis on metadata supporting light microscopy. It can be used as a data file format in its own right, or as a way of encoding metadata within a TIFF or BigTIFF file (for which purpose there is the OME-TIFF specification).

The standard is maintained by the Open Microscopy Environment Consortium, and was last updated in June 2012.

Protein Data Bank archive (PDB) is the single worldwide archival repository of information about the 3D structures of proteins, nucleic acids, and complex assemblies, managed by the Worldwide PDB (wwPDB). The PDB Exchange Dictionary (PDBx) is used by the wwPDB to define data content for deposition, annotation and archiving of PDB entries. PDBx incorporates the community standard metadata representation, the Macromolecular Crystallographic Information Framework (mmCIF), orginally developed under the auspices of the International Union of Crystallography (IUCr). PDBx has been extended by the wwPDB to include descriptions of other experimental methods that produce 3D macromolecular structure models such as Nuclear Magnetic Resonance Spectroscopy, 3D Electron Microscopy and Tomography.

Recommended Metadata for Biological Images (REMBI) provides guidelines for metadata for biological images to enable the FAIR sharing of scientific data. REMBI is the result of the bioimaging community coming together to develop metadata standards that describe the imaging data itself, together with supporting metadata such as those describing the biological study and sample.

Some repositories have decided that current standards do not fit their metadata needs, and so have created their own requirements.

An ISA-Tab-based standard for reporting the results of single nucleotide resolution nucleic acid structure mapping experiments.