COMPADRE at Berlin’s csv,conf.

One of the COMPADRE/COMADRE core committee members (Owen Jones) recently attended the “csv,conf” in Berlin. This one-day conference was a fringe event of the bigger Open Knowledge Festival and was about data – it was for those who collect or aggregate it, those who make it available online and those who analyse and visualise it.

There were a heap of interesting talks – for example, Felienne Hermanns spoke on why we should treat spreadsheets like Excel (which we use as our COMPADRE data entry platform) as a kind code, and employ the techniques of good coding practice to them (e.g. by building in error checking and validation at each step). Another was Karthik Ram‘s talk about a new package for R called testdat, which will be a useful tool to validate our COMPADRE/COMADRE metadata. For example, it can help identify outliers that could represent date entry errors, and things like non-numeric entries in numeric columns etc.

I gave a short talk about the COMPADRE and COMADRE population matrix databases. I covered some of the history of the databases and highlighted why these kinds of data are so important. I also highlighted some of the issues we have had to deal with along the way – one of these is how to handle data entry and error checking/validation on what are fast becoming large and unwieldy spreadsheets. We need to balance the need for a cheap, easy-to-use tool, with the need to have a robust error-free output.

Excel is great because it is already familiar to the COMPADRINOs* and has an easy learning curve. On the other hand, the fact that it is not always “what-you-see-is-what-you-get” means that errors can creep in unnoticed. For example, a number can be registered as a text string so that 0.00 is recorded as being different than 0, or sometimes as a date — very frustrating!

Fortunately, we do not distribute COMPADRE/COMADRE data in its “raw” Excel form – we save them out as CSV files and then combine them into an structured RData list object. While doing this, Rob Salguero-Gomez and I, the supervisors of COMPADRE and COMADRE, have developed routines to carry out a range error checks and validations for all the metadata and matrices allowing us to identify and correct any errors and inconsistencies before data distribution**.

Here’s the abstract for the talk —

Evolutionary biologists aim to make sense of population behaviour in species across the tree of life. However, the collection of animal and plant population data is laborious and costly so analyses that try to generalise across many species are not feasible unless data are shared among researchers, or obtained from the literature. I will report on the 30+ year journey of construction of two databases that collate demographic data from published literature on more than 2000 species with an aim of making it openly available to all. I will briefly outline why these data are important, describe the process of data production, and contemplate the lessons learned along the way.

Unfortunately the talk wasn’t recorded, but you can find the slides for it here at Figshare.

*The COMPADRINOS are the wonderful team of students based at the MPIDR in Rostock that do the data acquisition and data entry work for the databases.

**No doubt some errors will still creep in – please let us know if you spot any (compadre-contact AT demog DOT mpg DOT de)

Advertisements

COMPADRE Posters

The COMPADRE Plant Matrix Database is an international enterprise. The database contains globally distributed data, and has an international committee with representatives from all corners of the globe.

To help promote the use of COMPADRE we have produced a series of posters in several languages. So far we have them available in English, Spanish, French , German, Japanese,  Italian and Chinese. Get them here and feel free to print and distribute them!

A selection of the COMPADRE Plant Matrix Database posters. These posters describe the database and advertise that it is available for anyone via the web.

A selection of the COMPADRE Plant Matrix Database posters. These posters describe the database and advertise that it is available for anyone via the web.

What is COMPADRE?

The COMPADRE Plant Matrix Database is a repository containing demographic information on hundreds of plant species. It is a long-term enterprise initiated in 1989 by Jonathan Silvertown and Miguel Franco and is currently supported and hosted at the Max Planck Institute for Demographic Research (Rostock, Germany). The COMPADRE team consists of three sub-teams: a core committee, a science committee and a digitisation team. You can find out more about the database and its history at the main COMPADRE website.

Our goal is to make publicly available the demographic knowledge based on population projection matrices of species in the plant kingdom, and to facilitate its usage for scientific and teaching purposes.

COMPADRE is an open-access database – we only request users to register and login prior to accessing data. This is simply so we can keep a track on how much the database is getting used. This knowledge will be extremely useful for us as we try to secure funding to maintain and grow COMPADRE over the coming years.