Bibliometric & Big Data

The bibliometric & "big data" strand of the project aims to analyse the network of actors and institutions which were involved in sequencing the human, pig and yeast genomes. 

This strand explores how the actors involved may have changed over time. The dynamics and changes of the networks are compared across the three different genome projects with “big data” analytics and social network analysis. 

This involves understanding: 

1) the collaborations across institutions;
2) identities of the key actors and their prominence, and; 
‚Äč3) the funding structures of sequencing work in each genome. 

The networks are mapped based on data retrieved from servers/APIs of a major sequence database (European Nucleotide Archive) and literature and citation databases (European PubMed Central and SCOPUS). Techniques and challenges of working with data science and “big data” in this context are carefully considered. 

The study identifies sequence submission records of the human genome from 1985 to 2005, the yeast (1980-2000), and the pig (1990-2015). The number of records range from 18,000 to 10 million. In addition, by developing an automated process in programming environment R, we collect data on authors affiliation, citation network, funding and submission information where available.

This involves aggregating millions of sequence submission and publication records, in order to build a fuller picture of who the actors involved are and how they collaborated. This strand demonstrates the potential of combining digital research methods and social network analysis for research in Science and Technology Studies.

This helps address important questions regarding how the sequencing initiatives and translation practices developed and evolved. According to initial results, the genome initiatives went through significant transitions, particularly in the human strand. It began as a “bottom-up” approach, which was characterised by decentralised efforts and involved diverse actors and institutions who conducted work on specific, small parts of the genome. In this approach, the medical use of sequence information was a major drive for sequencing efforts. However, from the 1990s onwards, a “top-down” approach emerged and sequencing efforts became more centralised. There was an expansion of large-scale sequencing centres, which had advanced sequencing technologies and became key players in the field. The sequencing of the whole genome became the main agenda, regardless of the immediate medical usability of the collected information. 

This strand of the project offers a novel comparison among genomic initiatives of three species, and provides quantitative data to interrogate how close the developments of other initiatives are with the narrative of the human genome.  

This strand of the project is conducted by Mark T. O. Wong, University of Edinburgh. For more information about the bibliometric/“big data” strand, email Mark Wong at

Latest Blog Posts

Exploring pig genetics in France

At the end of November, I followed in the footsteps of many of the pig geneticists whose work I have been researching, and visited the Jouy-en-Josas campus of the French Institut national de la recherche agronomique (INRA), just south of Paris. At there and other INRA sites in Paris itself, … (Read more)

Advisory Board Meeting and Second Phase of Project

First Advisory Board meeting The TRANSGENE team presented its first findings to an Advisory Board comprised by Stephen Hilgartner (Cornell University), Robert Bud (Science Museum, London), Michel Morange (University Paris 6 and Ecole Normale Superieure), Abigail Woods (King’s College, London) and Hans-Jörg Rheinberger (Max Planck Institute for the History of … (Read more)

The unusual pioneer of the Human Genome Project

When we think about prime-movers, proposing for the first time the idea of scientifically tackling the human genome, the usual suspects come to our mind: reputed biomedical Nobel Prize winners such as James Watson, Walter Gilbert or Renato Dulbecco, who formulated their grand idea in the front pages of Science, … (Read more)


…at the British Society for the History of Science conference Amongst the sizeable Edinburgh contingent at the annual British Society for the History of Science conference, this year held in York, were representatives from the TRANSGENE project team. Miguel García-Sancho presented the progress on the yeast strand of the project … (Read more)

June events

The TRANSGENE project operates within the Science, Technology and Innovation Studies (STIS) subject group at the University of Edinburgh. This multidisciplinary affiliation, and association with the Institute for the Study of Science, Technology and Innovation (ISSTI) research network, enables us to discuss our work with researchers that have a wide … (Read more)

Classifying people, practices and institutions

Just as classifications of species, genes, stages of development, or macromolecules can shape research in biology, classification in the humanities and social sciences can condition our analyses. In the pig strand of the project I’m working on, classifying people, practices and institutions is necessary. My aim is to explain, so … (Read more)

The yeast telomeres

Writing the history of the yeast genome project starting from the end, more precisely the chromosome ends, can be an instructive exercise. Chromosome ends (telomeres) are specialised structures essential for chromosome maintenance and genome stability. As yeast telomeres are similar in structure and function to the telomeres of the other … (Read more)

Collaborating with the European Bioinformatics Institute (EMBL-EBI)

In March 2017, I am conducting a three-week visiting post-doctoral fellowship at the European Bioinformatics Institute (EMBL-EBI, Cambridge UK), for the bibliometric and “big data” strand of the project. I am collaborating with the EBI’s Literature Services to advance the project’s aims of mapping institutional networks in genomic sequencing initiatives … (Read more)

The history of pig genome research enters the matrix

In the TRANSGENE project we are committed to using approaches from different disciplines to make sense of the historical material, and to generate new data from which to form a picture of the genomic research. One of the key approaches is the use of quantitative methods, imported from the social … (Read more)

Tracing the history of European biotechnology in the HAEU

My week-long visit to the Historical Archives of the European Union (HAEU) in Florence has proved to be a remarkable opportunity for investigating the ‘behind the scenes’ of European biotechnology policies in the 1980s and 1990s. During my visit in Florence I mainly examined documents available in the Gordon Adam’s … (Read more)

Scientific archives and the history of genomics

In November 2016 Miguel Garcia-Sancho, James Lowe and I attended the Workshop on Scientific Archives organised by Anne Flore-Laloë, archivist at the European Molecular Biology Laboratory (EMBL) in Heidelberg. The meeting brought together tens of archivists from Germany, France, Switzerland, UK, US and Canada. The workshop presentations focused on best … (Read more)