Bibliometric & Big Data
The bibliometric & "big data" strand of the project aims to analyse the network of actors and institutions which were involved in sequencing the human, pig and yeast genomes.
This strand explores how the actors involved may have changed over time. The dynamics and changes of the networks are compared across the three different genome projects with “big data” analytics and social network analysis.
This involves understanding:
1) the collaborations across institutions;
2) identities of the key actors and their prominence, and;
3) the funding structures of sequencing work in each genome.
The networks are mapped based on data retrieved from servers/APIs of a major sequence database (European Nucleotide Archive) and literature and citation databases (European PubMed Central and SCOPUS). Techniques and challenges of working with data science and “big data” in this context are carefully considered.
The study identifies sequence submission records of the human genome from 1985 to 2005, the yeast (1980-2000), and the pig (1990-2015). The number of records range from 18,000 to 10 million. In addition, by developing an automated process in programming environment R, we collect data on authors affiliation, citation network, funding and submission information where available.
This involves aggregating millions of sequence submission and publication records, in order to build a fuller picture of who the actors involved are and how they collaborated. This strand demonstrates the potential of combining digital research methods and social network analysis for research in Science and Technology Studies.
This helps address important questions regarding how the sequencing initiatives and translation practices developed and evolved. According to initial results, the genome initiatives went through significant transitions, particularly in the human strand. It began as a “bottom-up” approach, which was characterised by decentralised efforts and involved diverse actors and institutions who conducted work on specific, small parts of the genome. In this approach, the medical use of sequence information was a major drive for sequencing efforts. However, from the 1990s onwards, a “top-down” approach emerged and sequencing efforts became more centralised. There was an expansion of large-scale sequencing centres, which had advanced sequencing technologies and became key players in the field. The sequencing of the whole genome became the main agenda, regardless of the immediate medical usability of the collected information.
This strand of the project offers a novel comparison among genomic initiatives of three species, and provides quantitative data to interrogate how close the developments of other initiatives are with the narrative of the human genome.
This strand of the project is conducted by Mark T. O. Wong, University of Edinburgh. For more information about the bibliometric/“big data” strand, email Mark Wong at Tsun.On.Wong@ed.ac.uk.