One of the many hats I'm wearing is that of a Fellow of the Software Sustainability Institute. It's a group of people who care deeply about good quality research software, best practices in coding and recognition of software development as a worthwhile contribution to scientific understanding. More about the Institute's mission can be found on their website. Their slogan is simple:
One of the Software Sustainability Institute’s goals is the preservation and re-utilisation of research software and code. Many tools were originally invented to tackle specific scientific problems, but can potentially solve a wide range of "real-world" problems. Maybe you can even make software originally developed for bioinformatics part of your own data science pipeline!
A free Data Science workshop
There are now plenty of open data, frequently freed from previously closed governmental silos. For example, the Environmental Agency and MetOffice both allow anyone to tap into their vast data sources. This enables citizen science, and data hack-a-thons are now commonplace. However, just being able to deal with a tab-delimited file doesn’t make one a data scientist: Network inference, pattern recognition, classification and clustering - these are just some of the things that can take your open data project from quick hack to citizen science. Be advised that we're not religious about openness. If your data isn't open, but the result of your doing benefits the community and not just your own pockets, please feel free to join in!
Save the date! On
9am - 5pm
we are going to bring together six scientists from different fields of computational biology to the CodeNode headquarter of
Spend a day with data science experts and see how their domain-specific tools might be applicable to address your questions. Quiz each of them for their ideas what to do with your data!
_9:30am: Six short talks from 'the scientists' about their research and tools
10:15am: Participants' "60 second pitch" of their data science problem (aka brief introduction)
10:45am: Coffee break
11:00am: Meet a scientist - and discuss your problem in small groups (bring demo data, print outs, whatever it takes to explain your problem).
noon: Lunch. A free lunch!
_1:00pm: Meet a scientist (continued)
_2:00pm: The Software Sustainability Institute - what is is and what it does
_2:15pm: Participants' "short report" about how they think the discussions have helped them
_3:30pm: Coffee break
_3:45pm: Hands-on - BYO data.
Boris Adryan (University of Cambridge and thingslearn Ltd.): Boris is a Royal Society University Research Fellow and leads a research group with a focus on computational method development in genomics and biomedical applications. Having experience with the integration of vast amounts of heterogenous data, he was in a prime position to found thingslearn, a company that provides data science services with special focus on the Internet of Things. Since 2015, Boris is a Fellow of the Software Sustainability Institute.
Stephen Eglen (University of Cambridge): Stephen is a Lecturer for Computational Neuroscience; he builds computational models to simulate aspects of neuronal development. He has experience in time-series analysis and spatial statistics, and applies simple machine learning techniques to datasets in neuroscience. He is also a Fellow of the Software Sustainability Institute.
Enrico Ferrero (GlaxoSmithKline): Enrico's focus in pharmaceutical R&D is on the analysis and integration of genomics and transcriptomics datasets for novel target identification and validation strategies. He works across different therapeutic areas (respiratory, immunoinflammatory), analyses various types of data (trancriptomics, genetics, epigenetics, functional genomics) and applies a diverse set of computational methods (differential analysis, functional enrichment, pathways and networks analysis, drug – target interactions) to support the GSK pharma pipeline. He's interested in all aspects of data science and has substantial expertise in the analysis of high-throughput, multidimensional datasets and the application of unsupervised learning and dimension reduction algorithms.
Laurent Gatto (University of Cambridge): Laurent (@lgatt0 on Twitter, @lgatto on GitHub) is a computational biologists that searches for meaning in complex biological data and develops software to help others to do the same, in a reproducible and informed way. He is a Software Sustainability fellow, a Software Carpentry instructor and is very much into R. His main technical focus is machine learning method development.
Daniel Hebenstreit (University of Warwick): Daniel is Assistant Professor for Systems Biology. His research is focused on stochastic variation in biological systems and experimental methods related to this subject. He applies a combination of experimental molecular biology and mathematical modelling. His theoretical expertise includes bioinformatics, stochastic modelling, and Markov chain Monte Carlo sampling.
Mikhail Spivakov (BBSRC Babraham Institute): Michael develops and applies a broad range of data mining and machine learning techniques in his research, ranging from quite straightforward (significance testing, clustering and linear regression) to the more sophisticated ones (hidden Markov models, random forests and weighted correlation network analysis). He mainly deals with genomic data – currently some of the most challenging and complex types of data known to man. His interests are certainly not limited to biology, and he's very happy to help people from fields benefit from the power of big data.