OMB # 0925-0740

Expiration Date: 07/31/2022

Public reporting burden for this collection of information is estimated to average 10 minutes per response, including the time for reviewing instructions,

searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. An agency

may not conduct or sponsor, and a person is not required to respond to, a collection of information unless it displays a currently valid OMB control

number. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this

burden, to: NIH, Project Clearance Branch, 6705 Rockledge Drive, MSC 7974, Bethesda, MD 20892-7974, ATTN: PRA (0925-0740). Do not return

the completed form to this address.

NCBI is pleased to announce a Biomedical Data Science Codeathon in collaboration with Carnegie Mellon in Pittsburgh, PA on January 8-10, 2020.

We're specifically looking for folks who have experience in working with complex disease, precision medicine, and similar genomic analysis.  If this describes you, please apply! This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for large scale genomic analyses from high-throughput experiments. The event is open to anyone selected for the hackathon and willing to travel to CMU.

Potential topics include:

  • Virus Genome Graph tools
  • Image analysis pipelines
  • RNAseq pipelines
  • Cancer graph genomes
  • Complex Disease Analysis

Codeathon Logistics

The codeathon runs from 9 am – 5 pm each day, with an optional social event on the evening of the second day.  Working groups of five to six individuals, with various backgrounds and expertise, will be formed into five to eight teams with an experienced leader. These teams will build pipelines and tools to analyze large datasets within a cloud infrastructure. Throughout the three days, we will come together to discuss progress on each of the topics, bioinformatics best practices, coding styles, etc.

There will be no registration fee associated with attending this event.

Note: Participants will need to bring their own laptop to this program. No financial support for travel, lodging, or meals is available for this event.


After a brief organizational session, teams will spend three days addressing a challenging set of scientific problems related to a group of datasets.  Participants will analyze and combine datasets in order to work on these problems. Throughout the three days, we will come together to discuss progress on each of the topics, bioinformatics best practices, coding styles, etc.


Datasets will come from public repositories, such as the sequence read archive that have been ported to cloud infrastructure, as well as derivative contigs of the above.  Additionally, image stacks and phenotype data may be available from a variety of labs.   


All pipelines and other scripts, software, and programs generated in this hackathon will be added to a public GitHub repository designed for that purpose (

Manuscripts describing the design and usage of the software tools constructed by each team may be submitted to an appropriate journal such as the F1000Research hackathons channel, BMC Bioinformatics, GigaScience, Genome Research or PLoS Computational Biology. 


Initial applications are due December 15th, 2019 by 3 pm ET. Participants will be selected based on the experience and motivation they provide on the form. 

Prior participants and applicants are especially encouraged to apply. The first round of accepted applicants will be notified on December 18th, and have until December 18th at noon ET to confirm their participation (especially qualified applicants or those traveling internationally may receive acceptances earlier).  If you confirm, you must be willing to commit to all three days of the event, as confirming and not attending prevents other data scientists from attending this event.


Entrants retain ownership of all intellectual property rights (including moral rights) in the code submitted to as well as developed in the hackathon. Employees of the U.S. Government attending as part of their official duties retain no copyright in their work and their work is in the public domain in the U.S.

The Government disclaims any rights in the code submitted or developed in the hackathon.

Participants agree to publish the code and any related data in GitHub.

For more information, or with any questions, please contact Ben Busby ( ).

You don't have the appropriate permissions.