Individuals of a species differ from one another at the genetic level to various degrees. These differences represent different genotypes, or genetic constitutions, within a species. To better understand the genetic content of each individual genome, it is important to understand similarities and differences of gene sequences and their sub-components when compared across genomes. Therefore, the Seeker is looking for a methodology to accurately identify similar gene sequences across genomes from individuals of a single species.
This is a Reduction-to-Practice Challenge that requires written documentation and output from the data analysis algorithm, and submission of source code and executable if requested by the Seeker.
Individuals of a species differ from one another at the genetic level to various degrees. To deeply characterize the genetic content for each individual genome, it is important to understand which sequences of common ancestry have been inherited, possibly in a modified form, across the genomes. Existing knowledge about a gene variant from a well-characterized genome can be applied to better understand other variants, or alleles, of the same gene in different, uncharacterized genomes. Knowledge of which sequences represent the same genes in different individuals is necessary to understand the impact of similarities or any differences that may exist in the gene sequences of individuals from different genetic backgrounds.
The difficulty lies in determining which gene-derived sequences in the genomes are allelic. Transcription of a gene may produce many alternative transcript representations which differ in sequence composition. Finding the best mapping between transcripts of different genomes is a difficult and time-consuming task. Current methods rely on a combination of common software and proprietary techniques, but the reliability and accuracy of the processed results could be improved. Therefore, the Seeker is interested in a better methodology, with algorithms and/or best selections of existing software/programs, able to relate transcript sets of two genotypes within a species quickly and accurately to identify the allelic relationships.
Submissions to this Challenge must be received by 11:59 PM (US Eastern Time) on November 23, 2019.