Background: The ability to identify regions of the genome inherited with a dominant trait in one or more families has become increasingly valuable with the wide availability of high throughput sequencing technology. While a number of methods exist for mapping of homozygous variants segregating with recessive traits in consanguineous families, dominant conditions are conventionally analysed by linkage analysis, which requires computationally demanding haplotype reconstruction from marker genotypes and, even using advanced parallel approximation implementations, can take substantial time, particularly for large pedigrees. In addition, linkage analysis lacks sensitivity in the presence of phenocopies (individuals sharing the trait but not the genetic variant responsible). Combinatorial Conflicting Homozygosity (CCH) analysis uses high density biallelic single nucleotide polymorphism (SNP) marker genotypes to identify genetic loci within which consecutive markers are not homozygous for different alleles. This allows inference of identical by descent (IBD) inheritance of a haplotype among a set or subsets of related or unrelated individuals.
Results: A single genome-wide conflicting homozygosity analysis takes <3 seconds and parallelisation permits multiple combinations of subsets of individuals to be analysed quickly. Analysis of unrelated individuals demonstrated that in the absence of IBD inheritance, runs of no CH exceeding 4 cM are not observed. At this threshold, CCH is >97% sensitive and specific for IBD regions within a pedigree exceeding this length and was able to identify the locus responsible for a dominantly inherited kidney disease in a Turkish Cypriot family in which six out 17 affected individuals were phenocopies. It also revealed shared ancestry at the disease-linked locus among affected individuals from two different Cypriot populations.
Conclusions: CCH does not require computationally demanding haplotype reconstruction and can detect regions of shared inheritance of a haplotype among subsets of related or unrelated individuals directly from SNP genotype data. In contrast to parametric linkage allowing for phenocopies, CCH directly provides the exact number and identity of individuals sharing each locus. CCH can also identify regions of shared ancestry among ostensibly unrelated individuals who share a trait. CCH is implemented in Python and is freely available (as source code) from http://sourceforge.net/projects/cchsnp/ .