Development of algorithms and software for high-performance computing in genetic analysis of complex human traits

The goal of the project is to develop a series of linked algorithms and software programs for high-performance computing in genetic analysis of complex human traits. This software should make semi-automatic discovery of genes involved in complex diseases possible. The algorithms must take into account evidence coming from different levels of genetic analysis (linkage, association studies, knowledge of the sequence of the human genome, literature data about disease). We will focus on exploiting highly parallelizable computation techniques in genetic analysis, building upon our previous join research. The software will specifically target the analysis of large pedigrees spanning 5 or more generations, as can be found in human isolated populations and life stock. A parallel computer system (cluster) will be constructed to support software testing and high-performance computing . The algorithms and software will be tested and validated using data simulated under various genetic models. Also, commercial software and software available in the public domain (if available) will be used as a golden standard for comparison. Finally, the data will be applied to the numerous data sets that have been obtained in ongoing research projects of Erasmus and IC&G.

Project leaders

Prof. C. M. van Duijn
Erasmus MC Rotterdam
PO Box 1738 3000 DR Rotterdam
The Netherlands
Phone:+31 10 704 3394; Fax:+31 10 704 4657;

Prof. T. I. Axenovich
Institute of Cytology and Genetics SD RAS
Lavrentjeva ave 10
630090 Novosibirsk
Russian Federation
Phone:+7 383 3332813; Fax:+7 383 3331278;

Periodic reports



Technical reports


A number of programs were developed for data quality control and management and descriptive analysis. RECODE_PED program tests the errors of pedigree structure and converts the data to Linkage format. Program RECODE_SNP recodes alphanumerically coded SNP to numbered alleles. Program AFFY2MEGA converts SNP data from Affymetrix to Mega2 and Merlin formats. Program PHENO_QC tests the consistency of phenotypes records, performs descriptive statistical analysis and convert alphanumeric binary and qualitative data to numeric format. Program PRE_PEDCHECK prepares pedigree and genotypes data for PEDCHECK program. Program GENOT_QC is an interface to standart genotypic quality control program PEDCHECK. Program GENOT_QC_X tests the Mendelian errors in X chromosome genotypes. Program POOL_STR pools Short Tandem Repeat (microsatellite) data coming from different genotyping experiments. Program FCN can be used to describe complex pedigree structures. Program PEDPEEL prepares pedigree data for calculation of Elston-Stewarts' likelihood function. It finds an optimal way to peel. Program PEDCUT cuts deep pedigrees where patients are distantly related into computable sub-pedigrees based on user-specified MaxBit. Program PED_STR cuts complex pedigrees with large number of patients which are close related into computable sub-pedigrees based on user-specified MaxBit size.

A set of programs have been developed for breaking loops in pedigrees of arbitrary structure with multiple loops. These programs achieve high performance through parallel computations, using LAM/MPI. The classical Kruskal algorithm was used in package LOOP_EDGE. Algorithm based on the step by step breaking loops was used in package LOOP_PED. On every step, breaker was choosen in accordance with the size of looped part of pedigree after the removing of this breaker. Algorithm described by Vitezica et al, HumHered 2004,57:1-9, was used in package LOOP_STAR.

MAN_H_PG is a program for complex segregation analysis of quantitative traits on large pedigrees without loops. MQscore_SNP is a program for multipoint parametric linkage analysis of quantitative traits and SNPs on large pedigrees without loops. Ped_Outlier is a program for automatic identification of within-family outliers. PedigreeQuery is a program for drawing pedigrees step-by-step.

GenABEL is a library for R statistical analysis software was designed for the purposes of genome-wide association analysis. ProbABEL is a R library for GWA analysis of imputed SNPs. MetABEL is a R library for GWA meta-analysis. GenABEL is a R library for genome-wide association analysis.

Program DSEC_STA makes basic descriptive statistics for samples from the normal distribution.

Program TASK_MANAGER runs several tasks on multiprocessor platform on Linux system with openMosix.

Last modified: 10 September 2010
by Anatoliy Kirichenko

Locations of visitors to this page