- script for detection of genotyping errors

Author: Irina Zorkoltseva, Institute Cytology and Genetics, Novosibirsk

December, 2004

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 1.3 of the License, or (at your option) any later version. You can download it from our website ( or or ftp (

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY, without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.


Table of contents











In large data sets obtained in genetic studies the presence of errors is almost inevitable. Often the pedigree data and the genotype data are kept in the different files. At that the personal ID in pedigree data may be not coincide with person identification number (code) in genotype file. Program pre_pedcheck combines pedigree and genotypic data.

The resulting file is tested for Mendelian inconsistency. At this stage, the external program PedCheck (O`Connell and Weeks, 1998) is used to detect errors of inheritance of autosomal markers and our program x_check is used for X-linked markers.

The information on errors is extracted from the output, connected to the initial coding and reported in table format (program rec_pedcheck_err).


This program is written in Perl, which is available for free both under Windows and Unix/Linux environment. After downloading the distribution file gen_qc.tar.gz, put it to a user-defined folder and type the command "tar -xzvf gen_qc.tar.gz" to unzip it. If you are windows user you can unzip it with WinZip. NOTE: windows users might need to install the platform of Perl to run the program, which can be downloaded for free at

After unzipping the file, you will find a new folder gen_qc, which contains 7 files (readme.txt,,, pedcheck.exe (executable file for Windows), pedcheck (executable file for Linux),, and and 2 folders (doc, and example). In readme.txt you will find general information on all files in each folder and short instructions on how to run the program. In the folder doc, you can find this manual gen_qc.html, the documentation of this program and the GNU general public license. In another folder, example, you can find a files, which contain an example command, an example input pedigree file, an example input genotype file.


perl [-p myfile.ped ][-d myfile.dat] [-s /sep/]


Program will look for pedigree.dat and genotype.dat as default inputs.

This may be changed using command line options:


### Examples:

	perl -p myfile.ped -d myfile.dat

One, none or both command options may be used. Order of the options is irrelevant.

These are text files, where entries are assumed to be comma-separated (this may be changed using option -s in command line) and missing data indicated by empty entries.

### Examples:

	perl -d myfile.dat -s

If you are using X-linked data, you must include the command option -x.

### Examples:

	perl -x -p myfile.ped


You need 2 input files to run this program: a pedigree file and a genotype file. Both these input files should be in comma delimited format (this may be changed using option -s in command line) and missing data indicated by empty entries.

The input files must contain a header line, providing the description of every column.

In these files before a header line, there might be comment lines, starting with \"#\".

Pedigree File

The pedigree file contains genealogy information in a standard pre-makeped LINKAGE pedigree file format.

The first 5 columns should consist of following information:

Column1: family identifier (ped)

Column2: individual ID (id)

Column3: father's ID (fa)

Column4: mother's ID (mo)

Column5: sex (1 - man, 2 - women) (sex)

Column6: unique person identification number (code)

Column7: old individual ID (old_id) (if you used; this column may be absent)

Column3 and column4 are both zero when the individual is a founder. Any column after the 7th column is ignored.


The genotype_file contains no less three columns.

One column is required with name code (a uniquie person identification number).

The others columns contain genotype data.

In this file, genotypes are coded as allele1 and allele2 for homo- and heterozygotes.

If the genotypes have different code from above, then run, that is available in this site.


All information about errors is printed to a file named pedcheck_rec.err.


To run the program, you need to put 2 input files and all programs into the same directory, and then, from the same directory, type the command line in DOS or UNIX (Linux) prompt. If you transferred files from Windows to Unix/Linux, please remember to change the format of these input files (e.g. dos2unix).


In the folder example, the file example_command.txt records the command to run the example. You can also find an example input pedigree file and an example input genotype file.

To run the example, copy,, pdcheck.exe (executable file for Windows), pdcheck (executable file for Linux),, rec_ to the example directory, then from the example directory type:

perl -p ped_test.dat -d data_test.dat -s


When using this software, please put reference to our web-site (

We will provide the reference as soon as we publish a note on this software.