gen_qc.pl - script for detection of genotyping errors

Author: Irina Zorkoltseva, Institute Cytology and Genetics, Novosibirsk

December, 2004

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 1.3 of the License, or (at your option) any later version. You can download it from our website (http://mga.bionet.nsc.ru/soft/gen_qc/gen_qc.zip or http://mga.bionet.nsc.ru/soft/gen_qc/gen_qc.tar.gz) or ftp (ftp://mga.bionet.nsc.ru/gencheck/).

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY, without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

Contact:

zor@bionet.nsc.ru

Table of contents

INTRODUCTION

INSTALLATION

USAGE

OPTIONS

INPUT FILE FORMAT

OUTPUT FILE

RUN

EXAMPLE

REFERENCE

Introduction

In large data sets obtained in genetic studies the presence of errors is almost inevitable. Often the pedigree data and the genotype data are kept in the different files. At that the personal ID in pedigree data may be not coincide with person identification number (code) in genotype file. Program pre_pedcheck combines pedigree and genotypic data.

The resulting file is tested for Mendelian inconsistency. At this stage, the external program PedCheck (O`Connell and Weeks, 1998) is used to detect errors of inheritance of autosomal markers and our program x_check is used for X-linked markers.

The information on errors is extracted from the output, connected to the initial coding and reported in table format (program rec_pedcheck_err).

Installation

This program is written in Perl, which is available for free both under Windows and Unix/Linux environment. After downloading the distribution file gen_qc.tar.gz, put it to a user-defined folder and type the command "tar -xzvf gen_qc.tar.gz" to unzip it. If you are windows user you can unzip it with WinZip. NOTE: windows users might need to install the platform of Perl to run the program, which can be downloaded for free at http://www.activestate.com/Products/ActivePerl/.

After unzipping the file, you will find a new folder gen_qc, which contains 7 files (readme.txt, gen_qc.pl, pre_pedcheck.pl, pedcheck.exe (executable file for Windows), pedcheck (executable file for Linux), x_check.pl, and rec_pedcheck_err.pl) and 2 folders (doc, and example). In readme.txt you will find general information on all files in each folder and short instructions on how to run the program. In the folder doc, you can find this manual gen_qc.html, the documentation of this program and the GNU general public license. In another folder, example, you can find a files, which contain an example command, an example input pedigree file, an example input genotype file.

Usage

perl gen_qc.pl [-p myfile.ped ][-d myfile.dat] [-s /sep/]

Options

Program will look for pedigree.dat and genotype.dat as default inputs.

This may be changed using command line options:

 -p 
 -d 

### Examples:

	perl gen_qc.pl -p myfile.ped -d myfile.dat

One, none or both command options may be used. Order of the options is irrelevant.

These are text files, where entries are assumed to be comma-separated (this may be changed using option -s in command line) and missing data indicated by empty entries.

### Examples:

	perl gen_qc.pl -d myfile.dat -s

If you are using X-linked data, you must include the command option -x.

### Examples:

	perl gen_qc.pl -x -p myfile.ped

INPUT FILE FORMAT

You need 2 input files to run this program: a pedigree file and a genotype file. Both these input files should be in comma delimited format (this may be changed using option -s in command line) and missing data indicated by empty entries.

The input files must contain a header line, providing the description of every column.

In these files before a header line, there might be comment lines, starting with \"#\".

Pedigree File

The pedigree file contains genealogy information in a standard pre-makeped LINKAGE pedigree file format.

The first 5 columns should consist of following information:

Column1: family identifier (ped)

Column2: individual ID (id)

Column3: father's ID (fa)

Column4: mother's ID (mo)

Column5: sex (1 - man, 2 - women) (sex)

Column6: unique person identification number (code)

Column7: old individual ID (old_id) (if you used recode_ped.pl; this column may be absent)

Column3 and column4 are both zero when the individual is a founder. Any column after the 7th column is ignored.

Genotype_file

The genotype_file contains no less three columns.

One column is required with name code (a uniquie person identification number).

The others columns contain genotype data.

In this file, genotypes are coded as allele1 and allele2 for homo- and heterozygotes.

If the genotypes have different code from above, then run recodesnp.pl, that is available in this site.

OUTPUT FILE

All information about errors is printed to a file named pedcheck_rec.err.

RUN

To run the program, you need to put 2 input files and all programs into the same directory, and then, from the same directory, type the command line in DOS or UNIX (Linux) prompt. If you transferred files from Windows to Unix/Linux, please remember to change the format of these input files (e.g. dos2unix).

EXAMPLE

In the folder example, the file example_command.txt records the command to run the example. You can also find an example input pedigree file and an example input genotype file.

To run the example, copy gen_qc.pl, pre_pdcheck.pl, pdcheck.exe (executable file for Windows), pdcheck (executable file for Linux), x_check.pl, rec_ pedcheck_err.pl to the example directory, then from the example directory type:

perl gen_QC.pl -p ped_test.dat -d data_test.dat -s

REFERENCE

When using this software, please put reference to our web-site (http://mga.bionet.nsc.ru/soft/gen_qc).

We will provide the reference as soon as we publish a note on this software.