Genetic risk calculations

Fundamental concepts and theory

Pedigree data likelihood

Let us consider data on a pedigree consisting of N individuals. The data consist of individual phenotypes x₁ being phenotype of person 1, x₂ being the phenotype of person 2, … x_N being the phenotype of person N. Thus, the phenotypic data may be thought as a vector X={x₁, x₂, x₃,…x_N}. Each individual in the pedigree has some genotype, g, with g₁ being genotype of person 1, g₂ being the genotype of person 2, … g_N being the genotype of person N. Thus, the genotypic data may be thought as a vector of genotypes G={g₁, g₂, g₃,…g_N}.

Following the low of total probability, probability of X is

P(X) = ∑_G P(X|G) P(G) (1)

here sum is taken over all possible genotypes of N people.

If we assume that a phenotype of an individual depends only on its’ genotype (and not on the genotypes of other people in the pedigree), then expression (1) may be written in more details:

P(X) = ∑_G [ ∏_i_=1,N P(x_i|g_i) ] P(G) (1)’

Further, we may explore what is the probability P(G). P(G) is a product of probabilities of genotypes of all N subjects. For “founder” subjects (i.e. these for whom parents are not in the pedigree) these probabilities are just populational probabilities of genotypes. Fot these, who are offspring, the probability is defined conditional on parents’ genotypes. If we say that people 1...K are founders and people K+1…N are not, then

P(X) = ∑_G [ ∏_i_=1,N P(x_i|g_i) ] [ ∏_j=1,K P(g_j) ] [ ∏_l=K+1,N P(g_l|g_lm g_lf) ] (1)’’

Thus, to calculate probability of a pedigree data, one needs to specify tree probability distributions:

“A priori” probability distribution of genotypes in population, P(g)
Conditional transmission probability – probability that an offspring would have genotype g given the genotypes of parents (maternal genotype g_m and paternal genotype g_f), P(g|g_mg_f). NOTE: there is some confusion about this term. More frequently it is used to denote the probability that a particular allele will be transmitted to an offspring given the parental genotype, e.g. P(A|AA) = 1 or P(a|Aa)=½. However, if these gametic transmission probabilities are defined, it is easy to calculate the transmission probability as defined above using these.
Penetrance function, that is conditional probability that an individual would have phenotype x given its genotype is g, P(x|g)

Example. Consider a Medelian locus with two alleles, A and a, with frequency of A being 0.99. The trait of interest is some dominant disease. The people with genotypes Aa and aa are having this condition for sure, the people with genotype AA never develop the condition. Let us formalize this model of inheritance in terms of the three distributions described above.

Designation: genotype g may take the form of AA, Aa or aa; phenotype x may take the form of D (affected) or U (unaffected).

P(g): assuming HWE, the population distribution of genotypes is P(g=AA) = 0.99 0.99 = 0.9801, P(g=Aa) = 0.0198 and P(g=aa) = 0.0001

P(x|g): the penetrance function is P(x=D|g=AA)=0, P(x=D|g=Aa)=1, P(x=D|g=aa)=1. A probability of being unaffected given the genotype is P(U|g)=1 – P(D|g)

Penetrance function extended

If we want to incorporate some covariate, C, into our model, this is quite easy to do. However, there may be many different ways to do that. One of the most obvious ways is to re-write the penetrance function using the form:

P(x|g) = w_g e^α^C / (1 + e^α^C)

here, w_g is the maximal penetrance (which is reached when C->+Infinity) for the genotype g, C is the value of the covariate and is the regression coefficient.

Figure: P(x|g) = w_g e^α^C / (1 + e^α^C), with w_AA=0.1, w_Aa=0.8, w_aa=0.9 and α=0.3.

It is also easy to incorporate more covariates and to do the regression coefficients dependent on the genotype (thus introducing genotype x environment interaction)

Genetic risks: general form

During accessing genetic risk we are usually answering the question what is the probability that some individual will have a disease given a known model of inheritance ands data on family history (generally, phenotypes and other information on relatives). Thus, the probability we are interested in is

P(Affected|X) = ∑_g P(Affected|g) P(g|X)

The first term in this expression is penetrance function and thus we know it from the model. The second term, P(g|X), is the probability that the individual has genotype g given the pedigree data. This may be calculated using Bayes theorem:

P(g|X) = P(X, g) / P(X)

P(X) is calculated using expression (1). P(X, g) is also obtained using expression (1). However, in this case, the genotype of the individual of interest is fixed.

Practical issues

X-linked recessive disease

Pedigree A & B. What is the probability that e is a carrier? Person a is affected. Thus, d is a carrier. The probability that the mutation is transmitted to e is ½. Next question, what is the risk for a boy born from e and an unaffected father, to be affected? Clear, this probability is the probability e is carrier times the probability of transmission of the mutation = ½ ½ = ¼. The pedigree B is different from A only by introduction of an extra generation. Thus, the risk for g to be carrier is ( ½ )² is ¼, and, consequently the risk for a boy born from g and unaffected father is 1/8. Generally, if N generation passed between obligate carrier and the person of interest (and information on disease phenotypes of males in previous generations is not available) then the probability of being carrier is ½^N.

Pedigree C. Now, we ran into real consulting. The probabilities of e being carrier (and, consequently, the risk for next child) are calculated in following manner:

Hypothesis:	e is carrier	e is not carrier
Prior, P(g)	½	1 – ½ = ½
Conditional, P(X\|g)	½ ½ = ¼	1
Joint, P(X\|g) P(g)	1/8	½
Posterior	1/8 / (1/8 + 1/2) = 1/5	½ / (1/8 + ½) = 4/5
Risk for next boy	1/5 ½ = 1/10

More generally, if the number of healthy children is not two, but some M, then the posterior probability of being carrier is 1/(1+2^M). If we combine this result with the result obtained for pedigrees A and B, then posterior probability is 1/(2^M+N-2^M+1). However, this formula is correct only if in the upper part of the pedigree there are no informative males. Generally, this is not the case.

Pedigree D. Here, the situation is more general. The simplest way to deal with this pedigree is to calculate the probability of e being carrier. Then, half of this probability will be the prior probability of being carrier for the woman of interest, j. The probability of e being carrier is easily calculated using the above formula:

P(e is carrier | data) = 1/(2^M+N-2^M+1) = 1/(2³⁺¹-2³+1) = 1/9

Then,

Hypothesis:	j is carrier	j is not carrier
Prior, P(g)	½ 1/9 = 1/18	1 – 1/18 = 17/18
Conditional, P(X\|g)	½	1
Joint, P(X\|g) P(g)	1/36	17/18
Posterior	1/36 / (1/36 + 34/36) = 1/35	34/35
Risk for next boy	1/35 ½ = 1/70

Dominant disease

Consider a dominant disease with the frequency of mutant allele being q=0.01. In this case, we can securely assume that any person observed in population is a heterozygous carrier (odds heterozygous vs. homozygous carrier are ~ 0.02/0.0001 = 200). Using this assumption, what are the risks for next child in pedigrees A-D? It is quite obvious, that the genotype of father is DN and the genotype of mother is NN. Given these genotypes, the risk for next child is ½.

If the disease allele is more frequent (say, q=0.1), we cannot assume that diseased people are heterozygous. In pedigree A, we need to estimate the probability of father being DD (given such genotype, then the risk to child is 1) or ND (the risk to child is 0.5). Using Bayesian approach:

P(father is DD | father is affected) =

P(affected | DD) P(DD) / [P(affected | DD) P(DD) + P(affected | DN) P(DN) + P(affected | NN) P(NN)] =

1 q² / [1 q² + 1 2 q (1-q) + 0 (1-q)²=]

0.01 / 0.19 = 0.053

P(father is DN | father is affected) =

P(affected | DN) P(DN) / [P(affected | DD) P(DD) + P(affected | DN) P(DN) + P(affected | NN) P(NN)] =

1 2 q (1-q) / [1 q² + 1 2 q (1-q) + 0 (1-q)²] =

0.18 / 0.19 = 0.947

Thus, the risk for the child to be affected is 0.053 + ½ 0.947 = 0.53

Now, consider pedigree B. Here, information on one previous child, which was affected, is available. How this information changes the posterior probability of father to be DD or ND?

Hypothesis:	Father is DD	Father is ND
Prior, P(g)	0.053	0.947
Conditional, P(X\|g)	1	½
Joint, P(X\|g) P(g)	0.053	0.4735
Posterior	0.053/(0.053+0.4735)=0.1	0.9
Risk for next boy	0.1 + 0.9 ½ = 0.55

Thus, the chances that the second offspring will get the disease, are increased.

Incorporating marker data – simple example

Recessive disease

Consider a recessive disease with frequency of mutant allele D being q=0.025. In pedigree A, what are is the probability that the child in question will get the disease? The information on a, b and c provides the key: the genotypes of these individuals are ND, ND and DD, respectively. Thus, the chance that d is heterozygous carrier are:

P(d is DN | d is unaffected) =

P(unaffected | DN) P(DN) / [P(unaffected | DD) P(DD) + P(unaffected | DN) P(DN) + P(unaffected | NN) P(NN)] =

1 ½ / [0 ¼ + 1 ½ + 1 ¼] = 2/3

Here, the priors for genotypes are the transmission probabilities given the genotypes of the parents.

What are the chances that e is heterozygous carrier?

P(e is DN | e is unaffected) =

P(unaffected | DN) P(DN) / [P(unaffected | DD) P(DD) + P(unaffected | DN) P(DN) + P(unaffected | NN) P(NN)] =

1 2 q (1-q) / [0 q² + 1 2 q (1-q) + 0 (1-q)²] =

= 0.049 / (0.049 + 0.95) =0.049

Then, the risk for the child is ¼ 2/3 0.049 = 0.008

Consider now pedigree B. For individual h the posterior probability of being carrier is 2/3, as in previous example. Here we assume, that the frequency of the D allele in population is very low. This means that D allele must come to both d and i (if i is carrier) from either of two ancestors a or b. The probability that D is not lost during the transmission in two generations is ½². Thus, the risk for the child is ¼ ¼ 2/3 = 1/24.

More complex case

Literature

Exercises

X-linked recessive:

Dominant disease: evaluate pedigrees C and D, first analytically (with frequency of D being q), then do calculations for q=0.1.

Recessive disease: evaluate pedigree C, assume that frequency of D is very low.