Need an expert in Genetics to help me with my population genetics course assignment. The assignment documents has been attached with 3 questions. Also, a few other documents about sample calculations
Population Genetics 4303 Chapter 6: evolutionary Forces II Prof. Sara V. Good Winter 2021 Population Genetics 4303 Winter 2021 1 Videos of Fst • https://www.youtube.com/watch?v=1eoZG956SgY • https://www.youtube.com/watch?v=I8RCOI7n4XI&t=5s WHITE BOARD EXAMPLE OF Fst , migration, drift Beleza S, Johnson NA, Candille SI, Absher DM, Coram MA, et al. (2013) Genetic Architecture of Skin and Eye Color in an African -European Admixed Population. PLOS Genetics 9(3): e1003372. https://doi.org/10.1371/journal.pgen.1003372 https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.100337 2 Relating this example of eyet colour Need to assess genetic variation in F2 generation • Take pure -breeding parental lines • In this case, we have populations in Africa and Europe that have diverged in allele frequencies. • Northern Europeans almost “fixed” for blue eyes (p=0.65). • African almost “fixed” for brown eyes • F1, mostly heterozygotes • F2, segregating • Analagous to what happens under population subdivision model (although here, blue eyes was probably selected for). • Populations diverge in alleles due to drift and selection (sometimes). Genome -wide association study (GWAS) of skin and eye colour 11 Summarising estimates of genetic variation, (aka) nucleotide diversity ( π ) • In reality, it is usually easier to estimate genetic variation using Nei’s genetic diversity. This is an estimate of heterozygosity, but is equal to 1 – SUM(homozygotes). • Gene diversity = π = 1 − σ = 1 2 • Where q is the number of alleles and xi is the frequency of the ith allele. Nucleotide diversity can be measured using the population mutation parameter theta θ • As discussed, genetic drift and mutation are the two forces in real populations that are always present. Populations are always finite in size and mutation is inevitable. • Thus, the most important parameter in population genetics to describe the levels of diversity expected in a population is known as θ . • Θ = 4N eμ for nuclear loci AND • Θ = N eμ for mitochondrial loci W =4Nm estimated from #segregating sites ACCTGAACGTAGTTCGAAG ACCTGAACGTAGTTCGAAT ACCTGACCGTAGTACGAAT ACATGAACGTAGTACGAAT ACATGAACGTAGTACGAAT * * * * A B C D A B C D 1 2 3 4 5 Watterson, 1975 Expected number of segregating sites: W= 4/(1+1/2+1/3 + 1/4)=24/11=1.92 θ , Θ W = S n/a n, where a n = σ = − = = 1 1 1 k i W n i S T = Average Pairwise Distance = (1+3+3+3+2+2+2+2+2)/10=2 A mutation on an interior branch will have higher weight T estimated from pairwise differences AC CTGA ACGTAGT TCGAA G AC CTGA ACGTAGT TCGAA T AC CTGA CCGTAGT ACGAA T AC ATGA ACGTAGT ACGAA T AC ATGA ACGTAGT ACGAA T * * * * A B C D A B C D 1 2 3 4 5 . Tajima’s D A large value indicates shortened tips A small value indicates shortened deep branches deviations from the shape of the coalescent tree may be detected by Tajima’s D Rough rule: D> 2 or D< -2 suggests a significant deviation Tajima 1989) ( Std W T W T D = Calculating Tajima’s D statistic ϴ T = Observed diversity ϴ w = Expected diversity 1) If D= 0, then ϴ T = ϴ w - drift -mutation equilibrium 2) If D > 0, then ϴ T > ϴ w; • When, D >0 this means that the observed nucleotide diversity is greater than the expected: • Can happen if there is balancing selection • Or if there is a recent bottleneck (no selection, but sudden drop in N e) • Remember genetic diversity declines more slowly than N e when there is a sudden drop in N e 3) If D<0 , ϴ T < ϴ w When D<0, observed heterozygosity less than expected. • Can happen with purifying selection when there is positive selection for one nucleotide/allele • can also happen when there is a population expansion (see next slides) Here obs diversity>expected Change in genealogies over time following moderate or severe population bottlenecks In a moderate bottleneck D > 0 If it is a severe bottleneck cut off “old diversity”, deep branches of tree If it is a moderate bottleneck cut off “tips of diversity”, but not deep branches of tree Summary of affect of changes in populations size on Tajima’s D • If D >> 0 (>2), indicates a moderate population bottleneck in which ancient lineages present, but not recent mutations • If d ~ 0 could be Wright -Fisher equilibrium • If d < -2 and there are long tip or external branches – and a classical star phylogeny, a population expansion • Severe prolonged bottlenecks are hard to identify with Tajima’s D. Randomly generated trees – all under Wright - Fisher equilibrium, (they look different, but all “neutral”) Mismatch distribution can tell us about whether a population is constant in size or expanding, recently or in ancient past Calculating the mismatch distribution from a sequence alignment, Figure 6.1 text Genome Wide data allow us to explore the genetic relationship and distance among individuals and populations in a more interesting and complete way • Individuals and populations: • Individuals and populations are not pure! Many individuals/ populatios are “admixed” consisting of ancestry from more than one region. • Some parts of the genome have experienced selection, many regions or genes have not • Want to identify the genes that are selected vs neutrally evolving. Genome Wide data allow us to explore the genetic relationship and distance among individuals and populations in a more interesting and complete way • Approaches : • Principal Components analysis – is a method of “data reduction”. Allows you to collapse the information of hundreds or thousands of markers of each individual into a single point – then you can compare the distribution of individuals in two - dimensional space (i.e. X and Y axis) • Cluster analysis: First run an analysis to determine the optimal number of “groups” in the data (parameter = k), then you assign each individual as having X percentage of membership in each group. Finally, compare differences in ancestry among populations. • Phylogenetic trees and Dating evolutionary events with genetic data (we will cover this lightly, pages sections 6.4 and 6.5) • Has selection been acting? (section 6.7), we have spoken a bit and will discuss throughout. Principal Component Analyses • Principal Component analysis is a method of data reduction. • Thing of a SNP panel data. You might have 1,000,000 SNPs called for 100 individuals. How would you compare this? Snp by snp , person by person? • PCA is a method to reduce all of the variation in the data into vectors, called principal components, of information containing the SNPs that contribute the most to variation among individuals. Typically plot the first two principal components Graphical representation of PCA analyses of 225 individuals from 11 populations in West Africa. One of the most famous PC analyses on human genomes!
Using data from 200,000 SNPs, shows nearly perfect relationship between where individuals fall on the X -Y axis with their location of birth Zooming in on one region from the previous plot and overlaying it with language Cluster analysis First run an analysis to decide how many groups (k) are in your data. The number of k groups tells you how many different “ancestries” are present in the dataset.
Calculate the percentage of ancestry from each ancestral group that each genome is composed of.
Here each column is an individual, and each colour is an ancestral group Structure output.
210 individuals from nine populations in Pakistan, scored at 377 microsatellites (Figure 6.4). Analyses suggested that k=5 or “five ancestral” populations Adding tree - thinking to these results Less diversity in Y - chromosome than mtDNA Gagneux et al., Mitochondrial sequences show diverse evolutionary histories of African hominoids., Proceedings of the National Academy of Sciences Apr 1999 Chimpanzee Human Neanderthal