Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Feb 26:13:54.
doi: 10.1186/1471-2148-13-54.

Probabilistic models for CRISPR spacer content evolution

Affiliations

Probabilistic models for CRISPR spacer content evolution

Anne Kupczok et al. BMC Evol Biol. .

Abstract

Background: The CRISPR/Cas system is known to act as an adaptive and heritable immune system in Eubacteria and Archaea. Immunity is encoded in an array of spacer sequences. Each spacer can provide specific immunity to invasive elements that carry the same or a similar sequence. Even in closely related strains, spacer content is very dynamic and evolves quickly. Standard models of nucleotide evolution cannot be applied to quantify its rate of change since processes other than single nucleotide changes determine its evolution.

Methods: We present probabilistic models that are specific for spacer content evolution. They account for the different processes of insertion and deletion. Insertions can be constrained to occur on one end only or are allowed to occur throughout the array. One deletion event can affect one spacer or a whole fragment of adjacent spacers. Parameters of the underlying models are estimated for a pair of arrays by maximum likelihood using explicit ancestor enumeration.

Results: Simulations show that parameters are well estimated on average under the models presented here. There is a bias in the rate estimation when including fragment deletions. The models also estimate times between pairs of strains. But with increasing time, spacer overlap goes to zero, and thus there is an upper bound on the distance that can be estimated. Spacer content similarities are displayed in a distance based phylogeny using the estimated times.We use the presented models to analyze different Yersinia pestis data sets and find that the results among them are largely congruent. The models also capture the variation in diversity of spacers among the data sets. A comparison of spacer-based phylogenies and Cas gene phylogenies shows that they resolve very different time scales for this data set.

Conclusions: The simulations and data analyses show that the presented models are useful for quantifying spacer content evolution and for displaying spacer content similarities of closely related strains in a phylogeny. This allows for comparisons of different CRISPR arrays or for comparisons between CRISPR arrays and nucleotide substitution rates.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of the instantaneous rates for an array of length 3. Leader-proximal end is on the left. The arrows display the allowed transitions for the fragment loss model. Deletions of length two are are displayed in red, deletions of length three in green. For the independent loss model, only the black arrows are allowed transitions. For the unordered model, the transition with rate λ results in either 4-3-2-1, 3-4-2-1, 3-2-4-1 or 3-2-1-4 with uniform probabilities.
Figure 2
Figure 2
Markov chain representation of the length models. (A) Independent loss model. (B) Fragment loss model. For clarity, deletions of length 2 are red, deletions of length 3 are green, and deletions ≥4 are not displayed.
Figure 3
Figure 3
Stationary distribution of the length models. Subscript I represents the independent loss model and subscript F the fragment loss model. ρs of the same color result in the same mean length, i.e., they are corresponding ρs.
Figure 4
Figure 4
Overview of the array segmentation for the likelihood calculation under the fragment loss model. This segmentation results in the inserted fragment 9-8-7, the preserved fragments 6-5 and 1 and the deleted fragment 4-3-2.
Figure 5
Figure 5
Expected loss times for both models. αl - expected lineage loss time, αp - expected pairwise loss time. αl(ρI) = αp(ρI), thus only one is displayed. Corresponding ρs are in one column, i.e., they result in the same expected length. Each point represents 10,000 simulations.
Figure 6
Figure 6
Rooted tree of three taxa with branch lengths.
Figure 7
Figure 7
Estimation of ρ with 2 arrays. (A) Simulations under the unordered model. (B, C) Simulations under the ordered model. (D, E) Simulations under the fragment loss model. A standard boxplot is shown. 1000 replicates are simulated under each setting. If present, the number of points outside the plot are listed above.
Figure 8
Figure 8
Estimation of times with 2 arrays. (A) Simulations under the unordered model. (B, C) Simulations under the ordered model. (D, E) Simulations under the fragment loss model. τ is the sum of the times from the ancestor to both descendants. Only pairs with overlap are included for “overlap”, the number of pairs is given by “Count”. A standard boxplot is shown. 1000 replicates are simulated under each setting. If present, the number of points outside the plot are listed above.
Figure 9
Figure 9
Estimation of ρ with 10 arrays. (A) Simulations under the unordered model. (B, C) Simulations under the ordered model. (D, E) Simulations under the fragment loss model. Data was simulated on random Yule trees rescaled to a specific treeheight. A standard boxplot is shown. 1000 replicates are simulated under each setting. If present, the number of points outside the plot are listed above.
Figure 10
Figure 10
Estimation of times with 10 arrays. (A) Simulations under the unordered model. (B, C) Simulations under the ordered model. (D, E) Simulations under the fragment loss model. A standard boxplot is shown. 1000 replicates are simulated under each setting. If present, the number of points outside the plot are listed above.
Figure 11
Figure 11
Phylogeny of the Cas locus from 19 Yersinia pestis genomes. Tree images are created by FigTree [39].
Figure 12
Figure 12
Trees using the CRISPR spacer data from data set 1. (A,C,E) NJU: Neighbor joining tree of times from the unordered model. (B,D,F)RNJF: Rooted neighbor joining tree of times from the fragment loss model.(A,B) Yp1, (C,D) Yp2, (E,F) Yp3. Branch lengths correspond to the number of events under the specific model. For clarity, the unrooted neighbor joining trees are shown with the root at the same branch as the rooted neighbor joining tree.

References

    1. Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, Moineau S, Romero DA, Horvath P. CRISPR provides acquired resistance against viruses in prokaryotes. Science. 2007;315(5819):1709–1712. doi: 10.1126/science.1138140. - DOI - PubMed
    1. Marraffini LA, Sontheimer EJ. CRISPR interference limits horizontal gene transfer in staphylococci by targeting DNA. Science. 2008;322:1843–1845. doi: 10.1126/science.1165771. - DOI - PMC - PubMed
    1. Bolotin A, Quinquis B, Sorokin A, Ehrlich SD. Clustered regularly interspaced short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. Microbiology. 2005;151(Pt8):2551–2561. - PubMed
    1. Wiedenheft B, Sternberg SH, Doudna Ja. RNA-guided genetic silencing systems in bacteria and archaea. Nature. 2012;482(7385):331–338. doi: 10.1038/nature10886. - DOI - PubMed
    1. Makarova KS, Aravind L, Wolf YI, Koonin EV. Unification of Cas protein families and a simple scenario for the origin and evolution of CRISPR-Cas systems. Biol Direct. 2011;6:38. doi: 10.1186/1745-6150-6-38. - DOI - PMC - PubMed

LinkOut - more resources