What are the limitations of commercial-grade DNA genotyping compared to full sequencing?

What are the limitations of commercial-grade DNA genotyping compared to full sequencing?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I've heard about services like 23andme, which offer genetic testing to the general public. As a person who knows very little about genetics, I'm interested in the subject and would like to know what the modern "commercial grade" genotyping really does. What are the limitations of commercial-grade "genotyping"? From what I've been reading, they test DNA against ~100 different markers, but do not really sequence it.

If I understand correctly, if DNA is sequenced, genetic markers and genes can be identified in it, including the newly discovered ones. However, (if I understand correctly) if DNA is genotyped, the genotyping is a one shot operation which states if some genetics markers are present in the DNA. To test against newly discovered markers, it would have to be genotyped again, right?

Thank you for any clarifications!

23andme briefly describes the technology they use here. They are testing the genotype of your DNA at roughly 1 million locations. The technology they use to do this is known as a microarray.

The limitations of using a microarray, as compared to sequencing, is that you will only find what you are looking for -- people often describe the disadvantages of microarrays as compared to sequencing with the streetlight effect/metaphor.

Arrays can only measure regions of the genome that they were designed to probe. Technically, if they probe 1,000,000 million locations, and the human genome is roughly 3.4 billion bases… you can do the math.

In practice, SNPs tell you a bit more about just the nucleotide being interrogated, due to linkage disequilibrium (cf. tag/proxy SNPs), so the array might tell you more than you expect.

Of course, modulo sequencing errors, whole genome sequencing will tell you "everything" as far as recovering the NTs that make up your genome, but how much information that will provide you is another thing altogether (for now, that is).

Just to be transparent, I work at a microarray company, but I've also attended high throughput sequencing workshops. I think this is a pretty fair assessment, but leave me a comment if you see anything out of line…

As pointed out by @SteveLianoglou, Genotyping microarrays are best for revealing whether a particular variant (not only SNPs but short insertions and deletions, copy number variations (CNVs) reliably and cheaply.

High Throughput DNA Sequencing has made great strides and the latest cost estimates are in the few thousands of dollars for a full human genome. This is better for discovering new variants but currently its not a great way to do do say a medical test to see if you have a particular mutation. The problem with sequencing right now is that sequencers have systematic biases - the local sequence can give the same error over and over again. It takes some significant effort to differentiate the error from the real variant data. In the workshop I attended, the data consisted of 38 full human genomes at 30-40x coverage. Despite this deep level of sequencing, systematic errors were still obvious (the proceedings are in press now - reference should show up at the link provided soon). So it can be fairly expensive when you have done all your data analysis and validation data to get the variants via sequencing. All this could change very quickly, but I think this is accurate in the current state of things.

I want to give the OP a slight variation on Steve's answer.

It is important to note that most of the genome is the same between different people. That isn't surprising, since we belong to the same species.

For example, it is estimated that there are about 10,000,000 SNPs (single basepair positions that may vary) in the human genome. So if you check 1,000,000 SNPs with a microarray (genotyping), you are covering ~10% of the possible differences. In addition, variation in some of the SNPs is very common while others are very rare, so if you check the most common 1,000,000 SNPs with a microarray, you might actually cover a very large portion of the differences. It is also possible that if you know in advance some general information about the person you are testing, such as ethnicity, you might be able to eliminate many uninformative SNPs.

This of course only refers to variation in SNPs (but these are the most common variations).

Genotyping of Genetically Monomorphic Bacteria: DNA Sequencing in Mycobacterium tuberculosis Highlights the Limitations of Current Methodologies

Because genetically monomorphic bacterial pathogens harbour little DNA sequence diversity, most current genotyping techniques used to study the epidemiology of these organisms are based on mobile or repetitive genetic elements. Molecular markers commonly used in these bacteria include Clustered Regulatory Short Palindromic Repeats (CRISPR) and Variable Number Tandem Repeats (VNTR). These methods are also increasingly being applied to phylogenetic and population genetic studies. Using the Mycobacterium tuberculosis complex (MTBC) as a model, we evaluated the phylogenetic accuracy of CRISPR- and VNTR-based genotyping, which in MTBC are known as spoligotyping and Mycobacterial Interspersed Repetitive Units (MIRU)-VNTR-typing, respectively. We used as a gold standard the complete DNA sequences of 89 coding genes from a global strain collection. Our results showed that phylogenetic trees derived from these multilocus sequence data were highly congruent and statistically robust, irrespective of the phylogenetic methods used. By contrast, corresponding phylogenies inferred from spoligotyping or 15-loci-MIRU-VNTR were incongruent with respect to the sequence-based trees. Although 24-loci-MIRU-VNTR performed better, it was still unable to detect all strain lineages. The DNA sequence data showed virtually no homoplasy, but the opposite was true for spoligotyping and MIRU-VNTR, which was consistent with high rates of convergent evolution and the low statistical support obtained for phylogenetic groupings defined by these markers. Our results also revealed that the discriminatory power of the standard 24 MIRU-VNTR loci varied by strain lineage. Taken together, our findings suggest strain lineages in MTBC should be defined based on phylogenetically robust markers such as single nucleotide polymorphisms or large sequence polymorphisms, and that for epidemiological purposes, MIRU-VNTR loci should be used in a lineage-dependent manner. Our findings have implications for strain typing in other genetically monomorphic bacteria.

Citation: Comas I, Homolka S, Niemann S, Gagneux S (2009) Genotyping of Genetically Monomorphic Bacteria: DNA Sequencing in Mycobacterium tuberculosis Highlights the Limitations of Current Methodologies. PLoS ONE 4(11): e7815.

Editor: Anastasia P. Litvintseva, Duke University Medical Center, United States of America

Received: July 31, 2009 Accepted: October 15, 2009 Published: November 12, 2009

Copyright: © 2009 Comas et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by the Medical Research Council, UK, and USA National Institutes of Health grants HHSN266200700022C and AI034238. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.


The advent of next-generation sequencing (NGS) has revolutionized genomic and transcriptomic approaches to biology. These new sequencing tools are also valuable for the discovery, validation and assessment of genetic markers in populations. Here we review and discuss best practices for several NGS methods for genome-wide genetic marker development and genotyping that use restriction enzyme digestion of target genomes to reduce the complexity of the target. These new methods — which include reduced-representation sequencing using reduced-representation libraries (RRLs) or complexity reduction of polymorphic sequences (CRoPS), restriction-site-associated DNA sequencing (RAD-seq) and low coverage genotyping — are applicable to both model organisms with high-quality reference genome sequences and, excitingly, to non-model species with no existing genomic data.

DNA Sequencing

Illumina next-generation sequencing (NGS) technology uses clonal amplification and sequencing by synthesis (SBS) chemistry to enable rapid, accurate sequencing. The process simultaneously identifies DNA bases while incorporating them into a nucleic acid chain. Each base emits a unique fluorescent signal as it is added to the growing strand, which is used to determine the order of the DNA sequence.

NGS technology can be used to sequence the DNA from any organism, providing valuable information in response to almost any biological question. A highly scalable technology, DNA sequencing can be applied to small, targeted regions or the entire genome through a variety of methods, enabling researchers to investigate and better understand health and disease.

One Decade of Sequencing

Explore the breakthroughs, advancements, and progress.

Benefits of DNA Sequencing With NGS

  • Sequences large stretches of DNA in a massively parallel fashion, offering advantages in throughput and scale compared to capillary electrophoresis–based Sanger sequencing
  • Provides high resolution to obtain a base-by-base view of a gene, exome, or genome
  • Delivers quantitative measurements based on signal intensity
  • Detects virtually all types of genomic DNA alterations, including single nucleotide variants, insertions and deletions, copy number changes, and chromosomal aberrations
  • Offers high throughput and flexibility to scale studies and sequence multiple samples simultaneously
Benchtop DNA Sequencers

Compare the speed and throughput of Illumina DNA sequencing systems to find the best option for your lab.

Common DNA Sequencing Methods

Whole-Genome Sequencing

Whole-genome sequencing is the most comprehensive method for analyzing the genome. Rapidly dropping sequencing costs and the ability to obtain valuable information about the entire genetic code make this method a powerful research tool.

Targeted Resequencing

With targeted resequencing, a subset of genes or regions of the genome are isolated and sequenced, allowing researchers to focus time, expenses, and analysis on specific areas of interest.

ChIP Sequencing

By combining chromatin immunoprecipitation (ChIP) assays and sequencing, ChIP sequencing (ChIP-Seq) is a powerful method to identify genome-wide DNA binding sites for transcription factors and other proteins.

Library Prep for DNA Sequencing

Our versatile library prep portfolio allows you to examine small, targeted regions or the entire genome. We've innovated in PCR-free and on-bead fragmentation technology, offering time savings, flexibility, and increased sequencing data performance.

What are the limitations of commercial-grade DNA genotyping compared to full sequencing? - Biology

Growing interest in personalized cancer therapy has led to numerous advances in the field of cancer genomics. Next-generation sequencing (NGS) is one such development that has allowed for lower cost, higher-throughput genome sequencing. However, the vast number and types of genomic aberrations found in cancer means that interpretation of the data generated by NGS requires substantial analytical complexity. Here, we discuss the clinical applications of NGS and the obstacles that must be overcome prior to widespread use in clinical decision making.

Key words: Next-generation sequencing, review, genomics, cancer


Personalized cancer therapy requires the use of molecular diagnostics to tailor treatments to individuals. At this time, only a few molecular biomarker-based therapies, such as erlotinib in EGFR-mutated lung cancer and vemurafenib in BRAF-mutated melanoma, have been widely accepted. 1,2 Next-generation sequencing (NGS) has the potential to revolutionize oncology through the classification of tumors and identification of biomarkers that can predict response to individualized therapy.

Until recently, the Sanger sequencing method was the most widely used sequencing method, and resulted in the only complete human genome sequence. 3 This technology relies on incorporation of chain-terminating dideoxynucleotides during DNA replication. 4 Fluorescently labeled terminators, capillary electrophoresis separation, and laser signal detection have improved the throughput of Sanger sequencing. 5 However, it remains labor-intensive, time-consuming, and expensive when done in large scale. 6 Therefore, the demand for faster, more accurate, and more cost-effective genomic information has led to the development of NGS methods.

NGS methods are high-throughput technologies with capabilities of sequencing large numbers of different DNA (massively parallel) sequences at once. NGS technologies monitor the sequential addition of nucleotides to immobilized DNA templates generated from target tissue. 7 Unfortunately, the increased throughput of NGS reactions comes at the cost of shorter sequences, as most sequencing platforms (Illumina, Roche, SoLiD) offer shorter read lengths (30&ndash400 bp) than the conventional Sanger-based method. 8 These shorter sequences are then assembled into longer sequences such as complete genomes.

Common approaches to DNA sequencing include whole-genome sequencing, whole-exome sequencing, targeted exome sequencing, and &ldquohotspot&rdquo sequencing. Whole-genome sequencing sequences the complete genome of a sample (ie, chromosomal DNA and mitochondrial DNA, which includes intronic and exonic regions). Whole-exome sequencing is a technique that sequences all of the protein-coding genes (ie, all exons in the genome). Targeted exome sequencing uses target-enrichment methods to capture genes of interest. This approach is becoming increasingly popular in oncology for assessing the full sequence of cancer-related gene panels. Targeted exome sequencing also facilitates sequencing at a greater depth, and thus the identification of subclonal mutations. Alternately, rather than sequencing the full sequence of selected genes, only selected regions of selected genes can be sequenced, focusing on cancer gene &ldquohotspots&rdquo&mdashregions with recurrent mutations. Although hotspot mutation testing facilitates large-scale sequencing of many samples, it does limit the knowledge that is acquired through sequencing because it limits the evaluation to small regions in selected genes. Consequently, it increases the possibility of omitting relevant mutations for which evaluation is not being conducted, thus limiting the clinical knowledge that is gained through NGS. Despite its drawbacks, it is becoming a widely accepted form of NGS.

Practical Application

  • Provide a brief introduction to the methods of next-generation sequencing (NGS)
  • Identify clinical applications of NGS including the identification of various molecular aberrations in different tumor types, the resultant design of molecular biomarker-driven clinical trials, and the potential to identify molecular aberrations that lead to disease progression and resistance
  • Identify limitations of NGS, including the need for extensive analytic capabilities, the difficulties in identification of driver mutations, and the confounding factor of tumor heterogeneity
  • Identify potential future applications of NGS

In addition to nucleotide change detection (mutations and small insertions and deletions), NGS allows for DNA-copy number predictions. Further, NGS technology also can be applied to RNA in order to evaluate the transcriptome of a tumor. RNA-sequencing (RNA-seq) allows for the assessment of gene expression and transcriptional splice variant analysis in addition to detection of mutations. A typical NGS work flow from sample collection to the capture and sequencing of genes of interest and data analysis is illustrated in Figure 1.

Identification of Cancer Genomics

In recent years, NGS has been used to characterize genomic alterations such as mutations, insertions/deletions, and copy number changes, and the frequency with which they occur in various tumor types. Efforts such as the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) aim to catalog such genomic alterations across many tumor types. 9,10 However, the wealth of information that is generated through this process unveils potentially the largest hurdle of genomic medicine: How do we analyze the abundance of information that is generated to make informed decisions regarding therapy? Analysis of cancer genomes reveals that most tumors contain multiple alterations. 11-14 As a result, it is very important to distinguish the &ldquodriver&rdquo mutations that contribute to tumor development from the &ldquopassengers&rdquo that do not. 15

Comparison of sequenced genomes to reference genomes allows for the identification of genome alterations that may be relevant in disease development and progression. 16 However, such comparison depends on the establishment of extensive and accurate reference genomes, which is a cumbersome task. Further, the complexity of genomic aberrations in cancer makes it difficult to rely on standard reference genomes. 8 Therefore, simplified methods of identifying driver mutations are required. Several theories exist for the potential identification of driver mutations. One such hypothesis is that mutations that occur with higher frequency are more likely to contribute to tumor development and growth. 17 Genome-wide association studies (GWAS) aim to compare the incidence of commonly known single nucleotide polymorphisms (SNPs) in genomes from patients with and without a specific disease. SNPs that occur at a higher frequency in the diseased population are identified as potentially causative. If a specific mutation is not found in high frequency, but the same molecular pathway contains frequent genomic alterations, those alterations may also be relevant. Another theory is that alterations present in both germline

and tumor tissue of the same patient are likely to be integral to tumor development. For example, some mutations in cancer-predisposition genes such as BRCA1/2 clearly do contribute to the development and maintenance of cancer. This, however, requires that germline tissue be collected in each patient. Yet another theory is that sequencing DNA and RNA from the same sample will identify mutations that subsequently alter expression, and are thus significant. However, all of these methods only begin to narrow the spectrum of genomic alterations that may be clinically relevant. Chromosome-scale changes and epigenomic changes cannot be evaluated in this manner. Many studies are now focusing on the development of bioinformatic tools to aid in the identification of driver mutations. 18

Clinical Decision Support

Once driver mutations have been identified in a tumor, the next step is to assess whether those mutations are &ldquoactionable.&rdquo Actionable alterations affect the function of a cancer-related gene and can be targeted with approved or investigational therapies. Assessing functionality is a difficult task and requires predictive knowledge of genome alterations. Often, early-phase studies are used to assess the role various mutations based on rates of response to targeted therapies. However, enrollment in such studies requires that physicians be aware of genome alterations and potential trials for each patient.

A survey of 160 physicians at a tertiary-care National Cancer Institute (NCI)-designated comprehensive cancer center revealed that a considerable percentage of physicians have low confidence in their genomic knowledge. 19 As a result, many institutions have instituted tumor boards to increase awareness of and access to appropriately targeted therapies. 20 Similarly, the American Society of Clinical Oncology has monthly presentations that explore current treatment strategies and novel therapeutics in various tumor types to increase knowledge of newer targeted therapies. Other trials such as NCI Molecular Analysis for Therapy Choice Program (NCI-MATCH) has streamlined the decision making by designing algorithms and creating rules to designate alterations as actionable, and to prioritize targets if more than one target is identified. In this signal-seeking trial, 3000 patients will undergo tumor NGS to match genomic alterations to smaller histology-agnostic phase 2 trials of Food and Drug Administration (FDA)-approved agents (in other diseases) and investigational therapeutics (Figure 2).

If a response signal is seen in early-phase trials, the clinical relevance and therapeutic implications of actionable mutations can be assessed through thoughtful biomarker-driven research. Hypothesis-driven preclinical studies and clinical trials to assess targeted therapies in various tumor types can be designed. Such trials allow for the recruitment of selected patients into clinical trials to enhance the assessment of those targeted therapies. Ultimately, the goal is to implement randomized clinical trials to assess molecularly targeted therapy in a biomarker-selected or biomarker-stratified fashion. The Adjuvant Lung Cancer Enrichment Marker Identification and Sequencing Trials (ALCHEMIST) is an example of such a trial in which patients with early-stage adenocarcinoma of the lung are screened for EGFR and ALK mutations, and subsequently randomized into trials of relevant targeted therapy if mutations are found. With NGS technology, a high throughput of patients can undergo testing to assess their eligibility for clinical trials within a clinically reasonable timeframe.

Genomic Evolution and Intertumor and Intratumor Heterogeneity

Further complicating the implementation of genomic medicine is the fact that driver mutations can evolve during the course of cancer. As tumors are treated or as they grow, a variety of acquired genomic alterations may emerge. For example, melanoma treated with BRAF or MEK inhibitors has been shown to acquire BRAF amplifications and downstream alterations that lead to reactivation of the MAP kinase pathway. 21-23 Similarly, increased signaling via the phosphatidylinositol 3-kinase/Akt pathway may contribute to trastuzumab resistance in HER2-positive breast cancer. 24 Thus, the dynamic nature of cancer requires that genomic information be applic

In addition to genomic evolution, tumors may also develop intertumor and intratumor heterogeneity. Intertumor heterogeneity refers to differences in alterations of tumors at different sites, while intratumor heterogeneity refers to differences in alterations within a tumor. Both intertumor and intratumor heterogeneity can further complicate the determination of relevant mutations because it means that tissue for NGS has to be obtained from relevant sites as well as at a relevant time point in the treatment course. This can result in repeated biopsies. Additionally, metastatic sites such as bone and brain can be difficult to test. However, comparison of primary tumors with matched metastases has shown relatively high concordance in their mutational profiles, suggesting that additional biopsies may not always be necessary. 25,26

Although genomic evaluation makes it difficult to identify relevant aberrations, recognizing genomic evolution is a powerful tool to better understand the progression of cancer. Genomic analysis of cancer at different stages, from precancerous lesions to localized tumors to metastatic disease, can identify genetic events that drive tumor growth. For example, genomic studies that analyze genomic alterations in breast ductal carcinoma in situ (DCIS) can help to design a predictive model for lesions that are likely to progress to carcinoma versus those that are not. 27 Similarly, NGS-based analysis of drug-resistant cells can help identify mechanisms of resistance. For instance, sequencing tumors from patients with estrogen receptor (ER)&ndashpositive breast cancer that recurred or progressed after treatment with antiestrogen therapy revealed mutations in the ESR1 gene these mutations were constitutively active.28 Interestingly, ESR1 gene mutations were not seen in TCGA analysis, which included primary tumors only.11 Together, these studies suggest that activating mutations in the ESR1 gene are an acquired mechanism of resistance to antiestrogen therapy. Similarly, RNA sequencing of tamoxifen-sensitive and -resistant breast cancer cells revealed gene expression changes implicating a series of resistance mechanisms that could be grouped in ER functions, cell cycle regulation, transcription/translation, and mitochondrial dysfunction. 29

Future Applications and Directions

Several additional applications of NGS are under development. One potential future application of NGS is the evaluation of circulating tumor cells or free plasma DNA to detect early relapse or residual cancer. 20 Once tumor-specific genome alterations have been identified by NGS, PCR assays could be used to detect circulating tumor cells or free-plasma DNA harboring the same alterations. Disease status, drug responsiveness, and relapse could be serially assessed. The monitoring strategy would, however, require that the mutation being tested be present in all tumor cells and remain present throughout the course of disease. As discussed previously, due to genomic evolution and tumor heterogeneity, such mutations are difficult to identify. Optimally, mutations used for monitoring would be truncal mutations&mdash mutations in the &ldquotrunk-branch&rdquo model of heterogeneity, and thus representing ubiquitous drivermutations present in every tumor subclone and region. 31 However, serial monitoring could also identify new alterations that occur under the selection pressure of treatment, which could give insights into mechanisms of acquired resistance.

Another potential application of NGS is to improve the diagnosis of cancer. Poor tissue sampling and processing can often make a histological diagnosis difficult. Additionally, mixed tumor phenotypes can sometimes make it difficult to determine the origin of the tumor. However, NGS-based analysis of tissue can be performed on small amounts of viable tissue and is accurate when sufficient information regarding causative mutations is known. An evaluation of 143 benign and malignant thyroid nodules revealed that genotyping of fine-needle aspiration (FNA) samples of the nodules using a broad NGS panel provided high sensitivity and specificity in the diagnosis of these samples. 32 Such diagnoses would require clinical validation prior to widespread use. Furthermore, as the genomics of different tumors become apparent, NGS can be used to identify different molecular subtypes, which is already becoming commonplace with sarcoma fusion proteins.

Finally, NGS can identify molecular aberrations that render tumors exquisitely sensitive to certain therapies, resulting in exceptional responses. Such extraordinary outcomes can improve our understanding of molecular features that can predict response to certain drugs. For this purpose, the NCI has undertaken the Exceptional Responders Initiative, through which tumors of exceptional responders will undergo DNA and RNA sequencing to define genetic alterations that might have resulted in such responses.

Next-generation sequencing has opened a broad new area of research with the potential to revolutionize personalized cancer medicine. However, further development of this field requires real-time knowledge of genome alterations that can be used in clinical decision making. This requires a robust data infrastructure, continuous improvement in sequencing technology, development of analytical tools, and ongoing biomarker-driven preclinical and clinical trials. Ultimately, however, NGS data have the potential to guide clinicians in tailoring treatment to dynamic genomic changes in individual tumors.

P otential E ffects of B iases on I nferences

The issues described above may shape data sets in ways that make them more or less appropriate or biased for downstream shallow systematics analyses. Sequence capture and RAD-Seq data sets yield broadly concordant results for phylogenetic analyses among species, depending on the steps used for data set assembly ( Leaché et al. 2015 Collins and Hrbek 2015 Manthey et al. 2016), but their relative utility for population genetic and phylogeographic analyses that are applied within species is largely unexplored. In this section, we discuss how these issues might impact a range of typical population genetic, phylogeographic, and phylogenetic analyses that are often applied at shallow timescales. The results of analyses of the empirical data sets presented here are not intended as a direct comparison of the applicability of RAD-Seq and sequence capture data, which in reality would probably not be examined with the same methods, but rather to demonstrate how the issues discussed above can result in divergent inferences between methods.

Genome-wide scans to identify signatures of selection or gene flow are often conducted in studies using RAD-Seq loci due to their dense distribution across the genome ( Hohenlohe et al. 2010). Conserved regions targeted by sequence capture may be insufficiently dispersed across the genome for use in genome-wide scans. As discussed above, mapping RAD-Seq loci to divergent genomes is challenging, thus RAD-Seq may not be appropriate for identifying the genomic context of outlier loci in species without available genome assemblies. As with many markers, RAD-Seq loci may come from heterogeneous genomic regions impacted by diverse neutral and non-neutral processes, so scans will need to account for alternative explanations of outlier loci or migrant alleles.

Demographic inference is popular in population genetics and phylogeography, and may be affected by the distribution of allele frequencies in a data set. Purifying selection on conserved regions may leave signatures, such as an excess of rare alleles, that complicate estimation of neutral demographic histories. Allele loss and heterozygote deficiencies in RAD-Seq data sets may also affect estimates of demographic parameters including theta ( ⁠ θ = 4 N e μ ) and admixture. We estimated demographic parameters using gene trees in BP&P v.3.2 ( Yang and Rannala 2010) and using SNP frequency spectra in ∂ a ∂ i v.1.7.0 ( Gutenkunst et al. 2009) with both RAD-Seq and sequence capture data from X. minutus. The demographic model used included two daughter populations comprising the four samples from west of the Andes Mountains and the four samples from east of the Andes Mountains, both of which diverged from a common ancestral population. We compared estimates of effective population size by normalizing the divergence time estimates from RAD-Seq and sequence capture data sets. We found that in both BP&P and ∂ a ∂ i results, effective population sizes in the daughter populations were fairly similar between data sets (Supplementary Tables S3 and S4, available on Dryad), but the estimate of ancestral effective population size was lower from sequence capture than from RAD-Seq data ( Fig. 4b and Supplementary Fig. S7, available on Dryad). The higher ancestral population size in the RAD-Seq data could be due either to the loss of shared variation among the daughter populations as a result of allele dropout in the RAD-Seq data set, or to the high frequency of rare alleles restricted to a single population in the sequence capture alignments. In addition, heterozygote deficiencies in the RAD-Seq data set may underlie the somewhat lower population sizes estimated in the daughter populations than those estimated in the sequence capture data set.

Impacts of data set biases on inferences from systematic analyses of Xenops minutus data from RAD-Seq and sequence capture. a) Relative pairwise JC69-corrected distances between individuals, b) mutation-scaled effective population size (theta) estimates for daughter and ancestral populations, c) BUCKy tree from sequence capture and d) BUCKy tree from RAD-Seq, with node values representing the proportion of gene trees from that data set containing each clade.

Impacts of data set biases on inferences from systematic analyses of Xenops minutus data from RAD-Seq and sequence capture. a) Relative pairwise JC69-corrected distances between individuals, b) mutation-scaled effective population size (theta) estimates for daughter and ancestral populations, c) BUCKy tree from sequence capture and d) BUCKy tree from RAD-Seq, with node values representing the proportion of gene trees from that data set containing each clade.

Phylogenetic tree estimation to reconstruct the relationships between populations is commonly used in shallow systematics studies. Phylogeny estimation may be complicated if allele loss results in a downward bias in the mutational spectrum ( Huang and Knowles 2014). This bias may produce shallower gene trees and lower genetic distances ( Harvey et al. 2015), particularly between the most divergent individuals in a study. We examined branch lengths from X. minutus trees inferred using BUCKy v.1.4.3 ( Larget et al. 2010), which are estimated in coalescent units based on quartet concordance factors for each branch. As observed in prior studies ( Leaché et al. 2015), internal branch lengths from BUCKy trees estimated from RAD-Seq data were short relative to those estimated from sequence capture data in X. minutus, perhaps as a result of the loss of the most divergent alleles ( Fig. 4c,d). Terminal branches in BUCKy trees for X. minutus are determined by the gene trees from loci in which individuals are homozygous for rare alleles. These branch lengths are longer in the RAD-Seq tree than in the sequence capture tree, consistent with the high levels of homozygosity we observed in the RAD-Seq data set. The difference in relative branch lengths between RAD-Seq and sequence capture trees was not evident in trees estimated from SNPs using SNAPP ( Bryant et al. 2012), likely because SNAPP removes sites with missing data, which would bias overall tree depth rather than relative branch lengths (Supplementary Fig. S8, available on Dryad). Despite the differences in phylogenetic branch lengths, relative genetic distances corrected using a JC69 model ( Jukes and Cantor 1969) among individuals were highly correlated between RAD-Seq and sequence capture X. minutus data sets (CADM test coefficient of concordance = 0.935, p < 0.001 ⁠ , Fig. 4a).

Both RAD-Seq and, to a lesser extent, sequence capture loci have low per-locus information relative to many of the genes traditionally targeted for Sanger sequencing in systematics. Low per-locus information content complicates analyses that depend on accurate parameter estimates from individual loci. It may be challenging to fit models of molecular evolution to loci due to their low information content, and poorly resolved gene trees may complicate analyses such as gene tree–species tree estimation ( Lanier et al. 2014). Concordance analysis of gene trees from RAD-Seq and sequence capture in X. minutus using BUCKy ( Larget et al. 2010) revealed that consensus relationships were supported by relatively few loci ( Fig. 4c,d). Most gene trees contained polytomies as a result of low information content in alignments. Concordance was lower among RAD-Seq loci than among sequence capture loci, presumably due to the lower resolution of RAD-Seq gene trees. The consensus trees inferred across loci from both methods were topologically identical, however, both using BUCKy ( Fig. 4c,d) and SNAPP (Supplementary Fig. S8, available on Dryad). Moreover, nearly all nodes had high support in the SNAPP trees from both RAD-Seq and sequence capture. Methods that successfully integrate across the small amounts of information present in many loci, including methods that examine independent SNPs, may be desirable for sequence capture and particularly RAD-Seq data sets.

The large data sets produced by RAD-Seq and sequence capture raise computational concerns. Although the sizes of both RAD-Seq and sequence capture data sets can be tailored according to researcher needs, RAD-Seq data sets are generally larger. Depending on the question being addressed, very large data sets may not be needed and additional data may unnecessarily complicate analyses ( Davey et al. 2011). Conversely, evolutionary events that are difficult to estimate may require large amounts of data to address, and larger data sets also offer the ability to subsample loci informing a research question post-hoc. To take advantage of the information in large data sets, computationally demanding methods may have to take a back seat to faster summary methods (e.g., Liu et al. 2009 Larget et al. 2010 Chaudhary et al. 2014).

Multifactorial Inheritance and Complex Diseases

11.4.8 Marker Allele Frequency and Hardy–Weinberg Equilibrium Filter

The Hardy–Weinberg equilibrium (HWE) test compares the observed genotypic proportion at the marker versus the expected proportion. Deviation from HWE at a marker locus can be due to population stratification, inbreeding, selection, nonrandom mating, genotyping error, actual association to the disease or trait under study, or a deletion or duplication polymorphism. However, HWE is typically used to detect genotyping errors . SNPs that do not meet HWE at a certain threshold of significance are usually excluded from further association analysis. It is also important to discard SNPs based on minor allele frequency (MAF). Most GWAS are powered to detect a disease association with common SNPs (MAF ≥ 0.05). The rare SNPs may lead to spurious results due to the small number of homozygotes for the minor allele, genotyping errors, or population stratification.

Genotyping by sequencing (GBS) and SNP marker analysis of diverse accessions of pecan (Carya illinoinensis)

Pecan (Carya illinoinensis) is an outcrossing, highly heterozygous, and slow-to-mature tree native to North America. In order to better understand cultivar characteristics, appreciate regional adaptation, and improve selection in pecan breeding programs, improved genomic tools that are cost-effective and capable of high-throughput screening are necessary. A diverse panel of 108 cultivars and accessions from the National Collection of Genetic Resources for Pecans and Hickories (NCGR-Carya) was selected to represent regionally adapted native pecans, controlled cross progeny and their parents, selected wild relative species, and interspecific hybrids between those species and pecans. We implemented a genotyping-by-sequencing (GBS) technique to discover 87,446 informative single nucleotide polymorphisms (SNPs) throughout the pecan genome. SNPs were used to develop genomic profiles to confirm, refute, or inform questions of cultivar origin. Native accessions show strong genetic relationships by geographic region of origin. Matrices were developed to facilitate evaluation of pedigree relationships between cultivars. A genome-wide association study (GWAS) was performed to discover 17 SNPs from a contiguous region significantly associated with the expression of the simply inherited trait controlling flowering type (dichogamy). The information, techniques, and resources developed will benefit the pecan community by improving the ability to characterize germplasm and use marker data for marker-assisted breeding. This should reduce breeding time by facilitating more informed and efficient selection of parents and progeny.

This is a preview of subscription content, access via your institution.


The main utility of NGS in microbiology is to replace conventional characterisation of pathogens by morphology, staining properties and metabolic criteria with a genomic definition of pathogens. The genomes of pathogens define what they are, may harbour information about drug sensitivity and inform the relationship of different pathogens with each other which can be used to trace sources of infection outbreaks. The last recently received media attention, when NGS was used to reveal and trace an outbreak of methicillin-resistant Staphylococcus aureus (MRSA) on a neonatal intensive care unit in the UK. 4 What was most remarkable was that routine microbiological surveillance did not show that the cases of MRSA that occurred over several months were related. NGS of the pathogens, however, allowed precise characterisation of the MRSA isolates and revealed a protracted outbreak of MRSA which could be traced to a single member of staff.

Nebula Genomics: Your whole genome and all of your data, too

Nebula Genomics does not rely on a blood sample, offers full genome sequencing, and makes it easy to access all of your own data.

Many services will provide consumers with no-blood testing, but Nebula Genomics is unique in that we offer a 30x Whole Genome Sequencing service that sequences 100% of your DNA. The other services compared above use DNA microarray technology to profile your genome, characterizing only some of your genome. For example, it is common for microarray sequencing to characterize

500,000 positions, which is less than 0.1% of the whole human genome. Microarray-based genotyping may be a bit more affordable, but it misses a lot of important information. Our 30x Whole Genome Sequencing sample collection is safe for women who are pregnant, will not be compromised by the genetic material of the fetus, and can provide you with complete DNA sequencing. The results are processed in CLIA/CAP-accredited laboratories, and all of your data remains secure and yours.

If you are interested in obtaining medical advice, you need access to your own data. Nebula makes it easy to bring your test results, including your raw data, to your physician or genetic counselors who can use our data to counsel you. We include several industry-standard file types (FASTQ, BAM, VCF) that will allow you to get the most out of your test. As always, individuals who believe they are at risk for a genetic disorder should consult a healthcare provider before taking a DNA test while pregnant.

For more information on genetic sequencing technology, check out our Intro to DNA Testing Methods and our Test for Paternity post. HomeDNA is a popular at home paternity test available at most major retailors.

If you want to focus on your maternal and/or paternal lineages, you can look at YFull or YSeq, services which analyze mtDNA or the Y chromosome to determine specific lineage haplotypes. Full Genomes also offers Y chromosome sequencing and analysis.

Other DNA testing companies that do carrier screenings for diseases that may be passed down to children include:

Watch the video: Genotyping vs. Sequencing - Tales from the Genome (July 2022).


  1. Barrie

    I'm sorry, but, in my opinion, mistakes are made. Let us try to discuss this. Write to me in PM, it talks to you.

  2. Monris

    To merge. I agree with all of the above-said. We can talk about this topic. Here, or in the afternoon.

  3. Shaktisar

    Rather useful phrase

  4. Bashir

    Of course, I apologize, but I propose to go the other way.

  5. Gugul

    I accept it with pleasure. In my opinion, this is an interesting question, I will take part in the discussion. I know that together we can come to the right answer.

  6. Welsh

    I congratulate, excellent thinking

Write a message