Ending the SARS-CoV-2 / COVID-19 pandemic requires that we have robust diagnostic tests, effective antiviral drugs, and a vaccine. The front line is testing. As learned in the last post , molecular diagnostic tests are the best way to identify active COVID-19 infections, and the number of available tests has been growing at a rapid rate. Tests can be made by laboratories for their own use, or by companies to be used in many laboratories.
When companies make a test for distribution, the ability to distribute and sell their tests is controlled by the United States Food and Drug Administration (US FDA). In a crisis, like we are experiencing, the FDA provides a rapid path to approve tests for a limited period of time. This approval process, the Emergency Use Authorization (EUA), reduces the validation burden. Between mid March and early May, 55 molecular diagnostic EUAs have been granted with more being granted daily. Some of the tests work well, and others not so well . The tests employ a wide variety of methods, but how do they work?
To answer the how-do-they-work question, we developed a brief history of molecular diagnostics (tests) infographic (right), and a longer than usual blog to explain the panels (below).
PCR - Polymerase Chain Reaction Provides the Foundation
Nearly all molecular tests require that the a DNA amplification step. This is commonly done via PCR (Polymerase Chain Reaction). PCR was a game changer in molecular biology , because it allowed us to identify DNA molecules that were present in too low concentration to be detected by conventional means. PCR increases the concentration of DNA by copying millions upon millions of DNA molecules in a sequence specific manner allowing the invisible to be seen.
PCR was made possible through two fundamental genetic concepts, two technological advancements, and some math.
Genetic concept one: The bases that make up DNA and RNA (nucleic acids) are ordered 5' to 3' and form antiparallel double strands in a sequence specific manor through hydrogen bonds. A (adenine) pairs with T (thymine) and C (cytosine) pairs with G (guanine). In RNA, U (uracil) replaces T. Complementary bases pair with each other through hydrogen bonds that are weak, but cooperative. The strength of the interaction is proportional to the number of hydrogen bonds. When paired, A and T form two hydrogen bonds and C and G form three, so CG base pairs are stronger than AT (or AU) base pairs. As more bases become involved in pairing via longer strands, the strands hold together more tightly. The formation of base pairs is called hybridization or annealing.
As hydrogen bonds are weak bonds, base pairing can be disrupted by heat and other factors. That is, at a certain temperature a double stranded DNA (dsDNA) molecule can "melt" into separate single stranded (ssDNA) molecules. The melting temperature (Tm) is a function of the numbers and kinds of (AT, AU, or GC) base pairs. Tm defines a state where half of the molecules are hybridized and half are not. At a molecular level, the dsDNA and ssDNA states are rapidly interchanging, like breathing. In addition to heat, chemicals, proteins, and competition from other ssDNA molecules can disrupt base pairing. DNA, and RNA, molecules can also form intramolecular double strands where local regions in a sequence fold together to form stem-loop structures.
Thus the first concept of PCR is that nucleic acids "find" other nucleic acids and form hybrids in solution though complementary base pairing (hybridization). Hybridization relies on hydrogen bonds that are weak. Forming and melting double stranded regions is dynamic and is controlled by temperature and other factors.
Genetic concept two: DNA and RNA can be copied in a template directed fashion by enzymes called polymerases. Polymerases add bases to a nucleic acid strand through a reaction that combines a new nucleoside triphosphate with a free 3'-OH (hydroxyl group) on an existing DNA or RNA strand. Base pairing rules, and the sequence of the nucleic acid template, determine which nucleoside triphosphate is added. T's (U's) are added across from A's, C's are added across from G's, and vice versa. In short, polymerases read a template to write a sequence that is complementary to that template.
Technical Advancement One: the discovery of Taq polymerase . Briefly, Taq (Thermus aquaticus) is a bacteria that grows in the hot springs in Yellowstone National Park. Because it grows at high temperature, its enzymes can tolerate high, near boiling, heat. At such temperatures, hybridized DNA molecules, of any length, often melt into single stranded molecules, unless they are protected by protein.
Technical Advancement Two: the ability to chemically synthesize DNA in automated ways . From above, polymerases "read" a template to add complementary bases to a 3'-OH group. In other words, base addition reactions must be primed. This is why some synthetic DNAs are called primers. A fragment of DNA hybridizes to a DNA template and the polymerase forms a complex that can read the template and add bases. Automated chemical DNA synthesis made it possible to make new DNA molecules from DNA sequences at scale. Thus, any DNA can be copied from any environment if we know its sequence.
Some Math: In PCR, CR stands for chain reaction. A chain reaction can occur because each strand of DNA can be copied into complementary strands, doubling the number of DNA molecules with each cycle of heating (melting), priming (hybridizing), and copying (polymerizing). The geometric (2n) increase results in approximately 1000 times more DNA, than the original amount, after 10 cycles, a million times more after 20 cycles, and a billion times more after 30 cycles. Each cycle of heating, cooling, and other adjustments, takes about 5 min; we'll come back to this later.
RT-PCR - Reverse Transcriptase PCR, Detect RNA by Converting it to DNA
PCR is great for identifying specific DNA molecules. What about RNA? Many viruses, including SARS-CoV-2 use RNA as their genetic material. In nature, DNA is copied (transcribed) into RNA as genes are expressed. RNA can also be copied into DNA via reverse transcription. The enzyme, reverse transcriptase, discovered in retroviruses  (another kind of RNA virus), makes it possible to copy RNA into DNA in the laboratory. One just needs to prime the reaction with a DNA primer. RT-PCR (reverse transcriptase PCR) thus has two steps: 1. Convert RNA into DNA; 2. Use PCR to amplify the desired DNA. It is possible to combine these steps in a single reaction tube.
RT-PCR - Real Time PCR, Measure Initial [DNA] by Coupling Amplification and Detection
PCR and RT-PCR make it possible to detect very small amounts of DNA, or RNA, in samples from hair, bits of bone, swabs from noses and throats, drops of blood, urine, feces, surfaces, and other sources. In some applications we need to know how much DNA or RNA is being detected. For example, to see if the number of virions is increasing. In these cases, PCR needs to be quantitative. If we could couple the ability to measure the number of DNA concentration (noted as [DNA]) measurement with number of PCR cycles, in real time, we could use the relative growth of signal intensity to infer the number of initial DNA molecules that are being copied. The basic principle of RT-PCR (also expressed as qPCR, so we do not confuse it with the above RT-PCR) is that it takes fewer PCR cycles to amplify larger numbers of molecules and reach a certain number. A sample with twice as many molecules as another is one full cycle ahead in the doubling of its molecules. This trend can be observed by plotting the signal intensity against cycle number for each sample.
To be quantitative, the signals between samples must be compared during the log phase of [DNA] growth, and also be compared to control samples with a known [DNA]. While the amount of DNA doubles with each cycle, it does not do so forever. Enzyme denaturation, reaction byproduct accumulation, and other issues place a practical cap on the number of cycles in which [DNA] doubles. This cap produces the S curves that are observed when signals are plotted as a function of cycle number.
So, how do we make those signals?
Intercalating Dyes: There are many ways to measure the concentration of DNA. One approach is to use a fluorescent dye, like SYBR green, that becomes more fluorescent when it binds double stranded DNA. SYBR green binds to DNA through intercalation, a process where the dye molecules slide between DNA bases. When intercalating dyes bind DNA they move into a hydrophobic environment which removes the water that would otherwise quench their fluorescent signal. Intercalation increases the dye's signal intensity (1000 times for SYBR green). Thus as [dsDNA] increases, so does the sample's fluorescence value.
Probes: An issue with intercalating dyes is that they are non-specific. They bind to any dsDNA. Recall that the only rule for a polymerase is a template and a free 3'-OH. Just a few bases can serve as a primer. Hence, PCR can have artifacts from DNA folding on itself to primers self annealing (primer dimers) or hybridizing to other sequences. Once copied, artifacts can be copied more, and increase the assay's noise. If we have a way to discriminate between specific and non-specific DNA amplification assay artifacts would be reduced. Probes, synthetic DNAs designed to hybridize to specific sequences within a desired target (an internal site to the primers), are a way to limit measurement to the molecules we want to measure.
Two examples of commonly used probes are TaqMan probes and molecular beacons. Yes, TaqMan is named after PacMan, in part, because we like to draw enzymes like PacMan. Taq also rhymes with Pac, and, because DNA polymerases also eat DNA (5'-exonuclease) as part of error correction, TaqMan becomes apropos. Both TaqMan and molecular beacons have the property that the florescence value increases as a function of [DNA]. But, the way fluorescence is generated in these assays is based on two very different mechanisms.
In both assays florescence quenching is used to suppress the signal of unbound probes. As noted above, quenching can be non-specific such as water quenching the fluorescence of intercalating dyes. Quenching can also occur through a specific molecular interaction where a quencher molecule absorbs the light emitted from a fluorescent molecule (a fluor). In this later case, the fluor and its quencher must be in close proximity.
TaqMan probes rely on the 5'-exonuclease activity of DNA polymerases. The fluor and its quencher are both attached to a probe. If the probe is bound to its DNA target or free in solution, the fluor is quenched. When the stands of target DNA are melted during the PCR heat phase, the TaqMan probe hybridizes to its internal target sequence. As the DNA polymerase copies new DNA, it digests the bound probe (via the 5'-exonuclease) and releases the fluor, producing a signal. The amount of released fluor (signal) corresponds to [DNA].
Molecule beacons operate in an opposite way. They are constructed so that the synthetic DNA forms a stem-loop structure with the probe DNA sequence contained in the loop. In the unbound state, the stem forms to hold the quencher near the fluor. When the probe hybridizes to its DNA target, the stem melts, because number of probe bases is greater than the number of stem bases so it forms a stronger interaction. The signal is produced when the fluor and quencher move far enough apart. While the probe will be displaced as DNA is copied, it is hybridized long enough to measure a signal.
Digital PCR - Increase Precision By Making Individual Reactions
Each cycle of quantitative PCR measures a two-fold change in DNA concentration. There are many examples in genetic analysis and gene expression where we want to count the numbers of initial nucleic acid molecules with greater precision. Because we cannot just count molecules, we need a way to infer the starting value. This is commonly done by preparing a limiting dilution.
Limiting dilution is a process where a sample is diluted to point where a container is likely to contain a single molecule. For example, if I have three molecules, and my container holds one microliter, then diluting my sample nine-fold will give me three molecules in nine microliters. If they are divided into nine containers, six will have zero molecules and three will have one molecule. Occasionally a container will have two molecules. When each container has one molecule, we can run as many PCR cycles as needed to get good signal, and simply count the chambers that are positive and negative to obtain a binary 1 or 0 (digital) output.
With the advent of micro-fluidics and oil emersion methods we can make containers out of very small drops of water. Each drop contains either a molecule of DNA or nothing due to limiting dilutions. PCR reactions are run to a certain number of cycles using either TaqMan or molecular beacon probes. Once reactions are completed, drops flow, single file, through a capillary that is attached to a light source and detector to measure signal. The numbers of positive drops are tallied and compared between samples.
Digital PCR also gives us new acronyms. In this case, dPCR stands for digital PCR and ddPCR stands for digital droplet PCR. Both dPCR and ddPCR mean essentially the same thing but, because PCR is big business, the terms are used interchangeably for branding.
Isothermal PCR - Light Under the LAMP
PCR requires cycles of heat, cooling, and other adjustments. Each cycle requires approximately five minutes with standard equipment, and the full testing process (from sample to result) can take four or more hours. Cycle times can be shorted by using specialized microfluidic devices that employ tiny "pipes" to flow reactions through temperature gradients. If PCR cycles could be run at a constant temperature that optimized the "dynamism" of hybridization, melting, and enzyme activity, DNA amplification could proceed quickly with concentrations increasing significantly faster than PCR's doubling rates, because there would be no "stops." This process, called isothermal amplification, is no longer PCR, because there is no chain reaction, just massive amplification. The 13 minute test developed by Abbot Laboratories that has been used in the White House is an isothermal amplification assay with molecular beacon probes .
There are many kinds of isothermal amplification. For example, Loop-mediated AMPlification (LAMP) is growing in popularity. LAMP assays require four primers and six target sites (see infographic). Two of the primers (F2, R2c) have adaptors (regions of sequences on a primer that enhance various processes) with sequences (F1c, R1) that are complementary to sites (F1c, R1) within the DNA target. In a later phase of the reaction, these adaptors hybridize with those sites to form stem-loop structures within the target. Two other primers (F3, R3c) complementary to the end sites (F3c, R3) of the target are used to amplify the original target, like PCR.
When the reagents, primers, nucleotides, enzyme (with strand displacement properties), and other components are mixed with sample DNA and the reaction temperature is set, all kinds of DNA synthesis happens, at once, and runs in a continual process. We draw the picture in three steps, so we can understand it, but it is still confusing. So, I like to watch this video .
In the first step, primers F2 and R2c hybridize to middle target sites (F2c, R2). The resulting polymerization produces DNA molecules with ends (F1c, R1) that can form stem-loop structures with the newly synthesized F1and R1c sites. Next, the F3 and R3c primers hybridize to the end sites (F3c, R3) and prime the syntheses the original target strands, which is now available for step one. In the third step, the stem-loop region forms adding an additional 3'-OH end that is used to initiate more DNA synthesis. Other primers can also anneal to repeat the synthesis, looping, and synthesis cycles. Within minutes, enough DNA is copied to make it easily detected. Detection can be accomplished as in qPCR by fluorescence or by colorimetric assays (not discussed).
An advantage of LAMP is short assay times. Another is that the method can be used outside of the laboratory in so called point-of-care (POC) or point-of-need (PON) formats.
A CRISPR PCR
Molecular tests are continually evolving and improving. CRISPR enzymes are a recent addition to the molecular testing toolkit . CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) systems are a form of adaptive immune system used by bacteria that combine CRISPR nucleic acid units with a DNA (or RNA) cutting enzyme (Cas). When the CRISPR/Cas complex binds to foreign phage (bacteria virus) nucleic acids, via hybridization, the Cas enzyme cuts the invader's nucleic acids . In molecular diagnostics, the CRISPR system is harnessed to increase assay specificity and produce fluorescent signals when specific DNA or RNA sequences are detected.
The current COVID-19 assay -- Sherlock (specific high-sensitivity enzymatic reporter unlocking ) CRISPR SARS-CoV-2 test -- has two steps. The first step combines reverse transcriptase and LAMP to amplify DNA. Some of LAMP primers include adaptors that contain T7 RNA polymerase promoter sequences that are used in the second step. In that step, RNA target sequences are produced by the T7 RNA polymerase. As these RNA molecules are made they can bind to a Cas13 enzyme (one of the many CRISPR nucleases ) that includes a probe containing a CRISPR-like sequence that contains a region that is complementary to the target sequence. When the probe binds to it's target, the Cas13 enzyme cuts and releases the target. This cutting stimulates an additional non-specific nuclease (DNA cutting) activity which cleaves TaqMan-like synthetic DNA molecules releasing a fluor from its quencher and producing a signal (the enzymatic reporter unlocking step of Sherlock). In the CRISPR test, the signal is general, but is the result of a highly specific sequence dependent molecular interaction.
55 FDA EUAs Show High Diversity in SARS-CoV-2/COVID-19 Molecular Tests
The current COVID-19 molecular diagnostic tests are diverse and not without controversy . Each of the above methods are used in one way or another in at least one test, otherwise I would not have spent so much time covering the different assay formats. Indeed, we can learn a lot from the EUA Instructions For Use (IFU), or EUA summary, PDF files that are posted by the FDA . Each file contains information about the test format, how to run the test, and other data (limits of detection; performance with controls), with the caveat that there is mix of well disclosed methods and black box descriptions where the manufacturer did not want to reveal proprietary techniques. All of the description formats are different and report critical data in several ways, so it is difficult to systematically analyze the reports. But, we can learn something from key words.
I used two methods to analyze the key words in the EUAs. The first was to look at "PCR" terms. Many, PCR, RT-PCR, RT-qPCR, RTqPCR, ddPCR, RT-ddPCR, rRT-PCR, dPCR, RT-dPCR, QPCR, micPCR, and RealPCR were found in one or more documents. A few EUA documents did not contain a PCR term, because they employ isothermal methods to amplify target DNA. All tests use reverse transcriptase, whether they say so or not. To find the PCR terms and count how many tests use a term, I wrote a Python script that used the Natural Language Tool Kit (NLTK) library and Python pdf libraries to extract all unique words from each document. I ran this script on a directory of files (all EUA IFUs/Summaries) and piped the output to grep "PCR" and produce a list of all PCR terms. Each term, listed on a per document basis, was counted to list the term and number of documents it appears in using a simple "word count" script. This list was manually edited to remove various term "fragments" that resulted from hyphens in names. The last step was to make a word cloud  for the infographic.
PCR terms alone provide little detail. To learn more about the specific technologies used, word searches were performed in DEVONThink . First, I used the terms TaqMan, beacon, CRISPR, isothermal, digital, and TaqPath independently to identify the number of EUAs containing that particular term. More than half of the documents used at least one of the terms. Next, "and not" queries (!TaqMan & !beacon & !CRISPR & !isothermal & !digital &!TaqPath) were tested. Reading through the remaining 23 documents identified words like "exonuclease," "degrades" (as in the polymerase degrades the probe), and "degraded" which I then included in searches and used to infer test formats. Adding !exonuclease & !degrades & ! degraded to the list of "& not" terms, left 12 documents. Of these remaining documents a few more inferences could be made, but five documents were still too opaque and were thus classified as undefined. I plotted the resulting data in the infographic pie chart to show the diversity of methods.
Since PCR's discovery nearly 40 years ago, the technology has advanced in many ways. RT-PCR is a predominant method in molecular diagnostic testing, and, as evidenced by the growing list of available SARS-CoV-2 tests, PCR concepts and components are continually refined and built upon as the industry searches for new, faster, more sensitive, more specific, and easier to run formats. Of course, technology alone does not win. As recently noted in the news, the tests must also be run correctly  with good controls. We'll save that discussion for another post.
Acknowledgements: This work was supported in part by the "START Immuno Biotech" NSF grant (DUE 1700441, Shoreline Community College). Dr. Sandra Porter (Digital World Biology) provided helpful comments and edits.