Drug Discovery: A Beginner’s Overview

16 min read Original article ↗

Strange Helix

Press enter or click to view image in full size

Image Credit: Naumova Marina/Shutterstock

In the previous post, we explored how biology and genetics gave birth to the Biotech industry, transformed healthcare, and enabled applications in computational biology, bioinformatics, synthetic biology, and immune cell therapy.

In this post, we will look at the drug discovery process, understand the main steps, and get the first glimpse into how computational methods can revolutionize drug discovery in the future.

What is Drug Discovery?

Drug discovery should not be confused with drug development. Instead, drug discovery is the first stage in a complex drug development process that also includes pre-clinical development, three phases of clinical trials, FDA review, and post-market monitoring.

Press enter or click to view image in full size

Phases of Drug Development

The goal of drug development is to bring a new drug to the market from start to finish. And the goal of drug discovery is to identify promising drug candidates, improve their properties, and move them to the next stages, such as preclinical development and clinical trials.

The Cost and Risks of Drug Development

Various studies conclude that the average cost of bringing a drug from the initial stages to market is between $1.3 and $2.8 billion and takes 10–15 years to complete.

Drug development is also risky, with an overall failure rate of 96%. AstraZeneca published a five-dimensional framework highlighting five reasons drug development fails at different stages of development:

  1. Wrong biological target
  2. Failure of the drug candidate to engage with the target
  3. Failure to demonstrate safety
  4. Incorrect design of clinical studies
  5. Poor product-market fit

Many of these problems can be caught early on during the drug discovery stage. So getting this process right is critically important.

There are two important classes of drugs that drastically affect the drug discovery process: small molecule drugs and biologics. Therefore, before studying the drug discovery process, we must understand the difference between the two.

Small molecule drugs

Press enter or click to view image in full size

Image Credit: Kovalchuk Oleksandr/Shutterstock

Most drugs in your medicine cabinet, such as aspirin, Tylenol, Advil, and others, are small molecule drugs. They have a simple, well-defined chemical structure and are manufactured by chemical synthesis.

Because of their relative simplicity, it is easier to understand their mechanism of action and predict how they spread and are absorbed by the body.

Small molecule drugs are also generally very stable and orally bioavailable, making them easier to consume.

On the flip side, they may accidentally interact with non-target proteins and cause harmful side effects.

The majority of drugs on the market today are small molecule drugs.

Biologics drugs

Press enter or click to view image in full size

Image Credit: AmazeinDesign/Shutterstock

Biologics are drugs produced by living organisms and extracted or synthesized with biological sources.

The first biologic to hit the market was insulin produced in genetically modified E.coli bacteria by Genentech in 1982. Other biologics include vaccines, monoclonal antibodies, blood components, allergenics, somatic cells, gene therapies, and tissues.

Biologics are very target-specific and less likely to interact with non-target molecules or cause harmful side effects. However, structurally, biologics are not as simple as small molecules and are challenging to characterize.

The manufacturing process for biologics is also much more complex, expensive, and harder to reproduce — suitable for a pharmaceutical company but bad for competitors and patients’ pockets.

Nevertheless, biologics have opened new opportunities for creating novel, better treatments. As a result, the investment in biologics drug development has been growing steadily over the last few decades. In 2020, an analysis by EvaluatePharma showed that 60% of the top 20 best-selling drugs were biologics.

As I mentioned, the drug development process for small molecule drugs and biologics differs drastically, but we will focus on the former in this article.

The drug discovery process for small molecule drugs depends on two main pharmacological approaches: classical and reverse. Let’s briefly look at both.

Classical Pharmacology vs. Reverse Pharmacology

Press enter or click to view image in full size

Image credit: Visual Generation/Shutterstock

For thousands of years, humans have relied on their knowledge of plants and minerals to treat diseases and illnesses. Influenced by cultural beliefs and experiences, this knowledge has been passed down from one generation to another. As a result, different knowledge systems were formed in Africa, China, Europe, India, Iran, and other regions and are still in use today.

Early “pharmacology” was primarily descriptive and empirical until the mid-19th century, when it started resembling experimental science. Drugs were still mainly extracted from plants, and traditional knowledge often inspired studies. Some drugs were discovered accidentally during scientific research, for example, penicillin.

In the 1960s, advances in molecular biology led to a better understanding of cellular processes and proteins that improved the drug discovery process, making it more scientific and rational. Ibuprofen, propranolol, diazepam, ketamine, fentanyl, and many other drugs still in use today were discovered at that time.

The approach to drug discovery that was mostly used at that time is what is now called classical pharmacology or forward pharmacology. The idea is to take a promising compound (for example, an extract from a medicinal plant), test it on animal or cellular models, and check for a desirable effect. If the compound works, a study is performed to determine its mechanism of action. Although it is now considered an expensive and time-consuming process, this approach produced many valuable drugs. In addition, studies have shown classic pharmacology’s advantage in discovering drugs with novel mechanisms of action.

Another approach called reverse pharmacology or target-based drug discovery was first established in the 70s and has been a main drug discovery method for the past two decades. In classical pharmacology, researchers, through trial and error, try to find what works and then make an effort to understand how it works. In reverse pharmacology, this process is, well, reversed. Researchers first try to understand the disease and determine the series of molecular interactions (biological pathways) involved in causing a pathological condition. Based on this understanding, they then try to identify a potential biological target, usually a protein, enzyme, or receptor, that can be modified in a way that would cause a desirable therapeutic effect.

After they find the target, the next step is to find a drug compound that can bind to this target and trigger a change. This process of finding a new drug compound that will bind to a known biological target is called drug design or rational design. Its fundamental goal is to predict how strongly (if at all) a given molecule will bind to a target.

Now, let’s look at reverse pharmacology and the main approaches scientists use to choose drug candidates.

Target Discovery and Validation

Press enter or click to view image in full size

Image Credit: Alina.Alina/Shutterstock

In reverse pharmacology, the first step in the drug discovery process is identifying a target (usually a protein) that plays an essential role in a medical condition. Promising targets are carefully validated to ensure clinical and commercial requirements are satisfied. A target should be safe, efficacious, and druggable — with a good chance of finding a molecule that can bind to it and change its activity. Drugs often fail because of a poorly chosen target, so this step is critical for drug development success. All experiments, at this step, are performed either in “test tubes” with cell cultures (in-vitro), animal-based models (in-vivo), or on computers by using a wide range of computational methods (in-silico). Researchers do not perform studies on humans at this point.

At the beginning of this process, researchers need to understand the molecular mechanism of the disease. They start with reviewing the scientific literature and biomedical data to select and prioritize potential molecular targets that would satisfy efficacy, safety, and commercial requirements. Several approaches exist to help choose and validate targets: genetic association studies, gene expression profiling, protein-protein interaction screening, and functional genomics screening are some of the most common ones. Let’s briefly review them.

A genetic association study allows scientists to compare genotypes directly between healthy individuals and people with a specific disease. Finding genes that contribute to the condition can help scientists understand the molecular mechanism and identify potential drug targets. A target can become a protein coded by a gene or a gene itself, in which case a drug will regulate its activity. Many genomic databases exist that help with this research, such as GWAS Catalog, GWAS Central, NCBI dbGaP, and PharmGKB.

Get Strange Helix’s stories in your inbox

Join Medium for free to get updates from this writer.

Enabling technologies: DNA microarray, next-generation sequencing (NGS).

Online tools: DisGeNET, Pharos, ADReCS-Target, Open Targets.

Gene expression profiling provides a view of the cellular state by measuring which genes are expressed in the cell. This measurement helps researchers visualize how the cell state changes under different conditions (healthy or diseased) and reacts to various drugs. The data from gene expression profiling is often combined with genetic association data to help identify genes that play an essential role in the disease.

Gene expression profiling data is also instrumental for evaluating the potential adverse effects of drug targets and assessing risks early in the drug development process. Researchers may rely on several gene expression data derived from various experiments to help drive their study. To name a few: DrugMatrix, TG-GATEs, LINCS 1000, ArrayExpress, and GEO repository.

Enabling technologies: DNA microarray, RNA-seq.

Online tools: DrugBank, ChEMBL, TTD, DisGeNET, Pharos, Open Targets.

Protein-protein interaction (PPI) screening allows us to study the totality of complex protein-protein interactions within the cell. Such interactions are crucial for any cellular process. Understanding them can help discover proteins that drive certain conditions (and can potentially be drug targets). PPI can be modeled with protein-protein interaction networks (PPINs) — graphs in which nodes represent proteins and edges connect pairs of interacting proteins. Some important proteomics databases that may help drive this research include PRIDE Archive, ProteomicsDB, Human Proteome Map, and Human proteome atlas.

Enabling technology: mass spectrometry, flow cytometry.

Online tools: DrugBank, Pharos.

Functional genomics screening is another widely used approach to determine the molecular mechanism of the disease. The goal is to understand the function of the gene and its impact on cellular processes by turning it down (knockdown), turning it off (knockout), or turning it up (over-expression). Thanks to the CRISPR-Cas9 genome editing technique, functional genome screening has become an essential tool in drug discovery research. It is used in experimental studies for target identification, drug resistance, host-pathogen interactions, and biological pathway analysis.

Enabling technology: CRISPR-Cas9, RNA interference (RNAi).

An Innovation Opportunity for Software Engineers

Press enter or click to view image in full size

Image Credit: Good Studio/Shutterstock

Target discovery and validation is the first step in the drug discovery process and may only take 2–6 months. However, it is difficult, costly, and high-risk. A mistake made here can reveal itself much later in clinical trials — after many years of development and hundreds of millions of dollars spent.

Although various computational platforms have been created to assist with target discovery, this process is still largely manual and lacks a developed ecosystem of software tools and services.

In an overview by Paananen and Fortino discussing current target discovery platforms, the following gaps are highlighted:

  • The lack of tools to integrate data from the growing number of available datasets — accessing data from many providers can incur additional costs and complexities
  • The shortage of in-silico tools for target safety assessment
  • Limited methods for the comparative analysis of different efficacy and safety estimates for drug target prioritization
  • The lack of tools for the systematic identification of multiple drug targets and selection of optimal therapeutic strategies

There has been a surge of interest in applying Deep Learning algorithms that have found tremendous success in capturing highly complex relationships in large-scale data. Such algorithms can help with target identification, efficacy, and safety estimates.

Researchers have applied DL methods such as CNNs, RNNs, VAEs, and GANs in studying bioactivity profiles based on microscopy images and predicting molecular properties and molecular design.

There is also a need for better tools to efficiently extract meaningful biological information from many datasets from experiments, clinical trials, and scientific literature.

We will cover gaps and current and possible future technology applications in more detail in the following posts. In the meantime, let’s delve into the next step in the drug discovery process.

Lead compound identification

Press enter or click to view image in full size

Image Credit: Naumova Alina / Shutterstock

After target discovery, the next step is to find a chemical compound that will interact with the target. The structure of this compound will then be used as a foundation to chemically engineer a drug candidate that has necessary drug-like parameters.

One of the most common methods to find a chemical compound that will interact with a chosen target is high-throughput screening (HTS). This method employs robots and data analysis software that allows scientists to quickly conduct large numbers of chemical, genetic, or pharmacological tests. Such systems can test up to 100,000 compounds a day, and some can do even more. A compound that produces a desirable reaction is called a hit. A hit rate in most HTS experiments is typically less than 1%. Large chemical libraries are required to produce enough hits, dramatically increasing the cost of the screening process, which can easily be thousands of dollars.

Press enter or click to view image in full size

HTS Robots. Photo credit: Maggie Bartlett, National Human Genome Research Institute

The screening process requires several iterations. After the first hits have been found, it is crucial to test if they interact with other unrelated biological targets (cross-screening). The more targets a chosen compound hits, the more likely it may cause unpredictable side effects.

Also, researchers perform initial tests of the pharmacodynamics and pharmacokinetic properties of the compounds. Pharmacodynamics is mainly concerned with the drug’s mechanism of action, and pharmacokinetics is the study of what’s called ADME (or sometimes ADMET) properties, which stands for absorption, distribution, metabolism, excretion, and toxicity. These properties account for the failure of about 60% of drugs in clinical trials, so getting this step right is critical for the initial selection.

At the end of these tests, scientists identify a few good candidates to move to the next step. The main candidate is called the lead compound, and the others are called backups.

Another method scientists use as a stand-alone or in conjunction with HTS is virtual screening. As the name suggests, the experiments in virtual screening are performed on a computer using computational software. This approach is significantly cheaper and allows for the screening of tens of millions of virtual chemical libraries a day. There are two main techniques in virtual screening: ligand-based methods and structure-based methods.

A ligand is a molecule that binds to another, usually larger, molecule. They often play a role of a chemical message that one cell can send without a specific address. A cell that has a matching receptor (or a binding site) may receive a “message”. So, in ligand-based methods, a structure of known ligands that bind to a binding site is explored. Then a set of techniques is used to find new ligands with similar structures with the expectation that they will bind to a site with a comparable affinity. Such techniques include comparing molecular shapes, molecular fingerprinting, and building pharmacophore models. Following the general trend, Machine Learning and Deep Learning models have been proposed and tested to predict whether a particular molecule will bind to a target.

Press enter or click to view image in full size

Image credit: Khan Academy

In structure-based methods, the focus is on exploring the structural features of the receptor of a molecular target. The most used technique in the structure-based method is molecular docking. Molecular docking is an optimization problem that can be framed as “lock-and-key”, where the key is a ligand, and the molecular target is a lock with the receptor being a keyhole. The goal is to find an orientation for the protein and ligand that minimizes the overall free energy of the system. Knowing this orientation can help us predict the strength with which two molecules bind to each other.

Just like in ligand-based methods, researchers have started applying ML and DL methods to solve the problem of molecular docking. One of the exciting approaches is a method called Deep Docking. In this approach, DL models are trained on docking scores for known compounds to predict docking scores for yet untested compounds.

Press enter or click to view image in full size

Molecular Docking. Image Credit: Wikipedia.

You may notice that ligand-based and structure-based methods answer a similar question using different techniques. The former focuses on the features of the ligand (a molecular message) and the latter focuses on the structure of the receptor. Researchers have been successfully experimenting with Machine Learning and Deep Learning methods to speed up the discovery process. This trend will likely continue, and we may see an increasing number of tech startups bringing innovation to virtual screening.

Lead compound optimization

After selecting promising compounds during the lead discovery phase, researchers move on to the next step — lead compound optimization. The goal here is to improve the potency of hit compounds and reduce their side effects.

At this step, separate HTS screening runs may be performed as chemists attempt to alter the chemical structure of the compounds to increase their activity against the chosen target and decrease activity against unrelated targets.

Another goal of lead compound optimization is to improve the ADMET properties of the compounds while preserving their potency and selectivity. Experiments are conducted using 3D cell culture systems (in vitro) and then in animal efficacy models (in vivo) to optimize dosage and introduction route (oral, injection). There are also various computational tools (in silico) that attempt to predict ADME properties and help scientists focus on the promising candidates. DL has been successfully applied to this problem and shows an increasing promise and usefulness for ADME prediction.

The drug discovery phase ends when scientists find at least one promising drug. Then, they move the candidate to the next stage —pre-clinical development and clinical development, which includes three phases of clinical trials.

Conclusion

The goal of drug discovery is to identify chemical compounds that have the potential to become effective drugs that treat a specific pathological condition. The main steps of drug discovery are target discovery, target validation, lead compound identification, and lead compound optimization.

A good drug candidate interacts only with relevant biological targets with the strength required for meeting efficacy requirements. It should also be safe and produce few side effects, and its ADME properties should meet specific criteria. Getting this process right is vital as mistakes made here can cause future clinical trials to fail.

Many computational tools are available for researchers. Machine Learning and Deep Learning methods have become increasingly more popular at every step in the drug discovery process.

In future posts, we will take a closer look at all stages of the drug development process, investigate the computational tools available for each one, and examine where there’s room for innovation.