TRENDS IN HIGH THROUGHPUT SCREENING

November 8, 2021

by 20/15 Visioneers

September 2021 Contents

1. Executive summary

2. Preface

3. Historical perspective

4. Main actors in the HTS play

4.1 Targets on screening

4.2 Assay modalities and technologies

4.3 Bioreagents

4.4 Compound collections and chemical diversity

5. Automation trends

5.1 Robots

5.2 Smart and Cloud Labs environments

6. Data Sciences and Information Technologies

7. Social commitment in HTS

7.1 Public-private partnerships

7.2 Sustainable screening

8. A look to the future: AI and HTS

9. Conclusions

10. Bibliography 1. EXECUTIVE SUMMARY

High throughput screening (HTS) has been a paramount tool in the drug discovery process over many decades. After years of maturation, the discipline has evolved from the initial “game of numbers” principle, based on the belief that the larger the number of samples being analyzed, the higher the probability of success, to a more information-driven strategy. Enzymes (kinases included) and GPPCRs remain the most popular target classes in the discipline, but novel biological systems like targeted protein degradation (TPD) have been incorporated into the HTS arena opening the possibility of modulating physiological processes previously considered inaccessible to therapeutic intervention. Innovative chemical strategies such as DNA-encoded library technology (DELT) are being widely used to expand the chemical diversity of the molecules being screened, and molecular biology tools have enabled expression systems suited for the preparation of multiprotein complexes at an adequate scale for screening purposes. Those advances are facilitating the modulation of targets that remained elusive until now. Flexibility in the design of robotic instruments, with modular systems contributing to a fluidic layout of HTS labs, allows for rapid implementation of novel assays in a flexible environment, distant from the rigid factory-like settings so common in the first decade of this century. This expansion of HTS boundaries comes along with a democratization of HTS that makes it more accessible to research groups with financial constraints. For instance, Cloud Labs enable the use of expensive equipment at reasonable pay-for-use costs. Likewise, initiatives to promote public-private partnerships facilitate massive data access to research groups and foster fruitful collaborations among them. Artificial intelligence tools are starting to enable the massive screening of molecules with only a limited number of real assay points, intertwining virtual and physical HTS. The internet of things enables more powerful and real-time quality control of operations. New liquid handlers facilitate the miniaturization of assays, causing precious savings that will also reduce the environmental impact of HTS, an area of concern recently the subject of intense efforts. Finally, improved data management has proved essential to maximizing the proficiency of the HTS process, with tools to harmonize and unify data structure being developed. Notably, such tools will be critical to make the data available for data science and analytics like artificial intelligence that will clearly drive the HTS of the future. Overall, as HTS has evolved, it continues being a cornerstone in drug discovery, and will likely remain so, for years to come.

Figure 1: How HTS advances have positively impacted on issues traditionally associated with the discipline. 2. PREFACE

With their focus on finding new molecules for therapeutic purposes, pharmaceutical companies embraced high throughput screening (HTS, i.e., the simultaneous evaluation of a vast number of molecules in biological assays expressing a given functionality, with the goal of identifying modulators of such functionality) several decades ago as a key strategy to identify new chemical matter. Starting from modest efforts intended for screening hundreds or thousands of molecules, technology evolution fostered a change in the discipline which evolved to ultra-high throughput screening at the dawn of the 21st century, where millions of molecular entities were screened for every target in a short period of time. This sudden revolution transitioned in the last two decades to a calm evolution where quantitative changes were replaced by qualitative ones. Novel strategies in different fields have converged (and are converging) to slowly transform the HTS scenario, as it becomes more mature, to increase its success rate. In this document we attempt to summarize the current status of HTS, highlighting the opportunities to further improve its efficiency with a view on future directions for the discipline. It will be focused mainly on target-based HTS, leaving phenotypic screening to be discussed in a separate document. Likewise, it will not cover genomic screening, an emerging therapeutic approach that has been reinvigorated with recent CRISPR advances (an excellent review on this topic can be found in a recent article1).

3. HISTORICAL PERSPECTIVE

In the last two decades of the 20th Century a number of discoveries caused a significant leap forward in the drug discovery process. Until then, drug discovery was based primarily on serendipity and empirical observations in rudimentary physiological models, using animal tissues at first, and assays based on cell cultures and purified proteins later. These latter assays, though more efficient than the animal tissues models, were burdensome and only allowed the evaluation of a few tens of chemical compounds per week. Therefore, the number of opportunities to develop novel therapeutics was rather limited. But advances in recombinant DNA technologies during the 1980s fostered the availability of recombinant proteins that could be used as targets in assays intended to find ligands capable of modulating their function, either in cell-free or in cell-based models. This was the seminal point that gave rise to high throughput screening as a discipline in the following decade.

Advances in fluorescence-based reagents in the 1990s enabled the development of more efficient assays that could be performed in 96-well plates, thereby allowing the simultaneous screening of many molecules. Likewise, the low volume of the assays caused precious savings in chemical compounds, moving from spending 5-10 mg per assay tube, to 10-100 µg per assay well. The first published articles visible in PubMed with the term “high throughput screening” date from 19912-4 using radioactive tracers to monitor ligand binding. However, most of the HTS work performed in pharmaceutical companies was not disclosed at that time since it was perceived as a competitive advantage and therefore its results were considered as highly sensitive information. Actually, HTS campaigns were first initiated in the 1980s with a throughput that grew from less than 1,000 compounds screened per week in 1986, to nearly 10,000 in 19895. As throughput increased, corporate compound collections typically containing 10,000-20,000 samples were too small to suffice for the nature of the HTS endeavor, conceived as a game of numbers: the larger the number of samples being screened, the higher the chances to find an interesting molecule. In addition, those collections were usually a reflection of the previous activity of the company and often included poorly developable molecules such as dyes. These facts contributed to limited chemical diversity in the set of compounds being screened, hence precluding the chances to find novel chemotypes. However, the explosion of combinatorial chemistry and parallel synthesis methods by mid-1990s, and a new appreciation for develop-ability properties for molecules brought on by Lipinski6in 1997, changed the landscape completely, and in the early 2000s compound collections scaled up to include at least half a million different molecules.

The aforementioned advances in fluorescence chemistry gave rise to a plethora of reagents suitable to all forms of fluorescent readouts: fluorescence intensity (FLINT), fluorescence polarization (FP), Förster resonance energy transfer (FRET), time-resolved fluorescence (TRF), fluorescence intensity distribution analysis (FIDA) or confocal fluorescence lifetime analysis (cFLA). All these technologies enabled the development of highly efficient assay formats in homogeneous “mix and read” mode, avoiding cumbersome washing steps and therefore increasing considerably the screening throughput. Those advances led to a further decrease in assay volumes, transitioning from 96-well plates in the mid-1990s (dealing with 100-200 µL assays) to 384-well plates (10-50 µL assays) at the end of the decade and 1536-well plates (3-10 µL assays) in the early 2000s. This was accompanied by significant progress in automation, with the development of more proficient robotic systems which made low-volume liquid handling more reliable. The founding of the Society for Biomolecular Screening (SBS, currently Society for Laboratory Automation and Screening, SLAS) prompted the adoption of microplate standards that were paramount to harmonize the development and manufacturing of new instrumentation, from liquid handlers to plate readers or incubators, and helped their integration in big robotic platforms. The bottleneck shifted from the laboratory to the information technology (IT) desks as new tools capable of dealing with thousands of data were demanded. The first versions of these tools were prepared initially as rudimentary, customized systems developed within each pharma company (usually as sophisticated macros operating in Excel) and later as more robust commercial LIMS platforms capable of capturing, processing and integrating all the information in corporate databases in a data integrity-compliant manner. Simultaneously, new quality control criteria were implemented to monitor and ensure assay performance, avoiding unwanted waste of costly reagents, and sound statistical criteria started to be used for the appropriate selection of hits.

These advances contributed to the industrialization of HTS, which evolved into “ultra-HTS” (uHTS), defined as the ability to screen more than 100,000 compounds every day against every single target. The availability of the human genome sequence7 in 2001 nourished the identification of novel therapeutic targets and all these facts triggered the definitive explosion of HTS in the pharmaceutical industry during the 2001-2005 period. And, as it happens with any sudden explosion of technology, this caused a hype of expectation that shortly afterwards led to disappointment as some of the unfounded (and somewhat naive) expectations did not pay off, at least in the short term. Most of the reproaches against HTS were elegantly discussed and refuted in a Nature Reviews in Drug Discovery article8 published in 2011.

The fruits delivered by HTS are now becoming more apparent. It is very difficult to trace the origin of drugs already in the market or in late clinical trials to find out if they originated from a HTS campaign, but surrogate approaches are demonstrating that the contribution of screening strategies to the discovery of novel molecular entities has increased in the last few years and therefore the discipline seems to be paying off, as observed in Figure 2. In the 1990-99 period, 244 novel molecular entities (NMEs) were approved by the FDA and 105 out of them (43%) were new in their scaffold and shape (i.e., they were not fast followers, nor inspired by known chemotypes). In the 2000-09 period, this percentage grew modestly to 51% (84 out of the 164 NMEs approved) but the number of these novel molecules decreased (from 105 to 84). However, in the 2010-19 period the percentage expanded to reach 67% (164 out of the 245 NMEs approved) and the figure boosted drastically (from 84 to 164) with screening believed to be responsible for this growth9. *

Figure 2: Evolution of drug discovery outcomes in the last three decades. NMEs, novel molecular entities. The term “innovative molecules” refers to those not inspired in known chemotypes, thus being new in their scaffold and shape. The percentage of innovative molecules is calculated with respect to the number of NMEs. As often happens with hype cycles, the disappointment following the hype was superseded by a degree of maturation and productivity which has been reached by HTS in the last few years. This maturation is reflected in the adoption of convergent, flexible strategies and the renunciations of previous myths and dogmas that were common in the early 2000s. For instance, diversity screens are combined with fragment-based screenings and rational drug design, complementing each other. Medium- or low-throughput assays, banned in screening platforms two decades ago as inefficient to screen large compound collections, are nowadays accepted using small subsets of the compound libraries or iterative approaches if those assays are a more valid reflection of the physiology of the target or if they are the only viable option, thus enabling the screening of targets previously considered intractable. These are just a few examples showing how HTS strategies have evolved and matured. In the following chapters a more detailed analysis of the different aspects of HTS will be elaborated with some considerations about future trends. As already mentioned in the Preface chapter, most of these reflections will focus exclusively on target-based HTS, leaving the rich field of opportunities brought about by phenotypic screenings to be analyzed in a future document.

4. MAIN ACTORS IN THE HTS PLAY

4.1 TARGETS ON SCREENING

Given the sensitive nature of HTS campaigns in pharma companies, determining accurately the nature of the targets being screened in those companies is an almost impossible task since in many cases this information might not be fully disclosed. An approximation can be taken by looking at screening data publicly available in databases like PubChem BioAssay, with most of these data being deposited by academic and public institutions (https://www.ncbi.nlm.nih.gov/pcassay/). Alternatively, HTS data published by pharma companies in the most relevant journal in the field (SLAS Discovery, formerly Journal of Biomolecular Screening) can be retrieved for this analysis, but this likely would be incomplete since not all companies publish comprehensive data from their HTS campaigns. Therefore, the information available in the PubChem BioAssay database has been used as the most feasible approach to evaluate current trends in the type of targets being screened.

Figure 3: Frequency distribution of target types (human and non-human). A: Distribution in HTS-related assays (N=332,549 assays) registered in the PubChem BioAssay database. Targets were grouped according to the classification of the IUPHAR/BPS Guide to Pharmacology. The group “enzymes” includes kinases and the group “receptors” comprises all type of receptors, including GPCRs and nuclear receptors. B: Distribution of targets among the FDA-approved drugs until December-2015 according to the Supplemental Information available in the original article from Santos et al.10. This frequency distribution is done on the total number of different targets (N=813), not on the number of drugs targeting them (i.e., each individual target is only counted once even if it is targeted by more than one drug). C: A closer look to the distribution in B but focused only in small molecule drugs approved in the 2011-2015 period. See main text for further explanation. An exploration of this dataset, done by using the “Classification Browser” tool and arranging the information by target type according to the IUPHAR/BPS Guide to Pharmacology (https://www.guidetopharmacology.org/), reveals that more than 300K HTS-related assays (either primary HTS, secondary and confirmation assays, dose-response assays and mechanistic biology assays for HTS hits) have been deposited in the database since 2004 . Figure 3A shows how these assays are distributed by target class, with enzyme assays being the most frequent (49%, kinases being included in this group), followed by receptors (40%, including G-protein coupled receptors (GPCRs), nuclear receptors and other membrane receptors). Worth noting, some of the assays considered in this analysis aimed to identify or characterize toxicity issues, hence the significant presence of transporters assays (2%). It is interesting to see how this distribution is similar to that observed in the panel of targets of FDA-approved drugs until 2015 (Fig. 3B), although receptor assays are more poorly represented in the latter panel. This difference is likely due to the fact that this target class includes many GPCRs considered intractable until the late 1990s. However, a number of events in the 2000s increased their perceived tractability: the development of novel molecular biology tools capable of generating transient and stably transfected cell lines thus making these targets amenable to screening campaigns; advances in crystallography techniques that uncovered the 3D structure of many of these proteins, therefore enabling rational drug design and virtual screening approaches; and the development of novel assay technologies. Among the latter, it is worth mentioning significant breakthroughs like 1) the discovery of fluorescent calcium indicators11 that led to the development of FLIPR™ technology which revolutionized GPCR drug discovery in the late 1990s12, 2) the design of luminescence-based technologies (e.g., “enzyme fragment complementation” systems used to monitor β-arrestin recruitment as a generic method for all GPCR types or the formation of cAMP in Gs receptors) and 3) the development of non-radioactive, FRET-based, ligand-binding assay formats. These advances caused a boost in GPCR-targeted HTS campaigns and consequently they are largely represented in Fig. 3A. These increased efforts in GPCR drug discovery are starting to pay off. A closer look at the FDA-approved drugs in the 2011-2015 period (Fig. 3C) shows that the presence of GPCR targets is significantly higher in this period than in all the historical series (21% vs11%), i.e., a significant part of the approved GPCR-targeted drugs is concentrated in recent years. With this encouraging success, and considering the relevance of GPCRs in cell physiology, it is likely that this target class will continue to be prevalent in the drug discovery portfolios of many pharmaceutical companies and therefore continuous improvement in assay formats is likely to occur. Particularly, it is expected that cheaper and more accessible instruments to monitor intracellular calcium release (the preferred methodology when screening for Gq receptors) will become available, especially these days where public institutions, with more restricted budgets, are increasingly active in HTS. Indeed, less expensive alternatives to FLIPR™ like FlexStation™ from Molecular Devices (San Jose, CA) are available and it is foreseeable that this trend will continue and expand in the coming years.

Also included in the receptor group, there are many nuclear receptors that were extensively pursued as drug targets in the early 2000s. Interest in this target class has somewhat declined in the last decade. Nevertheless, there are many proficient assay platforms allowing the development of high throughput assays for nuclear receptors, most of them being functional and based on the transcription event triggered by the receptor to lead to the expression of a reporter gene, which is easy to monitor in a high throughput-friendly format.

Enzymes are the largest target class being screened and the one with the largest representation in the target set of FDA-approved drugs. Worth mentioning, kinases have been included within this group in the analysis shown in Fig. 3. As it happened earlier with GPCRs, kinases were the subject of an explosion in drug discovery efforts in the first decade of the present century, especially for cancer drug discovery. However, many other non-kinase enzymes are responsible for the large impact of this target class. Indeed, a closer look at the group of enzyme targets of FDA-approved drugs shows that only 16% of them were kinases (including non-protein kinases of human and microbial origin) and the remaining 84% were other enzymes. Nonetheless, and consistent with the explosion mentioned above, many of the drugs targeting human protein kinases were approved in the 2011-15 period; likewise, kinases are largely represented in the target sets of HTS-related assays within the PubChem BioAssay database: 43% of the screened enzymes are kinases. Among the non-kinase enzymes there are many proteases, phosphodiesterases, reductases, dehydrogenases, phospholipases and, interestingly, a wide number of enzymes involved in the transfer of methyl groups, this being consistent with the burst in epigenetics drug discovery programs from 2005. Different than GPCRs, where preferred assay technologies involve the measurement of intracellular calcium release, there is not a universal method to be used with enzymes. Some groups of enzymes (kinases, phosphatases, NAD(P)H-dependent dehydrogenases), sharing common substrates or products, may benefit from a common assay format, but this is not applicable to all enzymes. Therefore, bespoke assays for individual enzymes are not unusual. On the other hand, in the hit selection and optimization phases of many drug discovery programs, especially in enzyme-targeted ones, there is a growing trend to consider not only the thermodynamics of drug-target interactions (measured through surrogates of the equilibrium constant: EC50 and the degree of effect at a given drug concentration) but also their kinetics (i.e., off-rates and target residence times)13. Consequently, it would not be surprising to observe a trend towards assay formats enabling continuous, on-line readouts instead of formats exclusively allowing end-point detection, since the former are more suitable for the acquisition of kinetic data.

Ion channels are the third target class more frequently considered in drug discovery. Since patch-clamp methods typically used in classical pharmacology studies for these targets are not amenable to high throughput efforts, alternative assay formats were developed. One of the most frequent approaches includes the use of ligand-binding assays, but such assays are not considered truly functional, and they are biased towards the identification of compounds binding to the same site of the probe ligand. Functional alternatives were developed, and the most popular one includes the use of voltage sensor probes (VSPs) which consist of FRET probe pairs which are disrupted upon membrane depolarization. Likewise, ion-specific fluorescent probes (like those used for GPCRs) are used to monitor changes in intracellular ions, e.g., calcium for calcium channels and for non-selective cation-channels. Efflux assays are also used in several drug discovery programs, although to a much lower extent than those mentioned above since they use radioactive tracers (usually 86Rb+) or demand low-throughput atomic absorption spectroscopy instruments to detect non-radioactive rubidium; in addition, they offer low temporal resolution. However, efforts in the last decades to adapt the gold-standard patch-clamp technology to high-throughput settings are maturing with the development of planar patch clamps14. IonWorks™ from Molecular Devices was the first platform to be developed (superseded by their new versions IW Quattro™ and IW Barracuda™) and new instruments are currently available incorporating the precision and accuracy of manual electrophysiology: PatchXpress™ (Molecular Devices), IonFlux™ (Fluxion Biosciences; Alameda, CA), Qube™ and QPatch HT/HTX™ (Sophion; Ballerup, Denmark) and Patchliner™ and SynchroPatch (Nanion; Munich, Germany) are among those. Although their throughput is incomparably high with respect to manual patch-clamp, it is still low when compared to other HTS-friendly technologies. However, they are very useful when screening small compound libraries or for screening campaigns using iterative approaches (like those described in chapter 8) to analyze a lower number of compounds. Still, the price of this instrumentation is nearly prohibitive for many medium or small companies that will have to rely on alternative methods. It is anticipated that prices will decrease in the near future and the predictable trend for ion channel screening will likely include a combination of these technologies.

Most of the targets considered in drug discovery programs were selected with the intention of modulating cellular processes that were deemed critically involved in the onset of pathological conditions. Deciphering cell signaling pathways in the 1980s and 1990s preceded the interest in GPCRs, kinases and nuclear receptors playing critical roles in those pathways. In the 2000s, unveiling the importance of epigenetics in modulating cell physiology fostered the interest in many targets involved in the process and therefore a large number of enzymes acting as “writers” (DNA methyl transferases, lysine histone methyltransferases), “erasers” (histone deacetylases, lysine demethylases) and “readers” (bromodomains) of epigenetic changes have been included in pharma early drug portfolios in the last 10-15 years. This effort has partially paid off with the first and second waves of FDA-approvals for drugs targeting epigenetic targets, namely DNA methyltransferases (DNMTs) and histone deacetylases (HDACs). Six drugs were approved among the two waves, all of them for the treatment of hematological malignancies: the first wave delivered azacitidine (2004) and decitabine (2006), targeting DNMTs, as well as vorinostat (2006) and romidepsin (2009), targeting HDACs; the second wave delivered belinostat (2014) and panobinostat (2015), both targeting HDACs. Since then, more efforts have focused on other epigenetic targets, but several issues are forestalling further success. One of them is the observed poor translation of in vitro results into in vivo efficacy data due to several reasons, the main one being that most epigenetic targets work in multiprotein complexes that are not being included in the design of the in vitro assays (this point and its possible solutions will be discussed in section 4.3). In addition, concerns on toxicity usually arise during the development of epidrugs due to both their poor selectivity versus the intended target and the interplay among epigenetic targets and chromatin related proteins. Finally, lack of sufficient target validation also impacts on the poor translational success of many epigenetic drug discovery programs. Nonetheless, several epidrugs candidates are currently in clinical trials and efforts within the scientific community to overcome the challenges described above will surely increase success in this field.

Targeted protein degradation (TPD) is the latest approach in the drug discovery radar and the one leading the current trends at the dawn of the 2020s. The main conceptual difference in this case is that TPD is not only being considered as a physiological process to be modulated for therapeutic purposes but rather as cellular machinery to be exploited to modulate protein targets that were neither druggable by small molecule ligands nor accessible by biological agents. Basically, TPD is a strategy intended to label specific target proteins for proteosomal degradation. It requires finding a suitable ligand for the target protein that can be conjugated through a suitable linker to a signal molecule that recruits an ubiquitin E3 ligase. The resulting molecule, called PROTAC (“PROteolysis Targeting Chimera”) is a heterobifunctional small molecule formed by a linker and two warheads: one warhead binds to the target protein, and the other recruits the E3 ligase. In this way, the target protein can be ubiquitinated and degraded by the proteasome. The first successful use of PROTACs was published in 2001 using a peptide conjugated to a covalent ligand of the protein targeted for degradation15. Since then, the field has rapidly evolved to the design of non-peptide small molecules acting as sensors for recruiting E3 ubiquitin ligases. The discovery of the E3 ligase cereblon as the target for thalidomide and analogues and the clinical success of these molecules in treating multiple myeloma has propelled a significant amount of work in the field. Such efforts have mostly tried to exploit the E3 family in the search for the ideal ligase cognate for each target, while looking for the most appropriate ligands. This is a nascent field predicted to evolve and grow in the next few years and will benefit from efficient HTS strategies. Since new E3 ligases will likely be discovered (as of 2019, only 7 out of circa 600 E3 ligases in the human genome had been exploited for TPD16), novel ligase sensors will be needed hence suitable binding assays will have to be developed. Likewise, new assay systems will have to be designed to screen for the most efficient ligands in inducing degradation. Companies like Amphista (London, UK), Lycia (San Diego, CA), C4 Therapeutics (Watertown, MA), Captor (Wroclaw, Poland), Arvinas (New Haven, CT) or Cullgen (San Diego, CA) are actively working in the field using their own platforms and designing assays to investigate the efficiency of the hits found, considering not only the affinity for the intended target and the cognate E3 ligase, but also the extension and the duration of action of the induced degradation. It is therefore predictable that the pharmaceutical industry will devote more resources to the field and significant progress will be made in upcoming years. Notably, as outlined below in section 4.4, TPD is a perfect match for DNA-encoded library technology (DELT, explained in the same section), and this fact will undoubtedly lead to successful outcomes.

4.2 ASSAY MODALITIES AND TECHNOLOGIES

In the previous section, some assay modalities were discussed concerning particular target classes. However, it is worth mentioning that target-agnostic technologies will likely influence the evolution of HTS in the near future. Particularly, biophysical, label-free technologies are endowed with features that make them attractive from a HTS perspective as they do not need artificially created ligands or labelled molecules that may be slightly but significantly different to natural ligands. Such technologies would therefore enable the development of assays that resemble more closely the physiological function of the target. In addition, label-free technologies are not susceptible to interferences with labeled reagents that usually lead to unwanted false positives and negatives. Therefore, they do not need orthogonal secondary assays to eliminate such false positives, resulting in valuable savings in time and costs.

Among the most extended label-free assay formats amenable for HTS it is worth mentioning those based in waveguide, impedance or surface plasmon resonance (SPR). The latter was one of the first technologies to be developed for drug discovery in the mid-1990s although its use for screening purposes is relatively recent, when appropriate instruments have become available. Though different technologies with different underlying events, SPR and waveguide are based in changes in the properties of reflected light following a change in the mass of a target previously immobilized on a suitable surface. Analogously, impedance-based methods register changes in the electrical conductivity of an electrode array located at the bottom of a well containing cells, once such cells have undergone morphological changes as a consequence of treatment with a given compound (Figure 4). Besides enabling the development of “less artificial” assays, these technologies provide biophysical data that can illuminate the mode of action of identified hits, thus offering a means for hit prioritization. Despite these benefits, the implementation of these technologies in the HTS arena has not been as widespread as was predicted ten years ago. Several factors can explain the low uptake, the most important being their lower throughput compared to other HT-technologies, the cumbersome preparation of reagents and plates (which also impacts throughput) and the high costs of instruments and, specially, consumables. Nonetheless they are still used in secondary screenings as described below for thermal-based methods.

Figure 4: Fundamentals of traditional label-free technologies used in HTS. A: Surface plasmon resonance. B: Waveguide. C: Impedance. Changes in mass (A, B) or in morphology (C) due to biological interactions (lower panels) induce a change in the reflection angle (A), the reflected wavelength (B) or the electrical impedance (C). Following these pioneer technologies, other approaches have become part of the screening armamentarium. Mechanistic assays based in shifts in thermal stability upon ligand binding are frequently considered in drug discovery programs: isothermal titration calorimetry (ITC), cellular thermal shift assay (CETSA) or microscale thermophoresis (MST) are becoming increasingly popular (the latter has been included in this group although is not a label-free technology properly speaking). However, they are also low throughput and therefore are more likely to be used for screening small libraries or in fragment-based screening exercises given the useful biophysical information they provide.

To date, the most successful label-free technology for HTS purposes is mass spectroscopy (MS). Initially, MS was coupled to multiplexed liquid chromatography to increase throughput, but this approach still required cycle times of more than 30 s per sample which were unacceptably high for HTS purposes. Changing to solid phase extraction caused a threefold increase in throughput, still considered low. However, the development of highly efficient MALDI systems such as the RapifleX® MALDI PharmaPulse® (Bruker; Billerica, MA), capable of working with low volumes in 1536-well plates has caused a dramatic decrease in cycle times to 0.5-1 s per sample and therefore this technology is now being successfully applied in HTS campaigns17. Still, interferences caused by salts are a significant problem when using MS in HTS. The emergence of self-assembled monolayers for MALDI (“SAMDI”) has overcome this important issue. SAMDI (developed by SAMDI Tech; Chicago, IL) utilizes biochip arrays with self-assembled monolayers (SAMs) of alkanethiolates on gold allowing assay execution under any condition: once the assay is completed, the reaction mixture is transferred to the SAM biochip, where the analyte of interest is captured and the rest of the reaction mixture is washed away. Finally, a suitable matrix is applied to the biochip and the sample is analyzed in a MALDI instrument18. Therefore, the only limitation for running screening assays using SAMDI is that the analyte has to be immobilized in the SAM biochip, but a vast number of immobilization chemistries are available and can be easily developed for most analytes.

Recent advances allow the acquisition of bidimensional arrays of hundreds of thousands of MS spectra in cells and tissue samples enabling MS-imaging, which is already being used for screening19. The successful implementation of MS for HTS purposes is being followed by other spectroscopy modalities like nuclear magnetic resonance or Raman spectroscopy and it is not unusual to find reports of HTS campaigns run with these technologies. These advances offer a promising field to be developed in the near future, increasing the arsenal of screening technologies and enabling the HTS of targets difficult to analyze by classical conventional technologies.

4.3 BIOREAGENTS

The provision of bioreagents has been one of the bedrocks to enable a successful HTS campaign. As explained in the introduction, advances in DNA recombination in the 1980s contributed to the availability of constructs that allowed the preparation of large quantities of recombinant proteins with high purity degree and transfected (transient or permanent) cell lines either overexpressing the target of interest or showing a target-driven specific phenotype, thus providing a suitable readout for high throughput assays. In the beginning of this century it was quite common in big pharma to have in-house resources and large production facilities equipped with bioreactors and protein purification instrumentation to prepare their own reagents. There is now a growing trend to outsource the preparation of bioreagents, a decision which is more common in small biopharma and academia. Classical suppliers of life science reagents (Sigma Aldrich, Merck, Promega, VWR, TebuBio, etc) can provide bespoke bioreagents at reasonable prices depending on scale. Likewise, small companies have flourished in many countries offering similar services. Therefore, it is not difficult to find a solution for bioreagents provision at all scales, from large HTS premises in big pharmaceutical companies to small groups in biotech startups or academic consortia. Furthermore, many commercial suppliers offer tailored cell lines expressing selected targets, ready to use for screening assays after a few optimization steps. Companies like Promega (Madison, WI) or Eurofins DiscoverX (Fremont, CA) possess a rich catalogue of cell lines for screening modulators of the most demanded targets according to scientific trends. A good example these days is the availability of cell lines for a wide number of targets in immuno-oncology.

That said, the availability of recombinant proteins or transfected cell lines should never be taken for granted. Some proteins are hard to express in functional form and in enough quantities for HTS purposes. Improvements in assay technologies and miniaturization trends in HTS have reduced the amount of bioreagents needed, but still some proteins can be difficult to express in a functional state. Nevertheless, advances in molecular biology have helped reduce those challenges to a minimum, and savvy molecular biologists usually find a way to overcome expression and/or production issues. Such advances in molecular biology currently permit scientists to express multiprotein complexes instead of individual entities, a significant leap forward allowing experimental in vitro systems to reflect more accurately disease pathophysiological processes.

As outlined in section 4.1 for epigenetic targets, many cellular functions are possible due to the activity of multiprotein complexes performing in a coordinated way, with individual components modulating the activity of others. Though the presence of large multiprotein complexes in cellular systems has been known for some time, it has recently been discovered that there are many small (no more than five components) protein complexes and, interestingly, many disease target proteins are part of these small protein complexes20. Given the prevalence of disease targets in such complexes, it follows that monitoring their activity in isolation seems an excessively simplistic approach that may fail to find the right therapeutic modulators and, therefore, may contribute to attrition in drug discovery. Powerful proteomic tools are helping to understand the nature of these complexes, including subunit composition, stoichiometry, and post-translational modifications. All this information is crucial to inform the correct cloning and expression strategy in appropriate systems. The selection of such systems must consider their suitability in terms of speed, reproducibility, scalability and economic feasibility.

Figure 5: The expression of multiprotein complexes involving disease target proteins helps the design of functional assays in vitrowith physiological relevance to be used in HTS. Recent advances in molecular biology have made available several expression systems suitable for the preparation of multiprotein complexes at an adequate scale for drug discovery and screening purposes. There are tools that utilize tandem recombinant strategies for simultaneous co-expression of proteins or proteins subunits in a single host. For instance, ACEMBL21 in E. coli (also applicable to eukaryotic hosts22), MultiBac23 for insect cells and MultiMam24 for mammalian expression hosts as well as several tools for co-expression in plants (reviewed in 25), demonstrate that these strategies expand to all types of hosts. They can be used both for transient and stable expression, giving molecular biologists the opportunity to select the most appropriate option depending on the features of the target being considered.

These tools are clearly pointing to new trends in bioreagent supply for HTS and drug discovery which will deliver the necessary material to investigate the function of disease targets in a more physiological environment. These targets will not only be the subject of structural studies needed to afford rational design of drugs or to run fragment-based screening campaigns, but also constitute the right target for high throughput assays that should deliver more viable hits than those obtained in the past with isolated targets.

4.4 COMPOUND COLLECTIONS AND CHEMICAL DIVERSITY

As explained in the introduction, the size of corporate compound collections for HTS grew drastically in the late 1990s in parallel with the deployment of combinatorial chemistry and subsequent array synthesis technology. Consequently, compound collections in big pharmas reached a size of more than 1M samples in the early 2000s. Nonetheless, a concern remained among all scientists involved in HTS regarding the diversity of such collections. Having been nurtured mostly from combinatorial chemistry applied to a limited number of chemistry projects (thus to a few chemical structures), there was the caveat that the collections were richer in depth than in breadth, i.e., they contained a vast number of variations of a limited number of defined chemical scaffolds (chemotypes). In other words, the collections were poorly representative of the enormous diversity of the chemical space. Although most chemical spaces are too large (above 1E60 molecules) to consider covering with a full representation, having as accurate a depiction as possible, or at least covering the biologically relevant portions of the space however that might be defined, is the unmet dream of most pharma company chemistry departments. Therefore, most have devoted significant resources to complementary strategies intended to explore novel regions of chemical space while trying to expand the diversity of their collections as much as possible by investing in the synthesis of novel compounds.

The first approach includes several alternatives, all of them sharing rational, structure-driven strategies. Virtual screening, utilizing large in silico compound libraries, has been the most common strategy. It is generally applied in two different forms, either following ligand-based screening, trying to find compounds matching the chemical features of known ligands, or following structure-based screenings, in which the virtual libraries were interrogated for docking fitness to the binding regions of known 3D protein structures. Many in silico libraries are available, some of them particularly extensive like ZINC (http://zinc15.docking.org/) comprising nearly 1 billion different compounds. In recent years, on-demand in silico libraries have been developed applying known and validated chemical reactions to a divergent set of reagents. Because they are based on validated reactions, these libraries offer the advantage of being realistic, i.e., their virtual compounds are synthesizable. With the obvious advantage of generating novel proprietary molecules, many pharmaceutical companies have expanded their internal diversity by leveraging their own synthetic experiences and novel building blocks, like the Pfizer Global Virtual Library (PGVL), the Proximal Lilly Collection (PLC) from Eli Lilly, the Merck’s Accessible inventory (MASSIV), or the Boehringer Ingelheim’s BI-Claim26. This trend has also reached academic groups, and two good examples of virtual libraries developed in academia are SCOBIDOO27 and CHIPMUNK28, the latter being particularly rich in compounds well suited to modulate protein-protein interactions. Finally, chemical vendors have also built on-demand libraries, the most extensive being REAL (https://enamine.net/compound-collections/real-compounds/real-database) developed by Enamine (Kyiv, Ukraine), with nearly 2 billion virtual compounds. Interestingly, these libraries are not only being used for virtual screening in the two classical forms described above, but they are also being used in a third approach that takes advantage of the benefits of machine learning (ML): the so-called generative models which can build new molecules with desired properties based on the continuous vector representation from a set of molecules used to train the algorithm sustaining the model29.

Fragment based screening (FBS) is another alternative strategy to explore the chemical space with relatively small physical compound libraries. This approach was initiated in the early 2000s when it became evident that compounds in corporate collections were similar in size to drugs, although this feature might be a disadvantage as fragments might not be endowed with all the properties needed to display high affinity for their targets (in addition to other considerations related to their ADME-Tox profile). It followed that screening at higher concentrations and using small fragments as starting points could be a valid and fruitful approach30. The strategy has a thermodynamic background based on the ligand efficiency concept31and also on entropic savings leading to several orders of magnitude increase in affinity when combining fragment-sized molecules with different enthalpic contributions into a larger molecule preserving such contributions. This fragment-based approach has been pursued by several companies and so far there are two drugs in the market that originated from this approach: Vemurafenib, for the treatment of metastatic melanoma, and Venetoclax, for chronic lymphocytic leukemia. In addition to validating the strategy, these successful cases have encouraged drug discovery scientists to consider FBS as a must-do activity in their projects, particularly when the structure of the biological target is suited for this approach because it contains binding pockets with cavities amenable to binding small fragments. This determination is propelling the evolution of FBS, led by two major drivers. The first is aimed at scoring the suitability of a given target for FBS and several computational approaches have been developed to that end32: this allows focusing on those targets with the highest likelihood of success, avoiding wasting resources on targets not suitable for this strategy. The second relies on the development of new chemical approaches to synthesize novel and easy-to-grow fragments, with particular emphasis on structural diversity and three-dimensionality33. Computing tools will be paramount in this regard to mine the interface between chemical synthesis precepts and structural information from the pocketome, as well as to exploit the biological data generated in order to guide and prioritize the synthesis of new fragments34.

While FBS is a useful alternative strategy to increase the diversity of HTS compound collections, the most important revolution in this area has been the development of DNA-encoded library technology (DELT), an approach initiated 15 years ago that has matured over this decade and is now playing a major role in most pharmaceutical companies, delivering successful results as exemplified by the RIP-1 kinase inhibitor GSK2982772 currently undergoing clinical development for psoriasis. The fundamentals of DELT (depicted in Figure 6) are connected to the original concepts of combinatorial chemistry and the “split and pool” procedures capable of generating a large number of molecules using multiple combinations of building blocks followed by screening the resultant pools of molecules by affinity screening to identify putative hits However, since each individual molecule is present in very small amounts, its correct identification is predicated on amplification of the DNA tag that encodes for the molecule’s recipe or synthetic sequence. Labeling individual building blocks with DNA oligos of different sequences and lengths enabled encoding the resulting molecule in a unique mode (Fig. 6A), thus granting its identification by amplification and sequencing of its DNA tag (Fig. 6B). This identification is unequivocal and highly sensitive, enabling the detection of molecules present in tiny amounts that could not previously be identified by conventional analytical methods. The availability of next generation DNA sequencing, with massive capabilities, has increased the throughput of the identification step with respect to the early days of the technology, therefore sharpening and improving the vast potential of DELT in drug discovery.

Figure 6: Fundamentals of DELT. A: split and mix of building blocks and labeling of the resulting combination with unique DNA tags. B: the identification of individual hits in a HTS against an immobilized biological target (from 35and 36). The implementation of DELT provides rapid access to large compound collections (with sizes that may reach billions of compounds) for big and small pharma alike, thus expanding their diversity. Since only minuscule amounts of each compound are necessary, the physical volume of these libraries is significantly smaller than those of conventional compound collections, despite being orders of magnitude larger. Likewise, the savings in preparing such libraries are huge compared to conventional ones: estimates are that the cost per compound in conventional libraries is 1,000 USD per compound, whereas in DELT libraries such cost is reduced to 0.02 US cents per compound37. This makes DELT libraries available even to academic labs or small startups with budget restrictions. In addition, since each individual molecule is easily identified, libraries are screened in mixtures of compounds, hence less assay points are needed to analyze the complete library, resulting in further savings. Furthermore, compounds are tested in binding assays against immobilized targets and, consequently, there is no need to develop functional assays, which can be challenging for some targets. Such functional assays are necessary, however, in a further step to confirm the biological activity of the identified molecule lacking its DNA tag, but since the number of entities to be confirmed will be relatively low it may not be imperative to have the assays configured in high throughput format. As mentioned in section 4.1, another added advantage of DELT is that it fits very nicely with TPD: once a ligand of the protein to be degraded has been identified, it is very simple to replace the DNA tag with the ligase-recruiting sensor. Hence, DELT and TPD seem to be a perfect match for many drug discovery efforts. Unfortunately, failure rates usually reach 50% in the confirmation assays mentioned above when many putative hits, prepared synthetically lacking the DNA tag, fail mainly due to possible DNA interferences in the primary, binding assay or artifactual enrichment due to incomplete chemical synthesis or tag degradation35. This is not the only inconvenience brought about by the DNA tag. It also limits the chemical reactions that can be used to prepare target molecules, precluding those reactions requiring harsh conditions where DNA may be labile. In addition, the resynthesis of some individual molecules at large scale, required for subsequent biological tests, could be challenging as some rare building blocks may not be available in sufficient quantity to carry out the synthesis. These are some of the major challenges DELT is currently facing, which future work is expected to address. The catalogue of chemical reactions tolerated by DNA, however, continues to grow as chemists develop alternative conditions to carry out key chemical transformations. This trend will benefit from the addition of enzyme-catalyzed reactions and incorporation of multifunctional building blocks. Progress is also expected in the DNA-labeling field to make this step more efficient. Finally, ML tools will likely contribute to the design of improved building blocks best suited for this technology.

Increasing the diversity of corporate compound collections remains an important objective for medicinal chemistry departments, but not the only one. One never wants to compromise quality for quantity and increasing the quality of their collections is paramount. Indeed, there has been a growing concern about the presence of undesirable compounds (the so-called PAINs: pan-assay interference compounds38), so efforts have been undertaken to identify them as early as possible in HTS campaigns. These compounds usually generate false positives due to interference with the assay readouts (e.g., fluorescent compounds or quenchers of emitted fluorescence), but they may also display apparent biological activity due to unwanted, nonspecific mechanisms like aggregation (forming micelles trapping the biological target or its ligands) or promiscuous chemical reactivity including strong redox potential. Although it is common that secondary assays are included within screening programs to filter out these molecules early in the process, the accumulation of data regarding their nature and the appropriate exploitation of such data will help prevent their inclusion in future collections.

All the strategies described above to improve chemical diversity and ultimately discover novel chemical matter have advantages and limitations, and none can be considered a universal panacea. The wisest option for pharmaceutical companies, assuming they are not resource limited, is a multiple approach integrating all these strategies in different proportions depending on the nature of the target being considered. Computer approaches and ML tools will play a significant role not only in improving these strategies, but also in deciding how to combine them for each individual target.

5. AUTOMATION TRENDS

As explained in the introduction chapter, automated platforms and robotic systems played a prominent role in the expansion of HTS at the end of 1990s. Different instruments from different vendors (e.g., the first versions of liquid handlers, plate readers, plate grippers or robotic arms moving plates from one device to another and multiple peripheral components specialized in punctual tasks like removing plate lids or reading barcodes) were integrated using bespoke drivers and tools enabling communication and coordination among them. This early configuration was soon replaced by bulky workstations, inspired by manufacturing robotics, which included all these functions in one single instrument demanding a considerable space in the lab and requiring huge investments.

Throughout these decades the role played by robotic units has evolved into more flexible modular configurations that could be tailored to each laboratory’s needs for each specific situation. Those large robotic workstations are again being replaced by small subunits easily integrated and interchangeable. In addition, in recent years externalization of certain tasks to specialized laboratory automation companies has become a strong trend in the field with the emergence of cloud labs.

5.1 ROBOTS

Budget restrictions and, more importantly, the need to adapt to different project types with very different needs in a continuously evolving environment, are motivating pharma companies to change their investment criteria regarding instrumentation. Large investments in big, static workstations have been replaced by the purchase of smaller instruments, covering different individual tasks, which could easily be moved, interchanged, and utilized flexibly.

Robotic arms are a good example of individual pieces offering flexible integration covering a wide number of activities. In the initial days of HTS, robotic arms were simple plate movers with no degrees of freedom, only capable of transporting plates in a linear motion from one device to another one next to it. Soon thereafter, limited access arms, endowed with 2-4 degrees of freedom, were launched. Those degrees of freedom enabled these instruments to move plates among peripheral instruments arranged in a bidimensional layout. Finally, articulated robotic arms, with 5 or 6 degrees of freedom became available and their flexibility has put them at the heart of many robotic platforms, not only in HTS environments but also in many automated labs.

Figure 7: Examples of articulated arms: A: F7 from Thermo Fisher Scientific (Waltham, MA). B: Spinnaker from the same vendor. C: UR3 from Universal Robots. D: PF400 from Precise Automation The first models of these articulated arms presented some inconveniences that, although being common to the three types of robotic arms, were more pronounced in the case of the former given their flexibility of movements. First of all, the arms had to be instructed to learn the positions of the peripherals and to acquire some spatial orientation. This learning process was complex and usually involved software packages with cumbersome interfaces, making the process excessively time consuming and arduous for the average operator. In addition, since the arms are heavy pieces moving at high speed, they pose a serious safety risk and hence they were confined in closed platforms with safeguarding barriers. Those barriers impeded flexible interactions of the operator with the HTS system for common tasks like refreshing reagents or fixing errors. Therefore, operators were sometimes tempted to circumvent the barriers, violating safety rules and putting themselves at risk. The most recent models of articulated arms (presented in Figure 7) have overcome these issues. They can acquire spatial orientation using a specific teaching command, enabling them to learn on the fly without the need for programming or using sophisticated software instructions. More importantly, they are equipped with sensors able to detect the presence of external elements (humans or instruments) in their trajectory and subsequently respond by softly slowing down their movement and eventually coming to a complete stop when such presence is excessively close. This new generation of articulated robotic arms are commonly known as “cobots” (the short form of “collaborative robots”) and their use is expanding the HTS arena to include other laboratory operations. Indeed, they have become common tools in many laboratories progressively enjoying the benefits of automation, providing a good example of the blurring edges between the classical HTS environments and modern automated labs not necessarily intended for screening purposes. Together with companies that have been playing in the field since the early days, new ones have emerged in recent years developing novel instrumentation like the last generation of robotic arms described above. Companies like Precise Automation (Fremont, CA) or Universal Robots (Odense, Denmark) are some of the leading players in the field, and the recent launching of the Universal Robots UR10, equipped with a dual gripper, clearly demonstrates the progress being made.

In addition to robotic arms and plate readers, liquid handlers are also one of the most critical pieces of instrumentation in the automated HTS environment. Considering only the technology supporting their operation, seven types of liquid handlers are available: those based on air displacement, positive displacement, peristaltic pumps, capillarity, acoustic technologies, piezoelectric stacks, and solenoid valves. They all cover a wide range of volumes to be dispensed, from pL to mL (see Figure 8).

Figure 8: Range of volumes dispensed by the seven different types of liquid handlers, depending on the technology supporting their operation. However, not all the liquid handlers using these technologies are available for common tasks in HTS like dispensing reagents during assay execution. High prices (e.g., for acoustic, piezoelectric or solenoid-based ones) or their own designs limit the functionality of some. For instance, acoustic instruments are exclusively used for dispensing compounds, although their high prices and the need for special plates capable of transducing the sound waves are limiting factors for their common use. Capillarity-based systems are also used for compound distribution by replication of a source plate. Some solenoid-based systems like the D300e from Tecan (Männedorf, Switzerland) are available at a reasonable price. These incorporate Hewlett Packard digital dispensing technology to deliver volumes as low as 12 pL. Again, this instrument is intended for compound addition, although its high accuracy when dealing with such small volumes makes it an ideal instrument for preparing concentration-response curves or combination of compounds to monitor synergistic effects.

Liquid handlers based on air displacement or on positive displacement are the most common types for routine dispensation of reagents, together with the popular (and less expensive) peristaltic pumps which are typically used for bulk additions throughout the entire plate. The continuous need for saving precious reagents dictates the trend in the evolution of these instruments with a two-pronged goal: to minimize dead volumes and to dispense small amounts of reagents for highly miniaturized assays. Air displacement liquid handlers have traditionally been the most popular instruments for that purpose due to the lack of contamination provided by the air cushion between the moving piston and the liquid being dispensed using disposable tips. However, they show poor accuracy for volumes below 2 µL and are severely limited when dealing with viscous solutions. In contrast, positive displacement liquid handlers are liquid-agnostic and show large accuracies in the sub-µL range, but they typically use fixed tips that pose the risk of cross contamination; hence they demand washing steps that play against speed and efficiency. In addition, they are usually not amenable to working with 1536 plates. However, SPT LabTech (formerly TTP Labtech; Melbourn, UK) has recently launched the Dragonfly 2 model (Figure 9), which addresses both issues: it uses disposable tips equipped with plungers which are coupled to the moving piston, hence no cross contamination occurs, and it is designed to work with 384 and 1536 plates. On top of that, it demands very low dead volumes (200 µL), so it is perfectly suited for challenging assays requiring highly efficient use of costly reagents.

Figure 9: Dragonfly 2 from SPT Labtech, a positive displacement liquid handler addressing most of the issues associated with this kind of instruments (as detailed in the text). A: overall view of the instrument. B: Details on disposable tips, showing the tip internal plunger coupled to the moving piston. In the last few years, novel firms have appeared offering liquid handlers (usually air displacement-based ones) with flexible configurations, intended for HTS-related tasks such as assay development or lead optimization. Opentrons (New York, NY) launched the OT2 instrument, which operates under an open-source software enabling customized protocols using its Python-based application programming interface. The configuration of the OT2 desk is flexible, allowing the assembling of different peripherals. Another example is the Andrew + pipetting robot (from Andrew Alliance, now part of Waters; Milford, MA), which uses conventional electronic pipettes to perform repetitive pipetting operations. Although these instruments are not genuinely intended for HTS, they can be used in those HTS-related tasks described above and they are undoubtedly key players in modern automated labs.

Flexibility has become a key component of modern HTS laboratories. Instead of using fixed instruments and large robotic machines resembling those of traditional factories, the current trend is to utilize modular systems, easily interchangeable, so that the HTS lab design can vary from day to day, adapting to changing demands. Thermo Fisher InSPIRE™ was the first modular platform being launched in 2018. It can be configured for a wide variety of workflows and is governed by the Momentum™ Workflow Scheduling Software using tactile devices. It is expected that by the end of 2021 Thermo Fisher will launch SmartCarts, a system intended to facilitate docking of the instruments within the platform that will help to reconfigure InSPIRE™ smartly through visual and tactile tools. Likewise, companies like High Res Biosolutions (Woburn, MA) have specialized in developing modular high-end lab automation systems and instruments and in 2015 they partnered with AstraZeneca for the development and deployment of several new modular automated screening systems.

With the growing implementation of the Internet of Things (IoT), novel pieces of equipment are being incorporated into HTS labs and into automated labs in general. Sensors and cameras enabling remote control and surveillance of operations are starting to be used, and machine vision will likely be exploited in the short term for automatic inspection, process control, and robotic guidance. For instance, scientists at the Scripps Florida HTS premises have developed a IoT-based system to continuously monitor liquid dispensing accuracy39. It is based on the weight of the reservoir containing the liquid to be dispensed, which is placed on a precision balance connected to a suitable microcontroller capturing the data and sending them to the corresponding database. In addition, the liquid handler is monitored by an infrared sensor to identify when a dispense operation starts and finishes, so data are captured only during the dispensation process and recorded conveniently to be shown in a display next to the robotic unit. Dispensing errors can be easily identified and narrowed down to the plate position if alterations are observed in the decreasing weight readout (Fig. 10).

Figure 10: IoT-based monitoring system developed by Scripps Florida HTS scientists for quality control of liquid dispensing during HTS operations. A: Overview of the process, with data being read from a balance, captured by an Arduino microcontroller, stored into a database, and made available for review via a web interface. B: Graph of decreasing weight readout from the balance throughout the course of a dispense. Dispenser state (right hand vertical axis) oscillates between 0 (off) and 1 (on). C: Data captured during a typical HTS process. (From 39). Voice assistants are also becoming useful tools in common laboratories, and they will likely be implemented for HTS operations. Companies like LabVoice (Research Triangle Park, NC), LabTwin (Berlin, Germany) and HelixAI (Atlanta, GE) have already launched the first voice assistants for the lab. It will be interesting to see how they evolve not only for lab operations but also for data handling and connectivity. All these innovations have already been implemented in automated labs and will likely make their way to HTS settings, again demonstrating the connections and the vague limits among these two highly related environments.

5.2 SMART AND CLOUD LABS ENVIRONMENT

As already outlined above, lab automation pioneered a few decades ago by HTS facilities has spread out of the HTS arena to reach other laboratories that are progressively embracing the benefits of automation. The emergence in the last few years of so-called “cloud labs” is fostering innovation in automation technologies, these cloud labs having learned from the experience of HTS labs in the past. Although they can perform screening tasks, they are not necessarily intended for that purpose and they may also be involved in other activities strongly related to HTS like target validation, assay development, chemical lead optimization and biological lead profiling, to cite a few. Therefore, HTS settings and cloud labs are strongly intertwined, with the latter focused on running a mix of complex suites of ever-changing protocols rather than high-speed performance of a single fixed protocol.

Cloud-based platforms promote accessibility, usability, and sustainability. They also offer vast improvements in experimental reproducibility, a matter of concern in modern science. A recent article40shows that nearly 40% of the results published in selected papers from high impact journals like Nature or Science failed to be replicated when the studies were performed by independent groups. Another report previously published in Nature41shows than 70% of scientists failed to reproduce data generated by others and, even worse, more than 50% of scientists failed to reproduce their own work. Not only has this lack of reproducibility engendered negative economic consequences (which are obvious in the drug discovery world since wrong data leading to wrong decisions can cause huge late-stage losses), but it also produces a moral damage to society as it erodes credibility and increases distrust in science. Setting aside rare cases of fabrication and falsification, poor reproducibility can often be traced to human error, either in “wet” experimental tasks or in data handling and processing, or to vague and incomplete documentation of experimental methods. Moral hazard also plays a role particularly under the perverse incentives often found in academic settings, with the high pressure for publishing being among the most important. It follows that minimizing human interventions and replacing them by robotic actions following experimental methods described in executable code will reduce these errors and improve reproducibility.

In addition to reducing human error, transparency is another feature of cloud labs that fosters reproducibility. Because protocols are written in executable software when data becomes public so will those protocols, enabling their execution by others using the same instruments and same materials. Likewise, data reside in the cloud complying with FAIR (findable, accessible, interoperable and reusable) principles. Consequently, published data can become fully auditable, as permitted by the originator scientist depending on confidentiality constrains.

Beyond improved reproducibility, lower costs, more rapid experimental progress, more methodical troubleshooting, and higher operating efficiency are driving the growth of cloud labs. A recent analysis by Emerald Cloud Lab (ECL; San Francisco, CA), one of the pioneering cloud lab companies, showcases differences in costs for a startup running their experiments in a cloud lab compared to investing in their own laboratory facilities and instrumentation. The results are clear: costs per experiment can be more than 4-times lower when run in a cloud lab, and savings due to more rapid milestone achievement may exceed 70%42. This cost analysis underscores one of the most convenient aspects of robotic cloud labs: having access to millions of dollars’ worth of lab equipment on a subscription basis scaled to utilization levels without having to bear the capital expense. In addition, the fact that these robotic labs can operate 24/7/365 suggests an unprecedented efficiency. Teams of scientists distributed around the world are already analyzing their results and designing new experiments during normal working hours, dispatching those protocols to run overnight in a non-stop production cycle. Finally, remote 24/7/365 operation provides high flexibility to scientific teams, making their productivity insensitive to interruptions like the COVID-19 lockdowns.

Figure 11: Emerald Cloud Lab facilities in San Francisco, CA (image from https://www.emeraldcloudlab.com/) Founded in 2015, ECL is the pioneering company in the robotic cloud lab field. Strateos (Menlo Park, CA) emerged afterwards following the merger of Transcriptic with 3Scan. ECL and Strateos offer vast facilities equipped with state-of-the-art robotic instrumentation covering diverse scientific disciplines including organic synthesis, analytical chemistry, and biochemistry. ECL is expanding its portfolio to include microbiology and cell culture in the short term. Other companies are focused on specific disciplines, like Synthego (Redwood City, CA) in genomic engineering and Arctoris (Oxford, UK) in drug discovery, combining HTS and AI approaches. As stated above, the flexibility of these robotic cloud labs makes them suitable to run many tasks involved in drug discovery that complement HTS, so it seems foreseeable that they will play a prominent role in the field within the next few years.

6. DATA SCIENCES AND INFORMATION TECHNOLOGIES

As mentioned in chapter 3, at the dawn of HTS in the 1990s data analysis was performed using customized systems developed within each company. When the amount of data grew exponentially, more robust commercial LIMS platforms became available. ActivityBase® (developed by IDBS; Guildford, UK) was one of those LIMS platforms used by many companies to capture, process, analyze and store their HTS data. Since then, other products have appeared in the market with improved performance.

Screener® (from Genedata; Basel, Switzerland) is currently one of the most popular products. It enables users to optimize experiment and data analyses avoiding wasting time on lengthy data capture, processing, and management tasks. One of the Screener® features most appreciated by its users is the protocol development ability, which is more versatile and user friendly than that of other platforms. Data visualization is integrated in the software without the need of separate interface development; its sophisticated data viewers allow users to access the quality of available data across complete campaigns and determine improvement strategies with advanced data analysis algorithms Likewise, it seamlessly integrates the whole screening workflow, including compound management, without needing to combine more than one tool. In addition, the product is easy to integrate in corporate IT environments.

Dotmatics Studies® (from Dotmatics; Bishops Stortford, UK) is one of the newest tools for HTS data analysis and management. It expands its features beyond HTS to reach DMPK studies, hence integrating all the data in a single suite. Besides supporting very high data volume analysis and providing out-of-the-box and custom protocols and analysis (including plate- and non-plate-based assays), it allows manual or automated processing, a flexibility that is highly appreciated by customers. Another important advantage lies in its collaborative nature, including cloud-hosted deployments supporting external collaborations. Dotmatics also offers other tools like Register® for compound registration and Vortex® for data visualization and mining, all of them easily integrated with Studies®.

Tools for data visualization like Spotfire® (from Tibco, Palo Alto, CA) or the tools developed by Tableau (Seattle, WA) are also critical in the HTS process. Therefore, it is common that many software tools, likely from different vendors, end up getting incorporated into the HTS data workflow. An appropriate integration of all those systems into a common network becomes imperative to prevent experimental data sets from remaining isolated from one another in independent silos accumulating data that ultimately becomes unusable. Needless to say, such integration must be automatic to avoid manual data entry and manipulation steps that will put data integrity at risk. TetraScience (Boston, MA) offers cloud-based solutions to integrate all these tools. In addition, it harmonizes and transforms the data, unifying data structures in a centralized platform that makes the data available for data science and analytics like AI, hence favoring FAIR guidelines compliance (Figure 12).

Figure 12: Scheme on the integrated solutions provided by TetraScience to favor adherence to FAIR guidelines. A: Typical data workflow scheme with data silos generated by independent systems using manual data entry and manipulations. B: Improved data workflow using TetraScience integrative solutions (images from https://www.tetrascience.com). One of the major issues associated with these informatics systems is the difficulty that small enterprises and academic institutions find in accessing them due to budget limitations. There are several issues to be considered here. The first is the mindset of upper management, which very often considers these tools as “useful” rather than “absolutely necessary”. Hence, acquisition falls low in the scale of priorities. In addition, these systems demand periodic updates and support, so they are perceived as a continuous source of expenditure. Finally, public funding for these informatics solutions is not as straightforward to obtain as for other investments, e.g., in instrumentation or personnel. Agencies usually demand extended and carefully built explanations to justify investing in software tools, again due to a misinformed mindset that considers them as accessories rather than pillars on which to build the knowledge that will support future growth.

The critical role of data management and the need for an optimized data workflow has been highlighted by many experts in recent times. The blooming of AI and ML approaches has evidenced the absolute need for model-quality data fulfilling the FAIR principles mentioned above. Early implementation of tools like the ones described is critical for adopting data standards and process standards as well as for harmonizing and optimizing processes, avoiding costly corrections in the future. It is estimated that in the E.U. alone, the costs of data wrangling in R&D environments (i.e., the process by which the data is identified, extracted, cleaned and integrated to yield a data set that is suitable for exploration and analysis) exceeds 28 billion USD per year. Therefore, the economic impact of poor data management in the HTS field, where massive amounts of data are generated, appears significant. Doubtlessly, there is an urgent need to change old mindsets in order to ensure proper implementation of data management strategies in scientific organizations as a way to rise the value of their most important scientific asset: experimental results and data.

7. SOCIAL COMMITMENT IN HTS

A recent notable aspect of the evolution of HTS is the growing awareness among practitioners of the potential negative environmental externalities generated by the disposal of vast amounts of consumables along with the laudable desire to engage in more frequent collaborations with less economically resourced groups. The corporate goal is to improve the long-term sustainability of HTS, as well as the reputation of the pharmaceutical companies, in the face of mounting societal pressures.

7.1 PUBLIC-PRIVATE PARTNERSHIPS

Given the high costs associated with HTS, the last decades have seen the emergence of many partnerships between pharmaceutical companies and academic groups. While the former contribute costly instruments and know-how in the drug discovery process, the latter contribute deep knowledge on the biology of the disease being pursued. These public-private partnerships (PPPs) encompass the whole drug discovery process, from target validation to clinical trials. This approach was pioneered as early as in 1999 by the “Medicines for Malaria Venture” (MMV) initiative. The MMV example was then followed in the U.S. by the Critical Path Initiative (CPI), launched in 2004, as well as by the Accelerated Medicines Partnership (AMP) launched in 2014, and in the E.U. by the Innovative Medicines Initiative (IMI) launched in 2008. Many collaborations for HTS have been forged under the IMI and CPI umbrellas. Previous to those efforts, other PPP initiatives had bloomed like the Molecular Libraries Program (MLP), launched in 2003 by the National Institute of Health (NIH), which funded a U.S.-wide screening center network between 2004 and 2013 targeting chemical probe development. Indeed, MLP gave rise to the PubChem BioAssay database mentioned in section 4.1, intended to archive the subsequent HTS data that was generated.

Several examples of PPPs for HTS are available on both sides of the Atlantic. In Europe, the PPP paradigm is the European Lead Factory (ELF) that was founded in 2013 under the IMI auspices (https://www.europeanleadfactory.eu/) The ELF is a pan-European consortium composed of seven pharmaceutical companies and several academic groups and SMEs. There are two major components of the ELF: the ESCulab Compound Collection (ECC) and the European Screening Center (ESC).

The ECC is an evolution of the seminal Joint European Compound Library that was run from 2013 to 2018 and grew to reach 500K compounds endowed with an attractive physicochemical profile (average MW of 350 Da and logP of 2-3) and predicted to show activity in a diverse array of biological targets43. In addition, they extend into chemical space not previously accessible as they show a higher 3D character (fraction sp3 >0.4 for 86% of the library core) and include structurally distinct scaffolds (Tanimoto coefficient <0.2 for intercollection similarity)44. As of 2021, the ECC includes 535K compounds, with nearly 300K coming from large pharma and chemistry companies such as Grünenthal, (Aachen, Germany), Servier (Suresness, France) or Bayer (Leverkusen, Germany) and more than 200K from academic institutions with proven expertise in biological chemistry or library design, as well as from SMEs leaders in the field of contract chemistry services. These compounds are available at no cost for SME and academic researchers as explained below. The ESC is composed of three different units: the Compound Management unit located at Newhouse (UK); the Hit Characterization unit based in Dundee (UK) (both units being operated by BioAscent, a private company based in Newhouse, UK); and the HTS unit, operated by the Pivot Park Screening Center in Oss (Netherlands) (Figure 13). The whole ESC ecosystem is managed and coordinated by Lygature (Utrecht, Netherlands), a private company focused on providing partnership management to foster innovation in drug discovery and medical technology.

Figure 13: Laboratory at the Pivot Park Screening Center in Oss, Netherlands (image from https://www.ppscreeningcentre.com/). Academic and SME researchers can apply to access the services provided by the ELF if they have a therapeutic target and a HTS-friendly assay available. Successful applicants will run the HTS at no cost, the exercise being funded by the IMI. At the end of the screening campaign, the participant will receive a “Qualified Hit List” (QHL) of up to 50 compounds and will be granted exclusive access rights for 3 years to exploit the screening results included in the QHL. These access rights includes resynthesis of the most promising compounds. In the 2013-2018 period, 154 proposals for 8 target classes in 7 therapeutic areas were submitted to the ELF, 88 of which were accepted and led to the execution of 72 HTS campaigns with their corresponding follow-up work45.

In addition to the ELF, the EU-OPENSCREEN (https://www.eu-openscreen.eu/) was founded in 2018 as a non-profit research infrastructure by seven countries (Czech Republic, Denmark, Finland, Germany, Latvia, Norway, Poland and Spain) with the support of the European Commission. It provides access to the European Chemical Biology Library (made of 100K diverse compounds), assay development and screening facilities, medicinal chemistry and informatics platforms, and associated supporting facilities for protein production, cell line generation, computational and structural biology, and structure-based drug design. Academic institutions are the main players in EU-OPENSCREEN with the support of SMEs and industrial partners like AstraZeneca. Compound structures and primary screening data are made public in the European Chemical Biology Database (ECBD), although there is an option for a three year grace period after submitting a patent application or releasing the publication of a manuscript.

In the U.S., the NIH is fostering PPPs under the AMP and the MLP programs described above, similar to those mentioned in Europe. In addition, many pharmaceutical companies like Pfizer have created their own hubs for PPPs like the Pfizer’s Centers for Therapeutic Innovation. And many public institutions possess state-of-the-art HTS facilities, with the Scripps Research (previously known as The Scripps Research Institute) being the most paradigmatic case and showing a promising scenario for fruitful PPPs.

Some discussions have been established about how to assess the performance of PPPs. Given the lengthy nature of the drug discovery process it is difficult to define a single outcome without underestimating the beneficial impact from intangibles such as knowledge transfer, ongoing collaborations, patents, spin-off companies formed and educational aspects. Specific indicators will have to be developed to evaluate such performance and to guide the creation of new partnerships in the future.

7.2 SUSTAINABLE SCREENING

As mentioned above, HTS utilizes vast amounts of consumables, from chemical and biological reagents to plastic labware. Scientists involved in HTS have been concerned about this from the very early days and, indeed, the huge miniaturization efforts carried out since then have delivered not only clear economic savings but less environmental waste. These initiatives also led to the progressive abolishment of radioactive-based assays and their continuous replacement by fluorescence-based ones. The trend towards miniaturized assays has not stopped and is still being fostered by advances in liquid handlers as explained in section 5.1.

Although miniaturization delivers significant savings in reagents, it has little impact on the consumption of plastic labware, specially tips and microplates. As of today, plastic cannot be replaced by any other material in HTS tasks. However, the intensive use of plastic and the interest in finding alternatives is not an exclusive endeavor of HTS as it affects our society as a whole. New biodegradable plastic surrogates are becoming available for everyday objects in our lives: stone paper (made of calcium carbonate) is replacing plastic in supermarket bags; bioplastic (made of waste products from the production of corn) is used to produce bottles and food packages; a combination of casein and clay yields a sturdy polymer utilized for in packaging containers. Unfortunately, the requirements for plastic surrogates to be used in HTS are much more stringent than for consumer products. Among other properties, the material to be used must be strongly resistant to a variety of chemicals, from organic solvents like DMSO to salts and chemicals used in buffers and even harsher substances used to develop optical signals in the development step of some assays. In addition, materials must be biocompatible and amenable to cell culture and must show low (or null) binding properties to avoid assay interferences. Likewise, the materials cannot interfere with common readouts like fluorescence, absorbance or luminescence, avoiding all kind of spurious effects like quenching, crosstalk between wells, or light diffraction to cite a few. And, needless to say, they must be mechanically robust enough to be used by robotic arms and liquid handlers. Despite all these challenges, it is not inconceivable that sustainable biopolymers for HTS may be developed soon and initiatives within the HTS community to foster their discovery would be desirable and expected by many advocates.

While these materials are being sought, HTS scientists should try to reduce their environmental footprint. As mentioned above, miniaturization has some beneficial effects since moving from 96- to 384- to 1536-well plates causes a cumulative 4-fold reduction in the number of individual plates to be used. Unfortunately, this does not apply to plastic tips since the number of wells to be dispensed remains the same. Designing assays with the least possible addition steps (like assays in single addition mode) would have a positive impact, but single-addition assays are uncommon and not possible for many assay modalities (actually they are only feasible for binding assays which are based in ligand displacement, provided that the kinetics of ligand dissociation are acceptable for that purpose). Grenova (Richmond, VA) has recently launched TipNovus®, a tip washing instrument intended to recycle plastic tips allowing for multiple use. TipNovus® (Figure 14) comes in two different configurations: TipNovus®, capable of washing 4 tip-racks per wash-dry cycle and of processing 16-24 racks per hour, and TipNovus® Mini, dealing with one tip-rack per cycle and processing 6-10 racks per hour. While the former is intended as a standalone instrument, though fully integratable with robotic arms such as the cobots described in section 5.1, the latter is suited for robotic integration with liquid handlers.

Figure 14: TipNovus®, a tip washer instrument from Grenova, integrated with a robotic arm (image from https://grenovasolutions.com/). The TipNovus® instruments perform high pressure washing and ultrasonic cleaning using the cleaning solution of choice (Grenova also offers their proprietary detergents). Tip agitation improves the cleaning process and facilitates the final drying step, while UV sterilization and temperature control optimize tip drying. Although it is undeniable that many scientists will have to change their mindset to accept tip recycling, and thorough validations in customer labs will be needed to ensure that no cross-contamination occurs (in addition to the validation works already performed by Grenova), tip washing looks like a sensible and useful solution to reduce tip consumption helping to minimize plastic waste. Indeed, this technology has been widely implemented in 2020, when the COVID-19 pandemic and the need to perform massive PCR analysis pushed many labs to acquire tip washers to avoid shortages in the supply chain. According to Grenova figures, more than 730 million pipette tips were washed and recycled with TipNovus® in 2020 and Grenova projects that this will reach close to 2.5 billion in 2021 and will exceed 6.5 billion in 2022. It is therefore conceivable that similar solutions to recycle microplates and other laboratory plasticware like tubes, etc. may appear in the future, leaving Grenova in an excellent position to generate such technology: indeed Grenova has recently announced the launch of the first microplate washer (Purus®) to wash and recycle 96-well plates.

In addition to reducing reagents and plasticware consumption other initiatives must be considered to lower the environmental footprint of HTS. Energy savings is one of the most substantial to focus on, kaing the selection of appropriate equipment paramount. Vendors and manufacturers must be prepared for a likely scenario in the not-so-distant future when their instruments will need to be certified for an optimized use of energy, similar to what happened with common household appliances. Likewise, the use of robotic labs discussed in section 5.2 may play a beneficial role in this regard, since centralizing activities in common labs will consume significantly less energy and will have a lower environmental footprint compared to carrying out all the activities in different labs scattered across the globe. Initiatives like the “My Green Lab” certification (https://www.mygreenlab.org/) are intended to promote initiatives in favor of sustainability, and soon HTS labs will have to adhere to it and fit to achieve such accreditation.

8. A LOOK TO THE FUTURE: AI AND HTS

Artificial Intelligence (AI) and Machine Learning (ML) tools are being increasingly applied to diverse areas within drug discovery. Exploitation of preclinical and clinical data to improve clinical trial design and outcomes, management of toxicology data from preclinical models, and mining preclinical data to select the best candidate to be progressed to the clinic are areas where AI and ML development are currently intensely focused. AI and ML approaches in the very early stages of drug discovery (i.e., those related to the identification of the seminal molecules -the “hits”- modulating a given therapeutic target), are focused either on the design of novel molecules exploring the features of potential binding sites in known therapeutic targets (in silico drug generation) or in the exploitation of pharmacology data from known molecules to repurpose them for novel therapeutic applications, including designing analogues with improved features. Automated image analysis, of clear application in phenotypic screening, can be applied to the analysis of dose-response curves to predict pharmacological behaviors, contributing to the selection of the best candidates. Likewise, using AI algorithms to improve the performance of virtual screening is already in place since first introduced in 201046. But the direct application of AI/ML tools to improve and increase the efficiency of the classical, “wet” HTS is still in its infancy, far from being fully exploited. While it is clear that the power of AI/ML will impact HTS leading to significant improvements, the precise mode on how this will be done remains to be seen.

As outlined in chapter 3 as well as in section 4.1 when discussing ion channel assays, iterative screening (also known as “stepwise screening”) is a sensible strategy recently being considered by several companies. This consists of randomly selecting a few compounds from the global screening collection then using the screening results from this subset to select the next subset from the collection that ML algorithms predict will have the highest probability of delivering hits. The new subset is tested in the same assay again and the process is repeated several times in a continuous learning process, with each step giving a higher hit rate. After several iterations, the algorithm should be trained to be able to predict with high accuracy which compounds within the remaining untested collection should be active (Figure 15).

Figure 15: A diagram illustrating the fundamentals of iterative (or stepwise) screening. A subset of the compound collection is selected for screening and the actives identified (red dots) are used by a suitable algorithm to predict which compounds in the remaining collection will be active. This prediction is used to build a second set which undergoes the same process to refine the selection of compounds. The processes are repeated n times so that at the end of the nth round the system is able to accurately predict the positives included in the whole collection. Although this strategy was mentioned in 2009 by Mayr and Bojanic47, the first publication describing data from an iterative screening exercise came from Novartis in 201648. The authors tested this approach by doing a retrospective analysis of 34 different HTS assays. The conclusion from their study is that by screening only 1% of the collection in the iterative process they were able to identify compounds from the 0.5% top active group (according to the historical data obtained when running the complete HTS on those 34 assays), with the majority of the compounds selected in most of the assays being among the 5% top active ones. These results showed for the first time the potential of iterative screening with real data. After that publication, some other retrospective studies have been published to ratify the usefulness of iterative screening and to select the most efficient algorithms for this purpose. One of the most recent publications49retrospectively analyzed HTS data from PubChem and demonstrated that machine learning models that can be run in a desktop (e.g., like random forest, RF) can deliver excellent results: screening only 35% of the collection in three iterative cycles returned 70% of the active compounds that had been identified in the past after screening the complete collection. The percentage of actives increased steadily when increasing either the number of iterations or the fraction of the collection being screened, with RF delivering always the best results among the different algorithms used.

Besides these retrospective studies, no papers have yet been published on iterative screening of novel targets that had not been conventionally screened before. Given the excellent accuracy demonstrated by those retrospective analyses, iterative screening promises to be an elegant strategy that will not only deliver significant savings in reagents and consumables (hence improved HTS sustainability) but also enable the screening of difficult targets that can only be tested using cumbersome, non high throughput-friendly assays that were discarded in the past. And so, it seems just a matter of time before published results begin to appear in the literature as the sensitive nature of the process works through the normal publication delays.

It is tempting to speculate how iterative screening will evolve and shape HTS in the future. It does not seem unreasonable to believe that deep learning and transfer learning will give rise to new algorithms that will require fewer iterative cycles. Such algorithms may not only exploit the chemical information from the compounds but also structural information from the protein target, and the likelihood of interaction with specific domains. Virtual and “wet” HTS will likely converge at this point. It must be noticed that the 3D structures of a significant number of target proteins are already deposited in public domains, and the recent deployment of tools like AlphaFold from DeepMind (London, UK) has made the predicted structure of more than 350K proteins available, including 98.5% of the known human proteome50, making such information fully exploitable. In an ideal scenario, the process should be limited to a single screening cycle with a unique compound set (Figure 16). Such a compound set could be used for all the targets being screened as long as it fulfills several requirements. First, the set should be highly representative of the full compound library to ensure that the results obtained are highly predictable of those expected with the collection. Second, it must display a high degree of chemical diversity to feed the algorithm with the widest possible chemical information. It would also be desirable that the compounds included were synthetically affordable to ensure easy and continuous replenishment, avoiding shortages of certain chemotypes. The compound set must have a balanced size, a compromise between being large enough to ensure diversity but small enough to minimize the screening process. Needless to say, it must be highly annotated to improve the algorithm proficiency. Finally, it must be expansible to keep up with growth and changes in the compound collection.

Figure 16: A hypothetical future scenario where AI tools may enable the use of a unique, reduced compound set fulfilling several requirements in a single screening process to predict the outcome of screening the full corporate collection. Target structural information can also be exploited for a most accurate prediction of the outcome. It should not go unnoticed that since only a subset of the compounds will be screened there will be no need for a dedicated “screening library” made of millions of samples. Instead, the subset should be representative of the whole corporate collection, i.e., the former screening library (that will now be devoted to confirmatory assays) as well as compounds from all the historical medicinal chemistry programs. Indeed, such a reduced set would become the new screening library. Furthermore, in an extreme evolution of this scenario, there may be no need for physical collections since virtual libraries will be used to nurture the set of compounds, provided that all compounds in such virtual collections are accessible. As mentioned in section 4.4, many of the recently developed virtual libraries fulfill the requirements described in the previous paragraph and appear well suited for this purpose. This will definitively blur the lines between virtual and physical HTS, intertwining both activities. In such a scenario the most likely successful “hit hunters” would be those companies having not only access to the widest and most diverse corporate collections (virtual and physical) but, more especially, the best training set: therefore, the cheminformatics tools and criteria used to build the training set will be paramount for success in HTS.

Nonetheless, as stated at the beginning of this chapter, the paragraphs above are speculation based on the current trends and available results. Other tools and strategies may appear in the coming years that will define the precise role of AI in screening which ultimately will influence (if not dictate) the future of HTS.

9. CONCLUSIONS

After decades since its emergence in the pharmaceutical industry, HTS has reached a degree of maturity that keeps it as a key piece of the drug discovery process. That HTS is in a good shape is demonstrated by recent studies suggesting that the global HTS market is projected to be worth 26.4 billion USD by 2025, with a compound annual growth rate (CAGR) of 11.5 % during the next 5-years period51. The innovations described in this document are joining the traditional tools used in the discipline to continuously improve the efficiency of the process. Such new tools are helping create novel scenarios far from dogmatic principles, combining multiple plans in a single strategy in order to find the most relevant molecules, even for targets traditionally considered as intractable. So far parallel approaches (e.g., virtual screening, fragment screening and HTS) are nowadays combined in a convergent strategy, with each approach nurturing the others for improved efficiency. The increasing involvement of academic institutions interacting with private companies is helping incorporate new ideas in a cross-fertilized environment, which will foster the discovery of novel drugs for diseases of low commercial return but considerable social impact. Cloud labs will provide new models of collaboration between groups and will help reduce experimental costs for startups and small companies, hence accelerating the blossoming of novel initiatives and discovery programs that will eventually deliver innovative drugs. New AI-based tools will benefit from FAIR data to fully exploit the potential of HTS and improve its efficiency at the lowest possible expense and with the minimum environmental impact. Overall, the future of HTS looks even more promising than it appeared two decades ago during the hype created at its origins, and hopefully its contribution to the discovery of novel medicines will continue to be important in the years to come. 2. BIBLIOGRAPHY

1- le Sage C, Lawo S, Cross BCS. (2020) CRISPR: A Screener's Guide. SLAS Discov. 25: 233-240 2- Harris SR, Garlick RK, Miller JJ Jr, Harney HN, Monroe PJ. (1991) Complement C5a receptor assay for high throughput screening. J. Recept. Res. 11: 115-128 3- Burch RM, Kyle DJ (1991) Mass receptor screening for new drugs. Pharm. Res. 8: 141-147 4- Maloff BL, Delmendo RE. (1991) Development of high-throughput radioligand binding assays for interleukin-1 alpha (IL-1 alpha) and tumor necrosis factor (TNF-alpha) in isolated membrane preparations. Agents Actions 34: 132-134 5- Pereira DA, Williams JA. (2007) Origin and evolution of high throughput screening. Br. J. Pharmacol. 152: 53-61 6- Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (2001) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug. Deliv. Rev. 46: 3-26 7- Venter JC, Adams MD, Myers EW, et al. (2001) The sequence of the human genome. Science 291: 1304-1351 (Erratum in Science(2001) 292:1838) 8- Macarron R, Banks MN, Bojanic D, et al. (2011) Impact of high-throughput screening in biomedical research. Nat. Rev. Drug Discov. 10: 188-195 9- Wills TJ, Lipkus AH (2020) Structural Approach to Assessing the Innovativeness of New Drugs Finds Accelerating Rate of Innovation. ACS Med. Chem. Lett. 11: 2114-2119 10- Santos R, Ursu O, Gaulton A, et al. (2017) A comprehensive map of molecular drug targets. Nat. Rev. Drug Discov. 16: 19-34 11- Minta A, Kao JP, Tsien RY (1989) Fluorescent indicators for cytosolic calcium based on rhodamine and fluorescein chromophores. J. Biol. Chem. 264: 8171-8178 12- Hansen KB, Bräuner-Osborne H (2009) FLIPR assays of intracellular calcium in GPCR drug discovery. Methods Mol. Biol. 552: 269-278 13- Schuetz DA, de Witte WEA, Wong YC, et al. (2017) Kinetics for Drug Discovery: an industry-driven effort to target drug residence time. Drug Discov. Today 22: 896-911 14- Comley J (2014) Automated patch clamping finally achieves high throughput! Drug Discov. World 15: 45–56 15- Sakamoto KM, Kim KB, Kumagai A, et al. (2001) Protacs: chimeric molecules that target proteins to the Skp1-Cullin-F box complex for ubiquitination and degradation. Proc. Natl. Acad. Sci. U S A. 98: 8554-8559 16- Schapira M, Calabrese MF, Bullock AN, Crews CM (2019) Targeted protein degradation: expanding the toolbox. Nat. Rev. Drug Discov. 18: 949-963 17- Beeman K, Baumgärtner J, Laubenheimer M, et al. (2017) Integration of an In Situ MALDI-Based High-Throughput Screening Process: A Case Study with Receptor Tyrosine Kinase c-MET. SLAS Discov. 22: 1203-1210 18- Scholle MD, Liu C, Deval J, Gurard-Levin ZA (2021) Label-Free Screening of SARS-CoV-2 NSP14 Exonuclease Activity Using SAMDI Mass Spectrometry. SLAS Discov. 26: 766-774 19- de Rond T, Danielewicz M, Northen T (2015) High throughput screening of enzyme activity with mass spectrometry imaging. Curr. Opin. Biotechnol. 31: 1-9 20- Havugimana PC, Hart GT, Nepusz T, et al. (2012) A census of human soluble protein complexes. Cell 150: 1068-1081 21- Bieniossek C, Nie Y, Frey D, et al. (2009) Automated unrestricted multigene recombineering for multiprotein complex production. Nat. Methods 6: 447-450 22- Nie Y, Chaillet M, Becke C, et al.(2016) ACEMBL Tool-Kits for High-Throughput Multigene Delivery and Expression in Prokaryotic and Eukaryotic Hosts. Adv. Exp. Med. Biol. 896: 27-42 23- Bieniossek C, Imasaki T, Takagi Y, Berger I (2012) MultiBac: expanding the research toolbox for multiprotein complexes. Trends Biochem. Sci. 37: 49-57 24- Kriz A, Schmid K, Baumgartner N, et al. (2010) A plasmid-based multigene expression system for mammalian cells. Nat. Commun. 1: 120 25- Ferrer A, Arró M, Manzano D, Altabella T (2016) Strategies and Methodologies for the Co-expression of Multiple Proteins in Plants. Adv. Exp. Med. Biol. 896: 263-285 26- Saldívar-González FI, Huerta-García CS, Medina-Franco JL (2020) Chemoinformatics-based enumeration of chemical libraries: a tutorial. J. Cheminform. 12: 64-88 27- Chevillard F, Kolb P (2015) SCUBIDOO: A Large yet Screenable and Easily Searchable Database of Computationally Created Chemical Compounds Optimized toward High Likelihood of Synthetic Tractability. J. Chem. Inf. Model. 55: 1824-1835 28- Humbeck L, Weigang S, Schäfer T, Mutzel P, Koch O (2018) CHIPMUNK: A Virtual Synthesizable Small-Molecule Library for Medicinal Chemistry, Exploitable for Protein-Protein Interaction Modulators. ChemMedChem. 13: 532-539 29- Schneider G (2018) Generative Models for Artificially-intelligent Molecular Design. Mol. Inform. 37(1-2) doi: 10.1002/minf.201880131 30- Hann MM, Leach AR, Harper G (2001) Molecular complexity and its impact on the probability of finding leads for drug discovery. J. Chem. Inf. Comput. Sci. 41: 856-864 31- Hopkins AL, Groom CR, Alex A (2004) Ligand efficiency: a useful metric for lead selection. Drug Discov. Today 9: 430-431 32- Pérot S, Sperandio O, Miteva MA, Camproux AC, Villoutreix BO (2010) Druggable pockets and binding site centric chemical space: a paradigm shift in drug discovery. Drug Discov. Today 15: 656-667 33- Kidd SL, Osberger TJ, Mateu N, Sore HF, Spring DR (2018) Recent Applications of Diversity-Oriented Synthesis Toward Novel, 3-Dimensional Fragment Collections. Front. Chem. 6: 460-467 34- Jacquemard C, Kellenberger E (2019) A bright future for fragment-based drug discovery: what does it hold? Expert Opin. Drug Discov. 14: 413-416 35- https://cen.acs.org/articles/95/i25/DNA-encoded-libraries-revolutionizing-drug.html 36- Neri D, Lerner RA (2018) DNA-Encoded Chemical Libraries: A Selection System Based on Endowing Organic Compounds with Amplifiable Information. Annu. Rev. Biochem. 87: 479-502 37- Goodnow RA Jr, Dumelin CE, Keefe AD (2017) DNA-encoded chemistry: enabling the deeper sampling of chemical space. Nat. Rev. Drug Discov. 16: 131-147 38- Baell J, Walters MA (2014) Chemistry: Chemical con artists foil drug discovery. Nature 513: 481-483 39- Shumate J, Baillargeon P, Spicer TP, Scampavia L (2018) IoT for Real-Time Measurement of High-Throughput Liquid Dispensing in Laboratory Environments. SLAS Technol. 23: 440-447 40- Serra-Garcia M, Gneezy U (2021) Nonreplicable publications are cited more than replicable ones. Sci. Adv. 7(21):eabd1705. doi: 10.1126/sciadv.abd1705 41- Baker M (2016) 1,500 scientists lift the lid on reproducibility. Nature 533: 452-454 42- https://blog.emeraldcloudlab.com/laboratory-startup-costs-incubator-vs-cloud-lab/ 43- Besnard J, Jones PS, Hopkins AL, Pannifer AD (2015) The Joint European Compound Library: boosting precompetitive research. Drug Discov. Today 20: 181-186 44- Karawajczyk A, Giordanetto F, Benningshof J, et al. (2015) Expansion of chemical space for collaborative lead generation and drug discovery: the European Lead Factory Perspective. Drug Discov. Today 20: 1310-1316 45- Honarnejad S, van Boeckel S, van den Hurk H, van Helden S (2021) Hit Discovery for Public Target Programs in the European Lead Factory: Experiences and Output from Assay Development and Ultra-High-Throughput Screening. SLAS Discov. 26: 192-204 46- Mballo C, Makarenkov V (2010) Using machine learning methods to predict experimental high-throughput screening data. Comb. Chem. High Throughput Screen. 13: 430-441 47- Mayr LM, Bojanic D (2009) Novel trends in high-throughput screening. Curr. Opin. Pharmacol. 9: 580-588 48- Paricharak S, IJzerman AP, Bender A, Nigsch F (2016) Analysis of Iterative Screening with Stepwise Compound Selection Based on Novartis In-house HTS Data. ACS Chem. Biol. 11: 1255-1264 49- Dreiman GHS, Bictash M, Fish PV, Griffin L, Svensson F (2021) Changing the HTS Paradigm: AI-Driven Iterative Screening for Hit Finding. SLAS Discov. 26: 257-262 50- Jumper J, Evans R, Pritzel A et al. (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596: 583-589 51- https://www.drugtargetreview.com/news/84394/high-throughput-screening-market-set-to-be-worth-26-4bn-by-2025/