Molecular docking-based virtual screening: Challenges in hits identification for Anti-SARS-Cov-2 activity

Krisyanti Budipramana; Frangky Sangande

doi:10.3897/pharmacia.69.e89812

Review Article

Molecular docking-based virtual screening: Challenges in hits identification for Anti-SARS-Cov-2 activity

Krisyanti Budipramana^‡, Frangky Sangande^§

‡ University of Surabaya, Surabaya, Indonesia

§ Universitas Prisma, Manado, Indonesia

Corresponding author: Frangky Sangande ( frangky.sangande@gmail.com )

Academic editor: Georgi Momekov

This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Citation: Budipramana K, Sangande F (2022) Molecular docking-based virtual screening: Challenges in hits identification for Anti-SARS-Cov-2 activity. Pharmacia 69(4): 1047-1056. https://doi.org/10.3897/pharmacia.69.e89812

Abstract

The emergence of severe acute respiratory syndrome coronavirus 2 (SARS-Cov-2) requires finding new drugs or repurposing drugs for clinical use. Molecular docking belongs to structure-based drug design providing a fast method for identifying the hit compounds with antiviral activity against SARS-Cov-2. However, the weakness of the docking method is compounded by the limited crystallographic information and comparison drugs due to the novelty of this virus can present challenges in identifying hits of anti-SARS-Cov-2. In the current review, we highlighted several aspects, especially those related to the target structure, docking validation, and virtual hit selection, that need to be considered to obtain reliable docking results. Here, we discussed several cases pertaining to the issue highlighted and approaches that could be used to solve them.

Keywords

consensus docking, hit criteria, homology, ligand efficiency, SARS-Cov-2

Introduction

The drug discovery process and development are commonly time-consuming for almost 20 years and have an immoderate cost. Drug discovery can be initiated by screening millions of compounds into smaller sizes and, finally, the best compound, the hit or lead compound, is found. New methods are still being developed to shorten the time and save money (Talele et al. 2010) and for this purpose, virtual screening has gained a lot of popularity lately.

Virtual screening comprises two approaches: structure-based drug design (SBDD) and ligand-based drug design (LBDD). LBDD is based on the principle that compounds with similar structures tend to exhibit similar biological activity. Thus, the screening will focus on finding compounds similar to the known active compounds. Meanwhile, in SBDD, it is assumed that biologically active compounds will be able to bind to target molecules (proteins, enzymes, DNA, RNA). Molecular docking is one of the structure-based virtual screening methods commonly used in drug discovery (Vázquez et al. 2020).

The emergence of novel coronavirus disease at the end of 2019 has opened a new chapter in the drug discovery field. Many studies have been carried out to find anti-SARS-Cov-2 drugs. Interestingly, the current treatment of SARS-Cov-2 is dominated by reusing existing drugs such as chloroquine/hydroxychloroquine, lopinavir/ritonavir, ribavirin, oseltamivir, remdesivir, and favipiravir (Singh et al. 2020), indicating that the need for Covid-19 drugs is urgent. Therefore, rapid strategies to shorten the hit identification process are needed, and molecular docking is one of the most frequently used strategies, probably because the method is relatively simple, fast, and does not require many tools, especially if the compounds being screened are already available for clinical use.

According to our search in the PubMed search engine using “docking screening covid” as a query, there were 398 articles in 2020 and an increase to 709 articles in 2021, indicating that scientists are interested in applying the molecular docking method for anti-SARS-Cov-2 drugs discovery. Unfortunately, several articles have suggested that caution is required in concluding the molecular docking results because there are some limitations to molecular docking (Kolb and Irwin 2009; Scior et al. 2012; Chen 2015). Hence, we are interested in discussing some aspects that can affect the quality of docking results, including the target structure, docking protocol validation, and virtual hit selection, when dealing with a novel case of SARS-Cov-2. Out of 1107 articles from 2020–2021, we solely analyzed 318 articles that met the inclusion criteria: free full-text access, thus we can obtain the complete methodology and the topic should be about virtual screening against a single target. Hopefully, this review will provide useful information in developing molecular docking-based virtual screening, especially when dealing with a novel target.

Target structure selection

In structure-based drug design, molecular docking will model the interaction between a ligand and the target molecule and then calculate the complex binding energy to distinguish between the binder and non-binder ligands. For good results, this method requires the high-resolution 3D structure of the target generated from X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, or homology modeling (Cerqueira et al. 2009). X-ray structures are considered more accurate than NMR structures, so they are highly recommended for use in molecular docking simulation. Since NMR structures are determined in solution, there will usually be up to ten conformers and one of them must be chosen which is the most representative for docking simulation. Huang and Zou (2007) reported that this method resulted in a poor prediction of binding modes and docking scores. On the other hand, a homology structure is usually used when X-ray and NMR structures are not yet available. Homology modeling generally consists of three major stages: finding the template structure; aligning target sequences to the template; building the model. Errors may occur at each stage (Rockey and Elcock 2006; Bordogna et al. 2011), thus using it in docking studies becomes riskier.

In the case of SARS-Cov-2, the availability of some crystal proteins is still limited due to their novelty. Thus, the structures of these proteins were ordinarily generated using homology modeling based on the SARS-Cov structure as the template (Sharma et al. 2020; Hosseini et al. 2021; Liu et al. 2021). According to the molecule being attacked, there are two categories of drug targets of SARS-Cov-2, one from the host including angiotensin-converting enzyme 2 (ACE2), transmembrane serine protease 2 (TMPRSS2), and the other from the virus itself consisting of structural proteins namely membrane (M), spike (S), envelope (E), nucleocapsid (N). In addition, there are 1–16 non-structural proteins (nsp) such as main protease (Mpro), papain-like protease (Plpro), and RNA-dependent RNA polymerase (RdRp) (Wu et al. 2020). Fig. 1 shows that out of the total articles reviewed, Mpro was the most used target. This might be due to its co-crystal complex structure having been solved and available in the protein data bank (PDB) thus providing useful information about the binding site and interaction profile for selecting the virtual hits.

Figure 1.

The number of articles for each target of antiviral SARS-Cov-2.

According to our findings, 11.6% of articles used homology structure in their docking studies for several targets such as TMPRSS2 (Barge et al. 2021; Idris et al. 2021; Mahmudpour et al. 2021), S (Mathew et al. 2021), ACE2 (Srivastava et al. 2021), and helicase (White et al. 2020). Indeed, the similarity between the protein sequence of SARS-Cov-2 and SARS-Cov as the template is high (Dong et al. 2020). The degree of similarity between the query and template sequences highly determines the quality of a homology structure where the lower the similarity, the worse the quality (Robinson et al. 2014). However, a study by McGovern and Shoichet (2003) demonstrated that three homology structures with 80% similarity to the template showed the worst enrichment factors (EF) (see validation of molecular docking section) when used in molecular docking-based screening, indicating poor screening power.

Meanwhile, to answer the question of whether the general rule that only homology models built from templates with a sequence identity > 50% of the template are suitable for docking study, Bordogna et al. (2011) revealed that in many cases of their study, models with sequence identity > 50% showed distance root-mean-square deviation (dRMSD) of values in the range of 2–8 Å, whereas in some cases, models with sequence identity < 50% showed acceptable values of dRMSD (< 2 Å). dRMSD is a deviation in the relative position of the ligand to the binding site residues after the superimposition of the model onto the target structure. These results are in line with a study conducted by Fernandes et al. (2004) with EF values as the parameter, indicating that sequence identity does not guarantee the quality of a homology model used in docking studies. Moreover, Bordogna et al. (2011) pointed out that the success rate of docking simulation depends on the accuracy of modeling the binding site. Here, the binding site geometry of the template will be transferred to the target, thereby possibly contributing to the screening accuracy. For this purpose, ligand binding information from the holo template (protein structure complexed with a ligand) might play an important role. Rockey and Elcock (2006) demonstrated that kinase homology models generated from other kinases as the template complexed with staurosporine showed better results for redocking of staurosporine than apo (protein structure without any ligand) or template complexed with a different ligand.

Concerning SARS-Cov-2, we found several studies have already been carried out before the targets are available in the protein data bank. As a result, homology structures were used in their virtual screening process. Two studies compared their homology models of Mpro (Jiménez-Alberto et al. 2020) and RdRp (Narayanan and Nair 2020) with the corresponding actual structures that were solved and deposited in the PDB by other studies before publication. The authors reported that their models overlap with the actual structure. We noted that the templates have sequence identity > 90% of the targets. Nevertheless, just like the other studies that used homology in this review, the performance of their docking protocols was not tested thus they just relied on the model’s quality in terms of sequence identity. The use of the lowest sequence identity (35.2%) was found in modeling TMPRSS2 using serine protease hepsin as the template (Durdaği 2020; Chikhale et al. 2021).

On the other hand, although some studies in our review have used the actual target structures, it still poses challenges, especially when dealing with the apo-form (Seeliger and De Groot 2010). A study by McGovern and Shoichet (2003) has demonstrated that the screening power using apo-form to distinguish active and decoys were generally weaker than holo-form. Moreover, the presence of a native ligand in holo-form provides binding pose information in addition to docking scores which can be a reference for selecting virtual hits. So far, Mpro was the most used target in this review, with some holo-forms available in PDB (Table 1).

Table 1.

Download as

CSV

XLSX

Crystal structure of several targets used in the articles reviewed.

No	Targets	PDB ID
1	Spike	6LZG, 6M0J, 6M17, 6VSB, 6W41, 6X6P, 7BZ5
2	ACE2	1R4L*, 1R42, 6LZG, 6M0J, 6VSB, 2AJF, 6VW1
3	TMPRSS2	7MEQ*
4	RdRp	6M71, 7BV2*, 7BW4
5	Nucleocapsid	7M4R, 6VYO, 6ZCO
6	Mpro	6LU7, 4MDS, 6Y2E, 6Y2F, 6Y2G, 6Y7M, 6Y84, 6YB7, 5R7Y, 5R80, 5R82, 5R84, 5RF7, 5RFS, 6LZE, 6M03, 6M0K, 6M2N, 6W63, 7BQY, 7BRP, 7JYC*
7	Plpro	4OVZ, 4OW0, 6W9C, 6WX4, 7JN2, 7JRN*
8	Nsp3	6W02, 7BF6
9	Nsp9	6W4B
10	Nsp10	6W4H, 6YZ1
11	Nsp15	6VWW, 6W01*
12	Nsp16	6W4H, 6YZ1, 6WKQ*

Naturally, a protein may have several binding site shapes depending on the type of ligand bound to it as a result of the induced-fit effect and side-chain flexibility. For example, the crystal structure of monoamine oxidase B (MAO-B) complexed with pioglitazone (PDB: 4A79) has a binding site volume of 97 Å whereas when MAO-B complexed with 1,4-diphenyl-2-butene (PDB:1OJ9), the binding site volume increases to 289 Å (Ramírez and Caballero 2018). Therefore, in addition to the resolution of the crystal structure, the use of a holo-structure with a native ligand similar to the ligand to be docked will increase the probability of successful docking (Fan et al. 2009; Tuccinardi et al. 2010). Most of the Mpro articles in this review used 6LU7 in their docking studies, in line with the finding by Llanos et al. (2021). When a study analyzed the docking results of natural compounds against 6LU7 (holo) and 6Y2E (apo), authors found that the selected compounds exhibited different binding poses, as well as their rank based on the docking score on each target (Hastantram et al. 2020), confirming that the target structure could affect docking results, especially for screening purposes.

Validation of molecular docking

Molecular docking consists of two main stages, namely sampling the binding pose of ligand in the active site of the macromolecule and predicting the binding energy expressed as a docking score for each pose using scoring functions (Meng et al. 2011). So far, the existing docking tools are generally able to generate the correct binding pose. Unfortunately, not always the correct binding pose is scored with the lowest energy. Verkhivker et al. (2000) in their docking simulation of the transthyretin-thyroxine complex (PDB: 1ETA) using two scoring functions found that the lowest energy conformation was at RMSD values of 8.97 and 6.74 Å respectively from the native state, indicating that the two scoring functions failed to predict the correct binding pose. If these docking protocols are still used for virtual screening, the use of the ranking order of the compounds based on their docking score becomes unreliable as a guide in virtual hit selection.

The inaccuracy of the scoring function in molecular docking could be caused by neglecting some parameters (e.g., solvation effects, flexibility, polarization effects) required for the binding energy calculation (Pantsar and Poso 2018). If all these parameters must be covered in the molecular docking process, the computational cost becomes high and time-consuming. Consequently, molecular docking is no longer effective for screening millions of compounds. Here, the validation process before docking simulations, known as redocking, plays an important role to confirm the docking power, whether the docking algorithm can reproduce the experimental binding pose (RMSD < 2 Å), and whether the scoring function can rank it at the top position (lowest energy) compared to other conformations (Meng et al. 2011). To perform this step, a crystal structure in holo-form is required.

In addition to docking power, there is another validation type commonly used in molecular docking, namely screening power assessment, by conducting a retrospective study where a docking protocol is used to screen a database consisting of active and inactive (decoy) compounds (Empereur-Mot et al. 2015). The screening results are then ranked based on their docking score and then the area under the curve (AUC) of receiver operating characteristic (ROC) curves or enrichment factor (EF) is calculated. An acceptable AUC value of ROC for virtual screening is > 0.7. It is noteworthy, that even though two docking protocols show the same AUC values, they may differ in their ability to recognize the active compounds at the beginning of the ranking list (Braga and Andrade 2013). In this case, EF can be relied on to select the best protocol where the greater the EF value, the more active compounds are found at the top of a certain fraction, generally 1% of the ranked database (Huang et al. 2006). Here, EF_1% is defined according to the equation below (A_{sampled (1%)} = number of active compounds found at 1% of the database screened, N_{sampled (1%)} = number of compounds screened at 1% of the database, A_total = number of active compounds in the entire database, N_total = number of compounds in the entire database) (Braga and Andrade 2013).

As shown in Fig. 2, models A and B result in the same AUC values. However, model B has a stepper ROC curve than model A, indicating a larger EF value. Therefore, model B is superior to model A.

Figure 2.

ROC curve of model A and model B.

In this review, we found that 71% of the articles performed their docking screening without validation data (Fig. 3). It was not surprising because the targets used were in the apo-form or homology models so it was not possible to carry out the redocking process. Even so, articles with holo-form targets also did not perform this validation. On the other hand, only 25% of articles involved redocking, and Mpro was the most commonly used target. Regarding screening power, only 2% of articles used this type of validation, and 2% of the others used a combination of docking and screening power. The lack of known inhibitors could be a reason for infrequent testing of the screening power. Without the validation process, the result of the docking screening might be questionable, especially when using a homology structure that has multiple layers of prediction compared to an actual structure. Rather than relying solely on the docking score to select virtual hits, additional confirmation is required, such as molecular dynamics (MD) simulation followed by rescoring the binding free energy using molecular mechanic/Poisson–Boltzmann or generalized Born and surface area (MM/PB(GB)SA).

Figure 3.

Distribution of validation types used.

MD simulation can overcome some limitations of molecular docking as mentioned previously. The complex of ligand-protein with an incorrect binding pose obtained from docking simulation can be identified by MD simulation where this complex will produce an unstable MD trajectory during the simulation characterized by increasing the ligand-RMSD profile. Moreover, an improvement of enrichment hit in retrospective virtual screening experiments was obtained when the docking poses were rescored using MM/PB(GB)SA after MD simulation (De Vivo et al. 2016). Chen (2015) found some articles with inconsistent docking results compared to MD in their report where the top score docking pose showed huge differences after a 20 ns MD simulation. Here, MD simulation provides further computational validation before summarizing the docking results.

On the other hand, MD plays a role in generating multiple conformations of the protein targets of interest in addition to multiple experimental structures which can be used for ensemble docking aimed to solve the flexibility of the binding site in molecular docking (Huang and Zou 2010; Salmaso and Moro 2018). It has been shown that the performance of ensemble docking in predicting the binding affinity is superior to docking using a single structure (Yan and Zou 2015). Meanwhile, in the case of docking using a homology structure, ensemble docking can be performed by generating multiple conformations of the target from multiple templates. Fan et al. (2009) generated 222 homology models based on 222 templates for 38 proteins and performed consensus ensemble docking. Then evaluated the EF for being compared to the EF obtained from docking using the holo and apo X-ray structures of the same protein. They found that EFs of consensus ensemble docking were better and comparable to the holo X-ray structure in 15 and 9 cases of the 38 targets, respectively. Meanwhile, its performance was still better compared to docking using an apo X-ray structure, indicating this might be an alternative strategy to improve the performance of docking using the homology model. In the current review, we found that 65.7% of the articles involved MD simulations in their screening process. Most of them used MD, particularly for predicting the complex ligand-protein stability and calculating the free binding energy.

Hits criteria

After performing a molecular docking-based screening against a database of ligands, the next important step is post-processing of the docking results to select the virtual hits, which are ideally top-ranked compounds. Generally, 0.1–2.5% of the top-ranked compounds are considered for experimental confirmation (Slater and Kontoyianni 2019). The next challenge is which compounds among the top-ranked will be chosen, especially, when the experimental testing capacity is not sufficient to cover all of them. To illustrate, for a database consisting of 1 billion compounds, at least 100,000 virtual hits will be obtained. If only 100 assays are available for experimental testing, then the compounds should be further screened. As previously discussed, the scoring function is not completely accurate in predicting binding energy. Thus, relying on the docking score as the only criterion will increase false positive hits. Here, we discussed several additional filters such as binding pattern, consensus docking, and ligand efficiency that can be applied not only to reduce the size of the virtual hits but also to minimize the false positives.

Binding pattern

In docking simulations, it is important to take into account the binding pattern of a ligand to the target protein. Several proteins have been known to have key residues on their active site that can interact with a ligand to produce or improve biological activity. For example, Met769 in the hinge region of EGFR (PDB:1M17) has been reported to be an important residue for ligand inhibitory activity, particularly via the hydrogen bond. The presence of this bond in inhibitors can increase their activity (Sangande et al. 2022). Meanwhile, four histidine residues (His361, His366, His541, His545) and Ile663 around Fe are the important residues in lipoxygenase (PDB: 1LOX). Binding or blocking 1–2 of those by a ligand can prevent the catalytic process (Rissyelly et al. 2022). According to these criteria, ligands, which involve the key residue in their interaction, should be prioritized. It has been demonstrated in a retrospective kinase virtual screening that in most cases, the false positive hits had no interaction with certain key residues of the kinase (Perola 2006). These key residues can generally be recognized from crystallographic data of the inhibitor/activator-protein complex so that again holo-structure is superior to apo or homology structure for this purpose.

In the case of SARS-Cov-2, several targets have no crystallographic data, especially in holo-form. To accommodate this issue, several articles try to dock existing drugs showing clinical benefit in SARS-Cov-2 treatment to the target of interest and take their binding pattern as a reference. This strategy is legal to use as long as the reference drugs have been confirmed to work against the intended target. Surprisingly, we found several studies docked hydroxychloroquine and remdesivir to Mpro, and used their binding pattern for selecting hits. To the best of our knowledge, remdesivir is an RdRp inhibitor (Yin et al. 2020; Kokic et al. 2021), while the mechanism of action of hydroxychloroquine is not yet clear (Satarker et al. 2020). This might cause an error in hit selection. On the other hand, several articles also set docking score cutoffs, especially when there are no reference drugs for comparison (Liu et al. 2021; Majeed et al. 2021). A study reduced the size of the database screened using Glide to 1% of the total 2201 compounds by utilizing a docking score cutoff at -8.5 kcal/mol for further analysis. This cutoff was chosen because it roughly corresponds to 1 µM (Wang 2020).

Consensus docking

In molecular docking, there are three types of scoring functions (SF): force field, empirical, and knowledge-based scoring function (Huang and Zou 2010; Maia et al. 2020). Each SF may result in a different order of ranking compounds when applied in virtual screening as shown in a previous study whereby four 2-substituted 4-aminoquinazoline derivatives docked to human epidermal growth factor receptor 2 (HER2), exhibited different ranking orders between DOCK6 (force field SF) and iGemdock (empirical SF) (Sangande et al. 2022). Several studies have also used this method called traditional consensus docking by combining several docking tools with different SF to obtain dependable results (Aliebrahimi et al. 2018; Kim et al. 2021).

In virtual screening, the consensus docking method can be applied by averaging the score of compounds in each docking tool to generate a new ranking order or by selecting the compounds that are top-scored by all docking tools used (Palacio-Rodríguez et al. 2019). Garcia-Sosa et al. (2008) conducted a virtual screening study in which they used two docking tools, Autodock and Glide. In this study, they calculated the consensus score for each compound, which is the mean of the two scores obtained from Autodock and Glide. In another study, Li et al. (2016) performed consensus docking by selecting four compounds that entered the top ten rankings in both result groups DOCK and Vina (Table 2).

Table 2.

Download as

CSV

XLSX

The top ten compounds on each docking tool, with four compounds (highlighted) found on both ranking lists.

Rank	DOCK score		Vina score
1	ZINC67912533	-49.206	ZINC67912780	-12.1
2	ZINC67912770	-42.197	ZINC67912765	-11.9
3	ZINC67912536	-41.430	ZINC67912770	-11.6
4	ZINC67912780	-38.277	ZINC67912773	-11.4
5	ZINC67912532	-37.362	ZINC49823152	-11.2
6	ZINC72320416	-36.851	ZINC28882432	-11.2
7	ZINC67912525	-36.756	ZINC67902892	-11.1
8	ZINC72320169	-35.085	ZINC08234294	-11
9	ZINC28882432	-33.335	ZINC77269187	-10.8
10	ZINC03643476	-33.309	ZINC72320169	-10.7

In the current review, we found that 9.1% of articles used consensus docking in their studies. Interestingly, one article applied a stricter consensus method by taking into account the binding poses of the ligands on the three docking tools: Glide, FRED, and Vina. In the two previous studies mentioned above, it is not clear whether the two docking tools result in a similar binding pose or not. Briefly, Gimeno et al. (2020) defined the hits in their study as the compounds with the equivalent binding pose (RMSD < 1.5 Å) in all three docking tools and presented the highest mean docking score (Fig. 4). By applying this method, the resulting docking score may represent the correct pose thus it is more accurately used as the basis for compiling a ranking list.

Figure 4.

Illustration of the consensus method performed by Gimeno et al. (2020) (CS: Consensus score; DS: Docking score).

Ligand efficiency

Docking also has a drawback related to the appearance of a biased score caused by the molecular weight (MW) of the ligand since, in most cases, the docking score is directly proportional to the MW (Cosconati et al. 2010), thereby potentially increasing false positives due to the accumulation of large molecules in the top rank. In a study using a series of 2,4-diamino-8-quinazoline carboxamide derivatives with a known IC₅₀ docked to the human cluster of differentiation 38 (CD38), Boittier et al. (2020) compared the docking score and IC₅₀ of each compound. Generally, the results revealed that the scoring function of several docking tools tends to overpredict compounds with a large R₂ group and vice versa (Fig. 5). One approach that might be used to control the balance of the MW and docking score is ligand efficiency (LE). Originally, LE which belongs to the efficiency indices is calculated by dividing the experimental binding affinity by the number of heavy atoms. However, in molecular docking, the binding affinity can be replaced by the docking score (Hetényi et al. 2007; García-Sosa et al. 2010).

Figure 5.

Overpredictive of a derivative compound with a large R₂ group (A); Underpredictive of a derivative compound with a small R₂ group (B).

In virtual screening, ligands with high LE values indicate that they use each atom efficiently to bind to the target and should be prioritized as hits. As pointed out by Garcia-Sosa et al. (2008), they suggested five compounds (represented by NSC154829) that would be interesting for inhibiting the wild-type H5N1 neuraminidase (PDB: 2HU0) according to their LE values, even though there were other compounds with better consensus scores (represented by amikacin) but low LE values (Fig. 6). As a comparison, oseltamivir is the known inhibitor as well as the native ligand of 2HU0 with a low MW but has a high LE value of 0.363 kcal/mol/heavy atom. This may be the reason for the efficiency of oseltamivir against this target. The LE values may vary depending on the type of target. However, many researchers use the LE value of 0.3 kcal/mol/heavy atom or better as a good criterion for selecting hits because these values are equivalent to drug candidates with a Kd of 10 nM and an MW of 500 Da (~38 heavy atoms). On the other hand, to achieve this level, a starting hit for optimization should have an MW of 350 Da and 0.5 µM activity or better because the optimization process often involves the addition of new groups in the scaffold of ligand thus increasing the MW. Noteworthy, an MW of 500 is the maximum recommended limit for a drug to be taken orally according to Lipinski’s Rule of Five (Zhu et al. 2013). Unfortunately, LE or other efficiency indices are rarely used in most virtual screening cases as encountered in the current review. We noted only 4.1% of articles involved the LE parameter in their docking-based virtual screening study.

Figure 6.

The chemical scaffold of oseltamivir, NSC154829, and amikacin with their consensus scoring (CS), molecular weight (MW), number of heavy atoms (NHA), and ligand efficiency (LE).

Conclusion

Molecular docking provides an effective method for identifying hits in the early stages of drug discovery. So far, docking scores are generally used as the criteria in selecting hits. Unfortunately, the present simplification in molecular docking decreases the accuracy of the docking score thus increasing false positives. In the current review, we have discussed several aspects that may affect the docking performance caused by the inaccuracy of the docking score.

As long as possible, it should be recommended to use the actual crystal structure (X-ray or NMR), especially in holo form, because access to redocking validation and information on ligand binding is only available in holo-structure. For apo and homology as well as holo-structure, the screening power validation can be performed if known inhibitors are available. Fortunately, several studies have reported in vitro activity of their hits against the SARS-Cov-2 target of interest, so it might be used to generate active and decoy sets.

As a novel virus, the information on its target proteins is limited. Relying solely on the docking score to pick hits has the potential to increase false positives, especially without validation or MD confirmation. Therefore, strict virtual hit selection should be applied. There are many strategies available for this purpose. In the current review, we suggested considering ensemble docking, consensus pose and scoring, and ligand efficiency to improve the accuracy of docking results, especially against a novel target.

References

Aliebrahimi S, Montasser Kouhsari S, Ostad SN, Arab SS, Karami L (2018) Identification of phytochemicals targeting c-met kinase domain using consensus docking and molecular dynamics simulation studies. Cell Biochemistry and Biophysics 76: 135–145. https://doi.org/10.1007/s12013-017-0821-6

Barge S, Jade D, Gosavi G, Talukdar NC, Borah J (2021) In-silico screening for identification of potential inhibitors against SARS-CoV-2 transmembrane serine protease 2 (TMPRSS2). European Journal of Pharmaceutical Sciences 162: 105820. https://doi.org/10.1016/j.ejps.2021.105820

Boittier ED, Tang YY, Buckley ME, Schuurs ZP, Richard DJ, Gandhi NS (2020) Assessing molecular docking tools to guide targeted drug discovery of CD38 inhibitors. International Journal of Molecular Sciences 21(15): 1–19. https://doi.org/10.3390/ijms21155183

Bordogna A, Pandini A, Bonati L (2011) Predicting the accuracy of protein–ligand docking on homology models. Journal of Computational Chemistry 32(1): 81–98. https://doi.org/10.1002/jcc.21601

Braga RC, Andrade CH (2013) Assessing the performance of 3d pharmacophore models in virtual screening: how good are they? Current Topics in Medicinal Chemistry 13(9): 1127–1138. https://doi.org/10.2174/1568026611313090010

Cerqueira NMFSA, Sousa SF, Fernandes PA, Ramos MJ (2009) Virtual Screening of Compound Libraries. In: Roque A (Ed.) Ligand-macromolecular interactions in drug discovery. Methods in Molecular Biology. Humana Press, Totowa, 57–70 pp. https://doi.org/10.1007/978-1-60761-244-5_4

Chen YC (2015) Beware of docking! Trends in Pharmacological Sciences 36(2): 78–95. https://doi.org/10.1016/j.tips.2014.12.001

Chikhale RV, Gupta VK, Eldesoky GE, Wabaidur SM, Patil SA, Islam MA (2021) Identification of potential anti-TMPRSS2 natural products through homology modelling, virtual screening and molecular dynamics simulation studies. Journal of Biomolecular Structure and Dynamics 39(17): 6660–6675. https://doi.org/10.1080/07391102.2020.1798813

Cosconati S, Forli S, Perryman AL, Harris R, Goodsell DS, Olson AJ (2010) Virtual screening with AutoDock: Theory and practice. Expert Opinion on Drug Discovery 5(6): 597–607. https://doi.org/10.1517/17460441.2010.484460

De Vivo M, Masetti M, Bottegoni G, Cavalli A (2016) Role of molecular dynamics and related methods in drug discovery. Journal of Medicinal Chemistry 59(9): 4035–4061. https://doi.org/10.1021/acs.jmedchem.5b01684

Dong S, Sun J, Mao Z, Wang L, Lu YL, Li J (2020) A guideline for homology modeling of the proteins from newly discovered betacoronavirus, 2019 novel coronavirus (2019-nCoV). Journal of Medical Virology 92(9): 1542–1548. https://doi.org/10.1002/jmv.25768

Durdaği S (2020) Virtual drug repurposing study against SARS-CoV-2 TMPRSS2 target. Turkish Journal of Biology 44(3): 185. https://doi.org/10.3906/biy-2005-112

Empereur-Mot C, Guillemain H, Latouche A, Zagury JF, Viallon V, Montes M (2015) Predictiveness curves in virtual screening. Journal of Cheminformatics 7: 52. https://doi.org/10.1186/s13321-015-0100-8

Fan H, Irwin JJ, Webb BM, Klebe G, Shoichet BK, Sali A (2009) Molecular docking screens using comparative models of proteins. Journal of Chemical Information and Modeling 49(11): 2512–2527. https://doi.org/10.1021/ci9003706

Fernandes MX, Kairys V, Gilson MK (2004) Comparing ligand interactions with multiple receptors via serial docking. Journal of Chemical Information and Computer Sciences 44(6): 1961–1970. https://doi.org/10.1021/ci049803m

García-Sosa AT, Siid S, Maran U (2008) Design of multi-binding-site inhibitors, ligand efficiency, and consensus screening of avian influenza H5N1 wild-type neuraminidase and of the oseltamivir-resistant H274Y variant. Journal of Chemical Information and Modeling 48(10): 2074–2080. https://doi.org/10.1021/ci800242z

García-Sosa AT, Hetényi C, Maran UKO (2010) Drug efficiency indices for improvement of molecular docking scoring functions. Journal of Computational Chemistry 31(1): 174–184. https://doi.org/10.1002/jcc.21306

Gimeno A, Mestres-Truyol J, Ojeda-Montes MJ, Macip G, Saldivar-Espinoza B, Cereto-Massagué A, Pujadas G, Garcia-Vallvé S (2020) Prediction of novel inhibitors of the main protease (M-pro) of SARS-CoV-2 through consensus docking and drug reposition. International Journal of Molecular Sciences 21(11): 3793. https://doi.org/10.3390/ijms21113793

Hastantram SRM, Vishwakarma R, Uma Shaanker R (2020) Molecular docking analysis of selected natural products from plants for inhibition of SARS-CoV-2 main protease. Current Science 118(7): 1087–1092. https://doi.org/10.18520/cs/v118/i7/1087-1092

Hetényi C, Maran U, García-Sosa AT, Karelson M (2007) Structure-based calculation of drug efficiency indices. Bioinformatics [Oxford, England] 23(20): 2678–2685. https://doi.org/10.1093/bioinformatics/btm431

Hosseini M, Chen W, Xiao D, Wang C (2021) Computational molecular docking and virtual screening revealed promising SARS-CoV-2 drugs. Precision Clinical Medicine 4(1): 1–16. https://doi.org/10.1093/pcmedi/pbab001

Huang N, Shoichet BK, Irwin JJ (2006) Benchmarking sets for molecular docking. Journal of Medicinal Chemistry 49(23): 6789–6801. https://doi.org/10.1021/jm0608356

Huang SY, Zou X (2007) Efficient molecular docking of NMR structures: Application to HIV-1 protease. Protein Science 16(1): 43–51. https://doi.org/10.1110/ps.062501507

Huang SY, Zou X (2010) Advances and challenges in protein-ligand docking. International Journal of Molecular Sciences 11(8): 3016–3034. https://doi.org/10.3390/ijms11083016

Idris MO, Yekeen AA, Alakanse OS, Durojaye OA (2021) Computer-aided screening for potential TMPRSS2 inhibitors: a combination of pharmacophore modeling, molecular docking and molecular dynamics simulation approaches. Journal of Biomolecular Structure & Dynamics 39(15): 5638–5656. https://doi.org/10.1080/07391102.2020.1792346

Jiménez-Alberto A, Ribas-Aparicio RM, Aparicio-Ozores G, Castelán-Vega JA (2020) Virtual screening of approved drugs as potential SARS-CoV-2 main protease inhibitors. Computational Biology and Chemistry 88: 107325. https://doi.org/10.1016/j.compbiolchem.2020.107325

Kim SS, Alves MJ, Gygli P, Otero J, Lindert S (2021) Identification of novel cyclin A2 binding site and nanomolar inhibitors of cyclin A2-CDK2 complex. Current Computer-Aided Drug Design 17(1): 57–68. https://doi.org/10.2174/1573409916666191231113055

Kokic G, Hillen HS, Tegunov D, Dienemann C, Seitz F, Schmitzova J, Farnung L, Siewert A, Höbartner C, Cramer P (2021) Mechanism of SARS-CoV-2 polymerase stalling by remdesivir. Nature Communications 12(1): 279. https://doi.org/10.1038/s41467-020-20542-0

Kolb P, Irwin JJ (2009) Docking screens: right for the right reasons? Current Topics in Medicinal Chemistry 9(9): 755–770. https://doi.org/10.2174/156802609789207091

Li J, Zhou N, Liu W, Li J, Feng Y, Wang X, Wu C, Bao J (2016) Discover natural compounds as potential phosphodiesterase-4B inhibitors via computational approaches. Journal of Biomolecular Structure & Dynamics 34(5): 1101–1112. https://doi.org/10.1080/07391102.2015.1070749

Liu C, Zhu X, Lu Y, Zhang X, Jia X, Yang T (2021) Potential treatment with Chinese and Western medicine targeting nsp14 of SARS-CoV-2. Journal of Pharmaceutical Analysis 11(3): 272–277. https://doi.org/10.1016/J.JPHA.2020.08.002

Llanos MA, Gantner ME, Rodriguez S, Alberca LN, Bellera CL, Talevi A, Gavernet L (2021) Strengths and weaknesses of docking simulations in the SARS-CoV-2 era: The main protease (Mpro) case study. Journal of Chemical Information and Modeling 61(8): 3758–3770. https://doi.org/10.1021/acs.jcim.1c00404

Mahmudpour M, Nabipour I, Keshavarz M, Farrokhnia M (2021) Virtual screening on marine natural products for discovering TMPRSS2 inhibitors. Frontiers in Chemistry 9: 722633. https://doi.org/10.3389/fchem.2021.722633

Maia EHB, Assis LC, de Oliveira TA, da Silva AM, Taranto AG (2020) Structure-based virtual screening: from classical to artificial intelligence. Frontiers in Chemistry 8: 343. https://doi.org/10.3389/fchem.2020.00343

Majeed A, Hussain W, Yasmin F, Akhtar A, Rasool N (2021) Virtual screening of phytochemicals by targeting HR1 domain of SARS-CoV-2 S protein: Molecular docking, molecular dynamics simulations, and DFT studies. BioMed Research International 2021: 6661191. https://doi.org/10.1155/2021/6661191

Mathew SM, Benslimane F, Althani AA, Yassine HM (2021) Identification of potential natural inhibitors of the receptor-binding domain of the SARS-CoV-2 spike protein using a computational docking approach. Qatar Medical Journal 2021(1): 12. https://doi.org/10.5339/qmj.2021.12

McGovern SL, Shoichet BK (2003) Information decay in molecular docking screens against holo, apo, and modeled conformations of enzymes. Journal of Medicinal Chemistry 46(14): 2895–2907. https://doi.org/10.1021/jm0300330.

Meng X-Y, Zhang H-X, Mezei M, Cui M (2011) Molecular docking: A powerful approach for structure-based drug discovery. Current Computer-Aided Drug Design 7(2): 146. https://doi.org/10.2174/157340911795677602

Narayanan N, Nair DT (2020) Vitamin B12 may inhibit RNA‐dependent‐RNA polymerase activity of nsp12 from the SARS‐CoV‐2 virus. IUBMB Life 72(10): 2112–2120. https://doi.org/10.1002/iub.2359

Palacio-Rodríguez K, Lans I, Cavasotto CN, Cossio P (2019) Exponential consensus ranking improves the outcome in docking and receptor ensemble docking. Scientific Reports 9(1): 1–14. https://doi.org/10.1038/s41598-019-41594-3

Pantsar T, Poso A (2018) Binding affinity via docking: Fact and fiction. Molecules [Basel, Switzerland] 23(8): 1899. https://doi.org/10.3390/molecules23081899

Perola E (2006) Minimizing false positives in kinase virtual screens. Proteins 64(2): 422–435. https://doi.org/10.1002/prot.21002

Ramírez D, Caballero J (2018) Is it reliable to take the molecular docking top scoring position as the best solution without considering available structural data? Molecules 23(5): 1038. https://doi.org/10.3390/molecules23051038

Rissyelly, Aziz S, Sangande F, Yodha AWM, Budipramana K, Elfahmi, Sukrasno (2022) The inhibition of 15-lipoxygenase by Blechnum orientale leaves and its glycoside-flavonoid isolates: In vitro and in silico studies. Hayati Journal of Biosciences 29(3): 353–359. https://doi.org/10.4308/hjb.29.3.353-359

Robinson SW, Afzal AM, Leader DP (2014) Bioinformatics: Concepts, methods, and data. In: Padmanabhan S (Ed.) Handbook of pharmacogenomics stratified medicine. Acade mic Press, New York, 259–287. https://doi.org/10.1016/B978-0-12-386882-4.00013-X

Rockey WM, Elcock AH (2006) Structure selection for protein kinase docking and virtual screening: Homology models or crystal structures? Current Protein & Peptide Science 7(5): 437–457. https://doi.org/10.2174/138920306778559368

Salmaso V, Moro S (2018) Bridging molecular docking to molecular dynamics in exploring ligand-protein recognition process: An overview. Frontiers in Pharmacology 9: 923. https://doi.org/10.3389/fphar.2018.00923

Sangande F, Julianti E, Tjahjono DH (2022) 2-substituted 4-aminoquinazoline derivatives as potential dual inhibitors of EGFR and HER2: An in silico and in vitro study. Medicinal Chemistry Research 31: 762–771. https://doi.org/10.1007/s00044-022-02876-0

Satarker S, Ahuja T, Banerjee M, E VB, Dogra S, Agarwal T, Nampoothiri M (2020) Hydroxychloroquine in COVID-19: Potential mechanism of action against SARS-CoV-2. Current Pharmacology Reports 6: 203. https://doi.org/10.1007/S40495-020-00231-8

Scior T, Bender A, Tresadern G, Medina-Franco JL, Martínez-Mayorga K, Langer T, Cuanalo-Contreras K, Agrafiotis DK (2012) Recognizing pitfalls in virtual screening: A critical review. Journal of Chemical Information and Modeling 52(4): 867–881. https://doi.org/10.1021/ci200528d

Seeliger D, De Groot BL (2010) Conformational transitions upon ligand binding: Holo-structure prediction from apo conformations. PLOS Computational Biology 6(1): e1000634. https://doi.org/10.1371/journal.pcbi.1000634

Sharma A, Tiwari V, Sowdhamini R (2020) Computational search for potential COVID-19 drugs from FDAapproved drugs and small molecules of natural origin identifies several anti-virals and plant products. Journal of Biosciences 45(1): 100. https://doi.org/10.1007/s12038-020-00069-8

Singh TU, Parida S, Lingaraju MC, Kesavan M, Kumar D, Singh RK (2020) Drug repurposing approach to fight COVID-19. Pharmacological Reports 72(6): 1479. https://doi.org/10.1007/s43440-020-00155-6

Slater O, Kontoyianni M (2019) The compromise of virtual screening and its impact on drug discovery. Expert Opinion on Drug Discovery 14(7): 619–637. https://doi.org/10.1080/17460441.2019.1604677

Srivastava N, Garg P, Srivastava P, Seth PK (2021) A molecular dynamics simulation study of the ACE2 receptor with screened natural inhibitors to identify novel drug candidate against COVID-19. PeerJ 9: e11171. https://doi.org/10.7717/peerj.11171

Talele T, Khedkar S, Rigby A (2010) Successful applications of computer aided drug discovery: moving drugs from concept to the clinic. Current Topics in Medicinal Chemistry 10(1): 127–141. https://doi.org/10.2174/156802610790232251

Tuccinardi T, Botta M, Giordano A, Martinelli A (2010) Protein kinases: Docking and homology modeling reliability. Journal of Chemical Information and Modeling 50(8): 1432–1441. https://doi.org/10.1021/ci100161z

Vázquez J, López M, Gibert E, Herrero E, Javier Luque F (2020) Merging ligand-based and structure-based methods in drug discovery: An overview of combined virtual screening approaches. Molecules 25(20): 4723. https://doi.org/10.3390/molecules25204723

Verkhivker GM, Bouzida D, Gehlhaar DK, Rejto PA, Arthurs S, Colson AB, Freer ST, Larson V, Luty BA, Marrone T, Rose PW (2000) Deciphering common failures in molecular docking of ligand-protein complexes. Journal of Computer-Aided Molecular Design 14(8): 731–751. https://doi.org/10.1023/a:1008158231558

Wang J (2020) Fast identification of possible drug treatment of Coronavirus Disease-19 (COVID-19) through computational drug repurposing study. Journal of Chemical Information and Modeling 60(6): 3277–3286. https://doi.org/10.1021/acs.jcim.0c00179

White MA, Lin W, Cheng X (2020) Discovery of COVID-19 inhibitors targeting the SARS-CoV2 nsp13 helicase. Journal of Physical Chemistry Letters 11(21): 9144–9151 https://doi.org/10.1021/acs.jpclett.0c02421

Wu C, Liu Y, Yang Y, Zhang P, Zhong W, Wang Y, Wang Q, Xu Y, Li M, Li X, Zheng M, Chen L, Li H (2020) Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods. Acta Pharmaceutica Sinica B 10(5): 766–788. https://doi.org/10.1016/j.apsb.2020.02.008

Yan C, Zou X (2015) MDock: An ensemble docking suite for molecular docking, scoring and in silico screening. In: Zhang W (Ed.) Computer-aided drug discovery. Methods in pharmacology and toxicology. Humana Press, New York, 153–166. https://doi.org/10.1007/7653_2015_62

Yin W, Mao C, Luan X, Shen DD, Shen Q, Su H, Wang X, Zhou F, Zhao W, Gao M, Chang S, Xie YC, Tian G, Jiang HW, Tao SC, Shen J, Jiang Y, Jiang H, Xu Y, Zhang S, Zhang Y, Xu HE (2020) Structural basis for inhibition of the RNA-dependent RNA polymerase from SARS-CoV-2 by remdesivir. Science 368(6498): 1499–1504. https://doi.org/10.1126/science.abc1560

Zhu T, Cao S, Su PC, Patel R, Shah D, Chokshi HB, Szukala R, Johnson ME, Hevener KE (2013) Hit identification and optimization in virtual screening: Practical recommendations based on a critical literature analysis. Journal of Medicinal Chemistry 56(17): 6560–6572. https://doi.org/10.1021/jm301916b

﻿Abstract

Keywords

﻿Introduction

﻿Target structure selection

﻿Validation of molecular docking

﻿Hits criteria

﻿Binding pattern

﻿Consensus docking

﻿Ligand efficiency

﻿Conclusion

﻿References