The discovery of potential tumor necrosis factor alpha converting enzyme inhibitors via implementation of K Nearest Neighbor QSAR analysis

Tumor necrosis factor-alpha converting enzyme (TACE) is considered as a pro-inflammatory cytokine which catalyzes the formation of TNF-α from membrane bound TNF-α precursor protein. It has important role in various pathological diseases such as inflammation, anorexia, rheumatoid arthritis, cancer and viral replication. Despite the reporting of a variety of TACE inhibitors of diverse chemical scaffolds whether peptide, peptide-like compounds or others, the bioavailability and pharmacokinetic problems are reflected strongly on its clinical effectiveness. In this effort we employed a novel combination of k-nearest neighbor QSAR modeling to select the best critical ligand-TACE contacts capable of elucidation of TACE inhibitory bioactivity among a long list of inhibitors. The recurrence of one valid pharmacophore hypothesis among the successful models emerged the pharmacophore to be used as 3D search queries to screen the National Cancer Institute’s structural database for new TACE inhibitors. Eight potent TACE inhibitors were discovered with inhibition percentage exceeding 50% at 100 μM inhibitor concentration.


Introduction
Tumor Necrosis Factor (TNF)-α is a pro-inflammatory cytokine produced by different types of cells, i.e., monocytes, macrophages, neutrophils, T-cells, mast cells, epithelial cells, osteoblasts and dendritic cells. Several pathological conditions such as Crohn's disease, ulcerative colitis, diabetes, multiple sclerosis, atherosclerosis, psoriasis and stroke are caused by over-expression of TNF-α. and able to jump the species barriers from 40 animal host to humans and also to spread efficiently among humans causing severe respiratory complications, sometimes fatal (Siddell et al. 2019). In those patients, as a result of severe inflammatory-induced lung injury attributable to an uncontrolled overproduction and release of soluble markers of inflammation, as cytokines and several other mediators, referred as cytokine storm syndrome (CSS), sustaining an aberrant systemic inflammatory response, triggered by abnormal immunologic response to SARS-CoV-2 infection (Batista 2020;Gorbalenya et al. 2020;Guan et al. 2020).
Additionally, obesity is well-recognized as a major risk factor of insulin resistance and the number of obese people is increasing universally. The expression of tumor necrosis factor (TNF)-α, is increased in obese humans and animals (Hotamisligil et al. 1994).
Currently, five TNF-α inhibitors which are infliximab, etanercept, certolizumab pegol, adalimumab and golimumab are approved by Food and Drug Administration (FDA) in the USA and in other countries for the treatment of various diseases such as psoriatic arthritis, ankylosing spondylitis, and rheumatoid arthritis (RA) (Maekawa et al. 2019). The mentioned drugs bind selectively to TNF-α and inhibit its interaction with the TNF receptor. An approach for the regulation of TNF-α levels by means of the inhibition of TNF-α converting enzyme (TACE). TACE is a membrane bound zinc metalloprotease which converts the 26-kDa transmembrane bound pro-TNF-a to the mature 17-kDa soluble form of TNF (Sheppeck et al. 2007a, b;Yu Guo et al. 2010). TNF-α biological response is mediated through two dissimilar receptors; namely TNF-R1 and TNF-R2. Both receptors are transmembrane glycoproteins similar in their extracellular domains sharing structural and functional homology, while their intracellular domains are distinct. TNF-R1 is constitutively expressed in most tissues while TNF-R2 is characteristically found in immune system. Experimental validation proved that under physiological conditions, TNF-R1 signaling seems to be mainly responsible for pro-inflammatory properties of TNF-α, while TNF-R2 has been hypothesized that it acts as ligand passer, at least in some cells. (Park et al. 2006;Sheppeck et al. 2007c;Maekawa et al. 2019;Batista 2020).
Furthermore, TACE has been associated with hypertension by the elevation of Angiotensin II levels (De Queiroz et al. 2020). Inflammatory cytokines are accompanied with increased risk of cancer disease, so the inhibition of inflammatory mediators will inhibit immune related cancer cases (Lebrec et al. 2015).

Several drugs have been generated based on the inhibition of TACE activity or TNF-α
So our interest is the discovery of new effective small molecule leads that could modulate TNF-α levels is of high interest. (Park et al. 2006;Govinda Rao et al. 2007;Bandarage et al. 2008;Zhang et al. 2009).
The majority of known TACE inhibitors include a hydroxamic acid as the zinc chelating group. Since hy-droxamic acids are often poorly absorbed and are susceptible to metabolic degradation and glucuronidation, the interest is discovering alternative groups to the hydroxamic acid Lombart et al. 2007;Duan et al. 2008;Mazzola et al. 2008;Udechukwu et al. 2017).
As the catalytic site of TACE and matrix metalloproteinase (MMP) has high amino acid sequence similarity, the early MMP inhibitors, such as marimastat, prinomastat, and CGS27023A showed TACE inhibitory activity. TACE inhibitors or dual TACE and MMP inhibitors without MMP-1 inhibition are desirable (Janser et al. 2006;Adrian et al. 2007).
On our ongoing discovery program of potent TACE inhibitors, more exploration and understanding of the relationship between the structure and biological activity based on quantitative structure-activity relationship (QSAR) is useful tool for design of more potent inhibitors (Janser et al. 2006;Levin et al. 2006;Adrian et al. 2007;Duan et al. 2008).

Molecular modeling Software and hardware
The molecular modeling software applied in the current research are:

Data set
Structures of 115 TACE inhibitors with similar in vitro bioactivities (Suppl. material 1: table A) were collected from different articles (Duan et al. 2007;Duan et al. 2008;Lu et al. 2008;Ott et al. 2008a, b, c). The bioactivities were expressed as (IC 50 ), which represents the concentration of the test compound that inhibited the activity of TACE by 50%. For pharmacophore modeling and QSAR analysis purposes, the logarithm of measured IC 50 (nM) values were used to ensure correlating the data linear to the free energy change.
The chemical structures of TACE inhibitors (1-115, Suppl. material 1: table A) were drawn as 2D format in ChemDraw Ultra (Version 12.0). Rule based methods implemented in Discovery Studio were used for the conver-sion into 3D structures followed by energy minimization to the closest local minimum energy by means of the molecular mechanics CHARMm force field. Subsequent saving as SD format for following docking experiments. The Log(IC50) (nM) values were applied in modeling analysis, hence linearly correlating the biological data to the free energy change. The obtained 3D structures were used as starting conformers for conformational analysis for pharmacophore modeling.

Data sets
HYPOGEN module from the CATALYST software package was employed to construct numerous plausible binding hypotheses for TACE inhibitors. (Duan et al. 2007;Duan et al. 2008;Lu et al. 2008;Ott et al. 2008a, b, c). The conformational space of each inhibitor (1-115, Suppl. material 1: table A) was explored adopting the ''CAESAR'' option within CATALYST (Smellie and Teig 1995;Khanfar and Taha 2013). Detailed experimental and theoretical explanations of pharmacophore modeling and conformational analysis are provided in the Suppl. material 1: Section SM-1, SM2, SM-3.
An ''Uncertainty'' value of 3 was reported for the inhibitors biological data, which means that the actual bioactivity of a particular inhibitor is assumed to be within an interval ranging from one-third to three-times the reported bioactivity value of that inhibitor (Sutter et al. 2000). Afterwards, three structurally diverse training subsets were carefully selected from the collection for pharmacophore modeling: sets A, B and C (Suppl. material 1: table B) as demonstrated in Scheme 1.

Pharmacophoric hypotheses generation
Three structurally diverse training subsets were selected from the collected 115 inhibitors (Suppl. material 1: table B). For each subset group, eight modeling runs were performed to explore the pharmacophoric space of TACE inhibitors. The difference between the resulted hypotheses were based on altering the interfeature spacing and the number of allowed features in the resulting pharmacophores (Suppl. material 1: table C).
Eventually, pharmacophore exploration (8 automatic runs, Suppl. material 1: tables C, D) culminated in 240 pharmacophore models of variable qualities (See Suppl. material 1: SM-2 for details about CATALYST Pharmacophore Generation Algorithm) (Li and Hoffmann 2000;Güner et al. 2004) and the summary of the experimental part is summarized in Scheme 1.

Assessment and clustering of the generated pharmacophore hypotheses
Upon the generation of hypotheses, CATALYST seeks to reduce a cost function consisting of three terms: Weight cost, error cost and configuration cost (Sutter et al. 2000;Catalyst 4.11 User Guide 2005; Discovery Studio 2.5.5 User Guide 2010). Clustered 46 pharmacophores, out of 240 generated models, were found to possess Fisher confidence values ≥ 90% (see section SM-3). Suppl. material 1: tables C, D show the success criteria of representative pharmacophores from each run. Detailed theoretical explanations of CATALYST's assessment of binding hypotheses are provided in the Suppl. material 1: SM-3.
CATALYST based method was applied for clustering of the successful models (240) which were clustered into 46 groups applying the hierarchy average linkage. Among 240 hypotheses, the closely-related pharmacophores were gathered in three-membered clusters. Accordingly, the highest-ranking representatives based on their fit-to-bioactivity correlation r 2 -values (calculated against collected compounds 1-115), were designated to represent their corresponding clusters in subsequent QSAR modeling (Suppl. material 1: table D).

KNN-based descriptor selection
The kNN-QSAR methodology relies on a distance learning approach such that the activity value of an unknown member is calculated from the activity values of certain number (k) of nearest neighbors (kNNs) in the training set. The equations and the details of calculated distances for kNNs are explained in the Suppl. material 1: SM-4.

ROC curve analysis
Successful GFA-kNN selected pharmacophore models were validated by assessing their abilities to selectively capture diverse TACE inhibitors from a large list of decoys employing ROC analysis as described by Verdonk and co-workers. (Verdonk et al. 2004;Triballeau et al. 2005;Kirchmair et al. 2008). For each active compound in the testing set an average of 35 decoys were randomly chosen from the ZINC database (Irwin and Shoichet 2005). See Suppl. material 1: SM-5 for detailed experimental and theoretical explanations of ROC analysis.

Addition of exclusion volumes
To account for the steric constraints of the binding pocket and to optimize the ROC curves of our QSAR-selected pharmacophores, it was decided to add exclusion volumes to the successful GFA-kNN selected pharmacophore models employing the HIPHOP-REFINE module of CATALYST. HIPHOP-REFINE uses inactive training compounds to add exclusion spheres to resemble the steric constraints of the binding pocket. It identifies spaces occupied by the conformations of inactive compounds and free from active ones. These regions are then filled with excluded volumes (Clement and Mehl 2000). More details are provided in the Suppl. material 1: SM-6).

Preparation of TACE protein structure
The 3D structure of TACE was downloaded from the Protein Data Bank (TACE, PDB code: 2i47, resolution: 1.9 A°). Addition of hydrogen atoms to the protein was performed utilizing Discovery Studio 2016 templates for protein residues. No further energy minimization of the protein structure was applied before performing the docking experiments. It was very difficult to gain a selective TACE protein over various Matrix metalloproteinases (MMPs), mainly MMP-2 and MMP-13 due to high similarity in these proteins and TACE structure. The selection of this protein structure (PDB code: 2i47) has preferences over other available protein crystallographic structures due to high resolution value, in addition to the selectivity of this protein structure with a well-known binding pocket (Condon et al. 2007).

TACE ligands libdocking
LibDock docks the ligands into a binding site directed by binding hot spots. It aligns docked ligand conformations to polar and nonpolar receptor interactions sites, i.e., hotspots (Diller and Mertz 2001).
In the current docking experiments, certain parameters were employed for docking and scoring of the total list (115) TACE inhibitor as mentioned under Suppl. material 1: SM-7.

Ligandfit docking
Ligandfit docking engine considers receptor as rigid structure and treat the ligands as flexible compounds (Venkatachalam et al. 2003). Steps of Ligandfit and the corresponding protocol settings are mentioned under Suppl. material 1: SM-8. It aligns docked ligand conformations to polar and nonpolar receptor interactions sites, i.e., hotspots. The binding site is determined from the co-crystalline ligand (KGY, PDB Code: 2i47). Scoring of the high ranking docked conformers/poses was performed using 7 scoring functions, namely: Jain, Lig-Score1, LigScore2, PLP1, PLP2, PMF and PMF04 (Venkatachalam et al. 2003).
In the current docking experiments, certain parameters were employed for docking and scoring of the total list (115) TACE inhibitor as mentioned under Suppl. material 1: SM-8.

In-silico screening of NCI database
The sterically-refined version of Hypo(A-T6-4) selected in the optimal kNN model was employed as 3D search query to screen the National Cancer Institute list of compounds (includes 265,242 structures). Screening was performed employing the "Best Flexible Database" search option implemented within Discovery Studio 2016. NCI hits were subsequently filtered based on Lipinski's rule such that only hits of H-bond donors less < 5, .molecular weights < 500 dalton, H-bond acceptors less < 10, H-bond acceptors < 5, and logP < 5 were retained (Lipinski et al. 2001;Veber et al. 2002). Filtered hits were fitted against Hypo(A-T6-4) by the "Best Fit" option in Discovery Studio 2016 and their fit values together with other relevant molecular descriptors were used to predict the activity of these selected hits. High-ranking hits were ordered from National Cancer Institute (NCI) for subsequent in vitro assay (see Suppl. material 1: table E).

In vitro TACE inhibition bioassay
The assay kit was purchased from Abcam's TACE inhibitor screening Kit, TACE hydrolyzes the specific FRET sub-strate to release the quenched fluorescent group, which can be detected fluorometrically at Ex/Em = 318/449 nm.
In the presence of the tested TACE inhibitor, the hydrolyzation of substrate will be impeded.
The assay protocol is described as follows: an aliquot of (50 μL) of diluted TACE enzyme solution was mixed with (25 μL) of a testing diluted compound, the positive control, enzyme control (without inhibitor) or the buffer solution in the case of negative control. Mix well, and incubate for 5 minutes at 37 °C. Next, add (25 μl) of diluted substrate solution into each well and mix well. Then the mixture was incubated at 37 °C for 30 minutes protected from light. The fluorescence intensity (Excitation : 318 nm; Emission : 449 nm) was read in FLX800TBI Microplate Fluorimeter (BioTek Instruments, Winooski, VT, USA). The selected hits were dissolved in DMSO yielding 10 mM stock solutions followed by dilution to the required concentration using distilled deionized water. DMSO concentration was adjusted to 0.1%. The percentage of remaining TACE activity was identified in the presence and absence of the tested molecules. TACE activity is not affected by DMSO. The negative control samples were used as a contrast background. The carried experimental protocol and measurements were in duplicates. The % inhibition of TACE by the selected hits was calculated using the following equation (1 Where, ∆RFU of EC: The difference in fluorescence generated by hydrolyzation of substrate for enzyme control, ΔRFU of S: The difference in fluorescence generated by hydrolyzation of substrate in presence of tested hit. The measurements were carried out in duplicates.

QSAR modeling
After the generation of high quality 240 pharmacophores from 8 various runs for each subset (Suppl. material 1:

kNN-based QSAR modeling
We employed kNN-based QSAR modeling for TACE target. The kNN-QSAR methodology is based on a distance-based learning approach, which calculates the activity value of an unknown member is based on the activity values of certain number (k) of nearest neighbors (kNNs) in the training set. The distance metric similarity is measured by Euclidean distance. We applied the following kNN workflow: (1) Compute Euclidean distances between an unknown object (x) and all the objects in the training set with respect to certain descriptor(s); (2) choose from the training set k objects which are most similar to object x; (3) compute the distance-weighted average bioactivities of k objects as predicted bioactivity of x. (4) Correlate between the predicted bioactivities with the experimental ones to define the optimal k value and illustrative descriptors via leave-20% out cross-validation (Sharaf and Kowalski 1986;Zheng and Tropsha 2000). Table 3 shows the selected descriptors, nearest neighbors and statistical criteria of the top 5 kNN-based QSAR models. We selected model number 1 (Table 3) as the best representative for subsequent virtual screening and QSAR-based predictions because it exhibits excellent overall explanatory power with the least number of descriptors and nearest neighbors (Sharaf and Kowalski 1986;Zheng and Tropsha 2000). kNN-QSAR model (1) selected four descriptors encoding for dipole Z that ensures the importance of polarity and the formation of hydrogen bonding with TACE biding site. The second descriptor is ES_Count_sssCH which indicates the number of methantriyl groups. Obviously, the presence of the ES_Count_sssCH descriptor in Model 1 indicates the participation of bulky substituents of quinoline and other heterocyclic compounds in steric interactions aromatic and olefinic carbon atoms (ES_Count_ sssCH). The topology, the connectivity and the bulkiness of the structure represented by(CHI_V_3_P), in addition to Jurs_SASA is calculated by mapping atomic partial charges on total solvent accessible surface areas of individual atoms. This descriptor has positively influenced the binding affinity of the inhibitors (Buglak et al. 2019).
Additionally, the appearance of dipole Z descriptor in model (1) which indicates The components of the dipole moment along z-axis, in addition to 3χP is path connectivity index. Moreover, it selected one pharmacophores as ad-ditional explanatory descriptors, namely, Hypo(A-T6-4). Interestingly, many of the descriptors in kNN-QSAR model (1), repeatedly emerged in other best performing kNN-QSAR models (Table 3), including the pharmacophore model which add further weight to these descriptors.
Repeated appearance of Hypo(A-T6-4) among the optimal kNN-QSAR models indicates the essential pharmacophoric features.
The repeated appearance of dipole Z and ES count descriptor in top ranking kNN-QSAR models, including model 1 (Table 3), is suggestive of significant role played by aromatic or aliphatic nitrogen atoms in ligand binding within TACE probably through hydrogen bond attractive forces or trough ionic attractive forces to acidic amino acid moieties in the binding pocket. Similarly, aliphatic and aromatic carbon atoms count descriptors probably encode for affinity interactions connecting different training ligands and hydrophobic moieties within TACE binding pocket. TACE binding site contains several hydrophobic and aromatic moieties capable of π-π-stacking and hydrophobic interactions with various ligands including, Pro437, Lys315, Leu350, Leu318 and Tyr352 (Fig. 1). Fig. 2 shows Hypo(A-T6-4) and how it maps co-crystallized ligand within the closely-homologous protein (PDB Code: 2i47), while Table 1 shows the X, Y, and Z coordinates of Hypo(A-T6-4).

Addition of exclusion volumes and ROC curve analysis
To further validate our kNN-QSAR-selected pharmacophores, we subjected them to ROC curve analysis to assess their abilities to selectively capture diverse TACE inhibitors from a large list of decoys as clearly mentioned in Suppl. material 1: SM-5. The ROC performance was accepted for the selected pharmacophore with ROC-AUC value of approximately 0.8 as shown in Table 2. Consequently, for the improvement of the ROC-AUC curve parameter, addition of exclusion spheres was performed using HipHop refining list that is selected in Table 4 and according to the procedure mentioned under Suppl. mate-   rial 1: SM-6. Ten exclusion spheres were added and the obtained Refined Hypo (A-T6-4) ROC-AUC was increased to 0.929 as obviously illustrated in Table 2 and Suppl. ma-terial 1: fig. A, SM-6, which represents the shapes of ROC curves for the unrefined and refined Hypo (A-T6-4). Despite low values of TPR = 0.25, selectivity (SCC = 0.99) is high which increases the hits inhibition against TACE over MMPs . Also low false negative rate (FNR = 0.007) decreases the loss of potential active hits.

Ligandfit and libdock
Additionally code: 2I47) simulates the pharmacophoric features of Hypo(A-T6-4) as hydrogen bonding was formed with various amino acids according to the binding mode of the NCI hit such as Glu 406 and H2O 927 as in the case of hit 118 (Fig. 4B), hydrophobic attractive forces with His 405, Lys 315, Gly 346 as presented with several hits (Fig. 4A, E, F). Also aromatic π-π stacking is shown with aromatic amino acid residues as shown with His 405, His 409 and His 415 with hit 140 (Fig. 4G).
According to the settings of Libdock (Suppl. material 1: SM-7), the docked active NCI hits poses are presented in Fig. 5, for example hit 128 forms hydrogen bonding with amino acid residues His 415 and Asp 418, hydrophobic forces are formed with Glu319, Leu 318and Leu 350 (Fig. 5C). Additionally, the most potent hit 131 has more obvious attractive forces with TACE binding site (Fig. 5E), that hydrogen bonding is clear with H 2 O 923, H 2 O 990 and Ala 351, aromatic π-π stacking is presented with Tyr 351 and His 415 which simulates the aromatic ring pharmacophoric feature of Hypo(A-T6-4). Finally, hydrophobic attractive forces with Leu350, Lys315 and Leu318. Furthermore, to ensure the correct selection of docking settings performed above, the co-crystalline ligand KGY (PDB: 2i47) was re-docked again within TACE binding site and according to the previous scoring functions. Clearly from the figure, the applied docking settings closely reproduced the bound structure with RMSD value of 1.14 A° which support the confidence in our docking experiment.

NCI search query and in vitro biossay
The filtered hits activity was predicted according to selected kNN model, 28 out of the best predicted NCI compounds were tested as TACE inhibitors using the bioassay kit as mentioned in the experimental part. The inhibitory activity of the selected NCI hits was performed at 100 µM concentration and the obtained results were tabulated with the corresponding structure for each hit as shown in Suppl. material 1: table E.
Mapping of the most potent hits against Hypo(A-T6-4) are presented in Fig. 3 which shows that 8 active hits with TACE inhibition activity exceeding 50% at 100 µM concentration. Obviously, the active hits mapped with Hypo(A-T6-4) without any missing features which explains the activity of the captured hits.
As explained before, the active hits mapped with the selected pharamacophore in our successful model (Hypo(A-T6-4) as obviously illustrated in Fig. 3. The dif-ferences in the obtained activity of the NCI hits can be explained by the variations in the docking poses of these hits and the types of the forces formed with amino acid residues as presented in Figs 4, 5. The better the matching between the docking hit forces with the pharmacophoric feature of Hypo(A-T6-4), the higher the activity of our captured hit as mentioned for hit 118 and 131

Conclusion
TNF-α converting enzyme is considered as a main cause of various serious diseases, so the discovery of new TACE inhibitors is remarkable step especially with drawbacks of the marketed known inhibitors. Ordinary QSAR modeling process was not efficient in the creation of normal QSAR equation comprising the most significant pharmacophores and 2D descriptors which manipulates the inhibitory activity of TACE inhibitors. Another approach via kNN-QSAR modeling procedure was a successful alternate for obtaining a validated models which clearly selected the essential pharmacophoric features needed for TACE inhibitors among 240 generated pharmacophores using three various subsets, in addition to properly picking out of crucial 2D descriptors in the model. ROC validated confirms the success of the selected phamacophore upon its validation and the subsequent active Refined Hypo(A-T6-4) was applied as search query on NCI compound. Selected NCI com-pounds inhibitory activity was examined and achieved TACE inhibitory activity exceeding 80% at 100 μM concentration for some NCI hits.