Research Article |
Corresponding author: Anastas Pashov ( ansts@yahoo.com ) Academic editor: Georgi Momekov
© 2023 Shina Pashova-Dimova, Peter Petrov, Sena Karachanak-Yankova, Anastas Pashov.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Pashova-Dimova S, Petrov P, Karachanak-Yankova S, Pashov A (2023) Neurodegenerative diseases associated antibody repertoire signatures in mimotope arrays based on cyclic versus linear peptides. Pharmacia 70(4): 1439-1447. https://doi.org/10.3897/pharmacia.70.e115179
|
The role of peptide probes’ conformational flexibility in extracting immunosignatures has not been sufficiently studied. Immunosignatures profile the antibody diversity and prove promising for early cancer detection and multi-disease diagnostics. A novel tool for modeling antibody repertoires, the concept of antibody reactivity graphs, proved instrumental in this respect. Serum samples from patients with Alzheimer’s disease (AD), frontotemporal dementia (FTD), dementia of unknown etiology (DUE), and healthy controls were probed using a set of 130 7-mer peptides relevant to neurodegenerative diseases. Results show that linear peptides probed with IgM yielded higher graph density compared to IgG, indicating different levels of polyspecificities. Additionally, the impact of peptide topology and antibody isotype on feature selection was studied using recursive feature elimination. Findings reveal that IgM assays on linear peptides offer superior diagnostic differentiation of neurodegenerative diseases and define the degree of agreement between IgG and IgM immunosignatures with linear or cyclic peptides.
igome, immunosignatures, cyclic peptides, mimotopes, antibody repertoires, graphs, peptide arrays
Cyclic and linear peptides have different applications in the field of vaccines and immunoassays. Cyclic peptides can be engineered to resist non-specific degradation in the body and can be activated upon exposure to target-specific environmental factors, making them suitable for targeted drug delivery (
Extracting immunosignatures is a technology that can be used to evaluate vaccines and predict vaccine effectiveness (Joseph Barten and Stephen Albert Johnston 2013;
Patients were recruited at the Neurology Clinic of the Medical University, Sofia. Serum samples (0.5 ml) from patients with AD, FTD, DUE or no signs of dementia (n = 4 for each group) were collected after informed consent. The collection and the following studies were approved by the Human Studies Ethics Committee of the Medical University, Sofia. The anonymized samples were kept frozen at -15 °C until processing.
Each serum sample was thawed and incubated at 37 °C for 30 min for the dissolution of IgM complexes. Next, the sera were centrifuged and 30-fold diluted with PBS followed by ultrafiltration (Amicon-Ultra 100kDa membranes, Millipore) for initial fractionation of serum proteins above 100 kDa. The high-molecular fraction of serum proteins was applied in a series on columns (HiTrap Protein G High Performance and HiTrap IgM Purification, GE Healthcare) for affinity purification of IgM and IgG according to the manufacturer’s instructions.
Custom peptide microarray chips from containing 130 7 amino acid residues long linear or cyclic peptides produced by PEPperPRINT (Heidelberg, Germany) were used. The peptides were synthesized in situ in an oriented array, attached to the chip’s surface through their C-terminus (linear) or C- and N-terminus (cyclic) with a spacer sequence GSGSG. The microarray layout consisted of the peptide spots duplicated in random positions. The sequences of the peptides were selected from a larger IgOme library (
Sequences of peptide mimotope probes with over/under expressed reactivity in AD and FTD used in this study.
Alzheimer’s disease | Frontotemporal dementia | ||||
---|---|---|---|---|---|
Over expressed | Under expressed | Over expressed | Under expressed | ||
ADDACPR | GTIPGQP | DAEGFTK | AATQLWW | MTDMSLL | AAYKGEE |
AEECNIC | GYPGLWS | DAHVRLA | ADPGYHS | NLAPRPH | ARSVHPI |
DAGPCRP | HDYENRG | DKAEIWH | ADVARTH | NPHHVTR | AWKWDFI |
DGASNLP | HEIGSQL | DQPHVWN | AGVAPRL | NPVQAHY | DRCCVLD |
DGGLIRI | HPLRHSG | DSGCGHQ | AHNWWFD | QDQICHC | EPVTSYL |
DHCFARR | MEPQVII | EANSIAF | AQSMEFV | QFTMATF | EQSAWRE |
DHRNSIR | MQCPNDC | EDCKWCR | ARPAEMS | QSSMLER | GHLPVWS |
EAHYRGP | MYGVDQN | EEGLIRG | ATRADYF | QVIPFNH | HPDFWPI |
EHVPRIL | NGEPLIP | EEVQIPV | AVDGTDR | RAADEYS | HPPAGIL |
GAPKHWL | QDMPRLP | EHSLETE | AWARHES | RDVLDVY | HTRADVV |
GATGSLP | QMQINLD | EPVIPRS | CCLAWDP | RSTDLYT | KPVEWRV |
GHARLSP | QTVEWYR | ERLTCEF | GASLRPG | RTTPPHY | MDTDALT |
GIVSYPG | RIAQNHP | ETVFWRM | GATGAYN | RWDPFPA | MGTPKED |
GKHITMW | RWIDKVP | ETWIGPI | GATGSYP | SGWNEMV | MGVQTEV |
GLLRPSP | SMHLGFI | GPAVTTS | GCCGADP | SPIDTWS | MIHDKRY |
GMHLSNW | SPDDLRV | GPGSQAT | GDEARDG | SQGYSMH | MLRTADT |
GMPTRTF | TLEEFPF | GPPGVSR | GEESYGW | TGVTRDS | MPHKNDF |
GNRVAYV | TQEYWRG | GPPLTWK | GHCRMNM | TIWGADF | MVKNYAD |
GQAGGLI | VERMYTP | HPGWAWQ | GLENLSH | TKTVTER | RFPVDQH |
GQIALSS | VWPQIIG | NPALWCC | GRWSDSY | TNPHGDT | RPFVYEY |
GQIDKIP | WDRNIHL | RLPHPLP | GTPVLSH | TQGFQTM | TDEIHQM |
GQNVTAP | WGTTRVA | RMEITNL | HDLMWHR | TTDARIH | TELKEMI |
GQVFTYP | WHGVQNI | TGSSWLV | HKVTDVF | TTDIPAR | THLAQDV |
GSIIFHR | WPLMLMP | TQNYAAI | HMATHPW | TTDRTMM | TTELLVA |
GTATTLP | WRDASMP | HWEPMRN | TTFRLPD | TTLPLPT | |
IANRAEQ | VERTLSY | VQNMWPV | |||
LDGPRPH | WERDCCT | ||||
MPIRGPM | WTKGEHF |
The microarray fluorescence images were acquired in a Innoscan 1100 (Innopsys, Carbonne, France). The densitometry was performed using MAPIX software. All further analysis was performed using publicly available packages of the R statistical environment (Bioconductor, Biostrings, limma, pepStat, sva, e1071, uwot, clvalid, etc.) as well as in-house developed R scripts. The data underpinning the analysis reported in this paper and the scripts of the analysis are deposited at https://github.com/ansts/cyclic. The details of the analysis procedures are described elsewhere (
Further, the logarithms of the data were normalized with respect to amino acid residue composition (
is the coefficient of variation of their concatenation, then:
is a function which tends to ρ + 1 as cv → 1 and to ρ + 2 as cv → 0. The value for k = 0.3 was found to maximize the area under the ROC curve of the CCV criterion when classifying cross-reactive peptides. The CCV criterion remains relatively high even for low correlation if the coefficient of variation is also very low. In this way, reactivity profiles which are flat and very similar in mean values are estimated cross-reactive. The criterion was tested using the algorithm and the sequence set form (
(A) ROC curve illustrating the capacity of the CCV criterion to classify 4150 pairs of peptides overlapping in 11/15 positions as a model of cross-reactive peptide pairs compared to a set of 10,000 pairs of dissimilar peptides (sharing a longest common subsequence of fewer than 3/15). The analysis is done on the basis of the dataset from [1]. The logarithms of the values of CCV are used. (B) Distributions of the log CCV values for dissimilar sequences (black) vs. cross-reactive sequences (red). The optimal tradeoff between sensitivity and specificity was found for CCV = 2 (specificity = 0.926, sensitivity = 0.69, AUC = 0.904)
In our previous reports we demonstrated the utility of antibody reactivity graphs as a tool for system level studies of the repertoires of antibody specificities. The reactivity graphs represent the antibody cross-reactivity relations between the peptide probes used. They are weighted and undirected. Also, they are usually highly connected due to antibody cross-reactivity especially when the IgM repertoire is probed with short peptides. Their capacity to encode diagnostically relevant information depends on the degree of cross-reactivity and specificity, the public nature of the repertoire features addressed (to ensure generalization), the diversity of the probe array, the use of targeted arrays with known relevance to the problem at hand, etc.
Here we address the question of the relative utility in binding assays of an array of peptide probes in linear (free) or cyclic (constrained) topology (conformation) in IgG or IgM reactivity graphs. Serum IgG and IgM from patients with AD, FTD, DUE or controls without dementia (n = 4 for each diagnosis) were tested using peptide arrays of 130 probes preselected for their significant over or under expression in AD and FTD. The binding of the antibodies was detected by a fluorochrome conjugated anti-IgG or anti-IgM secondary antibody and quantitated by scanning the fluorescence intensity. After acquisition, cleaning, background subtraction and normalization, the fluorescence data was used to construct 16 separate graphs using the binding profiles similarities (see Materials and methods).
The graphs represent data grouped by 3 factors: isotype (IgG and IgM), peptide topologies (cyclic or linear) and the diagnoses (n = 4) having also 4 different sera for each diagnosis. The graphs were studied either separately or by combining them in multigraphs, e.g. – grouped by isotype and/or topology, as well as after simplifying the multigraphs by summing up the weights of the parallel edges so that there is only one edge between two vertices. The overall graph produced as the union of all 16 graphs is shown on Fig.
Overview of the general reactivity graph constructed as the union of the reactivity graphs under different conditions (IgG or IgM binding; cyclic or linear peptides; AD, FTD, DUE and Control – altogether 16 different graphs based on the same vertices). The edge weights of the separate original graphs were summed. To outline only the strongest similarities, the edges were kept if their weight exceeded 17.4 (range 2–36). The vertices (peptide sequences) are color coded according to the source mimotope library (red – over expressed in AD, orange – under expressed in AD, dark blue – over expressed in FTD, light blue – under expressed in FTD). The color of the edges is a mixture of the colors of their incident vertices. The thickness of the edges is proportional to their weight interpreted as strength of cross-reactivity. The layout of the graph is an embedding based on the the 16 eigenvectors corresponding to the lowest non zero eigenvalues further projected to 2 dimensions using the UMAP algorithm. The modularity of the graph with respect to the partition by mimotope libraries equaled the 0.949 quantile of the bootstrapped modularity. When the modularity was bootstrapped dichotomously for each library against the rest, the quantiles were: 0.969 for AD high, 0.861 for FTD low, 0.647 for AD low and 0.626 for FTD high.
There was a weak but discernible separation between the peptide libraries with respect to their reactivities with the tested sera. When the modularity was calculated dichotomously for each library against the rest, the simulated distribution quantiles were respectively: 0.969 for AD high, 0.861 for FTD low, 0.647 for AD low, and 0.626 for FTD high. In all cases, the simulation was done by generating 1000 partitions of the same sizes as the tested.
When the graphs were grouped by topology/isotype, the four resultant multigraphs had significantly different mean intensities and graph densities (Fig.
A. Mean intensity of the binding data for the graphs grouped by the topology of the peptides (cyclic or linear) and by the isotype of the tested antibodies; B. Graph density of graphs grouped by the topology and isotype. The graph density is the ratio of the number of edges to the theoretical maximum for each graph. Among the graphs of the data based on linear peptides and tested with patients IgG showed lower density while those tested with IgM – higher than the cyclic peptide graphs. This is interpreted as lower, resp.: higher, cross-reactivity.
To study in more detail the commonality between the images of the IgG vs IgM repertoires probed with linear vs cyclic peptides, the sixteen original graphs were aggregated in 4 topology/isotype (T/I) graphs named linear_IgM, cyclic_IgM, linear_IgG and cyclic_IgG. These multigraphs were simplified by summing the parallel edges’ weights. The significance of the overlap of the individual T/I graphs was estimated calculating the sum of the weights of the parallel edges in all multigraphs generated by uniting the different combinations of the four T/I graphs. These were compared to weight sums of 1000 random graphs generated by scrambling the existing edges and their weights. The scrambling is done among all edges existing in the overall multigraph some of which are not found in individual graphs and thus are assigned weight 0 initially. Each of the graphs obtained by uniting a combination of T/I graphs was further stratified into 3 subgraphs based on edge weight using the ranges [2, 2.3), [2.3, 2.6) and >=2.6.
The magnitudes of the weight sums which are outside the 0.05–0.95 quantile range of the simulated values were considered significant. The significant weight sums are shown on Fig.
Graph overlaps. The four T/I graphs and their various combinations had their edge weights categorized as low – [2, 2.3), medium – [2.3, 2.6) and, high – >2.6 indicating the respective levels of cross-reactivity (pattern similarity). The sums of the edge weights which were outside the 0.05–0.95 quantile range of the simulate distribution are illustrated. The thickness of the connecting strips corresponds to the sums of weights of the overlapping edges. For some of the graphs the number of overlapping edges is significantly increased (red) or decreased (green) relative to the simulated randomly connected graphs. The distribution is drawn towards high overlap among multiple subgraphs which indicates a considerable consensus between the different conditions including between arrays of peptides in linear vs cyclic topology.
Thus, despite the differences both in the repertoire of reactivities and in the binding mode, IgG and IgM repertoires are partially comparable in their cross-reactivity with the tested array of peptide probes with the distinction of repertoire compartments with high and low overlap of reactivities.
An efficient machine learning model based on repertoire patterns implies selecting a relevant subset among ~103 peptide reactivities. Typically, less than a hundred peptides are selected which separate well the diagnostic groups of patients (
In the present study, the size of the groups does not allow building a generalizing model. Nevertheless, RFE can still be used to measure the performance of the different data sets in a possible classifier. The effect of the tested factors can be compared using the maximal value of the clustering criterion achieved by the optimal feature set since the different assays are performed on the same set of peptide sequences. The distribution of the different subsets of peptides selected in RFE using the four T/I graphs is shown in Fig.
A. Venn diagram of the overlap of peptide sequences selected by recursive feature elimination as a minimal set of peptide reactivities which separates optimally the 4 diagnoses; B. Comparison of the quality of separation based on the different graphs. The separation of the cases of each pair of diagnosis was estimated using the clustering criterion and the six values from these comparisons were used to further compare the different feature sets. IgM based immunosignature patterns were more efficient. For IgG based patterns, the topology seemed to have a greater (and opposite to those in the IgM assays) effect but it did not reach statistical significance. (C) and (D) multidimensional scaling projections of the different patients’ sera profiles with the optimal feature sets (peptide sequences) for linear_IgM (C – best separation) and linear_IgG (D – worst separation).
The IgG and the cyclic peptide conditions led to selecting larger subsets of peptides (cyclic_IgG – 42, linear_IgG – 38, cyclic_IgM – 29, and linear_IgM – 28 sequences). This correlated inversely with the quality of the separation (Fig.
These findings indicate that IgM assays on linear peptides differentiate better the diagnoses especially with respect to the IgG conditions (Fig.
The present study explores the effects of peptide probe cyclisation on the performance of repertoire level binding assays. The primary tool of our functional repertoire studies are reactivity graphs. They are based on the concept of cross-reactivity of pairs of probes. The probability for n repertoires to contain each two different sets of antibodies which exhibit the same level of reactivity to the two probes is inversely proportional to n and very small. Thus, if two peptides’ reactivities with a set of n repertoires correlate, most probably they are recognized by largely overlapping sets of antibodies in each repertoire, i.e. – they are cross-reactive (or isospecific). Using reactivity graphs, a tendency for a higher cross-reactivity of the IgM repertoire was found on the linear probes as compared to the cyclic ones (p = 0.11) as well as a significant increase as compared to IgG on linear probes. An intriguing finding is the opposite effect of cyclisation of the cross-reactivity of IgG and IgM antibodies.
It was tempting to interpret these findings in terms of diversity of the tested repertoires since the simulation (Suppl. material
Previously, cyclic peptide mimotopes have been found to bind with higher affinity even compared to the nominal antigen (
A major difference between the two isotypes is the valency of the antibodies. Under the conditions of the peptide array, IgM antibodies can and IgG mostly cannot bind the peptide molecules in multivalent manner. As a rule, IgG antibodies have higher intrinsic affinity to their nominal epitopes than IgM of similar specificity but the IgM compensate by avidity (multiple binding sites). In the case of igome mimotope arrays (
The slight increase in cross‑reactivity in linear IgM vs cyclic IgM may be due to the higher flexibility of most of the IgM paratopes (
Thus, with regards to repertoire immunosignature assays, interrogating the IgM repertoire with linear probes seems to have some limited advantage over probing IgG and the use of cyclic peptides. This conclusion is limited by circumstances in which IgG repertoire is of a particular interest as predominantly immune memory associated and pathogen selected. The advantage of using IgM was also confirmed in the efficiency of a feature selection algorithm. The superiority of IgM assays may be due to a difference between the IgG and the IgM repertoires with respect to the specificities which differentiate the neurodegenerative diseases. For an in-depth analysis, it would be better to test on two set probes selected by IgG and IgM disease specific repertoires.
These results indicate that linear peptide based immunosignature probes provide more information and a more efficient extraction of features for a subsequent machine learning based design of biomarkers than their cyclic version, especially in terms of testing the IgM repertoire.
Supplementary method
Data type: docx
Explanation note: Description of the simulation of repertoire binding and reactivity graph.