1. INTRODUCTION
Proteins are essential biological macromolecules that support the growth, reproduction, and physiological activities of living organisms [1]. On the basis of conformation changes, proteins perform a range of functions that precisely control biological processes; therefore, a single static structure cannot explain their molecular mechanisms. Protein function is inherently associated with protein structure, and distinct conformational states are required for specific functions [2, 3]. Therefore, protein binding sites interact with different partners through unique conformations, and the functional sites differ between active and inactive states. Proteins typically undergo conformational changes in response to signals such as ligand binding, chemical modifications, or environmental changes. Such “switchable” proteins have crucial biological functions and are valuable in engineering applications. Through the identification of functional mechanisms, some of these proteins have been effectively used as drug targets and key elements in innovative biosensors. Studying the dynamic structural conformational changes in proteins is critical for accurately understanding their functional mechanisms, which may facilitate the design of novel therapeutic strategies, and the development of highly sensitive and specific diagnostic tools [4, 5].
Experimental techniques such as X-ray crystallography and cryo-electron microscopy have substantially improved protein structure determination. However, these techniques capture only static snapshots of specific protein conformations, thus hindering the characterization of multiple conformational states and the comprehensive understanding of functional mechanisms. AlphaFold2/3, developed by Deepmind and based on artificial intelligence, has recently been reported to predict the overall structures of proteins in the human proteome with high accuracy [6, 7]. However, although this technique overcomes the time-consuming nature of traditional methods, it can predict only one conformation for each protein with a certain sequence. Other possible protein conformations have not been observed in previous predictions. Recently, one study has predicted multiple conformations of proteins via sequence clustering and AlphaFold2 (AF2) [8].
Notably, mutations in the amino acid (AA) sequences of many proteins are closely associated with the onset of various diseases. Accurate prediction of the conformational changes induced by these mutations is essential for understanding the mechanisms of mutation-related diseases, and may accelerate the development of small-molecule and therapeutic antibody drugs [9]. However, despite advancements in predicting protein conformations with AF2, substantial progress has not been made in predicting conformational changes due to disease-related mutations. Some protein mutations lead to considerable changes in protein structure, yet AF2 shows minimal alterations in its pre- and post-mutation structural forecasts [10]. Moreover, minor mutations at specific protein sites have been found to cause considerable and extensive shifts in AF2’s predicted structures [11]. Therefore, AF2 still faces limitations and inconsistencies in handling the dynamic conformational changes caused by mutations. Consequently, AF2 cannot directly identify crucial sites associated with structural changes. Our goal was to develop a universal approach to identify residues whose mutation triggers substantial conformational changes according to AF2 predictions, and to explore their characteristics. We aimed to investigate whether a general method might be used to assess AF2’s robustness regarding mutations. Historically, adversarial learning techniques have been used to gauge the robustness of deep learning models. In this vein, we used adversarial attack to identify residue mutations that substantially change AF2’s predicted structures. Discovering residues critical for protein conformational changes and accurately predicting mutated protein structures are highly valuable in structural biology and structure-based drug development.
Numerous adversarial attack techniques are currently used, primarily in the fields of computer vision [12–15] and natural language processing (NLP) [16–18]. Because of the discontinuous nature of protein sequences, many gradient-based adversarial attack methods [19, 20], although powerful, might be not be well suited to discrete protein sequences. AF2’s “recycling” method further complicates gradient computation. Consequently, a gradient-free evolutionary attack might offer a more streamlined and powerful solution. Here, we present an effective method called AF2-Mutation for producing adversarial mutation sequences for wild-type (WT) proteins (Figure 1). We used the differential evolution algorithm to discern replacement, deletion, and insertion mutations influencing the predicted structure. For proteins, replacing AAs with functionally related alternatives is similar to substituting characters with common misspellings or replacing words with semantically related words in NLP. Inserting and deleting AAs in protein sequences is analogous to inserting or deleting letters or words in NLP, and such operations do not affect readability after manipulation [21, 22]. Strategies applied in NLP usually require ensuring minimal semantic change in sentences before and after the attack, thus maintaining readability [21, 23]. This process often entails the introduction of constraints, such as ensuring that the replaced letters closely resemble the original letters, and that the substituted words have the same part of speech as the original words. However, such concerns are not applicable in attacking protein sequences. Therefore, we limited the number of AA changes to three. By limiting the mutations to three AAs and using the Local Distance Difference (lDDT) score [24], we evaluated structural similarities. The lDDT is a metric used to assess the local accuracy of predicted protein structures by comparing atomic distance pairs to a reference structure, with scores expressed as percentages (%), where higher values indicate greater structural similarity. In this study, we analyze changes in lDDT as variations in score points, omitting the percentage symbol (%) for simplicity, and refer to these variations as unit. Because our goal was achieving substantial conformational changes in the protein structure before and after the attack, we configured the differential evolution’s objective function to be the difference in lDDT with respect to the WT, both before and after the attack. We finally used the best obtained solution to produce the mutated protein sequence.

Method overview.
The differential evolution algorithm was used to derive adversarial sequences for wild-type (WT) proteins, with an aim to maximize the lDDT divergence between the original’s predicted structure and its mutation.
We conducted extensive tests on the effects of our adversarial attack by using the CASP14 dataset. Remarkably, altering only three AAs led to a considerable shift in the lDDT score, with a difference as high as 36.78 units. However, when we incorporated a mixed attack strategy combining various adversarial techniques, the difference in lDDT further increased, to 46.61.
To understand the real-world implications of these changes, we performed a visual comparison of the protein structures before and after mutation. Our observations revealed substantial structural alterations, thus confirming the efficacy of our adversarial methods in inducing meaningful conformational changes in the protein.
To further demonstrate the credibility of our approach, we conducted a focused case study on the SPNS2 protein, which has crucial roles in immune regulation, vascular development, and endothelial barrier function. Accurate prediction of SPNS2’s conformational changes through computational methods could potentially guide experimental study phases, thus decreasing the resources required for protein structure determination and aiding in the development of targeted therapies. In this case, our strategy successfully identified key AAs in SPNS2 that effectively alter the protein’s structural conformation.
In summary, the primary contributions of this study are as follows:
We propose what is, to our knowledge, the first universally applicable adversarial attack method targeting AF2 for all proteins.
We validated the efficiency of this framework on CASP14 targets, demonstrating that adversarial samples effectively misled the AF2 model through missense mutations.
Through a case study on SPNS2, by comparing the protein structural conformational changes before and after mutation, we demonstrated that our method can predict mutations altering protein structural conformations.
2. METHODS AND MATERIALS
2.1 Problem definition
Consider a protein sequence of length L, denoted as X = {r 1, …, ri , …, rL }, where each ri,i =1, …, L indicates the i-th residue of the sequence. Let us represent the sequence’s native and anticipated structures as Y and Ŷ, respectively. Using a pre-existing model F: X Y, such as AF2, the structure of the protein sequence is predicted. The precision of this anticipated structure is gauged according to juxtaposition with the native structure via the lDDT metric. The aim of adversarial attacks is to craft an adversarial example, X mut , that minimizes the objective function (Eq. 1):
Our approach to examining AF2’s resilience relied on the proposition that an adversarial example can yield a structure markedly divergent from the original. A comprehensive depiction of our strategy is provided in Figure 2 .
2.2 The AF2-Mutation algorithm
This section details our approach to defining the attack method as an optimization challenge. Our objective is to select the best AA positions for enhancing the lDDT difference between the adversarial and original structures. We set up this optimization problem and offer an optimal solution vector.
Given the vast search space associated with this problem, both exhaustive and random search methods fall short. An exhaustive search is impractical, whereas a random search is not ensured to achieve even a local optimum. Therefore, we suggest using the differential evolution method. This gradient-free, population-based evolutionary optimization technique finds the optimal solution without requiring the gradient of AA embedding.
Our search space construction incorporates three unique attack strategies against AF2, as shown in Figure 2(a) , left:
Replacement: exchange AAs in the protein sequence with any of the other 19 AAs.
Deletion: omit a residue from the protein sequence.
Insertion: introduce any of the 20 AAs into the protein sequence.
To attack a sequence, we combine these three tactics and express the mutation method through a solution vector of size N = 3b. The number 3 represents the mutation vector’s length for every position, as shown in Figure 2(a) , right. Each vector has three components indicating the mutation method (ind), the mutation’s location (or the AA index in the WT sequence), and the AA designated for substitution or inclusion. We establish an AA mutation limit with a budget, b.
Our method uses inputs of the protein sequence X, evolutionary iteration count m, population size s, crossover probability CR, and differential weight W. These parameters are standard in differential evolution [25]. At its core, this method initializes a candidate solution set and performs continual refinement until an optimal or near-optimal solution emerges.
Figure 2(b) shows the inaugural set creation process of candidate solution vectors P = p 0,G , …, p i,G , …, p s,G in step 1. Here, each p i,i=0, …, s is chosen at random, and G represents the generation index. Subsequently, a series of enhancements continues until either the optimum solution is revealed or m iterations are completed.
With every iteration, new solution vectors arise from the differential evolution method’s mutation, crossover, and selection processes. Figure 2(b) , step 2, illustrates the mutation operator crafting new vectors as follows:
where v i,G+1 is the solution vector, and p a,G , p b,G , and p c,G are three distinct vectors chosen randomly from the preceding generation, each of which is 3b in length.
Step 3 in Figure 2(b) highlights the crossover operator promoting solution vector diversity. After mutation in each cycle, a descendant vector u i,G+1 is formed:
where k, k ∈ 1, …, 3b is the k-th component in a vector, ri is a randomly drawn value from a uniform distribution, CR is the crossover rate, and R is an integer selected at random from 1 to 3b.
Figure 2(b) , step 4, delineates the selection procedure for the forthcoming generation. The freshly formed child solution vector u i,G+1 is contrasted with its parent vector p i,G . If the child vector achieves a superior objective function result, it supersedes its parent; otherwise, the original solution remains.
Finally, as detailed in the final step in Figure 2(b) , we use the lDDT discrepancy between the forecasted and original protein structures as our objective function (Eq. 1). The aim of differential evolution is to magnify this function. When the algorithm converges or attains the maximum iteration count, the adversarial missense mutation sample alongside the objective function value is procured. Although AlphaFold3 has impressive performance in protein structure prediction, its closed-access nature and restriction to only 20 runs per day limit its feasibility for large-scale comparisons. In our differential evolution search process, which potentially requires thousands of predictions, this constraint makes a direct comparison of our model’s precision with AlphaFold3’s predictions impractical.
2.3 Evolutionary process and multiple sequence alignment approximation
Integrating co-evolutionary data from multiple sequence alignments (MSAs) is critical for AF2’s precision in predicting protein structures. Given the enormous volume of sequences, identifying homologs within MSAs from a sequence repository is an extensive endeavor. Consequently, continually regenerating MSAs for each mutated sequence during the evolutionarily procedure becomes an inefficient approach. Therefore, we innovated an approximation technique that ensures uniformity in sequence length between the MSA and the adversarial sample, and also curtails the mutations’ effects on the MSA. As demonstrated in Figure 3 , the proposed approximations fall into three distinct categories:

Schematic representation of the devised approximation technique, eliminating the need for frequent MSA realignments.
Replacement (scenario 1): after an AA substitution, the homologs of the altered sequence remain intact, because the mutation does not disturb the alignment elsewhere.
Insertion (scenario 2): with the introduction of a new residue in the primary sequence, a space character is slotted at the analogous position across other sequences within the MSA, thereby maintaining the alignment’s integrity for subsequent sequences.
Deletion (scenario 3): erasing AAs from a protein sequence prompts the deletion of matching positions in the associated sequences within the MSA.
The mutation strategy selection, as shown in Eq. 2, involves replacement, insertion, and deletion, each chosen with equal probability. This balanced approach, in contrast to biological mutation probabilities, is aimed at supporting convergence and optimizing the objective within the search space. The equal distribution ensures a uniform contribution from each strategy, thereby fostering effective exploration and efficient convergence.
To ensure peak accuracy in AF2 predictions, the MSA undergoes realignment in forecasting the structure of the culminating adversarial sample.
3. RESULTS
3.1 AF2-Mutation performs well on CASP14
The objective of our study was to use the newly devised replacement and mixed attack methods targeting AF2’s protein structure prediction without access to the gradient of AA embedding. The CASP14 dataset served as the basis of our experiments, and the lDDT served as the primary measurement for gauging the similarity between pairs of protein structures. Every protein within the CASP14 dataset underwent adversarial testing in this research.
Our search for the optimal solution for every type of attack relied on the differential evolution method ( Figure 2 ). The experimental parameters, including the population size and the evolutionary iteration count, were fixed at s=32 and m=4, respectively. Notably, amplifying these values might have refined the solutions but would have simultaneously increased the computational demand.
To minimize the potential influence of the template on the final results, we used the third model from OpenFold’s AF2 version [26], which is distinguished by its template omission. Furthermore, we neutralized random masks and dropouts, to ensure that they affected the experimental outcomes.
We successfully generated adversarial sequences that, on average, led to shifts by more than 4 units and 10 units lDDT, when replacement and mixed strategies were used to modify three AAs in the native WT sequences. These findings indicated extensive alterations in 3D spatial folding.
In Figure 4 , the left and right panels show outcomes from the replacement and mixed attacks, respectively. Panel (a) indicates the lDDT differential between original and manipulated structures, before and after realignment. A comparison of AF2’s lDDT scores for the adversarial structure and the unmodified sequence confirmed the substantial power of our mixed attack algorithms. More than half of these sequences exhibited a divergence greater than 10 units (represented by dark maroon markers). Notably, the mixed strategy, which averaged a -10.76 units change, persistently outperformed the replacement tactic, which averaged -4.4 units. Given its broader search capability, the superior performance of the mixed approach was not unexpected. Interestingly, in two instances, the replacement attacks led to more accurate predictions than the WT—a finding that warrants further exploration.

Outcomes for the replacement attack (a–c) and the mixed attack (d–f): (a and d) illustrate the lDDT variance between the genuine and adversarial structures, both before and after realignment; (b and e) show the association between plDDT and lDDT; and (c and f) compare our suggested replacement and mixed attacks with a random approach, demonstrating the distribution for each.
Figure 4(b) shows the interrelation between predicted lDDT (plDDT) and lDDT under both attack forms. Panel (c) juxtaposes our recommended replacement and mixed strategies with a randomized approach, and additionally shows the outcome distribution for each. Our approach outperformed the random assault in both scenarios. Virtually every crafted adversarial sample surpassed its randomly generated counterpart. Specifically, the mixed strategy exhibited an average decrease of 12.33 units more than its random counterpart, whereas the replacement strategy outperformed the random strategy by an average of 7.03 units.
We conducted additional experiments examining mutations at one and two positions by applying with the proposed method (replacement and mixed attack). Our method achieved substantial lDDT decreases with two mutation positions. For sequences with mixed mutation positions, the lDDT decreased by up to 42.15 units (T1033-D1) after re-alignment, with an average decrease of 11.39 units across all tested sequences ( Figure A1.a ). For sequences with replacement mutation positions, the maximum lDDT decrease reached 40.45 units (T1082-D1), and the average decrease was 4.97 units ( Figure A1.b ). These experiments were evaluated on the CASP14 dataset, and detailed results are provided in the Appendix. With a single mutation position, our method also achieved extensive lDDT decreases. For sequences with a mixed mutation position, the lDDT decreased by up to 40.60 units (T1030-D2) after re-alignment, and the average decrease was 8.53 units across all tested sequences ( Figure A2.a ). For sequences with a replacement mutation position, the maximum lDDT decrease was 51.31 units (T1043-D1), and the average decrease was 4.51 units ( Figure A2.b ).
3.2 AF2-Mutation does not disrupt the relationship between plDDT and lDDT
The plDDT, a metric generated by AF2, is aimed at simulating lDDT without having access to the actual (ground-truth) structure and can be used to gauge the model’s confidence in its predictions. Consequently, sequence mutations influence the plDDT values because of the change in input.
Typically, a plDDT score above 70 indicates a reliable prediction. Within the dataset of 61 samples analyzed for CASP14, only five of the scores generated by AF2 were below this threshold. After a replacement attack, this number increased to seven, and four of these proteins already exhibited low plDDT scores before the attack. The incidence of low scores increased to 20 after a mixed attack, whereas 5 of these 20 proteins had low plDDT scores before the attack. Consequently, the reliability of predictions, particularly after a replacement attack, remained high.
Figure 4(b) indicates a notable positive association between plDDT and lDDT, as influenced by the replacement (left) and mixed (right) assaults, with scoring correlations of 0.66 and 0.76, respectively. However, a discernible difference in plDDTs was observed between techniques: for the replacement strategy (left), most plDDT values were near zero, whereas those for the mixed strategy (right) spanned a broader range on the x-axis. These findings suggest that AF2 has higher confidence when substituting AAs then mixed mutations, potentially as a result of its masked language model objective during the training and prediction stages.
3.3 AF2-Mutation outperforms the random attack baseline
Approximately (L*19)3 possibilities exist for making three substitutions, where L represents the length of the sequence, thus underscoring the importance of an effective search algorithm, in this case, the differential evolution strategy. The efficacy of this chosen search algorithm was demonstrated through a comparative analysis with a baseline method known as random attack, wherein targeted positions and substituting residues are randomly determined. For a fair comparison, both the differential evolution approach and the random attack method adhered to an identical number of iterations, significantly less than (L*19)3.
After multiple random attacks, the mutation yielding the lowest lDDT was benchmarked as the baseline’s output. A superposition of the suggested mixed attack and its randomized counterpart is presented in Figure 4(c) , which illustrates the effectiveness of our method over the random strategy. Most adversarial sequences created with our technique (replacement on the left and mixed on the right) surpassed their random equivalents. Specifically, the mixed strategy achieved an average decrease in lDDT that exceeded the decrease achieved by the random attack strategy by 12.33 units.
Under equivalent search times, our approach yielded superior results to mixed and replacement random attacks, and outperformed the mixed approach by an average of 9.83 units and replacement by 4.38 units across all tested sequences ( Figure A3.a and Figure A3.b ). These decreases highlight the efficacy of our method in maximizing structural deviation, which is crucial for accurate protein modeling. Additionally, our method achieved an average improvement of 6.66 units over mixed random attacks and 1.73 units over replacement attacks ( Figure A3.c and Figure A3.d ), thus demonstrating its robustness in inducing structural misalignment, even with minimal mutations.
3.4 AF2-Mutation re-aligns only after differential evolution
Earlier discussions suggested the importance of leveraging newly aligned MSAs to enhance AF2’s predictive precision. This section underscores the essential nature of such pre-evaluation alignment.
The construction of MSAs is facilitated by the JackHMMER algorithm [27], which searches a collection of sequences within MGnify [28] and UniRef90 [29]. In parallel, HHBlits [30] conducts searches within UniClust30 [31] and the extensive Big Fantastic Database (BFD). After these search operations, the candidate sequences are aligned with the reference sequence. Considering the extensive nature of MSA alignment creation, the process is restricted to the preliminary forecast of the WT sequence and the terminal assessment, thereby ensuring the attainment of a precise protein fold.
To simplify the attack mechanism, we introduced an approximation technique that emulates new MSAs after mutation, thereby avoiding an exhaustive search of massive sequence databases. Whereas altering a single AA might not be expected to markedly alter MSAs, approximation inaccuracy can compound with each mutation and culminate in significant differences. In addition, given the substantial time required to rescan the entire dataset for each mutation candidate to locate the MSA within the context of search algorithms, we elected not to perform reMSA during the search for mutations. Instead, we conducted reMSA after the final mutation candidates were identified at the conclusion of the search, thus ensuring the accuracy of the final results.
Figure 4(a) shows the lDDT variance between estimated and realigned MSAs for both replacement (left) and mixed (right) strategies. For the mixed attack, a pronounced difference emerged between the predicted structures for the estimated MSA (light maroon indicators, with −33.28 units) and the realigned versions (darker maroon indicators, with −10.76 units). This finding underscores the cumulative nature of approximation errors over iterations and the need for MSA realignment to achieve reliable evaluations. Analogous outcomes arise for the replacement attacks, with −8.66 units for estimated MSAs and −4.42 units for their realigned counterparts.
3.4.1 Effective implementation of AF2-Mutation in a case study of SPSN2
From our assessment, we selected the two most powerful targets from both the replacement and mixed attack contexts (detailed visual outcomes in Figure 5 ). In the mixed attack context in Figure 5(a) and (b) , the primary chains in both targets exhibited significant deflections or rotations. The replacement attack context in Figure 5(c) and (d) also showed structural shifts.

Superposition of natural structures with AF2’s predictions, encompassing both the initial prediction and post-attack forecast.
To detail the predictive alterations throughout the evolutionary process, we analyzed a four-generation evolution of T1068, a specimen with commendable AF2 prediction accuracy (measured at 91.25 units). Across the initial three generations, the lDDT decreased by 15, 17, and 19 units, respectively, and the final generation showed a steeper decrease of 56.76 units. Even after re-alignment, a differential of 34.69 persisted. As shown in Figure 6 , each generation instigated pronounced conformational shifts, thus underscoring the power of the proposed algorithm and the approximated MSA technique.
To conduct a deeper investigation, we used SPNS2, a member of the major facilitator superfamily (MFS), to validate the biologically significant positions identified by the algorithm. The MFS, the largest superfamily of secondary active transporters, comprises numerous membrane proteins that mediate the transmembrane transport of small molecules across both plasma membranes and inter-organellar spaces. These transporters are responsible for transporting a diverse range of substrates, including ions, sugars, AAs, and other small molecules. SPNS2 is particularly recognized for its role in transporting sphingosine-1-phosphate (S1P), a signaling lipid crucial in various physiological processes, such as vascular development, immune cell trafficking, and the maintenance of endothelial barrier function. SPNS2 has attracted research interest for its involvement in the immune system, by providing the necessary S1P gradient for normal migration of T cells and B cells.
First, through the described strategy, we successfully identified three predicted perturbation points, which aligned with residues D137, A140, and E497 of the primary sequence. Subsequently, we obtained the predicted structure model of the mutated SPNS2 with three mutations (D137K, A140R, and E497S). The predicted model of mutated SPNS2 showed excellent alignment with the experimental electron density map of mutated SPNS2 ( Figure 7(a) ). Additionally, the gap between the extracellular endpoints of TM1 and TM7 measures approximately 9.3 Å, thus suggesting an outward-open stance, as depicted in Figure 7(b) . This outward-open conformation resembles a recently reported SPNS2 outward-open structure [32]. Notably, the reported inward open structure of SPNS2 fitting with the experimental electron density map is shown in Figure 7(c) [33]. The distance between the extracellular termini of TM1 and TM7 in the inward-open structure is approximately 5.9 Å ( Figure 7(d) ), thus indicating that its central cavity remains sealed from the extracellular side. In contrast to the determined inward-open structure, the predicted mutated structure indicates an intracellular closed conformation and a newly formed extracellular opening. Prior studies have indicated a salt bridge between D137 and R342 on SPNS2’s extracellular side, potentially anchoring the inward-open stance [34]. Furthermore, the potential interaction between E497 and R342 might stabilize the extracellular-shut state of this conformation. The D137K mutation might perturb the salt bridge between D137 and R342. Alterations such as D137K, A140R, and E497S disrupt existing interactions for inward opening conformation and facilitate the formation of outward opening conformation. Accurate prediction of SPNS2’s varied conformations can enrich understanding of its functional mechanisms.

Conformational change in SPNS2’s predicted structure after assault.
(a) Predicted structure model of the mutated SPNS2D137K-A140R-E497S in an outward-open conformation fitting with this mutated protein’s low-resolution cryoEM map. (b) Comparison of the natural SPNS2 outward-open structure 8EX5 with the predicted structure model of SPNS2D137K-A140R-E497S. (c) Inward-open structure and map. (d) Features of the SPNS2 inward-open structure 8JHQ.
4. CONCLUSION
In recent years, substantial progress has been made in protein structure prediction. In this study, we investigated black-box gradient-free adversarial attacks on the recently proposed AF2 protein-folding language model. Our approach successfully generated missense mutation adversarial samples, as evidenced by our extensive experimental results. Moreover, the predicted structures of certain protein sequences were dramatically altered by modification of only three residues, thus leading to a sharp decrease in lDDT. This finding highlights the vulnerability of AF2 to attacks involving three residues. Moreover, our proposed method was able to predict protein conformational changes and identify critical locations for these changes. This method was successfully applied to SPNS2 and could be extended to other proteins. Identifying how specific mutations affect protein conformations would advance understanding of the disease mechanisms associated with protein dysfunction and aid in development of strategies to address these issues. Our approach might provide an invaluable reference and guidance for experimental study phases, thus potentially decreasing the time, effort, and resources required for protein structure determination.