Inviting an author to review:
Find an author and click ‘Invite to review selected article’ near their name.
Search for authorsSearch for similar articles
195
views
0
recommends
+1 Recommend
1 collections
    1
    shares

      Submit your manuscript to the new open access journal Drug Repurposing. Open for research articles, reviews, discussions, case studies, negative results across the whole spectrum of drug repurposing.

      No article processing charges.

      scite_
      0
      0
      0
      0
      Smart Citations
      0
      0
      0
      0
      Citing PublicationsSupportingMentioningContrasting
      View Citations

      See how this article has been cited at scite.ai

      scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

       
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Developing Project-Specific Consent Documents: A Registered Report for a Multistep Approach Using LLMs

      Published
      brief-report
      Bookmark

            Abstract

            Within the scope of clinical trials, developing participant information sheets and informed consent forms is a complex task that demands clarity, precision, and compliance with regulatory standards. Developing these documents is crucial for ensuring that participants are fully informed about the research in which they are involved. However, the process is often time-consuming and resource-intensive. In this context, we present the development of a methodology enabling the use of Large Language Models to assist in the creation of information sheets and informed consent forms for clinical trials according to a predesigned template. This research is being conducted within the framework of the project REPO4EU (Precision drug REPurpOsing For EUrope and the world).

            Main article text

            INTRODUCTION

            As one of the most critical ethical constructs, informed consent is a prerequisite for all research involving human subjects, respecting autonomy, mitigating potential harm, and upholding research integrity. 1 In the present, clinical trials often involve intricate procedures and novel therapies, which often prioritize legal compliance over participant understanding. Complex informed consent forms can hinder participant comprehension, leading individuals to agree to participate without fully grasping the implications, which undermines the trial’s ethical validity and raises concerns about how consent is currently obtained. 25 Therefore, to achieve truly informed consent, researchers must provide clear and accessible information tailored to participants’ understanding and provide patient information leaflets and informed consent forms (ICFs), which are essential components of the informed consent process, adequate to guarantee that participants can comprehend and make free decisions about their involvement.

            To address this challenge, researchers are exploring innovative ways to improve the clarity and accessibility of informed consent materials. Large Language Models (LLMs), with their ability to process and generate human-like text, show promise in creating more effective informed consent documents. 6 This study aims to answer the question: “How can we utilize LLMs to assist in developing participant information sheets (PISs) and ICFs for clinical trials?”

            LLMs can simplify medical terminology, tailor information to individual needs, and identify potential ambiguities, ultimately promoting greater participant understanding. Results from a recent study indicate that LLMs can effectively rephrase complex medical jargon into more accessible language, enhancing readability for patients and research participants, particularly those with lower health literacy. 7,8 Additionally, LLMs can identify and address potential ambiguities, ensuring clarity and reducing the risk of misunderstanding. Finally, they can automate the analysis of existing ICFs, identifying areas for improvement and ensuring compliance with ethical and regulatory standards. 911

            Furthermore, recent research demonstrates that digital tools improved participants’/patients’ understanding of and satisfaction with the IC process. Digital tools, particularly interactive multimedia tools, may help develop more personalized IC processes tailored to an individual’s socio-cultural characteristics. 12 A systematic review 13 showed that compared with patients using paper-based consenting, patients using eConsent better understood clinical trial information, showed greater engagement with the content, and rated the consenting process as more acceptable and usable.

            Even if these tools increase the participants’ comprehension and decision-making capacity, contrary to LLMs, all these strategies are time- and resource-consuming. 14 Our work is innovative because it proposes an automated solution to generate PIS and ICF, potentially saving time and resources while maintaining high quality and ethics compliance. Regulatory challenges in using eConsent would also apply to, and be even more complex, LLM-assisted consent strategies for the preparation of documents. 15 Despite efforts to harmonize the rules on data protection and clinical trials in the EU, the legal acceptance of eConsent differs significantly among the Member States. LLMs may encounter even more hurdles regarding clinical trial regulations and the one-size-fits-all requirements for PISs and ICFs. These models also raise concerns about misinformation, biases in the training data, and the potential for discrimination (like ageism) in their application. Caution is necessary due to inherent imprecision and a propensity for disseminating misinformation. 16

            This study will investigate the potential of LLMs to assist in creating and analyzing PISs and ICFs within the REPO4EU project (Precision drug REPurpOsing For EUrope and the world). We will use REPO4EU clinical trials to test the feasibility and effectiveness of LLMs for generating and analyzing informed consent documents.

            METHODOLOGY

            Participant Information Sheets and Informed Consent Forms
            Approach

            Standardized tools for informed consent ensure transparency and enable autonomous participant decision-making. These templates aim to ensure clear and consistent communication with potential research participants, providing them with all the necessary information to make informed decisions about their involvement. This task aimed to develop PISs and ICFs templates specific to the REPO4EU project. To prepare information sheets and informed consent templates for one of the trials under the REPO4EU project (REPO-HYPER II), we conducted a multistep methodological approach and two pilot exercises:

            1. Bibliographic Review: We comprehensively searched scientific literature to identify the essential elements of PISs for clinical trials. This search included peer-reviewed articles and relevant guidelines. Additionally, “gray literature” sources, such as reports and informed consent templates from regulatory bodies and other institutions, were explored.

            2. Checklist Development: The bibliographic review informed the creation of a preliminary checklist table to capture the key information required for both documents. We then validated this initial checklist by analyzing 15 PISs and ICFs from ongoing clinical trials registered in the EU Clinical Trials database. The analysis focused on how these trials met the information needs of potential participants. The initial checklist was subsequently modified based on the analysis of existing clinical trials.

            3. Template Design: After ensuring the text included information on all the previously checked requirements, we considered the recommendations outlined in Coleman et al. 17 to improve readability and, consequently, enhance the participants’ decision-making capacity. We used the PIS and ICF templates from the Swedish regulatory agency responsible for approving the REPO4EU trial as the main template. We modified this template using other templates in the corpus of documents used in this study and the requirements from our checklist.

            We performed two pilot exercises to prepare PISs and consent templates specific to REPO4EU. These exercises include the PISs and ICFs template for the REPO4EU HYPER II trial. The templates provided in this report are provisional and require further refinement in close collaboration with the relevant partners and in accordance with any specific requirements by local/National Ethics Committees or Institutional Review Boards. However, manually creating PISs and ICFs that meet these requirements can be time-consuming 14 and prone to errors, prompting the exploration of automated solutions based on LLMs.

            We amended the templates following the recommendations outlined in the study by Coleman et al., 17 summarized in Table 1 .

            Table 1.

            Recommendations for Participant Information/Informed Consent Form

            CategoryRecommendation
            Use leaflet format
            Use line spacing (1.2–1.5)
            LayoutIf appropriate to support the main message, use simple images or illustrations
            Use text boxes if highlight is needed
            Align text left
            Type size 12
            Use a sans serif font (e.g., Arial, Verdana, Tahoma)
            “All Capitals” should be avoided
            FormattingAvoid underlining
            Avoid italics
            Use clear headings
            Use clear contrast between text and background
            Avoid long sentences
            Use short paragraphs
            Use of questions in headings
            Use bullet points or numbered lists instead of long sentences
            LanguageMinimize technical language or jargon
            Specify numbers, avoiding the use words like “multiple”
            Use words for numbers 0–9; for 10+ use the digit
            Use whole numbers (avoiding percentage) for risk or benefits
            Avoid subordinate clauses

            The table is adapted from the study by Coleman et al. 17 .

            Large Language Models

            As explained in the previous section, drafting PISs and ICFs for clinical trials is complex and time-consuming, demanding meticulous attention to detail. LLMs can potentially solve this by automating tedious and error-prone tasks. We aim to leverage LLMs to create a tool that automatically generates PISs and ICFs for various clinical trials, adhering to the templates and checklists designed and described in the previous section.

            LLMs are deep learning models trained on significant quantities of data to learn and generate text that mimics the natural language and other forms of content. Regarding text, during their training stage, the models learn to predict a sentence’s following word by considering the sentence’s previous words and assigning a probability score to the overall frequency of words. Once trained, LLMs can generate text by predicting the sequence of words considering the input given as context and the knowledge they received during training. 18 A well-known approach that facilitates the application of LLMs is Langchain. 19,20 Existing work shows LLMs’ excellent performance in a plethora of tasks, 21 including assisting in patient communication and simplifying documentation tasks. 16

            Because they are trained with general information from multiple sources and, therefore, lack domain-specific knowledge. 22 LLMs alone can be inconsistent and inaccurate for specific applications. A Retrieval Augmented Generation (RAG) framework 23 addresses this limitation by retrieving relevant documents to provide context for the LLMs. RAG systems consist of two major components: the retriever, which gathers relevant documents or other information based on the provided input, and the generator, which produces suitable answers according to the retrieved information. Studies have shown that retrieval augmentation effectively helps LLMs to surpass knowledge boundaries when supplementary context is required and enhances LLMs’ capacity to answer questions. 24 By incorporating questions from a checklist, the RAG framework identifies relevant documents, which the generator can use to produce a more reliable and accurate final output.

            The RAG system forms the second major component of our methodology. It significantly enhances the accuracy and compliance of generated documents. This system maintains a knowledge base of regulatory documents using vector storage and embeddings, allowing for real-time verification of compliance requirements. The RAG implementation ensures that all generated content aligns with current regulatory standards while maintaining context-specific accuracy.

            In generating the PISs and the ICFs with the LLM, we plan to use a prompt template to ensure all crucial elements of a clinical trial—trial purpose, procedures, and risk factors—are consistently included. The templates maintain a balance between regulatory compliance and readability, addressing one of the critical challenges identified in the current process.

            Figure 1 summarizes our proposed methodology. We will use the tables with the required PIS and ICF information as checklists of questions. Each question is a query that serves as a prompt for the LLM supplemented with the context in the knowledge database. This additional knowledge enhances the accuracy of generated text. Therefore, with the prompt and retrieved documents, the LLM may generate a new PIS and ICF and a completed checklist.

            Figure 1.

            Proposed methodology for creating PISs (participant information sheets) and ICFs (informed consent forms) using LLMs (Large Language Models).

            To evaluate the results of our RAG-based LLM application, we need to address two main components: the retriever and the generator of the RAG framework. 25 Hence, we will assess metrics such as the following:

            • Contextual relevance, which evaluates if the information in the retrieval context is relevant given a specific input;

            • Contextual precision, which evaluates if the retriever properly ranks relevant information;

            • Contextual recall, which evaluates if the retrieved information complies with the expected output;

            • Faithfulness, which evaluates if the generator’s output complies with the information presented in the retrieval context;

            • Answer relevance evaluates if the generator’s output is relevant given a particular context.

            Since statistical-based scorers are inaccurate because they fail to consider semantics, we plan to resort to scorers that are model-based or both statistical and model-based for better results. 26 We will also consider user feedback to corroborate our result documents in real-life clinical trials.

            Our methodology also incorporates a robust memory component that maintains consistency across documents and versions. This system tracks document history, maintains audit trails, and ensures version control, which are crucial for regulatory compliance. Additionally, the memory component helps maintain consistency in terminology and explanations across different sections of the documents.

            The generated documents may only sometimes meet the desired quality. As such, quality control validates multiple aspects of the generated documents. This process includes automated assessment of readability levels, regulatory compliance verification, and consistency checking. The system uses a feedback loop to continuously improve document quality based on expert review and user feedback. This iterative improvement process helps maintain high standards while adapting to specific trial requirements.

            DATA AVAILABILITY

            The authors are committed to sharing the data used, provided it is not subject to proprietary or privacy regulations, upon completion of the study.

            CODE AVAILABILITY

            The authors are committed to sharing the code, provided it is not subject to proprietary or privacy regulations, upon completion of the study.

            AUTHOR CONTRIBUTIONS

            FL contributed to the study design, literature review, synthesis, discussion of the findings, and preparation of the manuscript. CT contributed to the study design, especially regarding LLMs, discussion, and the preparation of the manuscript. TC contributed to the study design, especially regarding LLMs, discussion, and the preparation of the manuscript. MSA contributed to the study design, discussion of findings, and the preparation of the manuscript. ASC coordinated the project and contributed to the study design, literature review, synthesis, discussion of the findings, and preparation of the manuscript. All authors reviewed and approved the manuscript prior to submission.

            CONFLICTS OF INTEREST

            The authors declare no conflict of interest.

            REFERENCES

            1. Nardini C. The ethics of clinical trials. Ecancermedicalscience. 2014. Vol. 8:387. [Cross Ref]

            2. Shafiq N, Malhotra S. Ethics in clinical research: Need for assessing comprehension of informed consent form? Contemp Clin Trials. 2011. Vol. 32(2):169–172. [Cross Ref]

            3. Feinberg IZ, Gajra A, Hetherington L, McCarthy KS. Simplifying informed consent as a universal precaution. Sci Rep. 2024. Vol. 14:13195[Cross Ref]

            4. Wisgalla A, Hasford J. Four reasons why too many informed consents to clinical research are invalid: A critical analysis of current practices. BMJ Open. 2022. Vol. 12:050543. [Cross Ref]

            5. Pietrzykowski T, Smilowska K. The reality of informed consent: Empirical studies on patient comprehension—systematic review. Trials. 2021. Vol. 22:57[Cross Ref]

            6. Mirza FN, Wu E, Abdulrazeq HF, et al.. The literacy barrier in clinical trial consents: A retrospective analysis. EClinicalMedicine. 2024. Vol. 75:102814. [Cross Ref]

            7. Artsi Y, Sorin V, Konen E, Glicksberg BS, Nadkarni G, Klang E. Large language models in simplifying radiological reports: Systematic review. medRxiv [Preprint]. 2024. cited 2024 Nov 15[Cross Ref]

            8. Doshi R, Amin K, Khosla P, Bajaj S, Chheang S, Forman HP. Utilizing large language models to simplify radiology reports: A comparative analysis of ChatGPT3.5, ChatGPT4.0, Google Bard, and Microsoft Bing. medRxiv [Preprint]. 2023. cited 2024 Nov 15[Cross Ref]

            9. Directive 2001/20/EC of the European Parliament and of the Council of 4 April 2001 on the approximation of the laws, regulations and administrative provisions of the Member States relating to the implementation of good clinical practice in the conduct of clinical trials on medicinal products for human use. OJEU. 1–May;2001. Vol. L 121:34–44. https://eur-lex.europa.eu/eli/dir/2001/20/oj

            10. Regulation (EU) No 536/2014 of the European Parliament and of the Council of 16 April 2014 on clinical trials on medicinal products for human use, and repealing Directive 2001/20/EC. OJEU. 27–May;2014. Vol. L 158:1–76. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A32014R0536

            11. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). OJEU. 4–May;2016. Vol. L 119:1–88. https://eur-lex.europa.eu/eli/reg/2016/679/oj

            12. Gesualdo F, Daverio M, Palazzani L, et al.. Digital tools in the informed consent process: A systematic review. BMC Med Ethics. 2021. Vol. 22:18[Cross Ref]

            13. Cohen E, Byrom B, Becher A, Jörntén-Karlsson M, Mackenzie AK. Comparative Effectiveness of eConsent: Systematic Review. J Med Internet Res. 2023. Vol. 25:e43883. [Cross Ref]

            14. O’Sullivan L, Sukumar P, Crowley R, McAuliffe E, Doran P. Readability and understandability of clinical research patient information leaflets and consent forms in Ireland and the UK: A retrospective quantitative analysis. BMJ Open. 2020. Vol. 10:e037994. [Cross Ref]

            15. De Sutter E, Meszaros J, Borry P, Huys I. Digitizing the informed consent process: A review of the regulatory landscape in the European Union. Front Med (Lausanne). 2022. Vol. 9:906448. [Cross Ref]

            16. Clusmann J, Kolbinger FR, Muti HS, et al.. The future landscape of large language models in medicine. Commun Med. 2023. Vol. 3:141[Cross Ref]

            17. Coleman E, O’Sullivan L, Crowley R, et al.. Preparing accessible and understandable clinical research participant information leaflets and consent forms: A set of guidelines from an expert consensus conference. Res Involv Engagem. 2021. Vol. 7:31[Cross Ref]

            18. Zhao WX, Zhou K, Junyi L, et al.. A survey of large language models. arXiv:2303.18223 [Preprint]. 2024. cited 2024 Nov 15[Cross Ref]

            19. LangChain, Inc. LangChain Documentation. [Internet]. cited 2024 Nov 15 https://python.langchain.com/docs/introduction/

            20. GitHub. LangChain-AI/Langchain: Build Context-aware Reasoning Applications [Internet]. cited 2024 Nov 15 https://github.com/langchain-ai/langchain

            21. Bubeck S, Chandrasekaran V, Eldan R, et al.. Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv:2303.12712 [Preprint]. 2023. cited 2024 Nov 15[Cross Ref]

            22. Chen J, Lin H, Han X, Sun L. Benchmarking large language models in retrieval-augmented generation. Proc AAAI Conf Artif Intell. 2024. Vol. 38(16):17754–17762. [Cross Ref]

            23. Lewis P, Perez E, Piktus A, et al.. Retrieval-augmented Generation for Knowledge-intensive NLP Tasks [Internet]. https://github.com/huggingface/transformers/blob/master/

            24. Ren R, Wang Y, Qu Y, et al.. Investigating the factual knowledge boundary of large language models with retrieval augmentation. arXiv:2307.11019 [Preprint]. 2023. cited 2024 Nov 15[Cross Ref]

            25. Yu H, Gan A, Zhang K, Tong S, Liu Q, Liu Z. Evaluation of retrieval-augmented generation: A survey. arXiv:2405.07437 [Preprint]. 2024. cited 2024 Nov 15[Cross Ref]

            26. Liu Y, Iter D, Xu Y, Wang S, Xu R, Zhu C. G-Eval: NLG Evaluation using GPT-4 with better human alignment. arXiv:2303.16634 [Preprint]. 2023. cited 2024 Nov 15[Cross Ref]

            Author and article information

            Journal
            dr
            Drug Repurposing
            ScienceOpen (Berlin )
            2941-2528
            18 December 2024
            : 1
            : 2
            : e20240015
            Affiliations
            [1 ] ICBAS – School of Medicine and Biomedical Sciences, University of Porto, Porto, Portugal ( https://ror.org/043pwc612)
            [2 ] FCUP – Faculty of Sciences, University of Porto, Porto, Portugal;
            Author notes
            *Correspondence to: Ana Sofia Carvalho, ICBAS – School of Medicine and Biomedical Sciences, University of Porto, Porto, Portugal. E-mail: anasofiapintodecarvalho@ 123456gmail.com ; aacarvalho@ 123456icbas.up.pt
            Author information
            https://orcid.org/0000-0003-3032-8851
            https://orcid.org/0009-0007-3950-2533
            https://orcid.org/0000-0002-7700-1955
            https://orcid.org/0000-0002-8032-7390
            https://orcid.org/0000-0003-1132-8880
            Article
            10.58647/DRUGREPO.24.2.0015
            7e663456-cc69-486d-a2a5-e307b1607799
            2024 The Author(s).

            This work has been published open access under Creative Commons Attribution License (CC BY) 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com.

            History
            : 11 October 2024
            : 25 November 2024
            Page count
            Figures: 1, Tables: 1, References: 26, Pages: 5
            Funding
            Funded by: European Union
            Award ID: 101057619
            REPO4EU is funded by the European Union under grant agreement no. 101057619. Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or the European Health and Digital Executive Agency (HADEA). Neither the European Union nor the granting authority can be held responsible for them.
            Categories

            automation,AI-driven consent,ethics and privacy-by-design,Large Language Models (LLMs),clinical trials,informed consent

            Comments

            Comment on this article