5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Improving ascertainment of suicidal ideation and suicide attempt with natural language processing

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Methods relying on diagnostic codes to identify suicidal ideation and suicide attempt in Electronic Health Records (EHRs) at scale are suboptimal because suicide-related outcomes are heavily under-coded. We propose to improve the ascertainment of suicidal outcomes using natural language processing (NLP). We developed information retrieval methodologies to search over 200 million notes from the Vanderbilt EHR. Suicide query terms were extracted using word2vec. A weakly supervised approach was designed to label cases of suicidal outcomes. The NLP validation of the top 200 retrieved patients showed high performance for suicidal ideation (area under the receiver operator curve [AUROC]: 98.6, 95% confidence interval [CI] 97.1–99.5) and suicide attempt (AUROC: 97.3, 95% CI 95.2–98.7). Case extraction produced the best performance when combining NLP and diagnostic codes and when accounting for negated suicide expressions in notes. Overall, we demonstrated that scalable and accurate NLP methods can be developed to identify suicidal behavior in EHRs to enhance prevention efforts, predictive models, and precision medicine.

          Related collections

          Most cited references31

          • Record: found
          • Abstract: not found
          • Article: not found

          Bootstrap Methods: Another Look at the Jackknife

          B Efron (1979)
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A simple algorithm for identifying negated findings and diseases in discharge summaries.

            Narrative reports in medical records contain a wealth of information that may augment structured data for managing patient information and predicting trends in diseases. Pertinent negatives are evident in text but are not usually indexed in structured databases. The objective of the study reported here was to test a simple algorithm for determining whether a finding or disease mentioned within narrative medical reports is present or absent. We developed a simple regular expression algorithm called NegEx that implements several phrases indicating negation, filters out sentences containing phrases that falsely appear to be negation phrases, and limits the scope of the negation phrases. We compared NegEx against a baseline algorithm that has a limited set of negation phrases and a simpler notion of scope. In a test of 1235 findings and diseases in 1000 sentences taken from discharge summaries indexed by physicians, NegEx had a specificity of 94.5% (versus 85.3% for the baseline), a positive predictive value of 84.5% (versus 68.4% for the baseline) while maintaining a reasonable sensitivity of 77.8% (versus 88.3% for the baseline). We conclude that with little implementation effort a simple regular expression algorithm for determining whether a finding or disease is absent can identify a large portion of the pertinent negatives from discharge summaries.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Development of a large-scale de-identified DNA biobank to enable personalized medicine.

              Our objective was to develop a DNA biobank linked to phenotypic data derived from an electronic medical record (EMR) system. An "opt-out" model was implemented after significant review and revision. The plan included (i) development and maintenance of a de-identified mirror image of the EMR, namely, the "synthetic derivative" (SD) and (ii) DNA extracted from discarded blood samples and linked to the SD. Surveys of patients indicated general acceptance of the concept, with only a minority ( approximately 5%) opposing it. As a result, mechanisms to facilitate opt-out included publicity and revision of a standard "consent to treatment" form. Algorithms for sample handling and procedures for de-identification were developed and validated in order to ensure acceptable error rates (<0.3 and <0.1%, respectively). The rate of sample accrual is 700-900 samples/week. The advantages of this approach are the rate of sample acquisition and the diversity of phenotypes based on EMRs.
                Bookmark

                Author and article information

                Contributors
                adi.bejan@vanderbilt.edu
                Journal
                Sci Rep
                Sci Rep
                Scientific Reports
                Nature Publishing Group UK (London )
                2045-2322
                7 September 2022
                7 September 2022
                2022
                : 12
                : 15146
                Affiliations
                [1 ]GRID grid.152326.1, ISNI 0000 0001 2264 7217, Department of Biomedical Informatics, , Vanderbilt University Medical Center, Vanderbilt University School of Medicine, ; 2525 West End Avenue, Suite 1500, Nashville, TN 37232 USA
                [2 ]GRID grid.412807.8, ISNI 0000 0004 1936 9916, Department of Medicine, , Vanderbilt University Medical Center, ; Nashville, USA
                [3 ]GRID grid.412807.8, ISNI 0000 0004 1936 9916, Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, , Vanderbilt University Medical Center, ; Nashville, TN USA
                [4 ]GRID grid.412807.8, ISNI 0000 0004 1936 9916, Department of Psychiatry and Behavioral Sciences, , Vanderbilt University Medical Center, ; Nashville, TN USA
                Article
                19358
                10.1038/s41598-022-19358-3
                9452591
                36071081
                4fba85fe-35b1-46fe-89e0-5c2a889c399f
                © The Author(s) 2022

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 15 April 2022
                : 29 August 2022
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000025, National Institute of Mental Health;
                Award ID: R01 MH121455
                Award ID: R01 MH121455
                Award ID: R01 MH121455
                Award ID: R01 MH121455
                Award ID: R01 MH121455
                Award ID: R01 MH121455
                Award ID: R01 MH121455
                Award ID: R01 MH121455
                Award Recipient :
                Categories
                Article
                Custom metadata
                © The Author(s) 2022

                Uncategorized
                data processing,data mining,machine learning
                Uncategorized
                data processing, data mining, machine learning

                Comments

                Comment on this article