Application and technology of an open source AI large language model in the medical field

Zhang, Tairui; Feng, Tianyi

doi:10.15212/RADSCI-2023-0007

Abstract

To explore the application prospects of an open source artificial intelligence (AI) large language model (LLM) in the medical field, we conducted an analysis from multiple dimensions, including the introduction of LLM, the classification of model types, and the status quo of the open source ecosystem development. The development of an open source LLM is currently in the rapid expansion phase, and there are many types of models and related tools. After analyzing the advantages and disadvantages of the models, we expounded feasible technical solutions for the application of an LLM in the medical field and made corresponding predictions. At present, LLMs in the medical field are still in the early stages, and there are still many problems related to ethics, technology, legal issues, and medical use.

Main article text

1. INTRODUCTION

Medical and healthcare are important parts of China’s national economy and key industries that protect people’s lives and health. Given the spread of the coronavirus and other diseases, numerous countries have encountered many problems, such as a shortage of medical resources and medical personnel. Combining artificial intelligence (AI) with medical care can assist physicians in virus screening [1] and disease diagnosis [2], thereby reducing the misdiagnosis rate and improving the efficiency of diagnosis and treatment. Recently, large language models represented by ChatGPT [3] and GPT-4 [4] have attracted attention from academia and industry. Many Chinese technology companies have also launched large language models (LLMs) to compete internationally at an advanced level. With powerful communication fluency, semantic understanding, inductive reasoning, and other abilities, LLMs have rapidly penetrated all walks of life.

In this context, the combination of LLM and medicine has established a new direction in the medical field. Because ChatGPT, GPT-4, and other models require high computing power and labor costs, many companies and research teams have launched a variety of open source LLMs. Indeed, the initiative promotes the rapid development of LLMs. In the current study we analyzed and discussed the advantages and disadvantages, technical solutions, and application scenarios of open source LLMs in the medical field by sorting out the technological development status, therefore aiming to promote the mutual integration and development of open source models and medicine.

2. AI LLMs

AI LLMs are also referred to as foundation models [5]. AI LLMs are trained on massive, diverse datasets and can handle a variety of downstream tasks [6]. The LLMs have multiple rounds of dialogue and the ability to understand user intentions. The LLMs have better versatility and generalization, which overcomes the traditional model problem of poor versatility.

Transformer was proposed by Vaswani et al. [7] in 2017. With excellent scalability and parallel computing capabilities, Transformer quickly replaced recurrent neural network (RNN) and long short-term memory (LSTM) to become the mainstream architecture in natural language processing (NLP). It also has been extended to the computer vision (CV). It is possible to design and train a model with a parameter scale exceeding 100 billion based on Transformer, and the models have good generalization. Figure 1 shows the AI LLMs with parameters > 10 billion that have emerged since 2019.

Figure 1 |

Large models with parameters greater than 10 billion since 2019 [8].

With the release of GPT-3 [9], ChatGPT, and GPT-4, prompt learning [10], instruction learning [11], reinforcement learning from human feedback (RLHF) [12] have become common training methods. Prompt learning unifies downstream tasks into pre-training tasks and converts downstream tasks into natural language with specific templates. Instruction learning can better motivate model comprehension ability compared to prompt learning. Instruction learning uses instructions to guide the model to take the correct action, which makes model generalization ability stronger. RLHF refers to evaluating the output of the models in the form of human feedback and using the feedback as a loss to optimize the model. Indeed, this approach can make the output more innocuous.

LLMs be divided into decoder-only, encoder-only, and decoder-encoder structures [13]. Models with different structures are suitable for different downstream tasks ( Table 1 ). Most of the early LLMs are open source, such as BERT [28], ERNIE [29], T5 [30], and BART [31]. These models use encoder or encoder-decoder as the main structure and have better encoding capabilities. In recent years, GPT-3, ChatGPT, and GPT-4 have adopted the decoder-only structure. Indeed, decoder-only is the most popular structure due to its excellent generation ability. With the high research cost of LLMs, many decoder-only models are not open source.

Table 1 |

Summary of mainstream large language models.

Structure	Publisher	Model
Encoder-only	Google	BERT, ALBERT [14]
	Baidu	ERNIE, ERNIE2.0 [15]
	Meta	RoBERTa [16]
	Microsoft	DeBERTa [17]
Encoder-decoder	Google	T5, Flan-T5 [18]
	Tsinghua University	GLM [19], GLM-130B [20]
Decoder-only	OpenAI	GPT-1 [21], GPT-2 [22], GPT-3, InstructGPT, ChatGPT, GPT-4
	Google	XLNet [23], LaMDA [24], Bard, PaLM [25]
	Meta	LLaMA [26], Galactica [27]

Although ChatGPT and GPT-4 can be used at no cost to the user, many companies have not announced the implementation details of the models. There are insurmountable technical barriers. It is difficult for individual developers, small companies, and research institutions to develop more innovative models, and the technical barriers hinder the promotion and application of LLMs in more fields.

In February 2023, Meta open sourced the LLaMA. The LLaMA derivatives, Alpaca [32] and Vicuna [33], can be trained at a lower cost. These models can even achieve the ability of ChatGPT, which promotes the wave of a LLM open source. At present, a number of open source LLMs for the medical field have been released. BioMedLM is a domain-specific LLM for biomedical text released by the Center for Research on Foundation Models (CRFM) in January 2023. BioMedLM uses a dataset that includes 16 million medical abstracts and 5 million studies. BioMedLM achieved state of the art results on the USMLE medical question and answer test. In April 2023, Tsinghua University open-sourced BioMedGPT [34]. The training data includes multi-scale and cross-modal biomedical data. The BioMedGPT model has the ability to predict drug properties and natural language processing. Visual Med-Alpaca, which was released in April 2023 by the Language Technology Laboratory at the University of Cambridge, recognizes and analyzes chest X-rays, and generates diagnostic conclusions. The research team of health intelligence (HIT) constructed a Chinese medical instruction dataset based on the knowledge map and application programming interface (API) of InstructGPT. The research team of HIT trained HuaTuo, a LLM of intelligent consultation based on LLaMA [35], which overcame the LLM limited language problem in the Chinese context.

Technology open source promotes the rapid development of LLMs in the vertical field, medical models with low deployment costs, high professionalism, and a strong understanding ability. Compared with traditional medical models, the capabilities of the LLMs have improved.

3. SUMMARY OF LLM OPEN SOURCE ECOSYSTEM

The term, open source, was officially proposed by the open system interconnect in 1988. After decades of development, open source has become the main driving force for innovation in emerging technologies. Open source can minimize repetitive labor, save development resources, promote technological breakthroughs, lower development thresholds, and accelerate the promotion and application of new technologies. The term, ecosystem, originated from the field of biology and refers to the natural system formed by organisms and the environment [36]. We believe that the LLM open source ecosystem is centered on open source models, and supported by AI technology, training platforms, and datasets. Together, the open source model support elements constitute a technical ecosystem.

3.1 Classification

The LLMs are classified based on different modalities and fine-tuning methods. We will introduce the development of two new LLMs.

When classified based on modality, open source LLMs can be divided into single modality, bimodal, and multimodal models. Single modality models can only handle NLP, CV, or audio tasks, such as Alpaca, BLOOM [37], ChatGLM, and GPT-2. The language model can be subdivided according to the output or the language, such as the code generation model (StarCoder [38]), the Chinese dialogue model (Chinese-Vicuna), the multilingual dialogue model (ChatGLM-6B), and the medical advice generation models (MedicalGPT-zh and Chat Doctor [39]). The bimodal models can handle two types of data and can be divided into text-to-image (CogView [40] and consistency models [41]), text-image mutual generation (UniDiffuser [42]), image-text matching (BriVL [43]), text-to-speech (Massively Multilingual Speech [44]), speech-to-text (Whisper [45]), and text-speech mutual generation (AudioGPT [46]) models. The multimodal LLMs can process data involving three or more modalities (text, image, and speech). For example, ImageBind can achieve an arbitrary understanding and conversion between six modalities (text, image, audio, depth, inertial measurement unit and thermal) [47].

Fine-tuning models can also be divided into models that have not been fine-tuned (LLaMA), models that have been fine-tuned by instructions (WizardLM [48], Dolly2.0, and Chinese-LLaMA-Alpaca) and RLHF models (StableVicuna, ChatYuan-large-v2, and OpenAssistant [49]). Fine-tuning refers to initializing the target network with the obtained parameters and training the target network with a dedicated dataset. Instruction tuning uses supervisory signals to guide the model to perform tasks described in the form of instructions so that the models can respond correctly to new tasks. WizardLM-7B uses the evol-instruct to automatically generate open-domain instructions with various levels of difficulty and skill ranges. A part of the WizardLM-7B output content achieves an effect similar to ChatGPT. RLHF relies on manually labeled data and the support of open source frameworks. StableVicuna uses Vicuna as the basic model, follows the three-stage RLHF training proposed by OpenAI, and has the ability to communicate.

In addition to the above types of LLMs, autonomous AI and large language models with plug-in systems are two new types of AI products. Autonomous AI is represented by AutoGPT, AgentGPT, and BabyAGI. This product can use the GPT-4 interface and other models to independently complete tasks given by humans, making up for the GPT-4 shortcomings that cannot be searched online. The NLP Group at Fudan University released the MOSS in April 2023, which can use plug-ins, such as search engines and calculators, to complete specific tasks. The plug-in system makes the models more flexible, enhances the expertise, and improves model interpretability and robustness.

After several years of development, open source LLMs have shown the advantages derived from different types, comprehensive functions, and wide usage scenario coverage. Fine-tuning based on the above models has become the most popular method for developing large language models in the medical field. For example, Huatuo, PMC-LLaMA [50], and ChatDoctor are based on LLaMA for fine-tuning, MedicalGPT-zh, DoctorGLM, and ChatGLM-Med are based on ChatGLM, and BioMedLM are based on GPT-2. While open sourcing the model code, most research institutions also provide models with different parameter scales to assist developers in reproducing the model under different hardware resources and publish relevant training data, which lowers the entry threshold for LLMs.

3.2 Open source framework

The open source framework encapsulates the commonly used training paradigms (instruction tuning and RLHF) into services or interfaces, which greatly reduces the amount of manually written code and saves graphics memory. The open source framework decreases the difficulty of training and achieves the unity of high efficiency and economy.

Instruction tuning frameworks include OpenGPT and LMFlow. OpenGPT can create samples based on domain data and the NHS-LLM trained with this framework has achieved more accurate results than ChatGPT based oseveral tests. The RLHF frameworks include trlX, DeepSpeed-Chat, ColossalAI, and Lamini. This type of framework realizes the popularization of RLHF training. For example, DeepSpeed-Chat can train a model with more than 13 billion parameters under the support of a single GPU, which enables researchers to create more powerful models under limited conditions. Lamini can package time-consuming and complex fine-tuning as a service.

In addition to optimizing, integrating, and encapsulating the training process of LLMs in the framework, there are also a number of new research projects. The self-instruct released by the University of Washington generates instructions autonomously by the models. This method effectively reduces the cost of manually labeled data and improves the ability of the model to follow instructions [51]. LoRA is a fine-tuning method proposed by Microsoft the can reduce the trainable parameters of the model without sacrificing performance [52]. The Alpaca-Lora uses this method to fine-tune the LLaMA 7B and achieves the same effect as Alpaca with few training parameters.

With the support of open source frameworks and new methods, the hardware resource requirements and development difficulties have been continuously reduced, and model performance has continued to improve.

3.3 Open source dataset

The capabilities of LLMs arise from datasets. LLM training relies on sufficiently large and complex training data. For example, GPT-1 is trained with BookCorpus (a corpus of unpublished free books by the authors). The model has acquired important world knowledge and the ability to manage long-term dependencies. When institutions open source their models, institutions usually open source the training data as well. For example, CRFM released the self-instruct dataset generated by text-davinci-003 while open sourcing Alpaca. Dataset open sourcing improves the utilization rate of resources and has a positive impact on academic research pertaining to LLMs.

The medical data include clinical datasets (MIMIC-II Clinical Database), doctor-patient dialogue datasets (HealthCareMagic-100k, icliniq-10k, GenMedGPT-5k, and alpaca-52k used in ChatDoctor training), Chinese medical dialogue datasets (data used by DoctorGLM), and self-built datasets consisting of medical paper abstracts and texts, and medical image datasets (DDSM, MIAS, and MURA). Medical open source datasets in the Chinese field are relatively scarce and rely on the instructions generated by ChatGPT, but this method is inaccurate and uncertain. To build a healthy and high-quality Chinese medical open source model field, there is an urgent to gather real and reliable medical data at a higher level to improve data quality. At the same time, evaluation sets that assess the capabilities of LLMs are also necessary.

In summary, the open-source ecologic development of LLMs is in a rapid growth phase. Various models, frameworks, and methods emerge in an endless stream, which provides a broad range of models and technology selection for the researcher to use. A highly versatile model comparable to GPT-4, however, is lacking. The limitation of a model’s capability remains a problem that cannot be dismissed, and the gap between close and open source models still exists. There is also a lack of a unified framework that simultaneously integrates instruction tuning and RLHF. There are no professional and systematic evaluation indicators in the construction of datasets. An open source ecosystem of LLMs needs to develop in the direction of generalization, specialization, and systematization.

4. OPEN SOURCE LLMS IN THE MEDICAL FIELD

The following will describe the application of open source LLMs in the medical field from three aspects: advantages and disadvantages analysis, feasible technical solutions, and application scenarios.

4.1 Advantages of open source models in the medical field

The advantages of open source LLMs in the medical field can be summarized as low-cost deployment, variety of functions, and diverse interactions.

First, LLMs usually perform reasoning tasks in clusters. The WebLLM project move the reasoning process to the client and runs in the browser, which minimizes server overhead and is more friendly to users (i.e., no need to use the complex command to run the model). Localized deployment is also more suitable for application scenarios, such as hospitals with limited hardware resources and high data security levels.

Second, the open source LLMs have a wide variety of functions and there are mature open source products in medical image processing and text generation. ChatDocter can conduct consultations in text form and ImpressionGPT can summarize and optimize radiology reports [53]. To eliminate the defect of LLM information lag, WebCPM was released by Tsinghua University in May 2023 and can interact with search engines and collect answers [54]; the generated content is more real-time.

Third, as the most common application scenario for LLMs, online medical consultation requires the model to have a high level of Chinese dialogue ability. Linly-Chinese-LLAMA, BELLE, Chinese-Vicuna, and Bai Ze are trained on Chinese datasets and have reached a high level in Chinese communication.

4.2 Disadvantages of open source models in the medical field

Due to the low-fault tolerance of the healthcare industry, most open source models are trained based on the community open corpus, and the content was not manually corrected. At the same time, open source models are limited by parameter scale and hardware resources. The model may generate biased, toxic, and inaccurate content, which will pose a threat to the safety of patients. Beaver, a highly modular RLHF training framework open sourced by the Peking University team, significantly reduces the biased and discriminatory content output of the model through constrained value alignment (CVA). This type of method is currently still in the development stage. In medical scenarios, physicians are also required to evaluate and give feedback on the professionalism of the output to reduce the errors and inaccurate information.

4.3 Feasible technical solutions

Deploying LLMs in medical scenarios can be divided into the following three technical solutions: 1) The capabilities of ChatGPT and GPT-4 should be used to solve professional tasks in the medical field with API. This approach is similar to the AutoGPT and HuggingGPT technical solutions [55]. Medical institutions can use the interface provided by LangChain or similar frameworks to effectively utilize the capabilities of multiple models to complete a large number of tasks. This method is easier to develop and easy to deploy. The disadvantage is that frequent usage of the ChatGPT and GPT-4 API may incur a large amount of expenses. Moreover, the degree of customization of the model is low, and the risk of data security is high. 2) Due to the sensitivity of medical data, it is difficult for cloud services to guarantee data security. Medical institutions or teams can rely on open source or medical field datasets to independently develop medical LLMs. The advantage of this solution is that the model fits perfectly with medical purposes and has a high level of customization. The disadvantage is that the independent development of LLMs will consume a lot of manpower and financial costs, which only the top institutions can afford. 3) Pre-training and fine-tuning for the medical use on the open source models is a compromise between the above two solutions. Researchers can choose the more popular decoder-only structure, which has stronger generation capabilities. The steps of pre-training, supervised fine-tuning, and RLHF should be followed. The datasets include open source data, artificially generated data, and self-instruct data. The open source model can be customized and developed at a controllable cost; however, some popular models (LLaMA and Alpaca) do not support commercial use, and it is necessary to avoid the risk of infringement when using models. Currently, the powerful open source models that allow commercial applications include ChatGLM2, Baichuan2, and LLaMA2, each of which has different parameter sizes.

Faced with so many models, frameworks, and technologies, some basic steps for building a medical large model are provided for reference: 1. Clarify the types of model users, including patients, medical institutions, physicians, and medical regulatory departments. 2. Clarify the requirements and objectives based on the first step. For example, the requirements and objectives can be divided according to the breadth of coverage (functional enhancement, process intelligence, and intelligence across multiple processes). The requirements and objectives are also divided into single modal and multimodal requirements. 3. Collect, filter, and standardize training data to form a high-quality supervised fine-tuning (SFT) dataset, including a high-quality physician-patient dialogue dataset, medical knowledge question and answer-related data, and a human preference dataset. The second and third steps interact with each other, such as generally clarifying the requirements of the current stage by considering the available data. 4. Select model versions of different sizes based on the range of data trained by the model and the prepared data, hardware resources, and funding situation among many open source models. 5. Train or fine tune the model. 6. Evaluate the capability of the new model based on the publicly available dataset.

4.4 Potential application

The development of medical AI began in 1972 with the AAPHelp system released by the University of Leeds [56]. Entering the era of LLMs, the computing power and comprehensive performance of models have been continuously improved and have reached the same level as humans in many fields. LLMs will play an increasingly important role in the medical field. Some typical application scenarios are as follows: 1. In scenario 1, open source LLMs can be used as analytic tools for medical images. Unlike traditional CV models that can only label and recognize images in a single domain, LLMs are more versatile. The SAM has strong generalization ability and can achieve zero-shot transfer on new tasks [57]. At the same time, the LLMs can also output the disease information of the medical image in the form of text, which can achieve a rapid diagnosis. 2. In scenario 2, open source LLMs can be used as daily medical assistants that provide medical consultation and drug recommendation services for patients. Patients can input their symptoms and medical history into the models, and it can search and summarize based on existing medical knowledge or search engines to form diagnostic recommendations. Finally, based on medical and clinical data, the best treatment drugs are recommended. 3. In scenario 3, using open source LLMs to generate or retrieve medical reports can reduce physician workload. The generation of medical reports is often performed manually by physicians. Because the reports are highly formatted with systematic text, the LLM has a strong ability to generate the reports. 4. In scenario 4, open source LLMs can be applied to clinical research to improve the efficiency of data analysis and problem investigation. Researchers can use autonomous AI products, such as AutoGPT, to complete preliminary research work by independently generating tasks and searching online. For text writing, researchers can use BioGPT [58] or similar tools to complete the classification, summarizing, and text generation. 5. In scenario 5, using a small amount of data or even no labeled data for training will comprise a general medical model capable of various medical tasks in future research. Generalist medical artificial intelligence (GMAI) was proposed by Topol and Rajpurkar in 2022 [59]. Idealized GMAI can be trained on large and diverse datasets, and the model can flexibly handle multimodal tasks. GMAI will have advanced medical reasoning abilities that can support clinical decision-making and generate protein amino acid sequences.

The various capabilities currently displayed by the LLMs have many potential applications in the medical field; however, the risks in multiple dimensions (ethics, harmlessness, and public acceptance) need to be considered. We need more mature technical support to continuously improve the reliability of the models.

5. SUMMARY

As one of the most important technical branches of AI, LLMs have penetrated all aspects of our society in < 1 year, but we need to be cautious when applying this technology in the medical field. The laws and regulations in China related to medical AI have not been well-established and the LLM open source ecosystem is still in the early stage. Medical LLMs should be based on open source products, and continuously deepen the research on the professionalism, humanistic care, and accuracy of model output. We also need to complement the supporting tools and promote the healthy development of this field. In the current study, by sorting out and analyzing the status quo of open source ecosystem development in LLMs, we hope to provide reference for promoting the application of LLMs in the medical field.

ACKNOWLEDGEMENTS

This study did not receive any specific grants from funding agencies in the public, commercial, or non-profit sectors.

CONFLICT OF INTEREST

None.

ABBREVIATIONS

AI, artificial intelligence; LLM, large language model; GPT, generative pre-trained transformer; RNN, recurrent neural network; LSTM, long short-term memory; NLP, natural language processing; CV, computer vision; RLHF, reinforcement learning from human feedback.

REFERENCES

Gao X, Khan MHM, Hui R, Tian Z, Qian Y, et al.. COVID-VIT: Classification of Covid-19 from 3D CT chest images based on vision transformer model2022 3rd International Conference on Next Generation Computing Applications (NextComp); 2022. p. 1–4
Costa GSS, Paiva AC, Junior GB, Ferreira MM. COVID-19 automatic diagnosis with CT images using the novel Transformer architecture. Anais do XXI simpósio brasileiro de computação aplicada à saúde. 2021. 293–301
Liu Y, Han T, Ma S, Zhang J, Yang Y, et al.. Summary of ChatGPT/GPT-4 research and perspective towards the future of large language models. Meta Radiol. 2023. Vol. 1:100017. 10.1016/j.metrad.2023.100017
Open AI. GPT-4 Technical Report. 2023. 2303.08774
Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, et al.. On the opportunities and risks of foundation models. 2021. 2108.07258
Zhou C, Li Q, Li C, Yu J, Liu Y, et al.. A comprehensive survey on pretrained foundation models: A history from BERT to ChatGPT. 2023. 2302.09419. 10.48550/arXiv.2302.09419
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, et al.. Attention is all you need. Adv Neural Inf Process Syst. 2017. 30
Zhao WX, Zhou K, Li JY, Tang T, Wang X, et al.. A survey of large language models. 2023. 2303.18223
Brown T, Mann B, Ryder N, Subbiah M, Kaplan J, et al.. Language models are few-shot learners. Adv Neural Inf Process Sys. 2020. 1877–901
Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, et al.. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput Surv. 2023. Vol. 55:1–35. 10.1145/3560815
Wei J, Bosma M, Zhao VY, Guu K, Yu AW, et al.. Finetuned language models are zero-shot learners. 2021. 2109.01652
Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, et al.. Training language models to follow instructions with human feedback. Adv Neural Inf Process Sys. 2022. 27730–44
Yang J, Jin H, Tang R, Han X, Feng Q, et al.. Harnessing the power of llms in practice: a survey on chatgpt and beyond. 2023. 2304.13712
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, et al.. ALBERT: a lite BERT for self-supervised learning of language representations. 2019. 1909.11942
Sun Y, Wang S, Li Y, Feng S, Tian H, et al.. ERNIE 2.0: a continual pre-training framework for language understandingProceedings of the AAAI Conference on Artificial Intelligence; 2020. p. 8968–75
Liu Y, Ott M, Goyal N, Du J, Joshi M, et al.. RoBERTa: a robustly optimized BERT pretraining approach. 2019. 1907.11692
He P, Liu X, Gao J, Chen W. DeBERTa: decoding-enhanced BERT with disentangled attention. 2020. 2006.03654
Chung HW, Hou L, Longpre S, Zoph B, Tay Y, et al.. Scaling instruction-finetuned language models. 2022. 2210.11416
Du Z, Qian Y, Liu X, Ding M, Qiu J, et al.. GLM: General language model pretraining with autoregressive blank infillingProceedings of the 60^th Annual Meeting of the Association for Computational Linguistics; 2022. p. 320–35
Zeng A, Liu X, Du Z, Wang Z, Lai H, et al.. GLM-130B: an open bilingual pre-trained model. 2022. 2210.02414
Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training. 2018.
Radford A, Wu J, Child R, Luan D, Amodei D, et al.. Language models are unsupervised multitask learners. OpenAI Blog. 2019. 9
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, et al.. XLNet: generalized autoregressive pretraining for language understanding. Adv Neural Inf Process Sys. 2019. 32
Thoppilan R, De Freitas D, Hall J, Shazeer N, Kulshreshtha A, et al.. LaMDA: language models for dialog applications. 2022. 2201.08239
Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, et al.. PaLM: Scaling language modeling with pathways. 2022. 2204.02311
Touvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, et al.. LLaMA: open and efficient foundation language models. 2023. 2302.13971
Taylor R, Kardas M, Cucurull G, Scialom T, Hartshorn A, et al.. Galactica: a large language model for science. 2022. 2211.09085
Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. 2018. 1810.04805
Sun Y, Wang S, Li Y, Feng S, Chen X, et al.. ERNIE: enhanced representation through knowledge integration. 2019. 1904.09223
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, et al.. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res. 2020. Vol. 21:5485–551
Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, et al.. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. 2019. 1910.13461
Taori R, Gulrajani I, Zhang T, Dubois Y, Li X, et al.. Alpaca: a strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. 2023. 7 https://crfm.stanford.edu/2023/03/13/alpaca.html
Chiang WL, Li Z, Lin Z, Sheng Y, Wu Z, et al.. Vicuna: an open-source Chatbot impressing GPT-4 with 90%* ChatGPT quality. https://vicuna.lmsys.orgAccessed on 14 Apr 20232023.
Zhang K, Yu J, Yan Z, Liu Y, Adhikarla E, et al.. BiomedGPT: a unified and generalist biomedical generative pre-trained transformer for vision, language, and multimodal tasks. 2023. 2305.17100
Wang H, Liu C, Xi N, Qiang Z, Zhao S, et al.. HuaTuo: tuning LLaMA model with Chinese medical knowledge. 2023. 2304.06975
Tansley AG. The use and abuse of vegetational concepts and terms. Ecology. 1935. Vol. 16:284–307
Scao TL, Fan A, Akiki C, Pavlick E, Ilic S, et al.. BLOOM: a 176B-parameter open-access multilingual language model. 2022. 2211.05100
Li R, Allal LB, Zi Y, Muennighoff N, Kocetkov D, et al.. StarCoder: may the source be with you. 2023. 2305.06161
Yunxiang L, Zihan L, Kai Z, Dan R, Zhang Y, et al.. Chatdoctor: a medical chat model fine-tuned on LLaMA model using medical domain knowledge. 2023. 2303.14070
Xu J, Liu X, Wu Y, Tong Y, Li Q, et al.. ImageReward: learning and evaluating human preferences for text-to-image generation. 2023. 2304.05977
Song Y, Dhariwal P, Chen M, Sutskever I. Consistency models. 2023. 2303.01469
Bao F, Nie S, Xue K, Li C, Pu S, et al.. One transformer fits all distributions in multi-modal diffusion at scale. 2023. 2303.06555
Huo Y, Zhang M, Liu G, Lu H, Gao Y, et al.. WenLan: bridging vision and language by large-scale multi-modal pre-training. 2021. 2103.06561
Pratap V, Tjandra A, Shi B, Tomasello P, Babu A, et al.. Scaling speech technology to 1,000+ languages. 2023. 2305.13516
Radford A, Kim JW, Xu T, Brockman G, McLeavey C, et al.. Robust speech recognition via large-scale weak supervision. 2022. 2212.04356
Huang R, Li M, Yang D, Shi J, Chang X, et al.. AudioGPT: understanding and generating speech, music, sound, and talking head. 2023. 2304.12995.
Girdhar R, El-Nouby A, Liu Z, Singh M, Alwala VA, et al.. ImageBind: one embedding space to bind them all. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023. p. 15180–15190
Xu C, Sun Q, Zheng K, Geng X, Zhao P, et al.. WizardLM: empowering large language models to follow complex instructions. 2023. 2304.12244
Köpf A, Kilcher Y, von Rütte D, Anagnostidis S, Tam ZR, et al.. OpenAssistant conversations–democratizing large language model alignment. 2023. 2304.07327
Wu C, Zhang X, Zhang Y, Wang Y, Xie W. PMC-LLaMA: Further finetuning LLaMA on medical papers. 2023. 2304.14454
Wang Y, Kordi Y, Mishra S, Liu A, Smith NA, et al.. Self-Instruct: aligning language model with self generated instructions. 2022. 2212.10560
Hu EJ, Shen Y, Wallis P, Zhu ZA, Li Y, et al.. LoRA: low-rank adaptation of large language models. 2021. 2106.09685
Ma C, Wu Z, Wang J, Xu S, Wei Y, et al. ImpressionGPT: an iterative optimizing framework for radiology report summarization with chatGPT. 2023. 2304.08448
Qin Y, Cai Z, Jin D, Yan L, Liang S, et al.. WebCPM: interactive web search for chinese long-form question answering. 2023. 2305.06849
Shen Y, Song K, Tan X, Li D, Lu W, et al.. HuggingGPT: Solving AI tasks with ChatGPT and its friends in huggingface. 2023. 2303.17580
EY. Artificial Intelligence in Europe, Outlook for 2019 and Beyond. 2018
Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, et al.. Segment anything. 2023. 2304.02643
Luo R, Sun L, Xia Y, Qin T, Zhang S, et al.. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform. 2022. Vol. 23:bbac409. 10.1093/bib/bbac409
Moor M, Banerjee O, Abad ZSH, Krumholz HM, Leskovec J, et al.. Foundation models for generalist medical artificial intelligence. Nature. 2023. Vol. 616:259–65. 10.1038/s41586-023-05881-4

Graphical abstract

Key points

Detailed comparison of the advantages and disadvantages of open-source LLM applied in the medical field.
Listed three technical solutions for deploying large models in medical scenarios.
Analyzed the potential application scenarios of large models in the medical field.

Author and article information

Journal

Journal ID (publisher-id): radsci

Title: Radiology Science

Publisher: Compuscript (Ireland )

ISSN (Electronic): 2811-5635

Publication date (Electronic): 01 December 2023

Volume: 2

Issue: 1

Pages: 96-104

Affiliations

[a ]School of Computer Science, College of Engineering and Physical Sciences, University of Birmingham, Birmingham, B15 2TT, UK

[b ]China Academy of Information and Communications Technology, Beijing 100191, China

Author notes

*Correspondence: fengtianyi@ 123456caict.ac.cn (T. Feng)

Article

DOI: 10.15212/RADSCI-2023-0007

SO-VID: ae2acdd6-b56f-4ec6-87d6-a9ee17092454

License:

This work is licensed under the Attribution-NonCommercial-NoDerivatives 4.0 International.

History

Date received : 16 July 2023

Date revision received : 04 October 2023

Date accepted : 10 October 2023

Page count

Figures: 1, Tables: 1, References: 59, Pages: 9

Comments

[1] Gao X, Khan MHM, Hui R, Tian Z, Qian Y, et al.. COVID-VIT: Classification of Covid-19 from 3D CT chest images based on vision transformer model2022 3rd International Conference on Next Generation Computing Applications (NextComp); 2022. p. 1–4

[2] Costa GSS, Paiva AC, Junior GB, Ferreira MM. COVID-19 automatic diagnosis with CT images using the novel Transformer architecture. Anais do XXI simpósio brasileiro de computação aplicada à saúde. 2021. 293–301

[3] Liu Y, Han T, Ma S, Zhang J, Yang Y, et al.. Summary of ChatGPT/GPT-4 research and perspective towards the future of large language models. Meta Radiol. 2023. Vol. 1:100017. 10.1016/j.metrad.2023.100017

[4] Open AI. GPT-4 Technical Report. 2023. 2303.08774

[5] Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, et al.. On the opportunities and risks of foundation models. 2021. 2108.07258

[6] Zhou C, Li Q, Li C, Yu J, Liu Y, et al.. A comprehensive survey on pretrained foundation models: A history from BERT to ChatGPT. 2023. 2302.09419. 10.48550/arXiv.2302.09419

[7] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, et al.. Attention is all you need. Adv Neural Inf Process Syst. 2017. 30

[8] Zhao WX, Zhou K, Li JY, Tang T, Wang X, et al.. A survey of large language models. 2023. 2303.18223

[9] Brown T, Mann B, Ryder N, Subbiah M, Kaplan J, et al.. Language models are few-shot learners. Adv Neural Inf Process Sys. 2020. 1877–901

[10] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, et al.. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput Surv. 2023. Vol. 55:1–35. 10.1145/3560815

[11] Wei J, Bosma M, Zhao VY, Guu K, Yu AW, et al.. Finetuned language models are zero-shot learners. 2021. 2109.01652

[12] Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, et al.. Training language models to follow instructions with human feedback. Adv Neural Inf Process Sys. 2022. 27730–44

[13] Yang J, Jin H, Tang R, Han X, Feng Q, et al.. Harnessing the power of llms in practice: a survey on chatgpt and beyond. 2023. 2304.13712

[14] Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, et al.. ALBERT: a lite BERT for self-supervised learning of language representations. 2019. 1909.11942

[15] Sun Y, Wang S, Li Y, Feng S, Tian H, et al.. ERNIE 2.0: a continual pre-training framework for language understandingProceedings of the AAAI Conference on Artificial Intelligence; 2020. p. 8968–75

[16] Liu Y, Ott M, Goyal N, Du J, Joshi M, et al.. RoBERTa: a robustly optimized BERT pretraining approach. 2019. 1907.11692

[17] He P, Liu X, Gao J, Chen W. DeBERTa: decoding-enhanced BERT with disentangled attention. 2020. 2006.03654

[18] Chung HW, Hou L, Longpre S, Zoph B, Tay Y, et al.. Scaling instruction-finetuned language models. 2022. 2210.11416

[19] Du Z, Qian Y, Liu X, Ding M, Qiu J, et al.. GLM: General language model pretraining with autoregressive blank infillingProceedings of the 60^th Annual Meeting of the Association for Computational Linguistics; 2022. p. 320–35

[20] Zeng A, Liu X, Du Z, Wang Z, Lai H, et al.. GLM-130B: an open bilingual pre-trained model. 2022. 2210.02414

[21] Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training. 2018.

[22] Radford A, Wu J, Child R, Luan D, Amodei D, et al.. Language models are unsupervised multitask learners. OpenAI Blog. 2019. 9

[23] Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, et al.. XLNet: generalized autoregressive pretraining for language understanding. Adv Neural Inf Process Sys. 2019. 32

[24] Thoppilan R, De Freitas D, Hall J, Shazeer N, Kulshreshtha A, et al.. LaMDA: language models for dialog applications. 2022. 2201.08239

[25] Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, et al.. PaLM: Scaling language modeling with pathways. 2022. 2204.02311

[26] Touvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, et al.. LLaMA: open and efficient foundation language models. 2023. 2302.13971

[27] Taylor R, Kardas M, Cucurull G, Scialom T, Hartshorn A, et al.. Galactica: a large language model for science. 2022. 2211.09085

[28] Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. 2018. 1810.04805

[29] Sun Y, Wang S, Li Y, Feng S, Chen X, et al.. ERNIE: enhanced representation through knowledge integration. 2019. 1904.09223

[30] Raffel C, Shazeer N, Roberts A, Lee K, Narang S, et al.. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res. 2020. Vol. 21:5485–551

[31] Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, et al.. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. 2019. 1910.13461

[32] Taori R, Gulrajani I, Zhang T, Dubois Y, Li X, et al.. Alpaca: a strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. 2023. 7 https://crfm.stanford.edu/2023/03/13/alpaca.html

[33] Chiang WL, Li Z, Lin Z, Sheng Y, Wu Z, et al.. Vicuna: an open-source Chatbot impressing GPT-4 with 90%* ChatGPT quality. https://vicuna.lmsys.orgAccessed on 14 Apr 20232023.

[34] Zhang K, Yu J, Yan Z, Liu Y, Adhikarla E, et al.. BiomedGPT: a unified and generalist biomedical generative pre-trained transformer for vision, language, and multimodal tasks. 2023. 2305.17100

[35] Wang H, Liu C, Xi N, Qiang Z, Zhao S, et al.. HuaTuo: tuning LLaMA model with Chinese medical knowledge. 2023. 2304.06975

[36] Tansley AG. The use and abuse of vegetational concepts and terms. Ecology. 1935. Vol. 16:284–307

[37] Scao TL, Fan A, Akiki C, Pavlick E, Ilic S, et al.. BLOOM: a 176B-parameter open-access multilingual language model. 2022. 2211.05100

[38] Li R, Allal LB, Zi Y, Muennighoff N, Kocetkov D, et al.. StarCoder: may the source be with you. 2023. 2305.06161

[39] Yunxiang L, Zihan L, Kai Z, Dan R, Zhang Y, et al.. Chatdoctor: a medical chat model fine-tuned on LLaMA model using medical domain knowledge. 2023. 2303.14070

[40] Xu J, Liu X, Wu Y, Tong Y, Li Q, et al.. ImageReward: learning and evaluating human preferences for text-to-image generation. 2023. 2304.05977

[41] Song Y, Dhariwal P, Chen M, Sutskever I. Consistency models. 2023. 2303.01469

[42] Bao F, Nie S, Xue K, Li C, Pu S, et al.. One transformer fits all distributions in multi-modal diffusion at scale. 2023. 2303.06555

[43] Huo Y, Zhang M, Liu G, Lu H, Gao Y, et al.. WenLan: bridging vision and language by large-scale multi-modal pre-training. 2021. 2103.06561

[44] Pratap V, Tjandra A, Shi B, Tomasello P, Babu A, et al.. Scaling speech technology to 1,000+ languages. 2023. 2305.13516

[45] Radford A, Kim JW, Xu T, Brockman G, McLeavey C, et al.. Robust speech recognition via large-scale weak supervision. 2022. 2212.04356

[46] Huang R, Li M, Yang D, Shi J, Chang X, et al.. AudioGPT: understanding and generating speech, music, sound, and talking head. 2023. 2304.12995.

[47] Girdhar R, El-Nouby A, Liu Z, Singh M, Alwala VA, et al.. ImageBind: one embedding space to bind them all. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023. p. 15180–15190

[48] Xu C, Sun Q, Zheng K, Geng X, Zhao P, et al.. WizardLM: empowering large language models to follow complex instructions. 2023. 2304.12244

[49] Köpf A, Kilcher Y, von Rütte D, Anagnostidis S, Tam ZR, et al.. OpenAssistant conversations–democratizing large language model alignment. 2023. 2304.07327

[50] Wu C, Zhang X, Zhang Y, Wang Y, Xie W. PMC-LLaMA: Further finetuning LLaMA on medical papers. 2023. 2304.14454

[51] Wang Y, Kordi Y, Mishra S, Liu A, Smith NA, et al.. Self-Instruct: aligning language model with self generated instructions. 2022. 2212.10560

[52] Hu EJ, Shen Y, Wallis P, Zhu ZA, Li Y, et al.. LoRA: low-rank adaptation of large language models. 2021. 2106.09685

[53] Ma C, Wu Z, Wang J, Xu S, Wei Y, et al. ImpressionGPT: an iterative optimizing framework for radiology report summarization with chatGPT. 2023. 2304.08448

[54] Qin Y, Cai Z, Jin D, Yan L, Liang S, et al.. WebCPM: interactive web search for chinese long-form question answering. 2023. 2305.06849

[55] Shen Y, Song K, Tan X, Li D, Lu W, et al.. HuggingGPT: Solving AI tasks with ChatGPT and its friends in huggingface. 2023. 2303.17580

[56] EY. Artificial Intelligence in Europe, Outlook for 2019 and Beyond. 2018

[57] Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, et al.. Segment anything. 2023. 2304.02643

[58] Luo R, Sun L, Xia Y, Qin T, Zhang S, et al.. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform. 2022. Vol. 23:bbac409. 10.1093/bib/bbac409

[59] Moor M, Banerjee O, Abad ZSH, Krumholz HM, Leskovec J, et al.. Foundation models for generalist medical artificial intelligence. Nature. 2023. Vol. 616:259–65. 10.1038/s41586-023-05881-4

Radiology Science