- Adoption of Large Language Model AI Tools in Everyday Tasks: Multisite Cross-Sectional Qualitative Study of Chinese Hospital Administrators
Background: Large language model (LLM) artificial intelligence (AI) tools have the potential to streamline health care administration by enhancing efficiency in document drafting, resource allocation, and communication tasks. Despite this potential, the adoption of such tools among hospital administrators remains understudied, particularly at the individual level. Objective: This study aims to explore factors influencing the adoption and use of LLM AI tools among hospital administrators in China, focusing on enablers, barriers, and practical applications in daily administrative tasks. Methods: A multicenter, cross-sectional, descriptive qualitative design was used. Data were collected through semistructured face-to-face interviews with 31 hospital administrators across 3 tertiary hospitals in Beijing, Shenzhen, and Chengdu from June 2024 to August 2024. The Colaizzi method was used for thematic analysis to identify patterns in participants’ experiences and perspectives. Results: Adoption of LLM AI tools was generally low, with significant site-specific variations. Participants with higher technological familiarity and positive early experiences reported more frequent use, while barriers such as mistrust in tool accuracy, limited prompting skills, and insufficient training hindered broader adoption. Tools were primarily used for document drafting, with limited exploration of advanced functionalities. Participants strongly emphasized the need for structured training programs and institutional support to enhance usability and confidence. Conclusions: Familiarity with technology, positive early experiences, and openness to innovation may facilitate adoption, while barriers such as limited knowledge, mistrust in tool accuracy, and insufficient prompting skills can hinder broader use. LLM AI tools are now primarily used for basic tasks such as document drafting, with limited application to more advanced functionalities due to a lack of training and confidence. Structured tutorials and institutional support are needed to enhance usability and integration. Targeted training programs, combined with organizational strategies to build trust and improve accessibility, could enhance adoption rates and broaden tool use. Future quantitative investigations should validate the adoption rate and influencing factors.
- Analysis of Metabolic and Quality-of-Life Factors in Patients With Cancer for a New Approach to Classifying Walking Habits: Secondary Analysis of a Randomized Controlled Trial
Background: As the number of people diagnosed with cancer continues to increase, self-management has become crucial for patients recovering from cancer surgery or undergoing chemotherapy. Technology has emerged as a key tool in supporting self-management, particularly through interventions that promote physical activity, which is important for improving health outcomes and quality of life for patients with cancer. Despite the growing availability of digital tools that facilitate physical activity tracking, high-level evidence of their long-term effectiveness remains limited. Objective: This study aimed to investigate the effect of long-term physical activity on patients with cancer by categorizing them into active and inactive groups based on step count time-series data using the mobile health intervention, the Walkon app (Swallaby Co, Ltd.). Methods: Patients with cancer who had previously used the Walkon app in a previous randomized controlled trial were chosen for this study. Walking step count data were acquired from the app users. Biometric measurements, including BMI, waist circumference, blood sugar levels, and body composition, along with quality of life (QOL) questionnaire responses (European Quality of Life 5 Dimensions 5 Level version and Health-related Quality of Life Instrument with 8 Items), were collected during both the baseline and 6-month follow-up at an outpatient clinic. To analyze step count patterns over time, the concept of sample entropy was used for patient clustering, distinguishing between the active walking group (AWG) and the inactive walking group (IWG). Statistical analysis was performed using the Shapiro-Wilk test for normality, with paired t tests for parametric data, Wilcoxon signed-rank tests for nonparametric data, and chi-square tests for categorical variables. Results: The proposed method effectively categorized the AWG (n=137) and IWG (n=75) based on step count trends, revealing significant differences in daily (4223 vs 5355), weekly (13,887 vs 40,247), and monthly (60,178 vs 174,405) step counts. Higher physical activity levels were observed in patients with breast cancer and younger individuals. In terms of biometric measurements, only waist circumference (P=.01) and visceral fat (P=.002) demonstrated a significant improvement exclusively within the AWG. Regarding QOL measurements, aspects such as energy (P=.01), work (P<.003), depression (P=.02), memory (P=.01), and happiness (P=.05) displayed significant improvements solely in the AWG. Conclusions: This study introduces a novel methodology for categorizing patients with cancer based on physical activity using step count data. Although significant improvements were noted in the AWG, particularly in QOL and specific physical metrics, differences in 6-month change between the AWG and IWG were statistically insignificant. These findings highlight the potential of digital interventions in improving outcomes for patients with cancer, contributing valuable insights into cancer care and self-management. Trial Registration: Clinical Research Information Service by Korea Centers for Diseases Control and Prevention, Republic of Korea KCT0005447; https://tinyurl.com/3zc7zvzz
- Exploring Technical Features to Enhance Control in Videoconferencing Psychotherapy: Quantitative Study on Clinicians’ Perspectives
Background: The COVID-19 pandemic required psychologists and other mental health professionals to use videoconferencing platforms. Previous research has highlighted therapists’ hesitation toward adopting the medium since they find it hard to establish control over videoconferencing psychotherapy (VCP). An earlier study provided a set of potential features that may help enhance psychologists’ control in their videoconference sessions, such as screen control functionality, emergency call functionality, eye contact functionality, zooming in and out functionality, and an interactive interface with other apps and software. Objective: This study aims to investigate whether introducing technical features might improve clinicians’ control over their video sessions. Additionally, it seeks to understand the role of the video in therapists’ VCP experience from a technical and relationship point of view. Methods: A total of 121 mental health professionals responded to the survey, but only 86 participants provided complete data. Exploratory Factor Analysis was used to scrutinize the data collected. A total of three factors were identified: (1) “challenges in providing VCP,” (2) “features to enhance the therapeutic relationship,” and (3) “enhancing control.” Path analysis was used to observe the relationship between factors on their own and with adjustment to participants’ areas of expertise and year in practice. Results: This study highlighted a relationship between the three identified factors. It was found that introducing certain features reduced therapists' challenges in the provision of VCP. Moreover, the additional features provided therapists with enhanced control over their VCP sessions. A path analysis was conducted to investigate the relationships between the factors loaded. The results of the analysis revealed a significant relationship between “challenges in VCP” and “features to enhance the therapeutic relationship” (adjusted beta [Adjβ]=–0.54, 95% CI 0.29-0.79; P<.001) and “features to enhance TR” and “enhancing control” (Adjβ=0.25, 95% CI 0.15-0.35; P<.001). Additionally, a significant positive relationship was found between “features to enhance the therapeutic relationship” and “enhancing control” (Adjβ=0.25, 95% CI 0.15-0.35; P<.001). Furthermore, there was an indirect effect of “challenges in providing VCP” on “enhancing control” (Adjβ=0.13, 95% CI 0.05-0.22; P=.001) mediated by “features to enhance TR.” The analysis identified the factor “features to enhance TR” (effect size=0.25) as key for improving clinicians’ performance and control. Conclusions: This study demonstrates that technology may help improve therapists’ VCP experiences by implementing features that respond to their need for enhanced control. By augmenting therapists’ control, clinicians can effectively serve their patients and facilitate successful therapy outcomes. Moreover, this study confirms the video as a third agent that prevents therapists from affecting clients’ reality due to technical and relational limits. Additionally, this study supports the general system theory, which allowed for the incorporation of video in our exploration and helped explain its agency in VCP.
- Validation of Ecological Momentary Assessment With Reference to Accelerometer Data: Repeated-Measures Panel Study With Multilevel Modeling
Background: There is growing interest in the real-time assessment of physical activity (PA) and physiological variables. Acceleration, particularly those collected through wearable sensors, has been increasingly adopted as an objective measure of physical activity. However, sensor-based measures often pose challenges for large-scale studies due to their associated costs, inability to capture contextual information, and restricted user populations. Smartphone-delivered ecological momentary assessment (EMA) offers an unobtrusive and undemanding means to measure PA to address these limitations. Objective: This study aimed to evaluate the usability of EMA by comparing its measurement outcomes with 2 self-report assessments of PA: Global Physical Activity Questionnaire (GPAQ) and a modified version of Bouchard Physical Activity Record (BAR). Methods: A total of 235 participants (137 female, 98 male, and 94 repeated) participated in one or more 7-day studies. Waist-worn sensors provided by ActiGraph captured accelerometer data while participants completed 3 self-report measures of PA. The multilevel modeling method was used with EMA, GPAQ, and BAR as separate measures, with 6 subdomains of physiological activity (overall PA, overall excluding occupational, transport, exercise, occupational, and sedentary) to model accelerometer data. In addition, EMA and GPAQ were further compared with 6 domains of PA from the BAR as outcome measures. Results: Among the 3 self-reporting instruments, EMA and BAR exhibited better overall performance in modeling the accelerometer data compared to GPAQ (eg EMA daily: β=.387, P<.001; BAR daily: β=.394, P<.001; GPAQ: β=.281, P<.001, based on repeated-only participants with step counts from accelerometer as dependent variables). Conclusions: Multilevel modeling on 3 self-report assessments of PA indicates that smartphone-delivered EMA is a valid and efficient method for assessing PA.
- Artificial Intelligence Performance in Image-Based Cancer Identification: Umbrella Review of Systematic Reviews
Background: Artificial intelligence (AI) has the potential to transform cancer diagnosis, ultimately leading to better patient outcomes. Objective: We performed an umbrella review to summarize and critically evaluate the evidence for the AI-based imaging diagnosis of cancers. Methods: PubMed, Embase, Web of Science, Cochrane, and IEEE databases were searched for relevant systematic reviews from inception to June 19, 2024. Two independent investigators abstracted data and assessed the quality of evidence, using the Joanna Briggs Institute (JBI) Critical Appraisal Checklist for Systematic Reviews and Research Syntheses. We further assessed the quality of evidence in each meta-analysis by applying the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) criteria. Diagnostic performance data were synthesized narratively. Results: In a comprehensive analysis of 158 included studies evaluating the performance of AI algorithms in noninvasive imaging diagnosis across 8 major human system cancers, the accuracy of the classifiers for central nervous system cancers varied widely (ranging from 48% to 100%). Similarities were observed in the diagnostic performance for cancers of the head and neck, respiratory system, digestive system, urinary system, female-related systems, skin, and other sites. Most meta-analyses demonstrated positive summary performance. For instance, 9 reviews meta-analyzed sensitivity and specificity for esophageal cancer, showing ranges of 90%-95% and 80%-93.8%, respectively. In the case of breast cancer detection, 8 reviews calculated the pooled sensitivity and specificity within the ranges of 75.4%-92% and 83%-90.6%, respectively. Four meta-analyses reported the ranges of sensitivity and specificity in ovarian cancer, and both were 75%-94%. Notably, in lung cancer, the pooled specificity was relatively low, primarily distributed between 65% and 80%. Furthermore, 80.4% (127/158) of the included studies were of high quality according to the JBI Critical Appraisal Checklist, with the remaining studies classified as medium quality. The GRADE assessment indicated that the overall quality of the evidence was moderate to low. Conclusions: Although AI shows great potential for achieving accelerated, accurate, and more objective diagnoses of multiple cancers, there are still hurdles to overcome before its implementation in clinical settings. The present findings highlight that a concerted effort from the research community, clinicians, and policymakers is required to overcome existing hurdles and translate this potential into improved patient outcomes and health care delivery. Trial Registration: PROSPERO CRD42022364278; https://www.crd.york.ac.uk/PROSPERO/view/CRD42022364278
- Design and Baseline Evaluation of Social Media Vaping Prevention Trial: Randomized Controlled Trial Study
Background: Electronic cigarette (e-cigarette) use is a major public health problem and young adults aged 18-24 years are at high risk. Furthermore, oral nicotine products (ONPs) are growing in popularity in this population. Poly-use is widespread. New methodologies for rigorous online studies using social media have been conducted and shown to reduce nicotine use. Objective: We report on the design and baseline evaluation of a large-scale social media–based randomized controlled trial to evaluate the effects of antivaping social media on young adult vaping and determinants of use. Methods: Using the Virtual Lab social media platform, participants were recruited using an artificial intelligence chatbot and social media advertising, completed a baseline survey, and were randomized to 1 of 4 study arms. The design was to achieve specific numbers of impressions per arm over 3 survey time points. We recruited 8437 participants, stratified by vaper (n=5026) and nonvaper (n=3321) status. Questionnaire data were collected using the Qualtrics survey platform. Future analyses will examine the effects of social media content on vaping at the endline. Our data analysis describes the 2 cohort samples, examines balance across the 4 study arms on baseline variables in each of the cohorts, and evaluates the internal consistency of several multi-indicator measures of psychosocial constructs. Results: Among vapers, almost three-fourths were current vapers, >40% were current smokers (using in the past 30 days), and >48% were current poly-users (using e-cigarettes and ≥1 other tobacco products). Substantial numbers of current vapers also currently use some other product, including cigars (n=1520, 30.2%), hookah (n=794, 15.8%), smokeless devices (n=462, 9.2%), and ONPs (n=578, 11.5%). The average age of participants was 21.2 (SD 2) years. Just less than 45% of participants were non-Hispanic White (n=3728, 44.7%), just less than 47% (n=3913, 46.9%) of the sample was male, more than 44% (n=3704, 44.4%) reported completing high school, and 79.3% reported meeting basic needs or better. There were no significant differences between arms and strata by any of these demographics. We calculated scale scores for depression and covariates related to nicotine use and found high alphas. Finally, participants who reported having seen antitobacco brand advertising were more likely to have higher levels of these variables and scales than participants who reported not having seen the advertisements. These results will be examined in future studies. Conclusions: Social media can be used as a platform at scale for longitudinal randomized controlled trials over extended periods, which extends previous research on short-term trials. Interventions delivered by social media can be used with large samples to evaluate social media health behavior change interventions. Future studies based on this research will evaluate the intervention and dose-response effects of social media exposure on vaping behavior and determinants. Trial Registration: ClinicalTrials.gov NCT04867668; https://clinicaltrials.gov/study/NCT04867668
- Impact of Digital Engagement on Weight Loss Outcomes in Obesity Management Among Individuals Using GLP-1 and Dual GLP-1/GIP Receptor Agonist Therapy: Retrospective Cohort Service Evaluation Study
Background: Obesity is a global public health challenge. Pharmacological interventions, such as glucagon-like peptide-1 (GLP-1) receptor agonists (eg, semaglutide) and dual GLP-1/gastric inhibitory polypeptide receptor agonists (eg, tirzepatide), have led to significant weight loss among users. Digital health platforms offering behavioral support may enhance the effectiveness of these medications. Objective: This retrospective service evaluation investigated the impact of engagement with an app-based digital weight loss program on weight loss outcomes among individuals using GLP-1 receptor agonists (semaglutide) and dual GLP-1/gastric inhibitory polypeptide receptor agonists (tirzepatide) in the United Kingdom over 5 months. Methods: Data were collected from the Voy weight loss digital health platform between February 2023 and August 2024. Participants were adults aged 18-75 years with a BMI ≥30 or ≥27.5 kg/m2 with the presence of obesity-related comorbidities who initiated a weight management program involving semaglutide or tirzepatide. Engagement was defined based on attendance at coaching sessions, frequency of app use, and regular weight tracking. Participants were categorized as “engaged” or “nonengaged” accordingly. Weight loss outcomes were assessed over a period of up to 5 months. Statistical analyses included chi-square tests, independent t tests, Kaplan-Meier survival analysis, and calculations of Cohen d for effect sizes. Results: A total of 57,975 participants were included in the analysis, with 31,407 (54.2%) classified as engaged and 26,568 (45.8%) as nonengaged. Engaged participants achieved significantly greater weight loss at each time point. At month 3, engaged participants had a mean weight loss of 9% (95% CI 9% to 9.1%) compared with 5.9% (95% CI 5.9% to 6%) in nonengaged participants (P<.001), representing a mean difference of 3.1 percentage points (95% CI 3.1% to 3.1%). A Cohen d effect size of 0.89 indicated a large effect. At month 5, engaged participants had a mean weight loss of 11.53% (95% CI 11.5% to 11.6%) compared with 8% (95% CI 7.9% to 8%) in the nonengaged participants (P<.001). A Cohen d effect size of 0.56 indicated a moderate effect. Participants using tirzepatide achieved more significant weight loss than those using semaglutide at month 5 (13.9%, 95% CI 13.5% to 14.3% vs 9.5%, 95% CI 9.2% to 9.7%; P<.001). The proportion of engaged participants achieving ≥5%, ≥10%, and ≥15% weight loss was significantly higher than the nonengaged group at corresponding time points from months 3 to 5 respectively (P<.001). Conclusions: Engagement with a digital weight management platform significantly enhances weight loss outcomes among individuals using GLP-1 receptor agonists. The combination of pharmacotherapy and digital behavioral support offers a promising strategy to promote the supported self-care journey of individuals seeking clinically effective obesity management interventions.
- Exploring the Capacity of Large Language Models to Assess the Chronic Pain Experience: Algorithm Development and Validation
Background: Chronic pain, affecting more than 20% of the global population, has an enormous pernicious impact on individuals as well as economic ramifications at both the health and social levels. Accordingly, tools that enhance pain assessment can considerably impact people suffering from pain and society at large. In this context, assessment methods based on individuals’ personal experiences, such as written narratives (WNs), offer relevant insights into understanding pain from a personal perspective. This approach can uncover subjective, intricate, and multifaceted aspects that standardized questionnaires can overlook. However, WNs can be time-consuming for clinicians. Therefore, a tool that uses WNs while reducing the time required for their evaluation could have a significantly beneficial impact on people's pain assessment. Objective: This study is the first evaluation of the potential of applying large language models (LLMs) to assist clinicians in assessing patients’ pain expressed through WNs. Methods: We performed an experiment based on 43 WNs made by people with fibromyalgia and qualitatively evaluated in a prior study. Focusing on pain severity and disability, we prompt GPT-4 (with temperature parameter settings 0 or 1) to assign scores and scores’ explanations, to these WNs. Then, we quantitatively compare GPT-4 scores with experts’ scores of the same narratives, using statistical measures such as Pearson correlations, root mean squared error, the weighted version of the Gwet agreement coefficient, and Krippendorff α. Additionally, 2 experts specialized in chronic pain conducted a qualitative analysis of the scores’ explanation to assess their accuracy and potential applicability of GPT’s analysis for future pain narrative evaluations. Results: Our analysis reveals that GPT-4’s performance in assessing pain narratives yielded promising results. GPT-4 was comparable in terms of agreement with experts (with a weighted percentage agreement higher than 0.95), correlations with standardized measurements (for example in the range of 0.43 and 0.49 between the Revised Fibromyalgia Impact Questionnaire and GTP-4 with temperatures 1), and low error rates (root mean squared error of 1.20 for severity and 1.44 for disability). Moreover, experts generally deemed the ratings provided by GPT-4, as well as the scores’ explanation, to be adequate. However, we observe that GPT has a slight tendency to overestimate pain severity and disability with a lower SD than expert estimates. Conclusions: These findings underline the potential of LLMs in facilitating the assessment of WNs of people with fibromyalgia, offering a novel approach to understanding and evaluating patient pain experiences. Integrating automated assessments through LLMs presents opportunities for streamlining and enhancing the assessment process, paving the way for improved patient care and tailored interventions in the chronic pain management field.