Nutan B., our Vice President for Pharma Consulting, is excited about implementing Gen AI in Pharma operations. With over 15 years of experience in handling pharma data, Nutan has led multiple projects deploying advanced analytics, and Gen AI solutions.
His recent interactions with leaders from the pharma industry raised a few questions about NLP’s impact on clinical reporting, innovation in drug development, overcoming pitfalls in pharma innovation, and more.
Table of Contents
How is Gen AI Utilized in the Pharmaceutical Industry, and What Capabilities Does it Offer?
Nutan: Having had firsthand experience with these applications, it is evident that Generative AI has been revolutionary within the pharmaceutical and life sciences industry, much like it has in various other domains and sectors. In an industry marked by stringent regulations, the adoption of Generative AI is still in its nascent stages, yet it holds immense potential to bring about transformation in pharmaceutical R&D, clinical trials, patient engagement, and numerous other crucial areas.
- Innovation in Product and Service Development: One of the key areas where Generative AI has showcased its potential is in the creation of novel products and services.
It can be effectively harnessed for the development of new drugs, the generation of innovative medical device technologies, and the exploration of diagnostic possibilities that were previously unknown, thereby creating new opportunities for intervention and cure. - Operational Efficiency and Administrative Streamlining: This technology exhibits wide-ranging applicability, with the capacity to automate processes, conduct analyses, and facilitate document creation, essentially serving as a valuable co-pilot in various operational aspects.
- Personalized Experiences and Enhanced Customer Satisfaction: While this is an area that has gained traction across industries for optimizing commercial content creation, it holds particular value within the life sciences sector. Here, the focus is on personalizing approaches to patient treatments and tailoring marketing campaigns to provide personalized medical care, ultimately enhancing the overall patient experience.
How are Researchers Addressing Ethical Concerns Related to the Use of GenAI?
Nutan: In highly regulated industries, adopting analytics as a capability has always been challenging. The reason is that analytics relies on probabilities, which is quite different from the deterministic approaches that these industries are accustomed to. With the introduction of generative AI, the situation becomes even more complex.
However, it’s crucial to start thinking about the implications of this technology early, as in some cases, generative AI offers solutions.
There are key dimensions to consider in this context:
- Data Privacy and Security: This is a hot topic in the research community, and solutions are emerging from a technological and infrastructural standpoint. For instance, Microsoft is enabling the creation of enterprise-level tenants for GPT systems, which helps organizations maintain better control over their data.
Generative AI plays a role in addressing this challenge, too, by enabling the creation of synthetic data and anonymizing data for downstream use to enhance data privacy and security. - Bias and Fairness: While this discussion is still in its early stages, it’s extremely important, especially in the life sciences industry. Unaddressed biases can lead to significant disparities in disease diagnosis or drug recommendations.
Organizations like the Partnership on Ethical AI are actively involved in finding ways to mitigate bias’s impact. However, it’s up to individual organizations and teams working with generative AI to incorporate strategies for ensuring fairness into their solutions. - Intellectual Property and Copyright, Informed Consent: Legal precedents are still being established in these areas.
In clinical/healthcare settings, for example, when and how patients are informed about the role of technology in their treatment, clinical trials, or diagnostic recommendations is critical to establish.
It’s very important for organizations to develop comprehensive ethical guidelines and engage with the scientific and technology community to address these implications.
How can Inherent Biases Negatively Affect Clinical Trials or Drug Development?
Nutan: Inherent biases pose a big threat to the integrity of clinical trials and drug development, potentially resulting in severe consequences if left unaddressed.
The root of this issue lies in skewed data, which has been well-documented in various studies, particularly with regard to the underrepresentation of specific demographics and gender in datasets related to conditions such as cancer and heart disease.
This bias is passed on to Generative AI and its applications if not carefully managed.
Multiple steps in a clinical trial impact the biases, starting with patient recruitment, protocol design assessment, and data analysis. The lack of diversity in trial participants can introduce biases in our comprehension of how drugs affect different ethnic groups, genders, and demographics, ultimately undermining the applicability of the trial’s results.
Publication and reporting biases further complicate the problem. It’s widely acknowledged that research incentives tend to favor the publication of positive outcomes over negative ones, painting an incomplete picture of the overall efficacy and safety of drugs.
To mitigate these biases, a combination of expert-designed solutions, sampling techniques, the use of synthetic data, and data augmentations can be applied.
Certain biases are deeply ingrained in the data itself, necessitating a more balanced approach from a process standpoint.
Implementing measures such as data safety monitoring boards and even straightforward practices like mandating the publication of all clinical trial results can contribute significantly to creating more equitable and unbiased outcomes that prioritize the well-being of patients.
Can GenAI Detect Biases Related to Race, Gender, etc., and if not, What Manual Methods are Used to Manage this Issue? What do these Methods Entail?
Nutan: This is a very important question. Generative models, like any other tools, do not inherently have the capability to detect biases on their own. Ultimately, boils down to researchers, scientists, and developers involved in creating processes and solutions to mitigate these biases.
As we discussed earlier, generative AI primarily learns from the data it’s trained on. If the data itself contains biases, there’s a high likelihood that the AI will replicate these biases in its outputs.
To tackle these, various strategies need to be used:
- Testing and Evaluation Techniques to identify biases: there are multiple datasets designed to evaluate for fairness across attributes like race, gender, and age. By leveraging these datasets, we can assess AI solutions for biases and pinpoint areas for improvement.
- Solution Design: This is a crucial stage where we can apply techniques such as under-sampling, data augmentation, and testing on diverse subgroups to quantify biases, etc. Solutions can also embrace a “human in the loop” approach, which incorporates a review and audit process to ensure the outcomes align with defined guidelines. It’s mandatory also to continuously monitor content generated by generative AI. Establishing feedback loops allows us to make ongoing improvements and address biases.
Finally, defining endpoints and processes is extremely crucial, too. Multi-disciplinary teams help recognize potential blind spots early in the process. There could also be review boards that can establish guidelines and governance mechanisms for assessing the outputs.
All of these processes could be time-consuming or could dampen the excitement of implementing generative applications and add some overheads, but they are extremely crucial for generative AI to be sustainably used in the future.
How can AI Algorithms be Better Trained to Avoid these Pitfalls?
Nutan: GPT 4 Apparently is trained on a trillion parameters.
And fundamentally, retraining these algorithms is becoming an almost impossible activity for individual healthcare and pharmaceutical organizations. It is also why most of these models are called foundational models.
So if these models are learning from biased data, correcting them requires a very meticulous approach and a very conscious set of processes defined to mitigate them.
There are fine-tuning techniques leveraging representative training data prioritizing diversity in factors like race, gender, age, and geography. These can correct the biases to a certain extent.
Solution approaches like those we discussed in the earlier answers, like data augmentation, adversarial training, and designing human-in-the-loop approaches to get feedback from experts, can also support addressing this challenge.
Explain ability and interpretability play very crucial parts as well in both the adoption of the technology and in addressing the bias problem.
Overarching frameworks from regulatory bodies like the FDA or WHO and from organizations themselves can serve as a road map for developing these algorithms, which prioritize ethical principles, fairness, and transparency.
How does NLP Technology Differentiate Between Patient-Specific Data, Company Confidential Information, and Generic Trial Data to Ensure Full Anonymity and Protect Proprietary Details?
Nutan: NLP technology has made remarkable progress in recent years and, when strategically applied, can effectively differentiate and protect patient-specific or company data. This is crucial not only for regulatory compliance but also for maintaining trust in the life sciences industry.
Let’s break down the process into two parts – what and how to anonymize.
Identification of What to Anonymize:
- Entity Recognition: NLP techniques like entity recognition categorize text into specific entities such as names, locations, and more. This aids in identifying patient-specific Personally Identifiable Information (PII) or organizational names.
- Relationship Extraction: Another valuable technique is relationship extraction, which identifies relationships between identified entities. This is particularly useful when dealing with multiple patients’ data and the need to attribute PII to specific individuals. It’s also helpful when patient data is referenced across different contexts in a document.
- Contextual Classification: Contextual training and classification techniques help understand the context of the text and classify it as sensitive or generic. They can even identify specific contexts in which sensitive information is mentioned. For instance, distinguishing between generic demographic information and sensitive medical details in a clinical trial summary.
- Entity Linking: Entity linking techniques connect identified entities to real-world entities like people, places, or organizations. This ensures accurate identification and anonymization.
Approaches for How to Anonymize:
- Differential Privacy: Widely used in the tech industry and pioneered by the US Census, differential privacy introduces statistical noise to obscure details while still allowing data to be used for downstream analysis and insights.
- Anonymization Algorithms: Popular algorithms like K-anonymity, L-diversity, and T-closeness can be applied to anonymize data effectively while preserving its utility.
- Customized Risk Algorithms: Organizations can develop customized risk algorithms to quantify the risk of re-identification. This is especially important for compliance with regulations like those from the EMA and Health Canada. Statistical techniques can estimate the risk, and then NLP techniques can iteratively optimize and anonymize documents to meet an organization’s or individual’s risk tolerance.
Leveraging these techniques while customizing them to a company’s internal documents and fine-tuning them on proprietary corpora can provide a resilient and reliable approach to ensuring full anonymity and protection.
It’s important to note that this pertains to the analytics context alone.
To ensure comprehensive data protection, other elements such as access controls, encryption, and best practices should also be implemented in conjunction with NLP-based anonymization techniques.
These measures collectively safeguard sensitive information and maintain the trust essential in the life sciences industry.
Given the Global Nature of Clinical Trials, How does NLP Technology Adapt to Comply with Diverse International Policies and Data Protection Laws?
Nutan: In addressing the question of how NLP technology facilitates compliance with international data protection regulations, we can divide the discussion into two parts: the “what” and the “how.”
Identification of What to Anonymize:
NLP technology is highly adaptable to multilingual setups, allowing it to process documents and text in various languages natively. Prominent models such as Azure, Google, and BERT-based models are proficient in handling over 100 languages. GPT-style large language models exhibit versatility across multiple languages.
Furthermore, NLP models can process geographical information, including organizational details and geographical identifiers such as city names and street addresses. This adaptability enables the identification of sensitive information across diverse linguistic and geographic contexts.
While NLP technology can identify what needs to be anonymized, ensuring fairness and unbiased outcomes requires meticulous model training, data preparation, and rigorous testing procedures.
Approaches for How to Anonymize:
The “how” of anonymization processes is heavily influenced by country-specific regulations.
As discussed earlier, techniques like differential privacy, anonymization algorithms (e.g., K-anonymity, L-diversity), and customized risk algorithms remain relevant. However, they must be customized and configured to align with specific country regulations, such as those established by the EMA, HIPAA, and others.
In a multi-country environment, data processing, storage, and best practices must also adapt to comply with various regulatory frameworks. For instance, the European Union’s General Data Protection Regulation (GDPR) imposes stringent rules on data privacy, dictating where data can be processed and stored.
NLP can also play indirect roles in supporting compliance with international data protection:
- Compliance Reporting: NLP can generate reports on compliance, highlighting areas where improvement is needed. It serves as a valuable “copilot” in the ongoing effort to maintain regulatory adherence.
- Training and Education: NLP can provide training to personnel on international data protection regulations. This technology aids in disseminating knowledge and ensuring that teams are well-versed in compliance requirements.
- Regulatory Monitoring: NLP systems can be programmed to monitor and adapt to changes in international regulations. By staying up to date with amendments, organizations can proactively adjust their data protection practices to remain in compliance.
In summary, NLP technology’s adaptability to multilingual and geographic contexts makes it a powerful tool for identifying sensitive information. However, the “how” of anonymization must be meticulously customized to align with country-specific regulations.
How has NLP Revolutionized the Generation of Clinical Study Reports and the Portrayal of Patient Narratives, Ensuring Accuracy and Comprehensiveness without Compromising Sensitive Information?
Nutan: Natural Language Processing (NLP) enabled a transformative shift in clinical study report generation, particularly with the advent of large language models like GPT.
This transformation, which began earlier this year, has significantly enhanced the accuracy and comprehensiveness of clinical reports while maintaining sensitivity and confidentiality.
One of the primary advancements enabled by NLP models like GPT is their ability to extract and process data from various structured and unstructured sources. This includes medical reports, patient profiles, trial records, and adverse event reports. The extracted information can then be used to construct coherent natural language reports, which can be further personalized to highlight the treatment’s impact on specific patient groups.
A notable feature of this process is its adaptability. Updates to these reports, based on new information, can be largely automated. New reports can be generated, and information can be anonymized using techniques discussed earlier.
These reports, generated through NLP, often serve as mature drafts. They can then be reviewed by domain experts for corrections and validation before finalization. This hybrid “human-in-the-loop” approach brings efficiency, time savings, and comprehensiveness to the report generation process while simultaneously reducing manual effort and minimizing the risk of human error.
Furthermore, NLP technologies can be continuously refined through learning and improvement. They can assimilate user feedback to adapt over time, resulting in more accurate and informative clinical documentation.