You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I’m working with Presidio in a context where users may input medical and dietary information into a chatbot.
Currently, Presidio does not have built-in support for detecting medical entities such as diseases, medications, and clinical procedures. This limitation required me to implement a custom recognizer to address the need for medical PII detection.
Below is an example of the custom recognizer I’ve built:
frompresidio_analyzerimportEntityRecognizer, RecognizerResultfromtransformersimportAutoTokenizer, AutoModelForTokenClassification, pipelineclassClinicalBERTRecognizer(EntityRecognizer):
def__init__(self):
# Download the model from Hugging Face's model hubmodel_name="blaze999/Medical-NER"# Load the model and tokenizerself.tokenizer=AutoTokenizer.from_pretrained(model_name)
self.model=AutoModelForTokenClassification.from_pretrained(model_name)
# Create a pipeline for named entity recognitionself.ner_pipeline=pipeline("ner", model=self.model, tokenizer=self.tokenizer)
# Define the supported entitiesself.supported_entities= [
"BIOLOGICAL_ATTRIBUTE",
"BIOLOGICAL_STRUCTURE",
"CLINICAL_EVENT",
"DISEASE_DISORDER",
"FAMILY_HISTORY",
"HISTORY",
"MEDICATION",
"THERAPEUTIC_PROCEDURE"
]
super().__init__(supported_entities=self.supported_entities)
defanalyze(self, text, entities, nlp_artifacts=None):
results= []
# Perform named entity recognition on the input textner_results=self.ner_pipeline(text)
forentityinner_results:
entity_type=entity["entity"].replace("B-", "").replace("I-", "")
# Check if the entity type is in the list of supported entitiesifentity_typeinself.supported_entities:
recognizer_result=RecognizerResult(
entity_type=entity_type,
start=entity["start"],
end=entity["end"],
score=entity["score"]
)
# Create a RecognizerResult object for the entityresults.append(recognizer_result)
returnresults
Describe the solution you'd like
Would there be interest in incorporating a medical domain recognizer into Presidio? If so, I’d be happy to submit a PR with this implementation or a more generalized version.
The recognizer leverages transformer-based models, from Hugging Face, in order to identify clinical entities like diseases, medications, and procedures. This would allow Presidio to support medical and healthcare-related use cases out of the box.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered:
RKapadia01
changed the title
Presidio Medical Recognizer
Feature Request - Presidio Medical Recognizer
Nov 28, 2024
Hi, I think that would be a great addition!
It offers a great addition for those interested in PHI and not just PII. Having said that, we should also think about the computational performance and development complexity. Therefore, our suggestion would be to add this to the repo, but not run by default. In addition, we don't install transformers by default to reduce development complexity, so the code should first check if transformers is installed, and skip if it isn't, so that it doesn't break the rest of the package.
For example:
Is your feature request related to a problem? Please describe.
I’m working with Presidio in a context where users may input medical and dietary information into a chatbot.
Currently, Presidio does not have built-in support for detecting medical entities such as diseases, medications, and clinical procedures. This limitation required me to implement a custom recognizer to address the need for medical PII detection.
Below is an example of the custom recognizer I’ve built:
Describe the solution you'd like
Would there be interest in incorporating a medical domain recognizer into Presidio? If so, I’d be happy to submit a PR with this implementation or a more generalized version.
The recognizer leverages transformer-based models, from Hugging Face, in order to identify clinical entities like diseases, medications, and procedures. This would allow Presidio to support medical and healthcare-related use cases out of the box.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: