You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I would like to ask you something about the splitting of text into sentences during the annotation phase.
I thought that the sentences were split by considering dots at the end of them, but it is not always the case. Sometimes sentence separators are ":" or a term in uppercase.
I would like to ask:
What is the rule for sentence splitting?
Is it possible to set the separator? For instance, split a sentence only when a dot is found.
I’m using the udpipe package in R.
Below is an example text where I find that sentences are separated by an uppercase term:
model <- udpipe_download_model(language = "english")
txt <- c("No previous
study has investigated the influence of governance and organizational AHCs configurations
on the productivity and scientific impact of AHCs.")
df <- udpipe(txt, object = udpipe_load_model(model$file_model))
Thank you!!
The text was updated successfully, but these errors were encountered:
Sentence splitting is based on a statistical classification model trained on conllu data from universaldependencies. It predicts for each letter in the text if a new sentence starts at that letter given the surrounding context.
If you want to use another way of splitting, you could use udpipe::strsplit.data.frame or strsplit from base R in order to define your own hardcoded sentence splitting criteria.
Hi! I would like to ask you something about the splitting of text into sentences during the annotation phase.
I thought that the sentences were split by considering dots at the end of them, but it is not always the case. Sometimes sentence separators are ":" or a term in uppercase.
I would like to ask:
I’m using the udpipe package in R.
Below is an example text where I find that sentences are separated by an uppercase term:
model <- udpipe_download_model(language = "english")
txt <- c("No previous
study has investigated the influence of governance and organizational AHCs configurations
on the productivity and scientific impact of AHCs.")
df <- udpipe(txt, object = udpipe_load_model(model$file_model))
Thank you!!
The text was updated successfully, but these errors were encountered: