-
-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reducing the number of API calls on large batch translations #198
base: master
Are you sure you want to change the base?
Conversation
…ng against a dictionary of already translated identical sentences
deep_translator/base.py
Outdated
for i, text in enumerate(batch): | ||
translated = self.translate(text, **kwargs) | ||
for _i, text in enumerate(batch): | ||
if text in self._already_translated.keys: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it .keys or .keys()?
btw I think you can even remove the .keys since it checks keys by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nidhaloff @NorthOC , Just a thought, wouldn't it be nice to use the python builtin lru_cache
system here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiaulislam Great idea, yes it would be nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it .keys or .keys()? btw I think you can even remove the .keys since it checks keys by default
Removed redundant keys method on dict when iterating.
@nidhaloff @NorthOC , Just a thought, wouldn't it be nice to use the python builtin
lru_cache
system here ?
No idea how caching works :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiaulislam @NorthOC Alright, I think it's ok to merge this.
Maybe @jiaulislam can take a look at caching in a separate PR, if you have time of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiaulislam @NorthOC Alright, I think it's ok to merge this. Maybe @jiaulislam can take a look at caching in a separate PR, if you have time of course.
yes, certainly. I will do it after PR #202
deep_translator/base.py
Outdated
@@ -27,6 +27,7 @@ def __init__( | |||
payload_key: Optional[str] = None, | |||
element_tag: Optional[str] = None, | |||
element_query: Optional[dict] = None, | |||
already_translated: dict[str, str] = {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you need this declaration here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, I ran the deep-translator for days.
In my case, I ran into API errors due to LibreTranslator mirror server resets (to prevent bots from running 24/7).
I had already translated many batches, so instead of doing API calls again, when the server reset, I added the translated batches into already_translated
and saved a shit ton of time (from 5 days of runtime for a full translation to less than 24 hours of runtime).
We're talking like hundreds of API calls per hour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok but is it really needed to pass already_translated in the constructor? wouldn't it be sufficient to hide this from the user and just define it inside the class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NorthOC unit tests are failing. I also asked a question about passing already_translated
as a parameter and if this is needed
@nidhaloff Moved |
An API call is made for each item in translate_batch, even if the same item is repeated a hundred of times. This is slow. That is why my pull request aims to fix that:
Even with hundreds of items in a dictionary, checking against its keys is significantly faster than an API call.
The best part is that the already_translated dictionary will be used for the next big batch because it is initialized with the BaseTranslator class. In short, this means that only unique items will require an API call, while repeated items will be translated swiftly.
example: