-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up the detection process #20
Comments
Hi @sasi143, thanks for your interest in Spaczz. I am very interested in improving the speed of the fuzzy matching process, however, for reasons I'll outline below, I unfortunately do not think this will happen in the near future without additional contributor(s). I believe the performance bottleneck(s) in spaczz's fuzzy matching come from the amount of time the code spends in pure Python iterating through and processing text and potential matches. I do not believe the fuzzy comparisons themselves are a bottleneck because they are done with RapidFuzz which is already written in C++. I do need to do some profiling to confirm this though. The pattern that spaCy proper uses to achieve it's rapid speed is dropping most of it's internal text processing code to C, and that is the pattern I would eventually like to follow. Unfortunately, I have almost no experience with C/C++, so without help from additional contributor(s) with C/C++ experience, my progress will be quite slow. That being said, I do intend to work on it myself, I just can't make any promises about timelines. Also, while I think incorporating GPU support is an interesting idea, to be honest I have even less of an idea about how to implement that than dropping portions of the code down to C/C++. In the shorter term here are some possible actions:
I'm sorry I can't give you a more definite solution or timeline right now. Hopefully as people continue to discover/use spaczz some more experienced programmers may become interested in contributing. As things stand, I will slowly be working on accomplishing these speed improvements myself. I'll use this issue as a place to keep track of these updates as they come. |
@gandersen101 Really thankfull to your well explaination and appreciate for your time. Keep doing good work and stay safe. |
Issue #41 has turned into a performance discussion and I am planning to make some performance improvements very soon. I will provide a summary of those changes on this thread soon. |
@gandersen101 Thanks for the inspiration. I started a low-level integration of rapidfuzz into spaCy, to attempt to improve performance explosion/spaCy#11359 |
@kwhumphreys very cool. Best of luck! Obviously I have not put much time into I'm going to add an announcement to the README - essentially I intend to address some issues/ add some functionality with |
First of all, really appreciate your work and time.
With small input data patterns, it is doing a good job, but when input data patterns crossing more than 1 lakh, it is taking too much time. Is there any possibility which can speed up the process (maybe using on GPU)
The text was updated successfully, but these errors were encountered: