⚠️ I have just open sourced Buzee. The documentation is lacking. Please open an issue and I'd be happy to help!⚠️
The OCR capabilities in Buzee are built on top of Textra on Mac and WinOCR on Windows. Do check these repos out! I feel the native OCR capabilities of Mac and Windows are really good and should be used more!
Buzee (pronounced boozey) is a eight-year-old labrador retriever who can't play fetch but can love you like no other.
Buzee is also a full-text search application for your life. It helps you find your files, effortlessly.
Download v0.2.0 from this Releases page.
- Fast, full-text search for all your documents, images, audio, video, folders, and browser history.
- Search all local documents and folders by keyword, time, type, or any combination of these.
- Ignore specific files or folders from being indexed. Or ignore only their content.
- Global shortcut. Press
⎇ / Alt + Space
anywhere to show/hide the app. Modify it in the settings. - In the app, press
⌘ / Ctrl + F or K
to go to the search bar from any screen. - In the app, press
⌘ / Ctrl + Shift + S
to go to the scratchpad from any screen. - View statistics about your files and get your Unique Document Profile.
- Sub-features:
- Extract text from PDFs and Images using OCR.
- Use a Scratch Pad to quickly jot down notes.
- Automatically syncs with changes on your filesystem.
- Lightweight installation package and low memory usage.
- Supports these default file types:
Documents: csv, docx, key, md, numbers, pages, pdf, pptx, txt, xlsx, xls
Images: jpg, jpeg, png, gif
Books: epub, mobi, azw3, pdf
Audio: mp3, wav, aac, flac, ogg
Video: mp4, mkv, avi, mov, wmv
- Use the Filetype filter or simply type it in the search (like
invoice pdf
) - Put quotes around keyword(s) to search for the exact phrase (like
"annual report"
) - Put a hyphen in front of the keyword to exclude it from search (like
"annual report" -2022 -pdf
)
Use the Date Range filter or simply mention the date/time period that you are looking for in your search. For example:
last month pdf invoice
annual report ppt this year
q2 2023 to q3 2023 retail report xlsx
prelim findings from 2017 to 2022
cv docx from march 2 2020 to aug 15 2020 -pdf
scope study 14/02/2015 to 10/08/2015
If you want to use a phrase as a keyword for search and not time, simply wrap it in quotes like this: invoice "March 2022"
Buzee works best on Mac. Windows may throw up some issues because I haven't had a change to properly test it. Linux is untested entirely, so you're on your own there.
- Clone the repository.
- Install Rust and NodeJS.
- Run
npm install
in the root directory. - Run
cargo install
in thesrc-tauri
directory. - Run
npm run tauri dev
in the root directory to run the app in development mode. - Run
npm run tauri build
in the root directory to build the app for production.
Building on Windows requires a few changes. Follow these steps:
- Remove
drag = { path = "./crates/drag", version = "0.4.0", features = [ "serde" ] }
fromCargo.toml
- Comment out
crate::drag::start_drag,
fromipc.rs
- Comment out
mod drag;
frommain.rs
Finally, replace binaries/textra
with binaries/winocr
in tauri.conf.json
.
Index:
(~) : partly implemented
(+) : has to be built from scratch
(?) : not sure if it will add great value
- (~) Show matching text for search results by reading from the
body
table. - (~) Browser history search should support complex queries the way document search does.
- (~) Icon view should load thumbnails in an efficient, non-blocking manner. Thumbnails should show up on the page as they are loaded.
- (~) Enable adding 'comments' to documents.
- (~) Enable pinning documents/folders to the top of search results.
- (~) Allow user to add or remove supported file types.
- (~) Allow user to switch between profiles on Arc and Chrome. (Currently uses the default profile)
- (~) Test for Linux.
- (~) Improve the speed of parsing PDFs, Images and XLSX files. Especially OCR operations.
- (+) Enable adding 'tags' to documents.
- (+) Create a 'Dashboard' view that shows statistics, pinned documents, and recent searches.
- (+) Add tests to the codebase.
- (?) Record frecency of documents and use it to sort search results.
Back-end:
- Rust
- Tauri v2
- SQLite
- Tantivy
Front-end:
- Svelte 4 using TypeScript
- shadcn-svelte
- TailwindCSS
See all dependencies in the Cargo.toml and package.json files.
- All file metadata is stored in SQLite in the
document
table. A centralmetadata
table stores the metadata from files and eventually cloud services, emails etc. - A full-text index is created on
metadata
and stored as themetadata_fts
table. - Parsed text from documents is stored in the
body
table. - A full-text index is created in Tantivy at the same time.
- The Firefox, Chrome and Arc history is searched using their respective history databases directly.
- All front-end code is in the
src
directory. All back-end code is in thesrc-tauri
directory.
Read the Vision and Roadmap.
I have spent two years building this project. It started as an Electron app, then I switched to Tauri for performance gains. When I started I barely new JavaScript and Svelte. Over the course of development, I learned NodeJS, TypeScript, SQLite, Rust, Tauri, Tantivy, and many other technologies. I learned so much about managing a project of this size and complexity. I am proud of what I have built but I am more proud of what I have learned.
I am now letting go of this project because I have other priorities. Please feel free to do with this project as you wish. I am happy to help you get started with the codebase.
If nothing else, this project can serve as an example of how to build a full-text search engine using Tauri and Tantivy. There are several tiny features and performance workarounds that I have implemented that you might find useful.
If you do do something with this project, please let me know. I would love to see what you build!
MIT