Skip to content
This repository has been archived by the owner on Sep 18, 2020. It is now read-only.

Caching and stuff #1

Open
alphapapa opened this issue Jan 4, 2020 · 5 comments
Open

Caching and stuff #1

alphapapa opened this issue Jan 4, 2020 · 5 comments

Comments

@alphapapa
Copy link

Hi,

I just stumbled upon this repo. It looks really interesting, and I'm surprised I haven't seen it before. Have you been trying to keep it secret? =)

I see that you use org-ql and ts, which is cool!

I also see that you have made your own caching mechanism. I guess you know that org-ql has two kinds of caches already (actually 3, counting the tags cache, but I've yet to merge that with the node-value cache), so I would love to hear about why you implemented your own. I had the idea recently to factor out org-ql's cache into a separate library, and Nicolas Goaziou has also discussed the idea of having some kind of cache built-in to Org someday, so it would be great if a single implementation could meet all of our needs.

Also, I see that you have some kind of query language or system, and that you have a Xapian backend. One of the long-term ideas I've had for org-ql, since I was working on helm-org-rifle years ago, is to have an indexed backend for files that aren't already open in Emacs. I have a branch on the org-rifle repo that implements a PoC SQLite index for Org files based on John Kitchin's work, but I haven't worked on it in a long time. Ideally that would be a separate package that could provide an org-ql backend.

Anyway, it looks like you're doing some really interesting work here. If there are any shortcomings of org-ql that could be addressed to meet your needs better, please let me know. Maybe we could collaborate on some solutions that would help everyone.

BTW, a couple of quick tips or ideas from looking at a small bit of your code:

  • ;; TODO: There has to be a better way
    There isn't exactly an officially better way; several packages do things like this. It is Lisp, after all. But there are some packages that provide some solutions, e.g. pcache. There's another one I'm trying to think of--I think it was by Radon Rosborough, so I guess I'm thinking of https://github.com/raxod502/prescient.el (the package's primary purpose is not to provide that functionality, but he wrote some code to do it that he wasn't interested in factoring out into a library).
  • (org-mode)
    You might want to use delay-mode-hooks here.

Thanks!

@l3kn
Copy link
Owner

l3kn commented Jan 4, 2020

Hi!

Have you been trying to keep it secret? =)

For now, the API is to unstable to be usable by anyone besides myself.
I've made a lot of progress in the last few days but there are a few TODO items left before I'd announce it through some (official) channel.

It would be nice to split the project into smaller more reusable packages use this repo to provide an example of how these packages could be combined.

I guess you know that org-ql has two kinds of caches already (actually 3, counting the tags cache, > but I've yet to merge that with the node-value cache), so I would love to hear about why you
implemented your own.

I like reinventing wheels (NIH syndrome) to get a better understanding of what the challenges and problems are. It's not the most efficient way of writing software but maybe I've come up with some new ideas that are useful for other packages.

If there are any shortcomings of org-ql that could be addressed to meet your needs better, please
let me know.

Thanks! I'll take a look at the source code in the next few days.

I've written my own tabulated-list based mode for listing headlines,
it would be nice to have some way to be able to pass a list of "headline" objects
to org-ql (I'm not sure what the best representation would be / which fields are needed)
and use the org-ql view code for rendering

Maybe we could collaborate on some solutions that would help everyone.

That would be great, the most abstract thing I've come up with yet would be an "org-cache"
that is persisted on disk, takes care of renaming / deleting files and allows users to register functions that get called with the output of org-element-process-buffer each time a buffer is saved / loses focus.

This would be useful for another package of mine where I'm currently using awk to quickly search for headlines with a specific property.

@l3kn l3kn closed this as completed Jan 4, 2020
@l3kn l3kn reopened this Jan 4, 2020
@l3kn
Copy link
Owner

l3kn commented Jan 4, 2020

Regarding Xapian / a sqlite backend:

I've started writing a PoC emacs module using https://github.com/ubolonton/emacs-module-rs,
https://github.com/tantivy-search/tantivy and https://github.com/PoiScript/orgize which might be a good solution for building a fast "plain-text database" that uses org-files as its storage format.

For now, keeping the (headline / file) index in memory as lisp objects and persisting it at regular intervals / on-save, using buffer-hashes to update changed files on startup, seems like a good compromise.

@alphapapa
Copy link
Author

alphapapa commented Jan 7, 2020

It would be nice to split the project into smaller more reusable packages use this repo to provide an example of how these packages could be combined.

Sounds good.

I guess you know that org-ql has two kinds of caches already (actually 3, counting the tags cache, > but I've yet to merge that with the node-value cache), so I would love to hear about why you
implemented your own.

I like reinventing wheels (NIH syndrome) to get a better understanding of what the challenges and problems are. It's not the most efficient way of writing software but maybe I've come up with some new ideas that are useful for other packages.

Cool. Rereading my quoted comment now, I realize that may have sounded as if I was rudely questioning your decision. I didn't mean it that way, and hopefully you realized that. :) I'm just very interested in what you learned from writing your own cache implementations.

I've written my own tabulated-list based mode for listing headlines,
it would be nice to have some way to be able to pass a list of "headline" objects
to org-ql (I'm not sure what the best representation would be / which fields are needed) and use the org-ql view code for rendering

I've thought several times about using tabulated-list-mode for an agenda-like view, but while the column-based sorting would be handy, it wouldn't easily support grouping like that provided by org-super-agenda, so I've not pursued the idea. It's still an interesting idea, though; maybe grouping support could be added...

BTW, I have a prototype of a Magit-like view implementation for org-ql-view: https://github.com/alphapapa/org-ql/tree/wip/view-section I'd like it to eventually support arbitrary grouping similar to magit-todos's recursive grouping, which is built on magit-section. The branch also uses structs for passing Org headings/entries around (the org-ql-item.el file).

That would be great, the most abstract thing I've come up with yet would be an "org-cache"
that is persisted on disk, takes care of renaming / deleting files and allows users to register functions that get called with the output of org-element-process-buffer each time a buffer is saved / loses focus.

That's an interesting idea. BTW, have you seen Nicolas Goaziou's org-element--cache feature? It's disabled by default, but it's already built-in to Org. Apparently it needs some more testing and polishing.

This would be useful for another package of mine where I'm currently using awk to quickly search for headlines with a specific property.

That also sounds like a useful utility!

@l3kn
Copy link
Owner

l3kn commented Jan 27, 2020

Cool. Rereading my quoted comment now, I realize that may have sounded as if I was rudely
questioning your decision. I didn't mean it that way, and hopefully you realized that. :) I'm just very
interested in what you learned from writing your own cache implementations.

No worries, I didn't read that as a critique and, considering the time I spent on it vs. how useful it has been so far, it's a decision that deserves to be questioned ;)

After looking through the code for org-ql and org-agenda, I found one big problem with caching/persisting data derived from org-elements and using it with org-ql / org-agenda:

Both org-ql (through its use of org-agenda-set-... functions) and org-agenda use markers to jump to a headline and make edits and these can't be persisted to disk.

The workarounds I've come up with so far:

  1. Adding IDs to every TODO heading and writing a custom version of the org-agenda-set-... functions that uses this id to jump to the heading (probably the best solution)
  2. Not persisting the cached data to disk, which, for my collection of org files, adds > 15s to emacs startup time
  3. Creating markers when the cache is loaded from disk by visting each file (seems very hacky)

For now, I'm using the cache only to generate a list of a few interesting files
(ones with active projets & timestamps), then use org-ql and org-agenda for the rest.

@l3kn
Copy link
Owner

l3kn commented Feb 8, 2020

For anyone interested, I've uploaded a first version of this cache to https://github.com/l3kn/org-el-cache.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants