Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File descriptors lifetime bound to query not to message #27

Open
andir opened this issue Feb 19, 2020 · 3 comments
Open

File descriptors lifetime bound to query not to message #27

andir opened this issue Feb 19, 2020 · 3 comments

Comments

@andir
Copy link

andir commented Feb 19, 2020

I am currently trying to iterate through all new mail with that matches tag:new and for each of those messages I am applying a set of processing rules to add/remove tags.

While doing that initially with a lot of email I did run into troubles since the program was running out of file descriptors.

It seems like the notmuch::Query struct dictates the lifetime of those. That isn't ideal if you want to iterate through a million mails.

The following code illustrates when it happens:

{
  let query = db.create_query("tag:new").unwrap();
  for message in query.search_messages().unwrap() {
      // so far no file descriptor has been created, it only is allocated once you try to access the headers (and potential other fields)

      let list_id = message.header("List-Id"); // this allocates a FD and fails once the FD limit has been reached

  } // FD should be free'ed here / after each iteration of the body / whenever `message` goes out of scope
}
// they are only free'ed when `query` runs out of scope
@vhdirk
Copy link
Owner

vhdirk commented Feb 29, 2020

Could you provide an example of how this would be remedied in C?

@andir
Copy link
Author

andir commented Mar 2, 2020

I just wrote the following short C program and it does open the file, read the list header and close the file. It also counts the mails found and how many list-id's it was able to read. I've successfully ran this against 100k mail. ~99k of which had list-id's extracted. ulimit -n shows 1024.

#include <stdio.h>
#include <notmuch.h>

int main(int argc, char* argv[]) {
        notmuch_database_t* database;

        if (argc < 2) {
                printf("usage:\n\t%s <search-term>\n", argc > 0 ? argv[0] : "<bin>");
                return 1;
        }

        notmuch_status_t rc = notmuch_database_open("/home/andi/Maildir", NOTMUCH_DATABASE_MODE_READ_ONLY, &database);

        if (rc != NOTMUCH_STATUS_SUCCESS) {
                return 1;
        }

        notmuch_query_t * query = notmuch_query_create(database, argv[1]);
        notmuch_messages_t* messages;
        notmuch_message_t* message;


        unsigned int count_total = 0;
        unsigned int count_list_id = 0;

        for (rc = notmuch_query_search_messages(query, &messages);
             rc == NOTMUCH_STATUS_SUCCESS &&
             notmuch_messages_valid(messages);
             notmuch_messages_move_to_next(messages))
        {
                message = notmuch_messages_get(messages);

                if (message == NULL)
                        break; // OOM

                count_total++;


                const char* header = notmuch_message_get_header(message, "List-Id");
                if (header != NULL && header[0] != '\0') {
                        printf("list-id: %s\n", header);
                        count_list_id++;
                } else if (header == NULL) {
                        printf("failed to get header\n");
                }

                notmuch_message_destroy(message);
        }

        printf("total: %u\n", count_total);
        printf("list_id: %u\n", count_list_id);


        notmuch_query_destroy(query);

        notmuch_database_close(database);

        return 0;
}

@vhdirk
Copy link
Owner

vhdirk commented Mar 13, 2020

Thanks, I'll look into this soon. Probably early next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants