-
Notifications
You must be signed in to change notification settings - Fork 626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cron job stops after random hours/days #805
Comments
is the Node process still running when the cron job stops? is it only the cron part that fails? |
yes the Node process keeps running. |
I have a NestJS codebase running in production for more than 8 months (restarted once in a while for deployments) that runs a lot of cronjobs for handling financial transactions, one of them running every second, and I've never encountered this issue... However, I do have more memory on my server, 2GB I believe, so maybe that's the issue. Have you ever encountered a situation when your container exceeds its maximum memory usage? Could you try updating "overrides": {
"@nestjs/schedule": {
"cron": "2.4.4"
}
}, This kind of issue is typically difficult to pinpoint without detailed information from those who have experienced it. Unfortunately, reports of this nature are quite rare and often lack critical details. If you can provide any detail or hint that would help us reproduce the issue, we would appreciate it and do our best to find a solution. |
I understand how difficult it is to reproduce these kind of bugs, I can't even reproduce it myself and it happened twice so far. For details, the container with the cron jobs is running usually at 70% memory and <5% cpu, but there was no peak in usage when the bug occurred. Thank you @sheerlox for the suggestion, but I think that blindly upgrading Again, I'm not even sure how to debug this bug, so I'm available to give more details if possible. |
I gave a good look to #232, and it actually contains a lot of information! From what I understand it looks like the issue is with line 191 from Please add this file to your project, then follow the set-up instructions from
Once the issue manifests itself again, you should be able to extract the interesting output with the following command: 15 lines above should give us a pretty good picture of the previous runs of the If you can get us this debug information I'll make sure to free up some time to dig deeper into the issue and find a fix. Thank you for re-opening an issue about this to let us know that still happens. |
Thanks for the update, will make sure to add this and get back to you if the issue manifests itself again 🙏. |
A little update :
will that be enough to debug the issue in case it happens ? |
That makes sense, I didn't consider that when writing the patch! Knowing the exact time the callback was started would be really useful, so instead of my first version can I suggest saving |
Sounds good to me, will add it 👌. Update patch :
|
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Just stopping by to say that it’s still happening in version 3.1.7, using node 14.19.3 (node:14-alpine). I had to switch to node-cron |
As I previously said, that issue is hard to reproduce and we're still looking for help from someone encountering it. I have critical production code running this library for more than a year and I've never been able to observe that issue, so we're welcoming anyone who would like to help us debug the issue with more information. Please find the patch to apply for the necessary debug information here: #805 (comment). Regarding the transition to P.S. moreover that issue also seems to exist in |
There's an interesting blog post: https://lucumr.pocoo.org/2024/6/5/node-timeout/ that might be relevant to the issue. There Armin describes how I see that here: And here: So the described memory leek may very well apply to the library. |
Hello guys. I would like to add my case that happened to me a few days ago. I have a payment gateway application implemented with NestJs, in production, that has a cron job (from nestjs) running to trigger webhooks. The cronjob runs every 10s, using the CronExpression from the library which solves in */10 * * * * *. The application runs on GCP using Cloud Run. In my implementation, for every execution, the tries increase by one until it is processed. The API was running for at least 20 days without an interruption or reset. So move on to what I have in GCP... I tried to see if the application was reaching the memory limit, but no, and the cloud run has a configuration to keep at least one instance. In this documentation about instances in cloud run, it has an idle status but is not deactivated. Cloud run configuration has 512mb and 1cpu allocated. The CPU is not set to be always allocated, but this can not be the problem as the other webhooks were processed in time. Your Environment |
i have the same issue: Cron job stops after random time Description:I encountered an issue where the cron job stops running after a random amount of time. After monitoring the behavior for a while, I noticed that the job unexpectedly stops without any specific error message or pattern. Steps to Reproduce:Set up a cron job using "cron": "^3.1.7" . Expected Behavior:The cron job should continue running consistently according to the schedule. Actual Behavior:The cron job stops after a random period of time. There is no specific pattern or error log that explains the stoppage. What I Did to Resolve the Issue:I forked the project and made modifications to the code. After testing the changes over the course of a few weeks, I no longer experienced the issue. Changes Made:
Conclusion:Since the changes I made have resolved the issue for me, I wanted to report this in case others are facing the same problem. I'm happy to provide further details or submit a pull request if needed. |
Hi @lmcuong25, thanks a lot for the effort. However, I'm afraid I might be missing something because except for returning more data for logging purposes, I don't see any changes to the getTimeout function. Could you point out to what you think solves the issue in your patch? |
Hi @sheerlox , |
Hi guys, I spent some time trying to understand what could cause the cron job to stop running. I used this tool https://github.com/wolfcw/libfaketime to try to replicate any situation where the cronjob could stop. I don't know if this tool is the best way to replicate any scenario, but I could stop the cron-job. If the multiplier to speed up the process is too big, the timeout calculated by the
Another curiosity is that, for example, I set the ratio in libfaketime to speed up 1000x (in seconds), sometimes could not start (timeout get -1 in the first loop), other times the code started and kept running, and in other cases, it ran by N loops, then got -1 as timeout and started the stop flow. I know it is inconclusive, but I think the problem could be with the way the getTimeout() is working or with the environment, which is not related to the library and could explain why some people do not experience this bug.
I did not comprehend why a timeout of -1 should stop the cronjob from running. Maybe to not compromise the cronjob execution interval and lead to unexpected behavior to the user? But in my opinion, stop cronjob because this already leads to unexpected behavior. My suggestion would be to throw a specialized error instead of calling As mentioned in other issues, like #232, the -1 in timeout could be the problem. In this comment #232 (comment), the person points out that the node is running very slowly, which could break the logic inside the library. If this is correct, the library should have minimal specs to run perfectly, and throwing an error would be a better choice in these situations. This is the repository to the fork and the changes I made: https://github.com/droderuan/node-cron |
Thank for so much to all of you for the work you've put into this. Thanks to your input, I think I might have a solution to the issue. I'll be working on it ASAP and keep you updated! |
I'm using Docker with the image: Here’s my log output:
At the time of logging, CPU and RAM usage weren’t particularly high. However, it’s possible that a process was blocking the thread at that moment (since Node.js is single-threaded). To address the issue, I modified the code to call localTimeToMillis first, as shown in this comparison: main...lublue:node-cron-track:main Updated code:
|
🎉 This issue has been resolved in version 3.3.2-beta.1 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Hey everyone, we just released a beta version with an idea of a fix so we don't let that issue stale. Basically, it will execute the job even if with some delay. There's still a maximum delay of 1000ms (1 sec) to avoid potentially disruptive behaviors, in which case it will behave as before (stop the job). In any case, a few logs have been added that should give us more info on the best way to go about this issue. We would highly appreciate if some of you could use All relevant logs start with This version should be production-safe, all tests are still passing, but feel free to take a look at the code before you commit. Happy Holiday Season! 🎄 |
Description
I have a node app running on a machine with 512mb of memory and 256vCPU on AWS, and I have a cron job (actually using Nestjs which in turn uses this package) that runs every second using node-cron.
After few days (doesn't always happen), the cron just stops working with no error or log.
In fact there was an identical issue reported before (#232) and was closed despite people still reporting the same behaviour.
Expected Behavior
cron job should not stop working, or at the very least throw an error ?
Actual Behavior
Possible Fix
No response
Steps to Reproduce
Context
Your Environment
cron
version: 2.4.1node:18.18.2-alpine3.17
docker imageThe text was updated successfully, but these errors were encountered: