Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Celery not working in develop branch #2554

Open
arakakiv opened this issue Oct 25, 2024 · 31 comments
Open

Celery not working in develop branch #2554

arakakiv opened this issue Oct 25, 2024 · 31 comments
Labels
bug Something isn't working

Comments

@arakakiv
Copy link

What happened

The celery components aren't working as expected; no job can be successfully executed by the GUI.

Environment

Latest changes in the develop branch.

What did you expect to happen

No error being thrown just like the prod environment.

How to reproduce your issue

Just run the start script in test mode.

Error messages and logs

For both celery_beat and celery_default:

Error: Invalid value for '-A' / '--app': 
Unable to load celery application.
@arakakiv arakakiv added the bug Something isn't working label Oct 25, 2024
@mlodic
Copy link
Member

mlodic commented Oct 25, 2024

🤔 you right let me check

@mlodic
Copy link
Member

mlodic commented Oct 25, 2024

no nevermind kidding I am not replicating the same issue. Things work like expected. Have you tried again?

Did you change anything regarding name queues for instance in the env_file_app file? that could have break this

Sharing your personal configuration could help identify the issue if you didn't manage to solve it

@arakakiv
Copy link
Author

I haven’t changed anything except the variables related to AWS, but only those for secrets and IAM. I am running on an Ubuntu machine in the AWS infrastructure; the web application runs normally, but due to Celery errors, the jobs never finish. The secrets could be obtained as they should, and that’s the only problem. I am starting the application with the command ./start test up -- --build. Finally, I'm cloning the develop branch.

@arakakiv
Copy link
Author

It looks like the aws_sqs is not on the settings object too.

@mlodic
Copy link
Member

mlodic commented Oct 25, 2024

can you share the exact same config that you have for the env_file_app (obviously redacting secrets)?

ah one thing: which kind of SQS queues have you configured? Cause fifo ones are required. Standard ones won't work

@arakakiv
Copy link
Author

arakakiv commented Oct 25, 2024

I didn't configured any SQS queues... Probably this is the problem? SQS queues are required when running on the AWS infrastructure? When running the prod version no problem is thrown. See below my env_file_app:

# Required Secrets
DJANGO_SECRET=
DB_HOST=postgres
DB_PORT=5432
DB_USER=
DB_PASSWORD=
DB_SSL=False
DB_NAME=intel_owl_db

# Additional Config variables
# jobs older than this would be flushed from the database periodically. Default: 14 days
OLD_JOBS_RETENTION_DAYS=14
# used for generating links to web client e.g. job results page; Default: localhost
INTELOWL_WEB_CLIENT_DOMAIN=localhost
# used for automated correspondence from the site manager
DEFAULT_FROM_EMAIL=
# used for correspondence with users
DEFAULT_EMAIL=
# Storage
LOCAL_STORAGE=True

# OAuth2
GOOGLE_CLIENT_ID=
GOOGLE_CLIENT_SECRET=

# SMTP backend
EMAIL_HOST= 
EMAIL_HOST_USER= 
EMAIL_HOST_PASSWORD= 
EMAIL_PORT=
EMAIL_USE_TLS=False
EMAIL_USE_SSL=False

# AWS
## S3 storage
AWS_STORAGE_BUCKET_NAME=
AWS_IAM_ACCESS=True
### to use if no IAM credentials are provided
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
## secrets, broker and region
AWS_SECRETS=True
AWS_SQS=False
AWS_USER_NUMBER=
AWS_REGION=us-east-1
# IAM ROLE for RDS
AWS_RDS_IAM_ROLE=False
## to use for sending mail with SES
AWS_SES=False

# Uploads
SLACK_TOKEN=
DEFAULT_SLACK_CHANNEL=

# Elastic Search Configuration
ELASTICSEARCH_DSL_ENABLED=False
ELASTICSEARCH_DSL_HOST=
# consult to: https://django-elasticsearch-dsl.readthedocs.io/en/latest/settings.html
ELASTICSEARCH_DSL_NO_OF_SHARDS=1
ELASTICSEARCH_DSL_NO_OF_REPLICAS=0

ELASTICSEARCH_BI_ENABLED=False
ELASTICSEARCH_BI_HOST=
ELASTICSEARCH_BI_INDEX=intelowl-bi

# Test tokens
TEST_IP=8.8.8.8
TEST_DOMAIN=www.google.com
TEST_URL=https://www.google.com/search?test
TEST_MD5=446c5fbb11b9ce058450555c1c27153c

# other variables
STAGE="production"
DEBUG=False
LDAP_ENABLED=False
DISABLE_LOGGING_TEST=False
MOCK_CONNECTIONS=False
HTTPS_ENABLED=not_enabled
RADIUS_AUTH_ENABLED=False
# True for public deployment, False for internal deployment
PUBLIC_DEPLOYMENT=False
# broker configuration
BROKER_URL=redis://redis:6379/1
WEBSOCKETS_URL=redis://redis:6379/0

FLOWER_USER=flower
FLOWER_PWD=flower
DJANGO_SECRET=

@mlodic
Copy link
Member

mlodic commented Oct 28, 2024

SQS queues are required when running on the AWS infrastructure?

No.

Anyway I tried to run IntelOwl from the develop branch with test option and your exact env_file_app and celery does not break.

Could it be possible that you changed something by hand in some files?

@arakakiv
Copy link
Author

arakakiv commented Oct 28, 2024

Well then I'm gonna try again by cloning and changing only the env_file_app and the env_file_postgres. No logs appear even with the --debug-build option?

@mlodic
Copy link
Member

mlodic commented Oct 28, 2024

--debug-build adds more information at Docker images building time. It could help. Not sure if this is the case.

@arakakiv
Copy link
Author

I'm sorry about not saying about this detail. Yes, actually I got this error when I use this option.

@mlodic
Copy link
Member

mlodic commented Oct 28, 2024

that flag should not affect the output. That flag should help in understanding possible problems at build time in case the image did not build properly.

Anyway I have just tried to build with that flag and in my test env it works. 🤔

It is very difficult to help if I can't replicate. The code affected is here and I thought that the queue names could have affected them. Otherwise there is no reason why it does not start.

Last questions to see if that could be the case:

  • docker version
  • docker compose version
  • OS version

@arakakiv
Copy link
Author

I will clone again and paste any erros that I get. Also, are you being able to create jobs? I can create them but they never get analyzed. Here are the answers for your questions:

  • Docker version 27.3.1, build ce12230
  • Docker Compose version v2.29.7
  • Ubuntu 22.04.1

@arakakiv
Copy link
Author

Just cloned the develop branch and changed only the env_file_app and env_file_postgres files. Started up the application using ./start test up. The error still occurs:

image

@mlodic
Copy link
Member

mlodic commented Oct 29, 2024

did you build the image after moving to the develop branch?

./start test up -- --build is required after you checkout to the develop branch. In this way the right image is used. Otherwise, if you launch up alone, you would still use the previously built image and the error would be reflected in any case.

I think that you are still using the old image related to the bug in the develop branch that I solved.

@arakakiv
Copy link
Author

I'm literally removing the whole docker's data (docker system prune -a) and the IntelOwl's directory (and cloning it again from the develop branch). Yes, I used the --build arg, I just didn't mentioned it after all, sorry. I'm following these steps everytime:

  1. Deleting the old IntelOwl cloned repository.
  2. Cloning it again.
  3. Cleaning all docker stuff (removing every container, image, volume and still running docker system prune -a).
  4. Modifying env_file_app and env_file_postgres files.
  5. Running with ./start test up -- --build.

Some other errors that I get are:
image

Tested up again and I saw that the error occurs only when using the AWS secrets and IAM access -- I'm actually modifying only the AWS_IAM_ACCESS, AWS_SECRETS and AWS_REGION variables (and letting an empty string for DB_USER and DB_PASSWORD).

@arakakiv
Copy link
Author

arakakiv commented Nov 1, 2024

It looks like the variables in the settings object are not being set in the intel_owl/celery.py file. I am trying to understand why. Any thoughts?

@mlodic
Copy link
Member

mlodic commented Nov 5, 2024

any updates?

The error in the build happens because you don't have any local build of the IntelOwl Project so that's normal if you have pruned everything locally. In that case, you should run ./start test build first and then the error would disappear once you call ./start test up.

I tried again with your env variables and this time I managed to replicate your error. There's some exception triggered by the AWS python library that is not correctly managed by the code. I am opening a PR soon about that

@mlodic
Copy link
Member

mlodic commented Nov 5, 2024

reference: #2567

I merged it in develop. I tested locally and it should work now. Can you please have a try?

@arakakiv
Copy link
Author

arakakiv commented Nov 5, 2024

Another exception has occurred, this time with the variable “ELASTIC_HOST,” which is not found in the settings object. This is likely the reason the errors in Celery still persist. Only the beat had the error.

 AttributeError: 'Settings' object has no attribute 'ELASTIC_HOST'. Did you mean: 'EMAIL_HOST'?

@mlodic
Copy link
Member

mlodic commented Nov 5, 2024

can you please add ELASTIC_DSL_ENABLED=False to your env file and re-try? I just added it in the template

@mlodic
Copy link
Member

mlodic commented Nov 5, 2024

ah and also re-build from the last commit, I added a little change

@arakakiv
Copy link
Author

arakakiv commented Nov 5, 2024

Did it again, see the errors below. Only the beat has this problem.
image

@mlodic
Copy link
Member

mlodic commented Nov 5, 2024

honestly I don't get it, it should work the same for all the celery containers. Can you try to populate the env var AWS_USER_NUMBER with a random number and see if that goes away?

@drosetti
Copy link
Contributor

drosetti commented Nov 5, 2024

Hi arakakiv,

I changed the env var for elastic from ELASTICSEARCH_DSL_HOST to ELASTICS_HOST. For this reason you get the error:
AttributeError: 'Settings' object has no attribute 'ELASTIC_HOST'. Did you mean: 'EMAIL_HOST'? .
Because when the project starts up and try to read the env var, it didn't find ELASTIC_HOST. Then it suggests to you something similiar (EMAIL_HOST).

I reverted the env var names and the names you are using are correct again, so your problem should be solved. Let me know if you have other problems.

@arakakiv
Copy link
Author

arakakiv commented Nov 5, 2024

It looks like the aws_sqs is not on the settings object too.

Just cloned everything again, removed all containers, volumes and images. Now, this error is occurring again. I started the application with the following commands:

./start test build
./start test up -- --build

I am currently using the same environment configuration as I stated above. For some reason this error is occurring again

@drosetti
Copy link
Contributor

drosetti commented Nov 6, 2024

This is quite strange, i didn't modify anything related to aws. Just one question: analysis are working correctly, aren't they ?

@arakakiv
Copy link
Author

arakakiv commented Nov 6, 2024

Cleaned everything again (containers, images, volumes). Cloned the develop branch and used the following commands to start the application:

./start test build
./start test up -- --build

There's the error again.

This is quite strange, i didn't modify anything related to aws. Just one question: analysis are working correctly, aren't they ?

No, jobs never finishes.

image
Its environment should really be production when running with the commands stated above?

@arakakiv
Copy link
Author

arakakiv commented Nov 6, 2024

honestly I don't get it, it should work the same for all the celery containers. Can you try to populate the env var AWS_USER_NUMBER with a random number and see if that goes away?

I tried again, but the error with the aws_sqs variable has returned. I’m not sure what’s causing it; I checked the latest commits, and it doesn’t seem like this variable was modified at all.

@drosetti
Copy link
Contributor

drosetti commented Nov 8, 2024

Its environment should really be production when running with the commands stated above?

No, please set the env var in the env_file_app in this way:

STAGE="local"
DEBUG=True

@arakakiv
Copy link
Author

honestly I don't get it, it should work the same for all the celery containers. Can you try to populate the env var AWS_USER_NUMBER with a random number and see if that goes away?

I tried again, but the error with the aws_sqs variable has returned. I’m not sure what’s causing it; I checked the latest commits, and it doesn’t seem like this variable was modified at all.

The same error continues, even when using STAGE="local".

@drosetti
Copy link
Contributor

I run docker system prune -a and then docker volume prune -a (some volumes where still present). Then I copied your config in the env_file_app and changed:

DJANGO_SECRET
DB_USER
DB_PASSWORD
GOOGLE_CLIENT_ID
GOOGLE_CLIENT_SECRET
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_USER_NUMBER
removed django secred duplicate in the last line

Then i run ./start test build and ./start test up -- --build.

It works, let me know if this solve your problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants