Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No such file or directory: '<my/folder>/metadata/variables.csv' #14

Closed
wmpay opened this issue Aug 9, 2018 · 14 comments
Closed

No such file or directory: '<my/folder>/metadata/variables.csv' #14

wmpay opened this issue Aug 9, 2018 · 14 comments

Comments

@wmpay
Copy link

wmpay commented Aug 9, 2018

Trying to run python manage.py load_metadata from the "Adding the metadata" part of the setup. Is there a workaround for this? Where do I get the variables file?
@jsfenfen

@jsfenfen
Copy link
Owner

jsfenfen commented Aug 9, 2018

Hi @wmpay could you say what OS you're working on, what version of python you're using, and what the entire verbatim script output is? Are the files present in your file system? How did you install stuff? What do you get if you enter something like $ irsx --format=csv 201533089349301428 from the command line? Does it work or freak out about missing .csv files?

This db is really a wrapper/datastructure around the irsx program... IF irsx isn't configured the database loader won't work either.

The variables.csv file is from the metadata repo here: https://github.com/jsfenfen/990-xml-metadata/
The metadata should have been installed as a dependency of irsx... it's packaged as a git submodule here: https://github.com/jsfenfen/990-xml-reader/tree/master/irs_reader. Not quite sure what's going on, will take a look at this ahead of the next release.

@wmpay
Copy link
Author

wmpay commented Aug 9, 2018

I'm working on macOS HighSierra v 10.13.6. I installed most of the software with homebrew (aws cli, postgres, python3). I'm using python version 3.7. I installed the python packages using pipenv, which is just a tool that combines pip and virtualenv. When I do pip freeze, irsx version 0.2.2 is installed. How can I configure it if it is installed as a python package? The full error message is below:

Running metadata load on variables.
Deleting variables.
Traceback (most recent call last):
  File "manage.py", line 15, in <module>
    execute_from_command_line(sys.argv)
  File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/base.py", line 316, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/base.py", line 353, in execute
    output = self.handle(*args, **options)
  File "/Users/mpaymar/990-xml-database/irsdb/metadata/management/commands/load_metadata.py", line 96, in handle
    self.reload_variables()
  File "/Users/mpaymar/990-xml-database/irsdb/metadata/management/commands/load_metadata.py", line 21, in reload_variables
    infile = open(infile, 'r')
FileNotFoundError: [Errno 2] No such file or directory: '/Users/mpaymar/foo/metadata/variables.csv'

Thanks for your help.

@jsfenfen
Copy link
Owner

Thanks for posting @wmpay.

It's looking in the '/Users/mpaymar/foo/' directory because that's where irsx says the metadata is--see this line:
https://github.com/jsfenfen/990-xml-database/blob/master/irsdb/irsdb/settings.py#L132

Does the command line program irsx work, and is it installed? You should be able to tell by typing irsx --format=csv 201533089349301428 -- does that produce an output of a 990 or does it error out? If so, can you post the complete error message?

If you run something like pipenv run pip freeze does it say that irsx is installed, and if so, what version is it? The files it is looking for are part of the irsx installation process.

@wmpay
Copy link
Author

wmpay commented Aug 13, 2018

Hi @jsfenfen. The irsx program is working when I run the command you send. The version is 0.2.2. I'm going to try to troubleshoot some more and I'll let you know if I solve the issue.

@wmpay
Copy link
Author

wmpay commented Aug 13, 2018

I think I had added my own METADATA_DIRECTORY for troubleshooting purposes. When I remove it I get this error

(990-xml-database-YbfOuN9i) bash-3.2$ python manage.py load_metadata
/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip instal
l psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)Running metadata load on variables.
Deleting variables.
Created 0 rowsTraceback (most recent call last):
  File "manage.py", line 15, in <module>
    execute_from_command_line(sys.argv)
  File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/__i
nit__.py", line 381, in execute_from_command_line    utility.execute()  File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/__i
nit__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)  File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/base.py", line 316, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/base.py", line 353, in execute    output = self.handle(*args, **options)
  File "/Users/mpaymar/990-xml-database/irsdb/metadata/management/commands/load_metadata.py", line 96, in handle
    self.reload_variables()  File "/Users/mpaymar/990-xml-database/irsdb/metadata/management/commands/load_metadata.py", line 27, in reload_variables
    if CANONICAL_VERSION in row['versions']:
KeyError: 'versions'

@wmpay
Copy link
Author

wmpay commented Aug 13, 2018

hmm I ran it again from the beginning and it seems like it's working. No idea what I did differently this time. Thanks anyway.

@wmpay wmpay closed this as completed Aug 13, 2018
@wmpay
Copy link
Author

wmpay commented Aug 13, 2018

One last thing - how do I download the actually filings themselves? Do I need to do that manually with the irsx tool or is there a script to do it? Where should I download them to? Thanks again

@jsfenfen
Copy link
Owner

jsfenfen commented Aug 14, 2018 via email

@wmpay
Copy link
Author

wmpay commented Aug 14, 2018

Hi - I was able to download the files with the aws cli but I'm still not sure how to actually load them into the DB. The load_monthly_filings and enter_yearly_submissions command doesn't seem to load anything so I think the tool doesn't know where the files are downloaded to? I tried setting the environment variables for irsx but that didn't seem to work. Sorry to keep spamming the issue board, would it be easier to direct message you somehow?

@jsfenfen
Copy link
Owner

@wmpay don't worry about spamming the issue board, it's useful for me to see what's confusing / tripping folks up. At some point I hope to rewrite the docs to include SQL queries that'll confirm that each step worked.

In general, each loading step is sequential, so if the annual index files aren't loaded it decides that there aren't any filings that need to be processed because there aren't any filings in the index files. Probably each script should have more verbose output. I don't have a clear timeline for when I'm gonna get to that though.

@wmpay
Copy link
Author

wmpay commented Aug 15, 2018

As far as I know I've downloaded the index files for 2014 to 2018. I've downloaded about a gigabyte of the xml files from s3 into the directory I set as FILE_SYSTEM_BASE. I'm assuming at least some of those files would be present in the index files. However, load_filings still just says Done for any year I enter. I think I just have things configured incorrectly... is there a specific place I need to do the s3 sync to?

@Georg-coder
Copy link

I get a similar error message as your initial one, when loading the metadata:
File`` "/Users/Georg/PycharmProjects/990-xml-database/irsdb/metadata/management/commands/load_metadata.py", line 31, in reload_variables infile = open(infilepath, 'r') FileNotFoundError: [Errno 2] No such file or directory: ``'/Users/Georg/PycharmProjects/990-xml-database/irsdb/generated_schemas/variables.csv'

How did you fix it? I also reran everything, but the error remains.

@wmpay
Copy link
Author

wmpay commented Jan 30, 2020 via email

@Georg-coder
Copy link

I think these folks solved it
#28

but couldnt implement it yet... :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants