fedspendingtransparency / data-act-broker-backend

Services that power the DATA Act's central data submission platform

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error in setup

azvaska opened this issue · comments

hello I followed the readme but once I issued the command
docker exec -it dataact-broker-backend python dataactcore/scripts/initialize.py -i

this error pops out:

2022-04-07 21:34:18,828 INFO:dataactvalidator.scripts.load_cfda_data:Fetching CFDA file from new-url.com/cfda.csv
Traceback (most recent call last):
  File "dataactcore/scripts/initialize.py", line 249, in <module>
    main()
  File "dataactcore/scripts/initialize.py", line 180, in main
    load_domain_value_files(validator_config_path, args.force)
  File "dataactcore/scripts/initialize.py", line 93, in load_domain_value_files
    load_cfda_program(base_path)
  File "/data-act/backend/dataactvalidator/scripts/load_cfda_data.py", line 82, in load_cfda_program
    r = requests.get(S3_CFDA_FILE, allow_redirects=True)
  File "/usr/local/lib/python3.7/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 528, in request
    prep = self.prepare_request(req)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 466, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 316, in prepare
    self.prepare_url(url, params)
  File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 390, in prepare_url
    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL 'new-url.com/cfda.csv': No schema supplied. Perhaps you meant http://new-url.com/cfda.csv?

reading the code I need to set this :

    usas_public_reference_url: new-url.com
    usas_public_submissions_url: new-url.com

but I don't know what to put, the usaspending doesn't seem to have these files.

since I don't have direct access to all the data, how can I update my DB with the new data ? without having to rebuild it every time from historical data.

Hi @azvaska, we keep these fields defaulted new-url.com with the intention of having our operation scripts populate them upon deploys as they can change over time. We can update our config examples to use the proper URL but in the meantime, you can use https://files.usaspending.gov/reference_data for usas_public_reference_url and that should get you going. Apologies for the inconvenience.

i modifyed the settings and now it's giving me this error:

2022-04-08 14:09:59,682 WARNING:dataactcore.interfaces.db:No current_app, falling back to non-threadsafe database connection
2022-04-08 14:09:59,682 INFO:dataactvalidator.scripts.load_tas:Working with local cars_tas.csv
2022-04-08 14:10:00,399 WARNING:dataactcore.interfaces.db:No current_app, falling back to non-threadsafe database connection
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 3540, in _ensure_valid_index
    value = Series(value)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/series.py", line 316, in __init__
    data = SingleBlockManager(data, index, fastpath=True)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1516, in __init__
    block = make_block(block, placement=slice(0, len(axis)), ndim=1)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 3284, in make_block
    return klass(values, ndim=ndim, placement=placement)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 2792, in __init__
    super().__init__(values, ndim=ndim, placement=placement)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 128, in __init__
    "{mgr}".format(val=len(self.values), mgr=len(self.mgr_locs))
ValueError: Wrong number of items passed 25, placement implies 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "dataactcore/scripts/initialize.py", line 249, in <module>
    main()
  File "dataactcore/scripts/initialize.py", line 182, in main
    load_tas_lookup()
  File "dataactcore/scripts/initialize.py", line 66, in load_tas_lookup
    load_tas()
  File "/data-act/backend/dataactvalidator/scripts/load_tas.py", line 235, in load_tas
    update_tas_lookups(sess, tas_file, update_missing=update_missing, metrics=metrics_json)
  File "/data-act/backend/dataactvalidator/scripts/load_tas.py", line 133, in update_tas_lookups
    old_data['display_tas'] = old_data.apply(lambda x: concat_display_tas_dict(x), axis=1)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 3487, in __setitem__
    self._set_item(key, value)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 3563, in _set_item
    self._ensure_valid_index(value)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 3543, in _ensure_valid_index
    "Cannot set a frame with no defined index "
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series

i basically want to run the usaspening.gov API locally but i don't get how to update with the newest data, i can see from where you are pullint it https://www.usaspending.gov/about but i don't see in the website any links to csv or somethinglike that