IQSS / dataverse.harvard.edu

Custom code for dataverse.harvard.edu and an issue tracker for the IQSS Dataverse team's operational work, for better tracking on https://github.com/orgs/IQSS/projects/34

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Upgrade Data Explorer to v2

landreev opened this issue · comments

Will need to be tested carefully to make sure all the new features are working.

It's in prod. now. Scholar Portal requests that the tool is hosted locally (with the old version, v1, we were just using it hosted at https://scholarsportal.github.io/Dataverse-Data-Explorer/), so it is installed statically under Apache on both server nodes.
I did save the time it would require to build the tool from sources by using a snapshot of the installation at UNC that @donsizemore kindly shared.
There are some questions already, about how categories are displayed (for discreet String vars, for ex., when categories are not explicitly defined in the metadata). So I'll keep this issue open for a while longer, to collect feedback/see if anything needs to be fixed.

Hmm. So the problem with categories appears to be real and specific to our installation.
Here's the screenshot of the "View Categories" popup from the UNC installation for the file https://holodeck.irss.unc.edu/dataverse-data-explorer-v2/index.html?fileId=7504327&fileMetadataId=3782249&dvLocale=en&siteUrl=https://dataverse.unc.edu, variable zipcode:
Screen Shot 2023-10-27 at 1 54 39 PM
Here's the DDI metadata record for the variable:

<var ID="v39982342" name="zipcode" intrvl="discrete">
  <location fileid="f7504327"/>
  <labl level="variable">
     D.11 What is your five-digit ZIP code at your home address? [IF ZIP GIVEN IS INV
   </labl>
   <varFormat type="character"/>
   <notes subject="Universal Numeric Fingerprint" level="variable" type="VDC:UNF">UNF:6:a2XEqpOGWuHGFTuQMUEAGQ==</notes>
</var>

i.e., the categories are not pre-defined in the metadata - this is a "simple" character variable. The categories as shown in the screenshot above must be calculated on the fly.

Here's a similar variable in Phil's dataset:

   <var ID="v19500785" name="language" intrvl="discrete">
   <location fileid="f3371438"/>
   <labl level="variable">language</labl>
   <varFormat type="character"/>
   <notes subject="Universal Numeric Fingerprint" level="variable" type="VDC:UNF">UNF:6:2UG0lAfsl9idD6tBDK4E9A==</notes>
</var>

... but, attempting to get a view of the categories in our instance of Data Explorer results in an empty box:
Screen Shot 2023-10-27 at 2 09 44 PM

Whatever it is, it must be happening in the browser/javascript - according to the access log, the tool successfully downloaded the data column for the variable from the tab file, so it got everything it needs to generate the list of unique values etc. ...
It does appear that this is not unique to Explorer v2, that the same thing is observed in v1 (still installed in parallel in prod.)

So what do we do with this issue?

It's weird, if I manually hack the tool URL and add my API token like this...

https://dataverse.harvard.edu/dataverse-data-explorer-v2/?fileId=6867331&fileMetadataId=6747643&dvLocale=en&siteUrl=https://dataverse.harvard.edu&key=REDACTED

I get a nice plot when I click "view categories" on the "language" variable:

Screenshot 2023-10-27 at 3 10 25 PM

... but this is non-restricted public data so it shouldn't need any API token at all. 🤔

OK, so the problem appears to be that sometime between 5.9 and 6.0 we've made the api auth start rejecting calls with invalid tokens (such as key=null), even if the file in question is public. i.e. this is no longer working:
https://dataverse.harvard.edu/api/access/datafile/6867331?key=null

So, sounds like closing this issue and opening a simple main repo issue should be the proper course.

I created an issue on the Data Explorer v2 side:

Some URLs we've been playing with:

Note that the 6.0 URL works now because of a workaround we (ok @landreev ) put into Apache to strip out key=null.

I will create an issue for the "junk key when no auth is required" on Monday.
In the meantime, worked around the issue in IQSS prod. with an apache rewrite rule (see the comment above). Removed the v1 of the Explorer.

Closing the issue., now that the explorer is working.
The rewrite rule that was added in prod. to strip "key=null" from incoming requests:

RewriteCond %{QUERY_STRING} ^(.+?&|)key=null(?:&(.*)|)$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1%2 [PT]

For information, we have adapted the rule to handle a case generating an error:Downloading "Tab-delimited file" from DataExplorer. Generated URL is &key=null

RewriteCond %{QUERY_STRING} ^(.*?&|)key=null(?:&(.*)|)$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1%2 [N,R]