twbxless
twbxless makes data exported in Tableau packaged workbook files (.twbx) available at CSV URLs for other tools, e.g. Excel data models or Google Sheets IMPORTDATA. This is possible thanks to Tableau's Hyper API.
It's possibly also useful for sustaining a community of Tableau visualization builders, where creators make new public workbooks from data exported to other public workbooks, rather than having to copy those workbooks and deal with new copies to get new data. Using twbxless this way may require web data connector or Google Sheets IMPORTDATA, as it doesn't seem Tableau, as of 2020.2, can fetch CSV data from the web on its own.
Build
To build twbxless as a container, provide
Docker
then docker build . -t twbxless
.
If you want to dev/build outside a container,
- provide JDK 14 and gradle
- download Hyper API
- extract lib from the Hyper API package and put it along side src
gradle build
Run
After building the container, run
docker run -it -p8080:8080 twbxless:latest
and you should see output like:
2020-05-14 23:39:50.695 INFO 1 --- [main] com.rationalagents.twbxless.Application : Starting Application
That means it's running, and you're ready to move onto Use. You can Ctrl+C to stop.
If you're thinking "yeah, I already did hard way once, gradle build
, it definitely built, let me keep on this
so-called hard way, and I'm on Windows, so whatcha got for me dawg", do something like this:
set HYPEREXEC=lib\hyper
java -jar build/libs/twbxless.jar
Config (optional)
twbxless supports 3 configuration environment variables:
PORT: lingua franca in servlerless, the port to bind to (default is 8080
)
HYPEREXEC: path to the executables (e.g. hyperd or hyperd.exe) packaged with Hyper API (default is /hyperapi/lib/hyper
since that's where Dockerfile puts them)
URLPREFIX: required prefix for any URL retrieved (default is https://public.tableau.com/
)
Use
First you'll need to identify the data extracts within a .twbx that's published on the web. To do that,
use the /filenames
endpoint, specifying the url
to the workbook.
For example, for this workbook featured on Viz of the Day
, the url
we need's the one backing Tableau Public's "Download" button. Use /filenames
with that url
:
http://localhost:8080/filenames?url=https://public.tableau.com/workbooks/FemaleDirectors.twb
and in CSV format you get a list of .hyper extract filenames within that workbook (there's just 1 in FemaleDirectors.twb):
filenames
Data/Fuentes de datos/Hoja1 (genderOverall).hyper
Then we switch to the /data
endpoint, using the same url
, adding filename
:
http://localhost:8080/data?url=https://public.tableau.com/workbooks/FemaleDirectors.twb&filename=Data/Fuentes de datos/Hoja1 (genderOverall).hyper
and we get the data from that file
genre,year,gender,freq,percent,filter
Total,2000,female,17,0.096,0
Total,2000,male,161,0.904,0
Total,2001,female,15,0.075,0
Total,2001,male,186,0.925,0
Total,2002,female,15,0.069,0
Total,2002,male,201,0.931,0
Total,2003,female,16,0.076,0
...
Enhancements & limitations
- This doesn't support all column types, for example geography. It'll include unsupported columns in the CSV, but
non-null values will be
TYPE?
. Please file an issue with an example workbook if you'd like support for a particular type. - Only supports single schema & table per .hyper file. I've seen plenty of workbooks with multiple .hyper files (one per data source), but never a workbook where there was >1 schema/table in a file. If you need this please provide an example workbook for enhancement #4.
- Unfortunately the .hyper filenames within .twbx files are rather opaque. Even after trying
/filenames
you might not know which file has the data you want! It'd be nice to improve on that. If interested please stop by enhancement #7.
FAQ
Does twbxless provide access to the external data sources used to make a workbook?
No. It only reads the data that's directly in the .twbx file.
Does Google Sheets' IMPORTDATA work with http://localhost:8080 URLs?
No, Google Sheets needs twbxless to be accessible from the internet. Run twbxless from some serverless somewhere, e.g. Google Cloud Run, Azure Containers, AWS, or Heroku, instead of on your computer.