TutteInstitute / thisnotthat

A visual labeling system implemented in Jupyter widgets.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FEATURE REQUEST] SummaryDataPane

jc-healy opened this issue · comments

This is a function that takes a summary function and outputs a data frame pane. We will pass in a function which takes a selection (and potentially other information) and generates a data frame to display to the user.

We will build a small set of pre-canned summary functions to make it easy for users to get started with this pane. These functions will be packaged into their own module to keep things tidy.

  1. value_counts_summarizer
  • Takes a selection and a series
  • pass a column from which to select and compute a value count
  1. sparse matrix largest columns passed to a value_counts summarizer.
  • Takes a selection, sparse matrix, column_index_dictionary
  • Column sum the selected rows from a passed in sparse matrix.
  • Then find the k columns with the largest sums.
  • Compute a value_counts display how many of these columns are present within our selection

Extra summary functions:
3. Weighted sparse matrix summarizer where we are working with a dedupped data set so we need to pass in a selection along with counts.
4. Cluster interpretability summarizer

  • regularized logistic regression on an interpretable feature space comparing the selected points against a sample of the other points
  1. Cluster interpretability high space centroid (top2vec summarizer)
  • Compute the centroid of the selected points in the high space and return a weighted nearest neighbour set from an interpretable joint embedding.