Cook4986 / Longhand

Text corpora in virtual reality

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Longhand

Longhand is a word cloud generator, but the words are 3D models surrounding the user. The models chosen (and their placement) represent text token frequencies in a target corpus. Longhand exposes humanities researchers to the specific benefits of immersive visualization, including depth cues and embodied interfacing.

throughput diagram

While objects of study associated with academic disciplines “whose primary dimensions are spatial” (i.e. STEM) are regularly deployed in virtual reality (VR) to support research and instruction, immersive visualization technology has yet to see consistent uptake in text-centric humanities, like History, Philosophy, and Literature. This is due in part to the nature and quality of source material, which often defies visualization, transcends media, and precludes close reading by virtue of sheer scale. Enter Longhand.

Longhand inputs a plain text "bag of words". That document is subjected to natural language processing. Once the bag-of-words has been parsed, the "x" (number of) most common words are added to a dictionary of lists based on their relative frequency in the transcribed corpus. That frequency is included in a second output document ("...objects.txt" output) along with urls, 3D model names, UIDs, and poly counts, all generated via the Sketchfab API. The dictionary and included values are used to place objects in a Blender scene, which can be exported for display to one of many available metaverse creation tools (Hubs, Spatial, Frame, etc.) as a multi-user virtual environment to support virtual exploration by teams of digital humanists.

The motivation for Longhand was a pattern of technology consultations taking place in an academic library. In many, consultees are working with voluminous, sometimes handwritten text materials from multiple sources but lack the programming background to effectively analyze their large-scale material collections using digital humanities methodologies associated with NLP. Longhand was designed to provide these non-technical researchers a sense of what they have; to virtually explore the contents of their heretofore opaque corpora, and to do so naturalistically, as embodied agents traversing a virtual research environment populated with 3D objects.

"First year of Astounding Stories (now Analog) sci-fi magazine, in Longhand"

Usage

Customize "declarations" and "input/output" lines in the Longhand_notebook.ipynb to generate "objects" documents in the target working directory. The notebook concludes with a terminal command that will launch Blender and run the first of three Blender Python ("bpy") scripts, Longhand_downloader.py. Once the Blender GUI launches, and the scene is populated with 3D models from the objects.txt doc, use the Blender text editor run the Longhand_aligner.py and Longhand_exporter.py, in sequence, to generate a binarized .GLB file, which includes model textures.

Usage Notes:

  • Lists, inventories, genre fiction, and recipes are all ideal content for Longhand visualization as they aren't centered on inherently ambiguous language. Conversely, emotionally grounded literature and fiction (for example) is inherently ambiguous with regard to visual comprehension. These materials may benefit from a (now-deprecated) named entity recognition (NER) branch of Longhand.

Benefits

  • Supports “raw” text input data
  • Represents tokens as 3D objects, which are recognizable in a cluttered scene and from novel perspectives.
  • Leverages existing asset collection (Sketchfab) as an object dictionary
  • Exposes text-centric fields to the benefits of XR, like:
    • Volumetric (i.e. 3D) representation space, with depth cues
    • Tracked HMDs allow for highly intuitive, embodied interfacing

Next Steps

  • Vision science-based object positioning
  • Model collision detection
  • Collect/deploy image covers
  • 100MB max, automatic decimation
  • Streamlit deployment
  • Text-to-3D (AI) for Longhand 2.0
    • e.g. Stable Dreamfusion
  • DearPyGUI-based interfacing

Core Technologies

Further Reading

Early Test Images

army_cookbook

"Manual for Army Cooks" - United States War Department, 1896

diary entries1

The personal diaries of one J.R. Coolidge - 1921

UFOs

Esotericism and UFO Research - Blomquist, 2017

Evidence Locker

"McLennan County Sheriff's Department Inventory of Evidence Locker" - 2021

Kitchen Science

"Science in the Kitchen" - Kellogg, 1904

Religion

Comparative religion and the religion of the future - Martin 1926

Matt Cook mncook.net- 2022

About

Text corpora in virtual reality

License:MIT License


Languages

Language:Python 54.8%Language:Jupyter Notebook 45.2%