RamySaleem / GEOSCIENTIFIC-WORD-EMBEDDINGS-on-CONFERENCE-ABSTRACTS

Analysing Unstructured Geosciences Data for a Changing World.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GEOSCIENTIFIC-WORD-EMBEDDINGS-on-CONFERENCE-ABSTRACTS

Analysing Unstructured Geosciences Data for a Changing World.

Introduction

This study was created on the framework of understainding how the natural lanuages processing can help in geosciences. The project aims to investigate a set of geosciences word embeddings and identify the most similar term to five given terms. Moreover, calculate the nearest term to a vector calculation problem. These terms were Salt, Ghost, Gather, and elastic. The vector calculations computed were P-wave - compressional plus shear, seal - mudstone plus sandstone, PSTM -time plus depth and finally Kirchoff - ray plus wavefield.

Dataset

The data is composed of summaries of geoscience conference abstracts and journal papers. The data was loaded using a token as an environment variable.

Generalised Workflow

To generate word embeddings from geoscientific texts follow the following workflow:

  1. Read in our corpus (geoscientific text),
  2. Perform any necessary processing of the corpus,
  3. Compute the word vectors

Conclusions

  1. Word processing can be precious to understand and interpret knowledge transfer.
  2. Numerical analysis of words and sentence lengths can help efficiency to highlight misunderstanding across disciplines.
  3. The information can be focused and can be user-defined to test data strategies to decrease overall model uncertainty.
  4. The automated approach provides a user-controlled, quick and easy word assessment of the language associated with geological and geophysical disciplines.
  5. Skipgram approach performs better to analyse G&G languages data.

Future work

The project can be compared with theContinuous Bag of Words (CBOW) Model. The CBOW model architecture tries to predict the current target word (the center word) based on the source context words (surrounding words). https://www.kdnuggets.com/2018/04/implementing-deep-learning-methods-feature-engineering-text-data-cbow.html

Reference

  1. What is the shear equivalent of a P-wave? https://www.earthdoc.org/docserver/fulltext/fb/38/7/fb2020051.pdf?expires=1619893516&id=id&accname=fromqa190&checksum=9E55711AF8CF1D67250F04B959D084CD.

  2. Geoscientific WORD EMBEDDINGS - https://github.com/cebirnie92/KAUST-Iraya_SummerSchool2021

By Ramy Abdallah

About

Analysing Unstructured Geosciences Data for a Changing World.

License:GNU General Public License v3.0


Languages

Language:Jupyter Notebook 100.0%