Introduction

The “Wellbore Acoustic Image Database” (WAID) project is part of PETROBRAS' efforts to promote innovation worldwide in the Oil and Gas industry.

The WAID project belongs to the PETROBRAS' program called Conexões para Inovação - Módulo Open Lab and is available on the PETROBRAS' Reservoir GitLab repository.

Motivation

The “Wellbore Acoustic Image Database” (WAID) aims to promote the development of applications based on Machine Learning, particularly Deep Learning, for automating tasks related to interpreting acoustic image logs representing the wellbore surface. Such solutions involve the segmentation of structures, filling of voids in the image, event detection and generation of new synthetic data, among others.

The WAID repository contains a dataset composed of image data with associated conventional open-hole log data and a set of basic jupyter notebooks for basic handling and early data exploration.

Strategy

The “Wellbore Acoustic Image Database” project belongs to the PETROBRAS' program called Conexões para Inovação - Módulo Open Lab. This is an open project composed of the following parts:

A dataset composed of acoustic image data from 7 wells with associated conventional open-hole logs;
A set of scripts in Jupyter Notebooks format in Python language for basic handling and visualization of this data.

Our strategy is to make these resources available to the global community and develop the WAID project collaboratively.

Ambition

Acoustic image logs are a class of logging acquisition that allows the construction of wellbore images to bring rich and intuitive geological features for human experts to analyze. However, they are composed of very high-resolution measurements, bearing a considerably larger information content than conventional open-hole logs. For this reason, they demand a lot of time-consuming routines for petrophysicists to extract information from it.

This high information density feature of acoustic images makes them perfectly suitable to benefit from artificial intelligence-based techniques. Incorporating such techniques will allow petrophysical interpreters to speed up routine procedures, discover new applications, and extract more knowledge from this data source.

With this project, PETROBRAS intends to foster the incorporation of ML/IA techniques to speed up routine procedures but especially to foment:

the development of new methods to improve classical applications such as identifying and discriminating geological structures (like fractures or vugs) from artifacts (like breakouts) and petrophysical parameter estimation;
the development of new applications based on image logs;
the discovery of new knowledge extracted from image logs.

Contributions

We expect to receive various types of contributions from individuals, research institutions, startups, companies and partner oil operators.

Before you can contribute to this project, you need to read and agree to the following documents:

It is also very important to know, participate and follow the discussions. See the discussions section.

Licenses

All the code of this project is licensed under the Apache 2.0 License and all dataset data files (CSV files in the subdirectories of the dataset directory) are licensed under the Creative Commons Attribution 4.0 International License.

Datasets

All datasets comprise '.csv ' files whose values are expressed in Brazilian numeric format (i.e., the decimal symbol is a colon ',' and the separator is a semicolon ';').

Acoustic amplitude image dataset

To load an acoustic amplitude image data from a given well (for example, Tatu-22), one can use the following Pandas command line:

bsc_data = pd.read_csv('tatu22_IMG.csv',
                        sep = ';',
                        decimal = ',',
                        ...)

Due to size restrictions in the image data files updated on GitHub, the original CSV files had to be split into many subfiles. We provide a Python function to solve this problem:

def concat_IMG_data(well_id, data_path):
    # Due to file size limitations, the original AMP '.csv' file
    # has been split into several sub-files.
    # The concat_IMG_data() function aims to concatenate
    # them back into a single data object.
    #
    # concat_IMG_data() returns 'image_df', a Pandas dataframe
    # indexed by DEPTH information and whose columns are
    # the azimuthal coordinates of the AMP image log.
    
    # Name of the initial '00' file
    initial_file = well_id + "_AMP00.csv"

    # Read the the initial file to capture header information
    initial_file_path = os.path.join(data_path, initial_file)
    image_df = pd.read_csv(initial_file_path,sep = ';',
                           index_col=0,
                           na_values = -9999,na_filter = True,
                           decimal = ',',
                           skip_blank_lines = True).dropna()

    # Read and add data from the remaining files sequentially
    for file in os.listdir(data_path):
        if file.startswith(well_id) and file != initial_file:
            file_path = os.path.join(data_path, file)
            df_temp = pd.read_csv(file_path,sep = ';',
                                  header=None,index_col = 0,
                                  na_values = -9999, na_filter = True,
                                  decimal = ',', skip_blank_lines = True,
                                  dtype=np.float32
                                 ).dropna()
            
            # Adjust tem df's header to match image header
            df_temp.columns=image_df.columns
            
            # Concat dfs
            image_df = pd.concat([image_df, df_temp])
    return image_df

After defining img_data_path and well_identifier, the above function returns the well image log data in a single Pandas dataframe, for example:

# Whole image data
img_data = concat_IMG_data(well_identifier,img_data_path)

Basic logs dataset

To load a basic log data from a given well (for example, TATU-22), one can use the following Pandas command line:

bsc_data = pd.read_csv('tatu22_BSC.csv',
                        sep = ';',
                        decimal = ',',
                        ...)

The chosen nomenclature is as follows:

for image data files: <well_name>_AMP.csv (AMP comes from the amplitude of the acoustic signal captured by the imaging tool. The values in the file express acoustic attenuation measures in dB.)
for basic logging data files: <well_name>_BSC.csv (BSC comes from the word basic). The basic curves present in thee basic dataset are:
- Caliper (CAL)
- Gamma Ray (GR)
- Bulk Density (DEN)
- Neutron Porosity (NEU)
- Sonic Compressional Slowness (DTC)
- Sonic Shear Slowness (DTS)
- Photoelectric Factor (PE)
- NMR Total Porosity(nmrPhiT)
- NMR Effective Porosity (nmrPhie)
- NMR Permeability (nmrPerm)
- NMR Free Fluid (nmrFF)
- Shallow Formation Resistivity (RES10)
- Deep Formation Resistivity (RES90)

It is important to highlight that the caliper log is often used as a data quality indicator.

Missing values

Some isolated curve values, or even the entire DTS curve (COALA-88), are missing in some of the wells. We encourage users to try Statistical and Machine Learning imputation techniques to imput missing values and missing curves.

Jupyter Notebooks (Python)

To illustrate the potential of the dataset, a Jupyter notebook plot_segment_acoustic_image.ipynb is provided showing the basic handling of the image log data and an application for image segmentation based on amplitude value tresholds.

Published work using WAID data

In this section we aim to include an updated list of published papers (from journals or conferences) or other academic/technical works that have used data from this database

Rewbenio A. Frota, Marley M. B. R. Vellasco, Guilherme A. Barreto and Candida M. de Jesus, "Heteroassociative Mapping with Self-Organizing Maps for Probabilistic Multi-output Prediction", 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 2024, pp. 1-6, DOI: 10.1109/IJCNN60899.2024.10650225.
Frota, Rewbenio A., Barreto, G.A., Vellasco, Marley M.B.R., de Jesus, Candida M. (2024). "New Cloth Unto an Old Garment: SOM for Regeneration Learning". In: Villmann, T., Kaden, M., Geweniger, T., Schleif, FM. (eds) Advances in Self-Organizing Maps, Learning Vector Quantization, Interpretable Machine Learning, and Beyond. WSOM+ 2024. Lecture Notes in Networks and Systems, vol 1087. Springer, Cham. DOI: 10.1007/978-3-031-67159-3_1

petrobras / waid

Table of Contents