aarong1 / open-carbon-viz

Building LLMs on top of Verra project design documents

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

open-carbon-viz

Description

open-carbon-viz is an open-source project aimed at generating interactive visualizations of the voluntary carbon markets. It combines the power of web scraping, Streamlit, and large language models (LLMs) like OpenAI's GPT-3.5 Turbo and Anthropic's Claude-2 to analyze and rate carbon offset projects listed under Verra.

Screenshot of Streamlit Demo 2

Screenshot of Streamlit Demo 1

Features

  • Data Extraction: The fetch_and_rate.py script fetches project details and Project Design Document (PDD) PDFs directly from the Verra registry.

  • PDF Analysis: The tool evaluates PDDs, extracting valuable insights to assess and rate different aspects of each project.

  • OpenAI Integration: The fetch_and_rate.py script uses OpenAI's GPT-3.5 Turbo model to interpret and analyze fetched JSON data.

  • Anthropic Integration: It leverages Anthropic's Claude-2 model to critically rate each section of the PDD and score it out of 10.

  • Interactive Visualizations: The visualisation.py Streamlit app provides an intuitive interface for visualizing and understanding the analyzed data. It offers various visualizations including pie charts, bar charts, scatter plots, heatmaps, and tables, along with multiple filters for selective data exploration.

Leveraging Language Models for Project Rating

The open-carbon-viz project utilizes powerful language models (LLMs), such as OpenAI's GPT-3.5 Turbo and Anthropic's Claude-2, to generate ratings by analyzing the Project Design Document (PDD) PDFs obtained from the Verra Registry. These LLMs have been trained on a vast amount of textual data and can understand and generate human-like responses.

Understanding the Process

  1. PDF Analysis: The fetch_and_rate.py script fetches the PDD PDFs from the Verra Registry. These PDFs contain detailed information about the carbon offset projects. The script extracts textual data from the PDFs using techniques such as optical character recognition (OCR).

  2. Interpreting the Data: The extracted textual data is then fed into the GPT-3.5 Turbo LLM. The LLM uses natural language processing capabilities to interpret and understand the contents of the PDD. It identifies key sections and extracts important information from the text.

  3. Rating Criteria: The LLM is programmed to analyze each section of the PDD based on predefined rating criteria. For example, the project details section may be evaluated for completeness, inclusion of key details, and accuracy of information. The safeguards section may be assessed for adherence to environmental impact assessments and stakeholder consultation protocols.

  4. Scoring and Comments: The LLM assigns a score to each section based on its analysis. These scores reflect the quality and compliance level of each aspect. Additionally, the LLM generates comments that provide insights and suggestions for improvement in areas where the projects may fall short.

  5. Anthropic Integration: In addition to the ratings generated by the GPT-3.5 Turbo LLM, the project incorporates Anthropic's Claude-2 model. Claude-2 critically evaluates each section of the PDD and assigns a score out of 10 based on a rigorous analysis.

  6. Overall Score and Comments: The individual section scores are combined to calculate an overall score for the project. This score provides an overall assessment of the project's quality and compliance. Along with the overall score, the LLM-generated comments help in understanding the strengths and weaknesses of the project.

Benefits and Considerations

Using LLMs for generating project ratings offers several benefits:

  • Efficiency: The automated analysis provided by LLMs significantly reduces the manual efforts required for evaluating and scoring projects.

  • Consistency: LLMs ensure a consistent evaluation process by applying the predefined rating criteria uniformly across all projects.

  • Insights and Suggestions: The LLM-generated comments provide valuable insights and suggestions for improving project documentation and adherence to standards.

However, it is important to note that LLMs have their limitations. They rely on the data they were trained on and may not capture certain nuances or context-specific considerations. Therefore, it is essential to review and validate the ratings and comments generated by the LLMs to ensure accuracy and informed decision-making.


Please refer to the sample ratings.csv to see an example of the ratings generated by the LLMs for the projects. These ratings and comments can assist users in understanding the strengths and areas for improvement of each project listed in the Verra Registry.

Installation & Setup

  1. Clone the repository:
    git clone https://github.com/ttwj/open-carbon-viz
    
  2. Navigate into the cloned directory:
    cd open-carbon-viz
    
  3. Install the necessary packages:
    pip install -r requirements.txt
    

Fetching Data from Verra Registry

To fetch the data from Verra Registry yourself, follow these steps:

  1. Visit the Verra Registry website at registry.verra.org.
  2. Navigate through the website to search for the specific project details or Project Design Document (PDD) PDFs you are interested in.
  3. Explore the different search options and filters available on the website to narrow down your search.
  4. Once you have identified the project or documents you want to fetch, click on the relevant links to access the details or download the PDFs.
  5. You can save the fetched data or PDFs for further analysis or integration into your own application or project.

Please note that fetching data directly from the Verra Registry requires compliance with the Verra Registry Terms of Use and any applicable terms and conditions set by Verra. Make sure to review and understand the terms and conditions before accessing and using the data from the Verra Registry.

Remember to always respect the Verra Registry's terms and policies and use the fetched data responsibly and in accordance with applicable laws and regulations.

Usage

Fetch and Rate

To run the Fetch and Rate script:

python fetch_and_rate.py

This will generate a CSV file with the fetched data from the PDDs.

Visualization App

To run the Streamlit web app:

streamlit run visualisation.py

This will open a web browser where you can interact with the dashboard.

Contributing

open-carbon-viz is open-source and contributions are welcome. If you want to contribute to this project, kindly follow these steps:

  1. Fork the project.
  2. Create your feature branch: git checkout -b feature/myFeature
  3. Commit your changes: git commit -m 'Add myFeature'
  4. Push to the branch: git push origin feature/myFeature
  5. Open a Pull Request.

Please make sure to update tests as appropriate.

License

open-carbon-viz is licensed under the MIT license.

Disclaimer

The data obtained from Verra Registry is provided for informational purposes only. The open-carbon-viz project and its contributors do not guarantee the accuracy, completeness, or timeliness of the data. The use of this data is at your own risk. The project does not endorse or promote any specific project or carbon offset listed on Verra Registry. You should conduct your own due diligence and research before making any decisions based on the data obtained from Verra Registry. The open-carbon-viz project and its contributors shall not be liable for any losses, damages, or claims arising out of or in connection with the use of this data.

About

Building LLMs on top of Verra project design documents

License:Other


Languages

Language:Python 100.0%