ianyu93 / datahour-gen-ai-nlp-chaining

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Hour Gen AI NLP Chaining

Overview

This repository contains the code for the DataHour session on AI NLP Chaining, tailored for beginner to intermediate practitioners. In this README, we will document concepts in this repository that are not covered in the session. See:

NLP

Basics

This session briefly touched on Huggingface Tasks. To further your NLP understanding, it's recommended to go through these three courses:

DevOps Related

Dev Containers

This repository was developed with dev containers. Dev containers are a way to develop in a containerized environment. This allows for a consistent development environment across different machines.

To use dev containers on your machine, you need to have Docker installed. You can install Docker here. Please watch this YouTube video: Get Started with Dev Containers in VS Code

You can also develop from your browser with Codespace, which is a feature of GitHub. You can learn more about Codespaces here. You can also watch this YouTube video: Codespaces configuration with dev containers.

If you're unfamiliar with Docker, you can watch the following YouTube videos:

GitHub Actions and Workflows

In this repository, we have just one GitHub Action workflow. It is located in .github/workflows/requirements.yml. It is a workflow that would export Python dependencies to a requirements.txt via Poetry, as this repository uses Poetry for dependency management. It is also required for Streamlit Cloud to deploy the app. We export requirements.txt with an --with dev option to include dev dependencies for Codespace, but this is not needed for production.

It is tied to the concept of Continuous Integration and Continuous Deployment (CI/CD). CI/CD is a DevOps practice that allows for the automation of the build, testing, and deployment of software. You can learn more about CI/CD here, or watch this YouTube video: The IDEAL & Practical CI / CD Pipeline - Concepts Overview.

Due to time constraints, this repository doesn't include common workflows, but some of the things you'd want to think about include:

  • Linting
  • Testing
  • Deployment

You can learn more about GitHub Actions here. You can also watch this YouTube video: GitHub Actions Tutorial - Building Your First CI/CD Pipeline.

Post Create Commands

In Dev Containers, we use postCreateCommand to install requirements.txt globally in the container. If you'd like to continue development in Poetry, however, you can follow Using Python and Poetry inside a Dev Container and post create script to initiate with Poetry environment instead.

Streamlit

Basics

Streamlit is a Python library that allows you to build interactive web apps. You can learn more about Streamlit here. You can also watch this YouTube video: Build 12 Data Science Apps with Python and Streamlit - Full Course.

A common alternative is Gradio, which is a Python library that allows you to build interactive web apps. You can learn more about Gradio here.

Personally, I find Gradio to be the fastest if you are looking for a simple input and output demo, but Streamlit with more customizable options. That being said, both are not good options for serious web development. They are popular among data scientists and machine learning practitioners because they cater to the needs of data scientists and machine learning practitioners for quick demos.

Deployment

Streamlit Cloud is a service that allows you to deploy your Streamlit app. You can learn more about Streamlit Cloud here. Read documentation:

You can also deploy it on Huggingface Spaces, which is a service that allows you to deploy your Huggingface models. You can learn more about Huggingface Spaces here.

Advanced Features

Here are some key features worth mentioning:

About

License:MIT License


Languages

Language:Jupyter Notebook 63.8%Language:Python 33.0%Language:Dockerfile 3.1%Language:Shell 0.2%