HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

Home Page:https://labelstud.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about Label Studio DB

jrobbl opened this issue · comments

Maybe a naive question but since label studio is a web application I was wondering if the files I load into a project are stored locally or to an external server. Also want to ask what is the DB structure.

Thanks a lot!

hey @jrobbl thanks for the question.

Questions like this are perfect for our Discourse

To answer your question:

Label Studio can handle data storage in different ways depending on your configuration:

  • Local Storage: You can store files locally on the server where Label Studio is running. This is typically used for smaller projects or testing purposes. To set this up, you need to configure the LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT and LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED environment variables. More details can be found here.
  • Cloud Storage: For larger projects or when dealing with sensitive data, it's recommended to use cloud storage solutions like AWS S3, Google Cloud Storage, or Azure. Label Studio will generate pre-signed URLs to access this data, ensuring that it does not reside on the Label Studio server. This setup allows you to keep your data secure and accessible only through encrypted connections.
    Regarding the database structure:
  • SQLite: By default, Label Studio uses SQLite for storing project data and configurations. This is suitable for small projects and demos. All data is stored in a single file in the specified directory of the admin user.
  • PostgreSQL: For larger projects with more than 100,000 tasks or more than five users, it's recommended to use PostgreSQL. This setup provides better performance and scalability. You can configure PostgreSQL by setting the appropriate environment variables (POSTGRE_NAME, POSTGRE_USER, POSTGRE_PASSWORD, POSTGRE_PORT, POSTGRE_HOST).
  • Data Storage: Project settings and configuration details are stored in Label Studio's internal database. Input data (texts, images, audio files) is hosted by external data storage and provided to Label Studio using URI links. The data is not stored in Label Studio directly; it is retrieved client-side only. Project annotations are stored in the internal database and can optionally be stored in a local file directory, a Redis database, or cloud storage buckets.

For more detailed information, you can refer to the Label Studio Security Guide.

Feel free to move the discussion over to Discourse or our Slack!