xyqyear / osu-beatmap-xplorer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OSU Beatmaps Scraper and API

Overview

This project is a web application that interacts with the OSU API to retrieve information about OSU Beatmaps and stores the data in a SQLite database. Additionally, it exposes an API endpoint for retrieving random beatmaps from the collected data.

The application is designed to be run continuously, with a regular update mechanism that pulls new data from the OSU API every hour, ensuring the database stays up to date with the latest available beatmaps. The application also checks for duplicates before storing the data, reducing unnecessary data storage and retrieval.

Key Features

  1. Scraping OSU API: The application authenticates and communicates with the OSU API to fetch details about the beatmaps.

  2. Storing Data: Fetched data is stored in a SQLite database for persistence. The application uses a structured data model that includes relevant information about the beatmaps.

  3. Periodic Updates: The application automatically re-fetches data from the OSU API every hour to update the database with any new beatmaps that have been added.

  4. Exposing API: The application provides a RESTful API that returns a set of random beatmaps from the database. The number of beatmaps to be returned can be specified in the request.

Running the Application

To run this application, make sure you have Python 3.11 and Poetry installed. Then, install the dependencies using Poetry:

poetry install --no-dev

Please note that this script requires a config.yml file in the same directory containing OSU client_id and client_secret for authenticating with the OSU API. Example config file:

client_id: <client_id>
client_secret: <client_secret>

Once the dependencies are installed, you can run the application using the following command:

poetry run serve

Accessing the API

The application API can be accessed by sending a GET request to http://localhost/random_beatmaps/<num_beatmaps>, where <num_beatmaps> is the number of random beatmaps you wish to retrieve.

You can apply filters to your request by sending them in the body of the HTTP request. The body should contain a JSON array of filter criteria, each one being an object following the format { "type": <field>, "compare": <operator>, "value": <value> }. The <field> is the name of the field you want to filter by, <operator> is the comparison operator, and <value> is the value you want to compare the field against.

Supported operators are:

  • = for exact equality,
  • > for greater than,
  • < for less than,
  • >= for greater than or equal to,
  • <= for less than or equal to,
  • ~ for text field contains.

Filter fields can be divided into three categories:

  • Beatmapset fields: These are fields that belong to the beatmapsets table.
  • Beatmap fields: These are fields that belong to the beatmaps table.
  • Text fields: This is a special type of field that operates on multiple fields of the beatmapsets table, which includes 'artist', 'artist_unicode', 'creator', 'source', and 'tags'.

Here are some examples of filters:

[
  { "type": "mode_int", "compare": "=", "value": 0 },
  { "type": "difficulty_rating", "compare": ">", "value": 5 },
  { "type": "difficulty_rating", "compare": "<", "value": 6 },
  { "type": "text", "compare": "~", "value": "maimai" }
]

This will retrieve beatmaps with mode_int exactly 0, difficulty_rating between 5 and 6, and any text field contains "maimai".

[{ "type": "bpm", "compare": ">=", "value": 180 }]

This will retrieve beatmaps with bpm greater than or equal to 180.

[{ "type": "text", "compare": "~", "value": "pop" }]

This will retrieve beatmaps where any of the text fields (artist, artist_unicode, creator, source, tags) contains the word "pop".

In order to use these filters, they need to be sent in the body of your HTTP GET request as a JSON object.

To perform a request with curl, it would look like the following:

curl -X GET -H "Content-Type: application/json" -d '[{"type": "bpm", "compare": ">=", "value": 180}, {"type": "text", "compare": "~", "value": "pop"}]' http://localhost/random_beatmaps/50

This will return up to 50 beatmaps with a bpm of at least 180 and contain the word "pop" in any of their text fields.

TODO

  • basic functionality
  • use aiosqlite instead of sqlite3 so that database interactions are async and don't block the event loop
  • don't crash the program when retry limit is reached
  • add filtering options to the API
  • mitigate SQL injection attacks
  • solve needlessly authenticating every time it scrapes
  • solve double import when using python -m to run the application

* README is partially generated by GPT4.

About

License:MIT License


Languages

Language:Python 94.9%Language:Dockerfile 5.1%