movie_score_genai_insights

Build a Movie Score prediction and prescription insights with BigQuery and Vertex AI PaLM API.

Dataset For this use case, we will use the movies_data dataset derived from movielens source (https://grouplens.org/datasets/movielens/1m/).

Data to ML

Movies dataset stored in BigQuery
Movie Score / User Rating Prediction Model based on GENRE and RUNTIME with BQML trained
SQL-only prediction of user rating and result analysis
Deploying your model in Vertex AI for a REST endpoint
Storing prediction result in Big Query

STEPS Follow this codelab: https://codelabs.developers.google.com/moviescore-prediction-bqmlsql

Except in step 6, for model creation, use the following query instead:

CREATE OR REPLACE MODEL movies.movies_score_model OPTIONS ( model_type='LOGISTIC_REG', auto_class_weights=TRUE, data_split_method='NO_SPLIT', input_label_cols=['score'] ) AS SELECT name, genre,runtime, score FROM movies.movies_score WHERE data_cat = 'TRAIN';

And in the same step, instead of the PREDICT query in the lab, use the below:

SELECT * FROM ML.PREDICT (MODEL movies.movies_score_model, ( SELECT * FROM movies.movies_rating WHERE data_cat= 'TEST' ) );

To write the predicted result into another BigQuery table, use the following query:

CREATE TABLE movies.predicted_movies_score as ( SELECT * FROM ML.PREDICT (MODEL movies.movies_score_model, ( SELECT * FROM movies.movies_rating WHERE data_cat= 'TEST' ) ) );

Data to Generative AI

Analyze other factors besides genre and runtime that are influencing the predicted movie rating and summarize it with Generative AI using text-bison (latest) model using only sql queries a. The table with the predicted results from the Data to ML step is the input for this b. BigQuery GENERATE_TEXT construct will be used to invoke the PaLM API remotely from Vertex AI c. External Connection will be created to establish the access between BigQuery ML and Vertex services.

STEPS Follow this codelab starting from step 6 (because we already have our dataset ready): https://codelabs.developers.google.com/llm-codesummarizer-sqlonly

Except for step 8: Instead of the query given there, use this:

CREATE OR REPLACE MODEL movies.llm_model REMOTE WITH CONNECTION us-central1.bq_llm_connection OPTIONS (remote_service_type = 'CLOUD_AI_LARGE_LANGUAGE_MODEL_V1');

_SKIP step 9.

Except for step 10: Instead of the query given there, to use the PaLM API to perform LLM on the dataset use the following query:_

SELECT * FROM ML.GENERATE_TEXT( MODEL movies.llm_model, ( SELECT CONCAT('FROM THE FOLLOWING TEXT ABOUT MOVIES, WHAT DO YOU THINK ARE THE FACTORS INFLUENCING A MOVIE SCORE TO BE GREATER THAN 5?: ', movie_data) AS prompt FROM ( SELECT REPLACE(STRING_AGG( CONCAT('A movie named ',name, ' from the country ', country, ' with a censor rating of ',rating, ' and a budget of ', budget, ' produced by ', company, ' with a runtime of about ', runtime, ' and in the genre ', genre, ' starring ', star, ' has had a success score of ', score, '') ), ',','. ') AS movie_data FROM ( SELECT * FROM abis-345004.movies.movies_rating WHERE CAST(SCORE AS INT64) > 5 LIMIT 50) ) AS MOVIES), STRUCT( 0.2 AS temperature, 100 AS max_output_tokens, TRUE AS flatten_json_output));

This is what your result should look like:

Based on the given information, the following factors seem to influence a movie score to be greater than 5:

Genre: All the movies with a success score greater than 5 belong to the crime genre
Censor Rating: All the movies with a success score greater than 5 have a censor rating of R...

AbiramiSukumaran / movie_score_genai_insights

movie_score_genai_insights

About