GoogleCloudPlatform / training-data-analyst

When following the instructions of https://www.cloudskillsboost.google/course_sessions/11591045/labs/433174 (part of 09 Serverless Data Processing with Dataflow: Develop Pipelines, `Data Engineer Learning Path > Serverless Data Processing with Dataflow: Develop Pipelines

Beam Concepts Review)

Task 5. Write to a sink cites a too short schema:

table_schema = {
        "fields": [
            {
                "name": "name",
                "type": "STRING"
            },
            {
                "name": "id",
                "type": "INTEGER",
                "mode": "REQUIRED"
            },
            {
                "name": "balance",
                "type": "FLOAT",
                "mode": "REQUIRED"
            }
        ]
    }

However if someone digs deep can see

training-data-analyst/quests/dataflow_python/batch_event_generator.py

Line 47 in 989aa2d

    
           log_fields = ["ip", "user_id", "lat", "lng", "timestamp", "http_request", "http_response", "num_bytes", "user_agent"]

log_fields = ["ip", "user_id", "lat", "lng", "timestamp", "http_request", "http_response", "num_bytes", "user_agent"] and consequently the solution file has

    table_schema = {
        "fields": [
            {
                "name": "ip",
                "type": "STRING"
            },
            {
                "name": "user_id",
                "type": "STRING"
            },
            {
                "name": "lat",
                "type": "FLOAT"
            },
            {
                "name": "lng",
                "type": "FLOAT"
            },
            {
                "name": "timestamp",
                "type": "STRING"
            },
            {
                "name": "http_request",
                "type": "STRING"
            },
            {
                "name": "http_response",
                "type": "INTEGER"
            },
            {
                "name": "num_bytes",
                "type": "INTEGER"
            },
            {
                "name": "user_agent",
                "type": "STRING"
            }
        ]
    }

however without peeking into the solution the job fails. The instructions could be updates for better student success.

"Serverless Data Processing with Dataflow - Writing an ETL Pipeline using Apache Beam and Cloud Dataflow (Python)" job fails because of short schema