"Serverless Data Processing with Dataflow - Writing an ETL Pipeline using Apache Beam and Cloud Dataflow (Python)" job fails because of short schema
MrCsabaToth opened this issue · comments
When following the instructions of https://www.cloudskillsboost.google/course_sessions/11591045/labs/433174 (part of 09 Serverless Data Processing with Dataflow: Develop Pipelines
, `Data Engineer Learning Path > Serverless Data Processing with Dataflow: Develop Pipelines
Beam Concepts Review)
Task 5. Write to a sink
cites a too short schema:
table_schema = {
"fields": [
{
"name": "name",
"type": "STRING"
},
{
"name": "id",
"type": "INTEGER",
"mode": "REQUIRED"
},
{
"name": "balance",
"type": "FLOAT",
"mode": "REQUIRED"
}
]
}
However if someone digs deep can see
log_fields = ["ip", "user_id", "lat", "lng", "timestamp", "http_request", "http_response", "num_bytes", "user_agent"]
and consequently the solution file has
table_schema = {
"fields": [
{
"name": "ip",
"type": "STRING"
},
{
"name": "user_id",
"type": "STRING"
},
{
"name": "lat",
"type": "FLOAT"
},
{
"name": "lng",
"type": "FLOAT"
},
{
"name": "timestamp",
"type": "STRING"
},
{
"name": "http_request",
"type": "STRING"
},
{
"name": "http_response",
"type": "INTEGER"
},
{
"name": "num_bytes",
"type": "INTEGER"
},
{
"name": "user_agent",
"type": "STRING"
}
]
}
however without peeking into the solution the job fails. The instructions could be updates for better student success.