1 Column missed in s_web_returns
eisenwang opened this issue · comments
Dear all,
When I'm preparing the refresh data, I created the source table with tpcds_source.sql. In this sql file, there's 1 table named s_web_returns. The statement is --
create table s_web_returns
(
wret_web_site_id char(16) ,
wret_order_id integer not null,
wret_line_number integer not null,
wret_item_id char(16) not null,
wret_return_customer_id char(16) ,
wret_refund_customer_id char(16) ,
wret_return_date char(10) ,
wret_return_time char(10) ,
wret_return_qty integer ,
wret_return_amt numeric(7,2) ,
wret_return_tax numeric(7,2) ,
wret_return_fee numeric(7,2) ,
wret_return_ship_cost numeric(7,2) ,
wret_refunded_cash numeric(7,2) ,
wret_reversed_charge numeric(7,2) ,
wret_account_credit numeric(7,2) ,
wret_reason_id char(16)
);
But while creating the fact view wrv on it, we can see an error -- Column 'wret_web_page_id' not found, and by check for wrv's statement --
CREATE VIEW wrv AS
SELECT
d_date_sk wr_return_date_sk
,t_time_sk wr_return_time_sk
,i_item_sk wr_item_sk
,c1.c_customer_sk wr_refunded_customer_sk
,c1.c_current_cdemo_sk wr_refunded_cdemo_sk
,c1.c_current_hdemo_sk wr_refunded_hdemo_sk
,c1.c_current_addr_sk wr_refunded_addr_sk
,c2.c_customer_sk wr_returning_customer_sk
,c2.c_current_cdemo_sk wr_returning_cdemo_sk
,c2.c_current_hdemo_sk wr_returning_hdemo_sk
,c2.c_current_addr_sk wr_returing_addr_sk
,wp_web_page_sk wr_web_page_sk
,r_reason_sk wr_reason_sk
,wret_order_id wr_order_number
,wret_return_qty wr_return_quantity
,wret_return_amt wr_return_amt
,wret_return_tax wr_return_tax
,wret_return_amt + wret_return_tax AS wr_return_amt_inc_tax
,wret_return_fee wr_fee
,wret_return_ship_cost wr_return_ship_cost
,wret_refunded_cash wr_refunded_cash
,wret_reversed_charge wr_reversed_charge
,wret_account_credit wr_account_credit
,wret_return_amt+wret_return_tax+wret_return_fee
-wret_refunded_cash-wret_reversed_charge-wret_account_credit wr_net_loss
FROM s_web_returns LEFT OUTER JOIN date_dim ON (cast(wret_return_date as date) = d_date)
LEFT OUTER JOIN time_dim ON ((CAST(SUBSTR(wret_return_time,1,2) AS integer)*3600
+CAST(SUBSTR(wret_return_time,4,2) AS integer)*60+CAST(SUBSTR(wret_return_time,7,2) AS integer))=t_time)
LEFT OUTER JOIN item ON (wret_item_id = i_item_id)
LEFT OUTER JOIN customer c1 ON (wret_return_customer_id = c1.c_customer_id)
LEFT OUTER JOIN customer c2 ON (wret_refund_customer_id = c2.c_customer_id)
LEFT OUTER JOIN reason ON (wret_reason_id = r_reason_id)
LEFT OUTER JOIN web_page ON (wret_web_page_id = WP_WEB_PAGE_id)
WHERE i_rec_end_date IS NULL AND wp_rec_end_date IS NULL;
But there's no wret_web_page_id column defined in s_web_returns. So would you please have a look and also modify the refresh data generator to add this column?
Thanks in advance for your help.
Regards
Eisen
And also puzzled on the difference between this refresh data list and the official document list --
In "TPC BENCHMARK ™ DS Standard Specification Version 2.3.0" Table 5-4, we can see -- only s_catalog_returns, s_catalog_sales, s_inventory, s_store_returns, s_purchase_lineitem, s_web_returns, s_web_order_lineitem -- 7 source tables. But in refresh data, there are 21 tables --
asiq160@bigDataIQ:/iq_data/dump/refresh> ls s_*.dat
s_call_center_1.dat s_customer_address_1.dat s_store_1.dat s_web_returns_1.dat
s_catalog_order_1.dat s_inventory_1.dat s_store_returns_1.dat s_web_site_1.dat
s_catalog_order_lineitem_1.dat s_item_1.dat s_warehouse_1.dat s_zip_to_gmt_1.dat
s_catalog_page_1.dat s_promotion_1.dat s_web_order_1.dat
s_catalog_returns_1.dat s_purchase_1.dat s_web_order_lineitem_1.dat
s_customer_1.dat s_purchase_lineitem_1.dat s_web_page_1.dat
And in the refresh data, there's no s_catalog_sales data file either...
Would you please tell me what use for the other source tables and how to fix s_catalog_sales data file loss? Thanks
Regards
Eisen
Looks like the issue is that tpcds_source.sql
has the first column of s_web_returns
as wret_web_site_id
, but the spec has it as wret_web_page_id
. See page 97 Table A-18: Column definition s_web_returns.
Dear Gregrahn,
Thanks a lot. Now the wrv is right. But still no s_catalog_sales refresh data generated... Please kind help. Thanks
Regards
Eisen
It looks like Table 5-4 in the spec has an error in the name s_catalog_sales
-- this should be s_catalog_order
since the view csv
contains the table s_catalog_order
and Table 5-4 is the only occurrence of s_catalog_sales
in the entire spec. Also see Appendix A. noting the cord_*
columns referenced in the csv
view belong to table s_catalog_order
.
Oh... Got it. Thank you very much