COMPLETE DATASET CAN BE FOUND HERE: https://www.kaggle.com/sohier/calcofi The CalCOFI data set represents the longest (1949-present) and most complete (more than 50,000 sampling stations) time series of oceanographic and larval fish data in the world. Description: Focus on predicting water temperature. The predictors are salinity, oxygen, phosphate, silicate, nitrate and nitrite, chlorophyll, transmissometer, PAR, C14 primary productivity, phytoplankton biodiversity, zooplankton biomass, zooplankton biodiversity, etc. The dataset has 74 columns and 864863 observations. After removing columns with missing values: n = 864863 and p = 40. There are 6 categorical predictors and 34 numerical predictors. CLEARED DATA included as cvs file in repository named DF with n = 1300 and p = 40