Job aborted due to stage failure

Question

Job aborted due to stage failure

NamrataRade opened this issue 6 months ago · comments

Error-org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 150.0 failed 4 times, most recent failure: Lost task 1.3 in stage 150.0 (TID 426) (172.30.123.114 executor 6): java.lang.NullPointerException
sample code:
from pyspark.ml.clustering import KMeans
kmeans = KMeans(k=4, seed=1)
model = kmeans.fit(df_pca.select("features"))
df_cluster_out = model.transform(df_pca)

df_pca:
unique id features
1 (2,[],[])
2 (2,[],[])
3 [0.4,0.8]
I have performed scaling+pca before applying to kmeans. Please provide some guidance.