In this work, we propose a structured pruning method to train Transformer models for neural machine translation efficiently. We construct experiments on two widely-used machine translation datasets, and results show that pruning early before convergence significantly saves total training time while keeping comparable performance.