In this project, we propose to deal with the data scarcity problem in a specific NLP task by harnessing existing annotated datasets from related tasks. Our approach involves training a multi-head architecture concurrently on both the main task and these “supporting” tasks. We experimented this approach on medical NLP tasks and on three GLUE tasks.