jaehyeon-kim / emr-remote-dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

EMR Remote Development

Develop and Test Apache Spark Apps for EMR Remotely Using Visual Studio Code

  • When we develop a Spark application on EMR, we can use docker for local development or notebooks via EMR Studio (or EMR Notebooks). However the local development option is not viable if the size of data is large. Also I am not a fan of notebooks as it is not possible to utilise the features my editor supports such as syntax highlighting, autocomplete and code formatting. Moreover it is not possible to organise code into modules and to perform unit testing properly with that option. In this post, We will discuss how to set up a remote development environment on an EMR cluster deployed in a private subnet with VPN and the VS Code remote SSH extension. Typical Spark development examples will be illustrated while sharing the cluster with multiple users. Overall it brings another effective way of developing Spark apps on EMR, which improves developer experience significantly.

About


Languages

Language:HCL 85.4%Language:Python 8.8%Language:Shell 5.8%