levonk / interview-data-engineer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

interview-data-engineer

Congratulations you've gotten past the recruiter, the resume screen and phone screen. Not many people do! We're very interested in getting to know you a little more!

Background

The desire is to understand how you approach the problem. It is deliberately light on specifics as the intention is to see how the interviewee would solution. We already have the below process in production. The intention is to:

  • Provide clarity on the more simple types of requests that the data engineer may work on
  • Save time on the part of the interviewee
  • Save time on the part of the interviewer
  • Permit the interviewee to use the development environment that they may be comfortable with
  • Provide a rapid way to assess the easily quantifiable tactical skills of the interviewee
  • Provide a normalized understanding of the candidates coding skills

Guidelines

  • If you received this, or any details about this, prior to it being assigned to you from a full time employee representative of the company you're interviewing at then please notify the interviewer before starting. Agents of the company (consultants, 3rd-party recruiters, etc...) are not considered full time employees.
  • Do not fork this repository if you will be providing an answer.
  • If you think this is going to take a significant amount of your time, don't do this.
  • Do not assume that you should only use tools/services that can run within your IDE. The preference is to identify what production level tools/libraries/services you would recommend.
  • The team currently uses Eclipse and IntelliJ, use what you want (and document) but, don't make it hard for your teammates to build your project.
  • Submit the work via GitHub
  • If there are any improvements that you would make to this Epic around the requirements or your implementation, feel free to make them or add them via a Readme or inline comments.

Epic:

As an Interviewer I need a highly reliable update of tweets on a query term as frequently as possible with parts of speech tagging so I may have a comparable code sample.

Acceptance Criteria:

  1. The interviewee should select an interesting query term and parameterize it.
  2. The process is highly reliable, comment/code in the reliability techniques
  3. The process is highly scalable, comment/code to how the process is scalable
  4. The data is extremely fresh, comment on how it will be kept fresh
  5. The code is maintainable
  6. The analyst should get a wide table with parts of speech tagging
  7. Anonymize the name of the tweeter
  8. Please communicate what services you would choose
  9. Please communicate what references (people, docs, etc...) you used
  10. Please communicate what you used for a development environment
  11. Please communicate how much time it took
  12. Please complete the work using Java
  13. Put the data in S3
  14. Create the table in Hive
  15. This project is to be maintained in a team environment use techniques that are conducive to it.

Next steps

Your recruiter will contact you with the next steps. We're also always interested in feedback, so please let your recruiter know if you have any thoughts regarding the process or the question. Thanks for your continued interest in joining us!

About