Guide to Awesome Machine Learning Projects
As a content creator and educator, I am constantly looking for awesome projects that I find useful and share them with the broader community. I am not the only one doing this. There are lots of people that share fun projects that they find interesting and useful. This is how projects go viral and gain lots of visibility. From my observation, there are a few components that make certain machine learning projects stand out from the rest. If your goal is to build a portfolio or create impactful and unique projects for the community, here are a few areas you can focus on to make your projects compelling and stand out from the rest.
Building projects is sometimes the easy part. Creating a strong messaging around it is perhaps the most difficult part due to the large number of projects fighting for attention these days. One of the first things you should be doing before starting a machine learning project is to identify what makes your project impactful, unique, and what really is the main purpose of it. This could be a well-written impact statement or just sharing your reasons on why the project matters. Is the project just about educating others about a particular machine learning method/feature? Or is it more specific like solving a challenging and unique problem using a new technique? Tell your audience about the purpose of your project. Build that connection and motivate your project. Build a good messaging around it. You are not selling, you are informing and educating.
I like projects that are usable and quickly accessible. What does this mean? Imagine you have developed a new text classification approach and want others to better understand how useful it is. Just having an example notebook with 100s of lines of code is probably not going to make it the most usable and accessible project. What you would want to do is not only to provide the notebook but also to provide a complete library that others can easily install on their computers that enables them to explore your project. Python allows you to do this easily but other languages work just as well. Make sure to provide instructions on how to use the project/library (we will talk more about this in an upcoming section). In fact, I implore you to be more ambitious and create an online demo accompanying the project. Later on, I will talk about visibility and how demos can help. These tips all go hand in hand. The easier you make it for someone to use your project, the quicker they find how impactful and useful it is. Quick adoption helps to project a huge return on your investment.
Not only should you aim to make your project usable to stand out, but it also has to be highly accessible to be successful. What do I mean by that? One good example is to create an online demo as I said earlier as this makes it easy for others to access your project. But there are other important things you should be thinking about. Very often we tend to ignore the fact that not all our users are going to have the same means or ways to access your project. Think about other ways to make your project more accessible. Things like translations, metrics, visualizations, and audio recordings are also important to consider. For instance, some users may not be so comfortable reading what your project is about (maybe because of some disability or lack of technical expertise), so in that case, maybe you can record an audio/video clip that briefly and clearly explains your project and what it is about. The more you increase the accessibility of your project, the more potential it has to become highly impactful and gain the visibility you want.
Nowadays, it is simply not enough to build a useful project that users find interesting to play with for a few minutes. If you want your project to stick, you should initially be focusing on a unique problem that your project aims to solve. This should have already been clear if you addressed the “Purpose” section of this guide. There are so many similar projects that it makes it really hard for your project to stand out. For instance, I cannot tell you how many image classifiers I have come across—potentially thousands of them. I am always looking for a surprise factor in these projects. If I came across an image classifier that provides me interpretability functionalities, that’s something I will be willing to explore a bit further—there are not so many of these online. Ideally, you want to set your project objectives before starting it and ensure to conduct extensive research to identify key and unique ways it is contributing to the community.
One of the main problems with machine learning projects these days is that the developers forget to address the presentation aspect of it. I think it’s easily a missed opportunity. You should always be thinking about how you present your project to an audience. In addition to all the tips I have discussed so far, you need to think about how you want to package and present your projects. For instance, if you are publishing your project on GitHub, which you should definitely do, you can improve its presentation by including a very clean, clear, concise README file. I am not exaggerating when I say that the majority of machine learning projects that I come across don’t care or put effort towards presentation, and in fact don’t even include a README for that matter. That’s bad! It doesn’t say good things about the seriousness and professionalism you are trying to project with your projects. I may be going on a limb here, but most of the successful machine learning projects I have across have excellent and well-written README files, including other ways to improve the presentation of the project.
The truth of the matter is that the majority of machine learning projects eventually die. Your goal is to make your projects interesting enough that others start to care about its sustainability. Only the best projects survive and you just never know where yours will take you. With so many open-source enthusiasts out there, there is a good opportunity to attract collaborators to help keep building and maintaining your project. Make sure you provide more information about maintenance cycles and future improvements. Try to provide guidance on how others can contribute to your projects, even if it is to just improve a certain function or something like that. Try not to ask for minor improvements like editing your README file. This doesn't encourage any good practice in the community. Ideally, you want to provide more guidance about major improvements needed like optimizing the speed at which data is read, etc.
When I think about maintenance I also think you should not only provide regular updates about your projects but also help the community to respond to issues and questions. Typically, when I find projects that have been modified 5 months ago and include several unanswered open issues, this tells me a lot about the maintenance and projected sustainability of the project. I will think hard about sharing a project like this just because it’s probably outdated already. If you think it makes sense, create a free slack or discord group where people can reach out and ask questions directly.
Documentation is a huge part of the messaging and packaging of your project. What’s the point of publishing a project if there are no instructions on how to use it. Given all the sections I discussed before, at this point you start to notice a pattern. Messaging is huge! It’s not easy. You have to be clear and concise in your messaging. People that are looking for interesting projects are spending less than 30 seconds on your project and if they don’t see neat documentation or something else that hooks them, it’s sad news for you and your project. Even if you consider your projects to be a small one, you should think about how you expect others to use it and better provide guidance around it. For example, if you have built a complete Python library, try to provide clear and easy examples on how to use the library, including how to install it, run it, and providing examples of the expected inputs/outputs. If you are building an API, you need to clearly explain all the functionalities and behaviors. In some cases, you may even need to provide a documentation website but for most small projects this is probably not necessary. Regardless, you should definitely consider full examples that guide the user from start to finish. In my opinion, notebooks are great but they don’t serve as good ways to provide documentation about your machine learning projects.
Not only do we want our machine learning projects to stand out, but we also want these projects to be easily accessible and searchable. The great thing about the internet is that there are many easy ways to actually build more visibility for your project. Besides making your projects more presentable, think about ways you can improve the searchability/visibility of your projects. You can try to share a GitHub repo with your friends on a group chat or Slack group. Just make sure you have a great README and you already thought about and addressed all of the components I wrote about here before sharing your project. Write a nice blog post about your project and publish it. Share on websites like Reddit, Made with ML, Hacker News, and Twitter. The more places you share your projects, the more visibility you are giving it, and the more searchable/visible it becomes.
That’s it! Hope you find this guide helpful. I am going to regularly maintain it as I come across more ideas on how to improve your machine learning projects. I also welcome any feedback (just open an issue). Feel free to fork this repo and use this guide as a checklist for your next big machine learning project. Wish you all the best!
How you can contribute to this guide?
- Add more components that in your experience help projects stand out
- It will be great to add more examples to each section