[DMP 2024]: Clustering large amount of audio

Question

[DMP 2024]: Clustering large amount of audio

dennyabrain opened this issue 7 months ago · comments

Ticket Contents

Description

Feluda allows researchers, factcheckers and journalists to explore and analyze large quantity of multimeda content. One important modality on Indian social media is audio. The scope of this task is to explore various automated techniques suited for this grouping similar audio together and visualizing them. After consultation with the team, implement an end to end workflow that can be used to surface visual or temporal trends in a large collection of audio.

Goals

Review Literature with our team and do research and prototyping to review state of the art ML and classical DSP techniques
Optimize the solution for consistent RAM and CPU usage (limit the spikes caused by variables like file size, video length etc) since it will need to scale up for million videos.
Integrate the solution into Feluda by creating a operator that adheres to Feluda operator's interface

Expected Outcome

Feluda's goal is to provide a simple CLI or scriptable interface for Analysing multimodal social media data. In that vein, all the work that you do should be executable and configurable via scripts and config files. The solution should look at feluda's architecture and its various components to identify best ways to enable this.
The solution should have a way to configure data source (database with file IDs or a S3 bucket with files), specify and implement the data processing pipeline and where the result will be stored. Our current implementation uses S3 and SQL database for data source and Elasticsearch for storing result but additional sources or stores can be added if apt for this project.

Acceptance Criteria

Regular Interactive Demos with the team using a public jupyter notebook pushed to our experiments repository
Working feluda operator with tests that can be run as an independent worker in the cloud to schedule processing jobs over a large dataset
Output Structured data that can be passed onto a UI service (web or mobile) for downstream use cases

Implementation Details

One way we have approached this is by using Vector Embeddings. We have done this to great success to surface visual trends in Images. We used ResNet model to generate vector embeddings and store them in elasticsearch. We also used t-sne to reduce the dimensions of the vector embeddings to then display them in a 2D visualization. It can be viewed here
A detailed report over feluda's usage in a project to analyze images can be read here
The relevant feluda operator can be studied here
The code for tsne is here
A prior study of various ways to get insights out of images has been documented here

Mockups/Wireframes

This is an interactive visualization of Image clustering done using Feluda.

Doing UI development or integrating with any UI software is not part of this project but it might help to see what sort of downstream applications we use Feluda for.

Product Name

Feluda

Organisation Name

Tattle

Domain

Open Source Library

Tech Skills Needed

Machine Learning, Python

Mentor(s)

@dennyabrain @duggalsu

Category

Data Science, Machine Learning, Research

Madhukesh singh · Answer 1 · Tue Apr 09 2024 03:42:59 GMT+0800 (China Standard Time)

Hi there, @dennyabrain , I'm passionate about machine learning and keen on joining this project.

Here's a bit about myself:
I am Madhukesh Singh, currently studying at the National Institute of Technology, Hamirpur, in my third year.

My experience includes working on image processing, computer vision, and object detection in satellite imagery during my internship as an AI developer at DRDO DYSL.AI.

Is there a preferred method for communicating with the mentors? I'm eager to contact you and explore how I can contribute.

Denny George · Answer 2 · Tue Apr 09 2024 11:44:00 GMT+0800 (China Standard Time)

Hi @MadhukeshSingh we can use this issue to communicate approaches. If you start concretely implementing something, you can make a new issue specific to your approach and we can take the conversation there.

Tahseen23 · Answer 3 · Wed Apr 10 2024 10:14:58 GMT+0800 (China Standard Time)

"Hi there, @dennyabrain! I want to contribute to this project, but I am new to open-source contribution.
So, can you tell me what I have to do in this project and how to contribute?"

manisha sharma · Answer 4 · Wed Apr 10 2024 14:12:49 GMT+0800 (China Standard Time)

Hi there, @dennyabrain , I'm passionate about machine learning and keen on joining this project and for the project because of a robust skill set encompassing advanced machine learning and natural language processing capabilities.
my adaptability, efficiency in information retrieval, and quick learning make me a valuable asset for tasks requiring Machine Learning, AI-driven insights, data analysis, language language-related applications.
I am equipped to contribute to the team's goal by leveraging cutting-edge AI technology and staying abreast of industry trends.

Here's a bit about myself:
I am Manisha Sharma, currently studying at the Gd Goenka University, Gurugram, Haryana, 4th last year.

My experience includes working on deep learning, machine learning and artificial neural network, and artificial crypt analysis during my internship as an AI developer at Sag - DRDO and currently working in Interglobe Aviation as a data analyst internship.

Is there a preferred method for communicating with the mentors? I'm eager to contact you and explore how I can contribute.

poozasingh · Answer 5 · Wed Apr 10 2024 14:35:17 GMT+0800 (China Standard Time)

Hi @dennyabrain,
I'm Pooja Singh, a software developer intern at Verana Networks, passionate about machine learning and eager to join this project. My robust skill set in advanced machine learning and natural language processing, coupled with my adaptability and efficiency in information retrieval, make me a valuable asset.

I have hands-on experience in machine learning, and artificial intelligence from my internship as well as current work. I'm keen to connect with mentors to explore how I can contribute to the team's goals.

What's the preferred method for communication? Looking forward to hearing from you.

sreyash-layek · Answer 6 · Wed Apr 10 2024 15:18:18 GMT+0800 (China Standard Time)

Hello @dennyabrain ,
I'm thrilled to delve into the Feluda project and its objectives. After reviewing the documentation, I noticed that my background aligns well with the project's needs.

A little about myself: My name is Sreyash Layek, and I'm currently in my fifth year at the Indian Institute of Technology, Kharagpur, pursuing a Dual Degree (Integrated B.Tech & M.Tech) with a specialization in Signal Processing and Machine Learning.

Over the past three years, I've dedicated myself to exploring Machine Learning, with a particular focus on Computer Vision and Natural Language Processing tasks. I've spent a year working on Speech Processing and Accent Conversion, achieving results close to the state-of-the-art. Additionally, I've developed models for various applications, including Attention Monitoring, Accident Classification, Audio Classification, Emotion Classification, Recommendation Systems, and more.

I bring to the table over five years of experience in Python and three years in Machine Learning and Deep Learning. I'm eager to learn more about the project and discuss how I can contribute. I'd be interested in understanding your expectations and the specific requirements for this project.

Could we explore this further?

Sbswag · Answer 7 · Thu Apr 11 2024 14:51:48 GMT+0800 (China Standard Time)

Hello @dennyabrain , My name is Surjeet bijarniya and I am a student of IIT bhu and passionate about machine learning and eager to join this project. But I am new in machine learning sir, tell me how I contribute

KAMERA VAMSHI · Answer 8 · Thu Apr 11 2024 16:36:44 GMT+0800 (China Standard Time)

Hello @dennyabrain! I'm enthusiastic about machine learning and eager to be part of this project.

Allow me to introduce myself:
I'm Kamera Vamshi, currently I am Pursuing my B.Tech Final year at the National Institute of Technology, Rourkela (NIT Rourkela).

My background involves significant experience in Machine Learning, Python, and Data Analysis. I honed these skills during my internship and Projects.

Could you please advise on the preferred method for reaching out to mentors? I'm keen to connect and discuss how I can contribute to the project.

Akanshu Aich · Answer 9 · Thu Apr 11 2024 17:16:17 GMT+0800 (China Standard Time)

Hii @dennyabrain ,

I am Akanshu Aich, a third year BTech student from International Institute of Information Technology, Bhubaneswar. I am writing to express my interest in contributing to this project as a part of DMP 2024. Having thoroughly reviewed the project, I am impressed by its objectives and it seeks the potential for great impact in industries.

With my background in Backend using Django , MERN with practicing hands on Machine learning and DevOps such as Docker, I believe I can make valuable contributions to Machine learning part . My experience includes several projects like Society-Expenditure Manager using Django, Real Estate using MERN and Info-Finding Tool using Machine Learning(LLM), which I believe align well with the goals of your project.

I am particularly interested in fulfilling the requirements of the project and have some ideas on how to approach it effectively. I am committed to adhering to best practices, contributing high-quality code, and actively collaborating with the project maintainers and community.

I am excited about the opportunity to contribute to "Feluda" and help further its mission. I look forward to discussing potential contributions and how I can best support the project.

Please guide me with procedure and with all your knowledge and experience.

manavsolkar · Answer 10 · Thu Apr 11 2024 17:35:56 GMT+0800 (China Standard Time)

Hello @dennyabrain! I'm enthusiastic about machine learning and eager to be part of this project.

Allow me to introduce myself:
I'm Manav Solkar, currently I am Pursuing my B.Tech second year at Thakur College of Engineering and Technology (TCET).

I really want to be a part of this and hope that your guidance would help me to increase my skillset .

Could you please advise on the preferred method for reaching out to mentors? I'm keen to connect and discuss how I can contribute to the project

Tatwansh · Answer 11 · Fri Apr 12 2024 01:35:25 GMT+0800 (China Standard Time)

Hey @dennyabrain and @duggalsu,
I am interested to work on this project. I have prior experience working on project with similar objectives on the QAnon dataset. You can check out my work with the provided link.

notebook link: https://www.kaggle.com/code/tatwanshjaiswal/dark-web-language-analysis

I would be happy to receive feedback on how to improve it.

AbhimanyuSamagra · Answer 12 · Fri Apr 12 2024 20:44:45 GMT+0800 (China Standard Time)

Do not ask process related questions about how to apply and who to contact in the above ticket. The only questions allowed are about technical aspects of the project itself. If you want help with the process, you can refer instructions listed on Unstop and any further queries can be taken up on our Discord channel titled DMP queries.

Ashutosh · Answer 13 · Sat Apr 13 2024 16:58:48 GMT+0800 (China Standard Time)

Hey @dennyabrain and @duggalsu, I am Ashutosh pursuing B.Tech. in Artificial Intelligence and Data Science from IIT Jodhpur. I am proficient in languages like Python and C++. I have worked on projects related to machine learning and deep learning such as Stock Price Prediction and Voice Controlled Music Recommendation System using Deep Learning.
I am interested to work on this project and apply my skills in the project.

Denny George · Answer 14 · Tue Apr 16 2024 18:47:27 GMT+0800 (China Standard Time)

Hi everyone,

Thank you for expressing interest in this issue. Depending on your interests and skills, you can take ANY ONE of the following approaches :

Look at the problem statement and propose your approach
Remember the main problem statement - Given a large number of audio files, find a way to group identical and similar audio files. This approach would be ideal for anyone who is interested in or studies ML and/or DSP. By thinking about the problem statement, reviewing existing literature on it and proposing your approach here, we would all learn something from it and the mentors should be able to nudge you in the right direction.
Try getting feluda working on your machine
Feluda is a moderately complex software and has many moving parts. Getting it working on your machine itself can be a challenge. We have a guide on it here. If you are is a software developer/tinkerer, this might be a good place to start because once you have Feluda working locally and you can see the various existing functionalities, that might give you an idea of how to proceed.
Recreate our code on a jupyter notebook or google collab notebook
We already have some code that takes audio files and converts them into vectors. We also have code that takes these vectors and clusters them. I would take this approach if you are a software engineer with some ML engineering skills and you know your way around using ML models. Once you get this working on your notebook we can try out different pretrained models to evaluate performance.

You'll have me or members from our team to guide if you get stuck on any of these approaches. Taking some conrete steps on any of these 3 steps would help us know what your interests and skills are and give you concrete feedback when you get stuck.

All the best!

Vishakha Sharma · Answer 15 · Fri Apr 19 2024 19:17:09 GMT+0800 (China Standard Time)

Hello @dennyabrain I really want to contribute in this project. I have good hands on experience on python, Machine learning, Databases, Deep Learning. I am Data Science student and really enthusiast to work in your project.
From past 3 years, I have done a lot of real time projects, I have also done many internships to gain the hands on experience.
I want to learn and gain experience in deep way by working on this project. Please allow me to work with your project.

Satyam Kumar · Answer 16 · Sat Apr 20 2024 00:59:35 GMT+0800 (China Standard Time)

Hello @dennyabrain,

I'm eager to contribute to your project. With substantial experience in Python, machine learning, databases, and deep learning, I believe I can make valuable contributions. As a data science student, I've spent the past three years working on various real-world projects and completing internships to hone my skills.
I'm enthusiastic about delving deeper into the field and gaining practical experience through involvement in your project. I'm eager to learn and collaborate effectively. Please consider allowing me to be part of your team.

AbhimanyuSamagra · Answer 17 · Tue Apr 23 2024 18:48:00 GMT+0800 (China Standard Time)

Do not ask process related questions about how to apply and who to contact in the above ticket. The only questions allowed are about technical aspects of the project itself. If you want help with the process, you can refer instructions listed on Unstop and any further queries can be taken up on our Discord channel titled DMP queries. Here's a Video Tutorial on how to submit a proposal for a project.

Chaithanya kalyan · Answer 18 · Wed Apr 24 2024 00:08:47 GMT+0800 (China Standard Time)

hi @dennyabrain,

I am Chaithanya Kalyan. I am interested in contributing to this project.

I have experience working with time series signals. As part of the PhysioNet 2023 challenge, time domain and frequency domain features were extracted to classify the EEG signals (more details here).

I have a doubt regarding the details of this project and would greatly appreciate the clarification:

Does this clustering algorithm have to be scalable to different datasets (like a general framework that can be extended ) or is it only for a specific dataset?

I think the following approach will be worth trying:
without extracting the traditional audio features, we can train an autoencoder network on a large audio collection to automatically learn a low-level representation of the audio signals and cluster based on these latent representations.

I have tried a similar approach on EEG signals before, you can find that notebook here.

I would be happy to hear your feedback.

contact: chay5522kalyan@gmail.com

Denny George · Answer 19 · Wed Apr 24 2024 00:31:53 GMT+0800 (China Standard Time)

Hi @Chaithanya512,

Given that the project focus is on addressing usecases around online misinformation, the dataset we deal with is usually audio/video found on social media. So it can contain a variety of audio - memes, news clipping, amateur recording from phones etc.

Is there a quick way to validate if the autoencoder network approach would be suitable for this use case? What is your rationale to preferring that over extracting traditional audio features?

Chaithanya kalyan · Answer 20 · Wed Apr 24 2024 01:32:24 GMT+0800 (China Standard Time)

Thank you for the feedback, I am currently working on the code to validate the use of autoencoders.

Compared to traditional, hand-crafted features, autoencoders have the potential to capture a wider range of features. While traditional audio features are valuable, they might miss some subtle patterns in the data that autoencoders can discover.

I have a follow-up question (might be stupid) for your response, please correct me if I am wrong.

I'm curious, do you think traditional audio features are effective in clustering misinformation and not-misinformation? do those features vary for misinformation and not-misinformation?

Denny George · Answer 21 · Wed Apr 24 2024 11:00:18 GMT+0800 (China Standard Time)

So we wont be using the clusters to classify something as "misinformation" and "not misinformation". We're hoping to use clustering as a way to find first level of grouping amongst a large dataset. So most likely the clusters could be something high level like "memes", "amateur-smartphone" etc. If we are lucky we could aspire for thematic labels like "politics", "health" etc.

An example of clustering we did on images is here - https://tattle.co.in/articles/covid-whatsapp-public-groups/t-sne/
The clusters we got then were - Screenshots(Social Media), Screenshots(Other), Medical Supplies, Paper Documents, Religious Imagery etc

Chaithanya kalyan · Answer 22 · Wed Apr 24 2024 11:37:50 GMT+0800 (China Standard Time)

thank you for the clarification. That makes sense now. So, we are using clustering only to find the high-level labels/pseudo labels. I have found this paper that uses labeled data (only text) to categorize misinformation posters or active citizens on social media. It got me thinking - if we could obtain the transcriptions of the audio content (if that is possible), that information could significantly enhance our clustering efforts.

Denny George · Answer 23 · Wed Apr 24 2024 12:20:02 GMT+0800 (China Standard Time)

@Chaithanya512 yes that would certainly help. Infact when we do clustering for images, we often try to extract any text out of it as a way to get a richer dataset. You can certainly try transcriptions for audio content. One challenge might be that we are dealing with non English languages and also low quality audio.

Preeti Sharma · Answer 24 · Wed Apr 24 2024 17:04:25 GMT+0800 (China Standard Time)

hey can I work on this issue I have work on speech attenuation in the past so kind of familiar with problem statemnet indly let me know

Ahmed Furkhan · Answer 25 · Thu Apr 25 2024 17:57:23 GMT+0800 (China Standard Time)

Hey !! I Want to work on this

Ankita Mohan · Answer 26 · Sat Apr 27 2024 13:51:45 GMT+0800 (China Standard Time)

Hi there, @dennyabrain,
I am Ankita Mohan, I am a third-year student at Kalinga Institute of Industrial Technology, Odisha. I'm passionate about machine learning and keen on joining this project. Moreover, I have a deep understanding of clustering algorithms as I have done projects in clustering.
I am eager to contribute and to gain your guidance for the same.

Pushkar0730 · Answer 27 · Sat Apr 27 2024 19:56:35 GMT+0800 (China Standard Time)

I would definitely like to work on it ☺️

Denny George · Answer 28 · Sat Apr 27 2024 20:43:51 GMT+0800 (China Standard Time)

Hi all thanks for your enthusiasm. Please let me know if you have any specific ideas on how you would go about the project.

Please refer to this comment for some suggested ways to move forward #82 (comment)

PriyalPB · Answer 29 · Sun Apr 28 2024 17:37:07 GMT+0800 (China Standard Time)

Hi @dennyabrain ! I'm a third year student from Cummins Pune.

I'm thrilled to join your Clustering large amount of audio project and offer my skill sets which has a strong background in Machine Learning ,deep learning (CNN), NLP, DSP and Python, which seem to fit perfectly with what you're looking for.
I'm excited to explore how my expertise can elevate the project. Furthermore, the integration computer vision along with the ML advancements could lead to a seamlessly automated system.
I'm eager to discuss further avenues where I can make meaningful contributions. Could we schedule a meeting to delve into this in more detail?

CodeSage4 · Answer 30 · Tue Apr 30 2024 22:17:56 GMT+0800 (China Standard Time)

My skills in machine learning (computer vision, NLP) and experience with speech processing align well with the Feluda project. I'm a motivated student with 3+ years of Python experience and 2 years in ML/DL. Eager to discuss how I can contribute!

V Dinesh · Answer 31 · Fri May 03 2024 17:04:10 GMT+0800 (China Standard Time)

Hi @dennyabrain , Myself V Dinesh Third Year Mechanical student from Army Institute of Technology Pune. I'm passionate about machine learning and keen on joining this project. In addition, my expertise in clustering algorithms extends to a profound level, acquired through hands-on experience gained from multiple projects focused specifically on implementing and fine-tuning various clustering techniques. These projects have provided me with a comprehensive understanding of the underlying principles, nuances, and practical applications of clustering algorithms across diverse domains, allowing me to effectively navigate through complex datasets, identify patterns, and extract meaningful insights. I am enthusiastic about contributing my expertise and am eager to receive your guidance in order to further enhance my capabilities in this regard.

pandharkardeep · Answer 32 · Fri May 03 2024 19:07:24 GMT+0800 (China Standard Time)

Hi @dennyabrain . I am Deep Pandharkar, second year Data Science Engineering student from DJ Sanghvi College of Engineering Mumbai. I have a some experience in CV as well as NLP. My passion towards ML makes me keen towards joining this project. In addition to that, I have practised a lot of vector embeddings as a part of my NLP projects. I also have coding experience in Data Structures and Algorithms. Eager to discuss how can I contribue

Sufia · Answer 33 · Mon May 06 2024 15:03:16 GMT+0800 (China Standard Time)

I am Sufia, and I graduated with B.tech CSE, I am Data scientist and also full stack developer, but I am fresher I hv only completed 6 months of training in the entire field and one month of Internship so, I want to do the internship.

Aatman Vaidya · Answer 34 · Tue Jun 18 2024 20:26:39 GMT+0800 (China Standard Time)

Weekly Goals

Week 1

Setup Feluda and run tests for AudioVecEmbedding Operator
Collect a dataset of 150-200 Audio Files
Run Feluda AudioVec Operator on a the dataset, reduce dimensions using t-SNE and do a visual plot - This will act as a baseline for us
Try out different Embedding Models

Week 2

play the audio from the plot in colab
try out more embedding models - AST
do a review of lit for any video based transformer models are trained on audio
try out k-means clustering on Audio Data using Feluda's AudioVec Embedding operator

Week 3

Let's keep trying out more transformer based/ CNN ensemble based models
Evaluate clustering results
Do a review of lit for other clustering algorithms

Week 4

Keep exploring embedding model's
start looking at sampling strategies for audio files.

poozasingh · Answer 35 · Wed Jun 19 2024 14:32:06 GMT+0800 (China Standard Time)

Hello can you please make a call in 6299 143 824.

…

On Tue, 18 Jun, 2024, 17:57 Aatman Vaidya, ***@***.***> wrote: Weekly Goals Week 1 - Setup Feluda and run tests for AudioVecEmbedding Operator - Collect a dataset of 150-200 Audio Files - Run Feluda AudioVec Operator on a the dataset, reduce dimensions using t-SNE and do a visual plot - This will act as a baseline for us - Try out different Embedding Models — Reply to this email directly, view it on GitHub <#82 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BBTKGEM2LSVESBNH274GGNTZIARRRAVCNFSM6AAAAABDLKDTISVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZVHE3TQNRSHE> . You are receiving this because you commented.Message ID: ***@***.***>

Chaithanya kalyan · Answer 36 · Thu Jun 27 2024 06:17:17 GMT+0800 (China Standard Time)

Weekly Learnings and Updates:

Week 1:

Set-up feluda and ran tests for AudioVec Operator.
Collected a dataset of 314 audio files.
Established a baseline evaluation for the performance of AudioVec Operator on the dataset using a visual clustering plot.
Tried out OpenL3 embeddings model.

Colab file: https://colab.research.google.com/drive/1lBrWCyUsuCSTOEUUqDwfc6FzpQWO0ETt?usp=sharing

Week 2:

Reviewed literature on Audio representation & embedding models.
Evaluated pre-trained models like Audio Spectrogram Transformer (AST), VGGish, HuBERT.
Implemented KMeans Clustering on AudioVec Operator embeddings.
Was able to achieve clear and distinctive visual clusters using the AST model after t-SNE mapping.
Updated colab notebook: https://colab.research.google.com/drive/1lBrWCyUsuCSTOEUUqDwfc6FzpQWO0ETt#scrollTo=bbvns2TAeA9Y&line=1&uniqifier=1