wip - i use this code to learn rust and kotlin
Suppose you take an artist and song combination and scramble the letters. How would you retrieve the artist and song again without knowing that first step?
This application unscrambles that string into the artist and song.
This application builds an android application. The application contains an input field for scrambled data. The scrambled data is ordered and fed to query of MusicBrainz data. The corresponding artist and song are displayed back on the screen.
input
we can type the input by hand. Or use the Google Vision API to decode a photo to text.
database
Using the MusicBrainz database it creates all possible artist - song combinations. All letters of these values are then put in alphabetical order. This becomes the scrambled value.
If we now do the same with the unknown value we are able to look it up directly and derive the value.
Other implementation
Another implementation would use fuzzy search to find nearest neighbours.
application is in Android Kotlin, the PoC is done in rust
The following steps show how to download the musicdb dataset, open it using docker, transform the data and export it to csv. We then have all artist song combinations.
- download and extract database : ( this directory is not in git for (2G) of obvious reasons ... )
VERSION=$(curl -s https://mirrors.dotsrc.org/MusicBrainz/data/fullexport/LATEST)
curl -o mbdump.tar.bz2 -s https://mirrors.dotsrc.org/MusicBrainz/data/fullexport/${VERSION}/mbdump.tar.bz2
mkdir -p ${PWD}/musicdb/input
tar -xvjf mbdump.tar.bz2 -C ${PWD}/musicdb/input
- download schema into
mbdump/input
:curl -o ${PWD}/musicdb/input/CreateTables.sql https://raw.githubusercontent.com/metabrainz/musicbrainz-server/master/admin/sql/CreateTables.sql
and the manualy remove the toc CUBE on file 3272 (and the comma on the line before)
-
enter the
${PWD}/musicdb
directory -
start a postgres container:
docker run -ti -u postgres --name pg -v ${PWD}:/datadump postgres:9.5
-
in another shell enter the container as postgres
docker exec -u postgres -ti pg bash
-
add user
createuser musicbrainz_user
-
create db
createdb -U postgres --owner=musicbrainz_user --encoding=UNICODE importtest
-
start psql
-
alter user with superuser rigts (needed to export csv)
alter role musicbrainz_user with superuser;
-
exit
\q
-
create tables and import tables (See https://wiki.musicbrainz.org/History:Database_Installation)
psql -U musicbrainz_user importtest
and import sql: \i /datadump/CreateTables.sql
exit: \q
import data:
for t in * ; do \
echo `date` $t ; echo "\\copy $t from ./$t" | psql -U musicbrainz_user importtest ; \
done
- enter psql again
psql -U musicbrainz_user importtest
- select the data into temp table
SELECT my.*
INTO tmp
FROM (
select ac.name as artist, r.name as track
from recording r, artist_credit ac
where r.artist_credit = ac.id
) AS my;
- export data to csv:
COPY tmp to '/tmp/tracks.csv' DELIMITER E'\t' QUOTE '"' CSV FORCE QUOTE *;
- login to container as root and move it to the share so its available on the host system
mv /tmp/tracks.csv /datadump/
the resulting file is about 800M
Here use createDb.groovy
script to read csv export, create a scrambled text and add it to an sqlite database.
To run this with test data:
head -1000 ${PWD}/musicdb/tracks.csv > tracks.csv
groovy ./createDb.groovy
The initial dataset contains 23.528.878 records. Or 14.121.143 records when using dropping all between brackets: (...)
However, for the first 10.000 records select count(*) from songs;
yields 10.000
But select count(*) from (select distinct artist,song from songs);
yields 68.000
So I replaced the id
pk with a song,artist pk. And changed the INSERT
into INSERT OR REPLACE
Together with the index on scramble slowed the creation of the database by a lot. However we now have a fast (and smaller) read-only database.
-
sanitize scrambled word.
-
find sanitized word in table
-
return song
https://cloud.google.com/vision/docs/quickstart
https://cloud.google.com/vision/docs/reference/libraries#client-libraries-install-java
upload
Copy test file to google storage. This will be a photo from the phone's camera.
gsutil cp ~/Downloads/DSC_1419.JPG gs://song-unscrambler.appspot.com/test/DSC_1419.JPG
gsutil cp ~/Downloads/DSC_1421.JPG gs://song-unscrambler.appspot.com/test/DSC_1421.JPG
enable api
https://console.cloud.google.com/flows/enableapi?apiid=vision.googleapis.com&redirect=https:%2F%2Fconsole.cloud.google.com&pli=1&authuser=1
query
payload.json:
{
"requests": [
{
"image": {
"source": {
"imageUri": "https://storage.googleapis.com/song-unscrambler.appspot.com/test/DSC_1421.JPG"
}
},
"features": [
{
"type": "TEXT_DETECTION",
"maxResults": 2
}
]
}
]
}
find key here:
https://console.cloud.google.com/apis/credentials?project=song-unscrambler&authuser=1
fire curl command:
API_KEY_VISION=my-key
curl -X POST \
-H "Content-Type: application/json" -d @./scripts/vision-api-request.json \
https://vision.googleapis.com/v1/images:annotate?key=$API_KEY_VISION > scripts/vision-api-response.json
"
project is located here:
https://console.firebase.google.com/u/1/project/song-unscrambler/settings/general/android:com.wijdemans.songunscrambler
download json
google-services.json
can be downloaded
setup gradle
Project-level build.gradle (/build.gradle):
buildscript {
dependencies {
// Add this line
classpath 'com.google.gms:google-services:3.1.0'
}
}
App-level build.gradle (//build.gradle):
// Add to the bottom of the file
apply plugin: 'com.google.gms.google-services'