Call for Papers Tool

Making life easier for finding relevant conference call for papers

Slides from Extraction Summit 2021

Useful links:

Getting up and running

The Jupyter notebook will get the initial Neo4j graph up and running. You will need to insert credentials for the instance of Neo4j you are using. The code is work in progress, to be optimised, and I apologise in advance for the terrible code :)

If you’ve not already done so, I would suggest you get yourself a (free!) instance of Neo4j Sandbox up and running first.

Querying the CfP graph

Here are the queries that we will run against the graph during the session. You will need to have Neo4j Browser up and running for this.

What are the most common tags? Hints at trends

MATCH (e:Event)-->(t:Tag)
RETURN t.value, count(e) AS size ORDER BY size DESC

What CfPs are closing within the next month? This query is casting a string to DateTime - we can import/update this property to a DateTime type. Will do that later

MATCH (e:Event)-->(t:Tag)
WHERE date(e.cfpClosing) < date("2021-11-01")
RETURN e.title, e.cfpClosing, e.description, collect(t.value) AS Tags
    ORDER BY e.cfpClosing

Find some data science-esque conferences. This query starts by providing an array of terms we’d commonly associate with data science, and then we’re checking whether any of the tags linked to events contain those terms. As more than one of these array terms could map to the same event, we use DISTINCT to just bring back a single copy.

WITH ['data science', 'machine learning', 'artificial intelligence', 'ai', 'ml', 'deep learning'] AS p
MATCH (l:Location)<--(e:Event)-->(t:Tag)
WHERE t.value IN p
RETURN DISTINCT e.title, e.description, l.value

Find conferences that have tags of data science and python. Here we’re again using an array, but this time around we want the tags associated with an event to include all of the terms.

WITH ['data science', 'python'] AS p
MATCH (e:Event)
WHERE ALL (i in p WHERE exists((e)--(:Tag {value:i})))
RETURN e.title

Time for similarity - create a graph projection for similarity. Think of a graph projection as an in memory 'view' of our original graph. The algorithm we are using is expecting a monopartite graph, so we are creating a view where the Tag node is being used as a bridge to show which events are connected to each other.

CALL gds.graph.create.cypher("similar",
"MATCH (e:Event) RETURN id(e) AS id",
"MATCH (e1:Event)-->(t:Tag)<--(e2:Event) RETURN id(e1) AS source, id(e2) AS target")

Let’s have a look at those similar events. We are going to run the node similarity algorithm across our graph projection, and then return a stream of event pairs, ordered by which ones are most similar.

CALL gds.nodeSimilarity.stream("similar")
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).title, gds.util.asNode(node2).title, similarity
    ORDER BY similarity desc

Let’s now connect similar events together, so that we can have a look at them. Similar query as above, but now we are going to make a write to our graph. We are going to create a SIMILAR_TO relationship between our Event nodes where they are at least 80% similar as specified by the algorithm. To prevent having relationships in both directions between Event nodes, we’re using the greater than 'trick' to just write in one relationship.

CALL gds.nodeSimilarity.stream("similar")
YIELD node1, node2, similarity
WITH gds.util.asNode(node1) AS n1, gds.util.asNode(node2) AS n2,
	similarity WHERE similarity >= 0.8 AND id(n1)>id(n2)
CREATE (n1)-[:SIMILAR_TO]->(n2)

And let’s look at the result! Here we are using the fact that there is only one relationship between the Event nodes, so we find the 'head node' of the similar group, and then do an unbounded directional traversal to pick up all the other similar events, collect them and then return them. We also show the tags for the 'head node' so we get an idea of the conference theme.

MATCH (e:Event)-[:SIMILAR_TO]->(e2)
WHERE NOT  ()-->(e)
WITH e
MATCH (e)-[:SIMILAR_TO*]->(e1), (e)-->(t:Tag)
RETURN DISTINCT e.title, collect(distinct e1.title), collect(distinct t.value)

About

Making life easier for finding relevant conference call for papers

Apache License 2.0

Languages

Language:Jupyter Notebook 100.0%