twitter / cassovary

Cassovary is a simple big graph processing library for the JVM

Home Page:http://twitter.com/cassovary

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Edge Weights

bmckown opened this issue · comments

Everyone,

I wanted to find out the communal opinion on adding edge weights to cassovary. After having built out quite a few algorithms, it has become clear that these could be improved through the addition of edge weights to the graph structure. This may be as simple as adding a new Node type--maybe something like

/** A map based node that stores edge weights for all connections to the given node
 * @param id The id for the given node
 * @param inboundNodes A map of node id to edge weight for inbound connections between 
 * `this` and source node id
 * @param outboundNodes A map of node id to edge weight for outbound connections 
 * between `this` and target node id 
 */
class EdgeWeightedNode(val id: Int, inboundNodes: Map[Int, Double], outboundNodes: Map[Int, Double]) 
  extends Node {
  ...
}

and a new graph class EdgeWeightedDirectedGraph extends DirectedGraph[EdgeWeightedNode]

Then I could update our algorithms to work with EdgeWeightedDirectedGraphs as well.

I would love to hear some thoughts on this idea and see how we could best implement this. Thanks!

IMHO, edge weights are important for many applications and it is a key
missing feature of Cassovary right now. However, Cassovary is all about
efficiency of storage, so it will be nice to give thought to that. There
are two separate things here:
(1) Abstraction -- such as EdgeWeightedNode in your snippet
(2) Implementation -- take a look at com.twitter.cassovary.graph.labels and
see what you think of managing edge labels as a collection separate from
the edge itself.

On Thu, Feb 26, 2015 at 8:49 AM, Ben McKown notifications@github.com
wrote:

Everyone,

I wanted to find out the communal opinion on adding edge weights to
cassovary. After having built out quite a few algorithms, it has become
clear that these could be improved through the addition of edge weights to
the graph structure. This may be as simple as adding a new Node type--maybe
something like

/** A map based node that stores edge weights for all connections to the given node * @param id The id for the given node * @param inboundNodes A map of node id to edge weight for inbound connections between * this and source node id * @param outboundNodes A map of node id to edge weight for outbound connections * between this and target node id */class EdgeWeightedNode(val id: Int, inboundNodes: Map[Int, Double], outboundNodes: Map[Int, Double])
extends Node {
...
}

and a new graph class EdgeWeightedDirectedGraph extends
DirectedGraph[EdgeWeightedNode]

Then I could update our algorithms to work with EdgeWeightedDirectedGraphs
as well.

I would love to hear some thoughts on this idea and see how we could best
implement this. Thanks!


Reply to this email directly or view it on GitHub
#168.

Also see #36 for some past discussion on this.
CC @szymonm

"IMHO, edge weights are important for many applications and it is a key
missing feature of Cassovary right now" @pankajgupta
+1
I would really like something similar to Labels.

trait WeightedDirectedGraph[+V <: Node]  extends DirectedGraph[V] {
  /**
    * Labels on edges of this graph.
    */
  var edgeLabels: Labels[(Int, Int)] = _

  /**
    * The label of an edge accessed by name. Label can be anything.
    */
  def labelOfEdge[L : TypeTag](edgeId: (Int, Int), labelName: String): Option[L] = _

} 

Yes indeed. As an aside, there is a beginning of this feature in terms of
node labels.

See
https://github.com/twitter/cassovary/tree/master/cassovary-core/src/main/scala/com/twitter/cassovary/graph/labels

Perhaps you can tell us a bit about what kind of edge labels will satisfy
your use case? e.g., you have how many approx. edges and approx. how many
and what kinds of labels per edge?

CC @szymonm
https://github.com/twitter/cassovary/issues?q=is%3Apr+is%3Aopen+author%3Aszymonm
@plofgren

On Wed, Dec 9, 2015 at 7:51 AM, Thomas notifications@github.com wrote:

"IMHO, edge weights are important for many applications and it is a key
missing feature of Cassovary right now" @pankajgupta
https://github.com/pankajgupta
+1
I would really like something similar to Labels.


Reply to this email directly or view it on GitHub
#168 (comment).

I read that labels are suitable for nodes and edges so I used (Int, Int) keys and Long or True labels for my edges.
In this use case, my graph is really dense and I have 7 types of labels.

I'm in the process of making it work but I think that I will a hard time querying labels with partial tuples (all labels with a given source node)

In the end, it would be really useful to create an hash/id for each edge that would be used as key for the labels.

Not sure I fully understand. May be make a class of these 7 labels and
index that by edge? What do you mean by partial tuples?

The best might be to point to your code if you can.

On Wed, Dec 9, 2015 at 10:36 PM, Thomas notifications@github.com wrote:

I read that labels are suitable for nodes and edges so I used (Int, Int)
keys and Long or True labels for my edges.
In this use case, my graph is really dense and I have 7 types of labels.

I'm in the process of making it work but I think that I will a hard time
querying labels with partial tuples (all labels with a given source node)

In the end, it would be really useful to create an hash/id for each edge
that would be used as key for the labels.


Reply to this email directly or view it on GitHub
#168 (comment).

How can I create an index by edge without using a tuple of (src.id, dst.id) ?

By partial tuples, I meant that using (src.id, dst.id) as an index for edge, it's really tempting to try to the sum of incoming weights without using the graph representation itself.