How can I get Titan vertices from HBase directly using Apache Spark?

Question

How can I get Titan vertices from HBase directly using Apache Spark?

ChaohsinChan opened this issue 8 years ago · comments

I am running Titan 1.0 with HBase 1.0.3 backend.I want to get the Titan vertices from HBase directly using Apache Spark 1.6.1 ,can you give me some advice? Thanks

ChaohsinChan commented 8 years ago

OK,thanks

Imri Hecht · Answer 1 · Tue Dec 13 2016 04:49:31 GMT+0800 (China Standard Time)

Hey,

You can run the following code in order to retrieve the vertices. For example, let's count how many vertices you have on your graph.

import mizo.rdd.MizoBuilder;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;

public class MizoVerticesCounter {
    public static void main(String[] args) {
        SparkConf conf = new SparkConf()
                .setAppName("Mizo Vertices Counter")
                .setMaster("local[1]")
                .set("spark.executor.memory", "4g")
                .set("spark.executor.cores", "1")
                .set("spark.rpc.askTimeout", "1000000")
                .set("spark.rpc.frameSize", "1000000")
                .set("spark.network.timeout", "1000000")
                .set("spark.rdd.compress", "true")
                .set("spark.core.connection.ack.wait.timeout", "6000")
                .set("spark.driver.maxResultSize", "100m")
                .set("spark.task.maxFailures", "20")
                .set("spark.shuffle.io.maxRetries", "20");

        SparkContext sc = new SparkContext(conf);

        long count = new MizoBuilder()
                .titanConfigPath("titan-graph.properties")
                .regionDirectoriesPath("hdfs://my-graph/*/e") // HDFS path to your HBase Table
                .parseInEdges(v -> false)
                .verticesRDD(sc)
                .toJavaRDD()
                .count(); // total number of vertices in your graph

        System.out.println("Vertices count is: " + count);
    }
}

Change 'hdfs://my-graph/*/e' to the HDFS path of your HBase Table.

Let me know if you have any further questions.

ChaohsinChan · Answer 2 · Tue Dec 13 2016 15:32:52 GMT+0800 (China Standard Time)

Thank you for your reply. I have two suggestions.
First, whether we can get the HDFS path through the HBase interface, which is more convenient to use, usually, we only know that HBase table name and it's configurations.
Second, whether the project can be converted to Maven management, which can also be developed inside the Eclipse. For those who are not familiar with Idea, it would take a long time to build up the development environment.

Imri Hecht · Answer 3 · Tue Dec 13 2016 15:36:21 GMT+0800 (China Standard Time)

Thanks for your suggestions -

Regarding the Table name, I generally prefer not to rely on Hadoop config files, but rather specify paths directly.

Regarding Maven - good advice, I will switch to Maven and reupload soon.

Did you manage to run the code eventually?

ChaohsinChan · Answer 4 · Tue Dec 13 2016 15:43:23 GMT+0800 (China Standard Time)

I am not very familiar with Idea, so until now has not set up a good development environment. Can you give me some advice?

Imri Hecht · Answer 5 · Tue Dec 13 2016 15:45:58 GMT+0800 (China Standard Time)

You only have to open the root directory in IntelliJ, then go to MizoEdgesCounter, tight click and debug.

ChaohsinChan · Answer 6 · Tue Dec 13 2016 16:11:36 GMT+0800 (China Standard Time)

When I import a project to Idea, choosing to create a project from an existing source will prompt me that the project file already exists and that other errors will occur when I choose to overwrite it. I do not know why.But if I choose Import a project from an existing model,only Eclipse,Gradle,Maven can choose.So I still did not succeed.

Imri Hecht · Answer 7 · Tue Dec 13 2016 16:15:50 GMT+0800 (China Standard Time)

Use Open rather than Import Project, should work On Tue, 13 Dec 2016 at 10:11 ChaohsinChan <notifications@github.com> wrote: When I import a project to Idea, choosing to create a project from an existing source will prompt me that the project file already exists and that other errors will occur when I choose to overwrite it. I do not know why.But if I choose Import a project from an existing model,only Eclipse,Gradle,Maven can choose.So I still did not succeed. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABPy_2-55frNsHvxQZ1LOLW2AgMADmUwks5rHlM4gaJpZM4LKZ6v> .

Imri Hecht · Answer 8 · Tue Dec 13 2016 16:16:10 GMT+0800 (China Standard Time)

Try using File > Open and choose the project iml file

ChaohsinChan · Answer 9 · Tue Dec 13 2016 17:05:33 GMT+0800 (China Standard Time)

Thank you for your suggestion, I am left with a last problem.
Module mizo-core: invalid item 'com.google.guava:guava:19.0' in the dependencies list
Module mizo-core: invalid item 'com.thinkaurelius.titan:titan-core:1.0.0' in the dependencies list
How do I introduce these dependencies? And Hbase and Spark without these dependency problems.

Imri Hecht · Answer 10 · Tue Dec 13 2016 17:27:44 GMT+0800 (China Standard Time)

These dependencies should come from Maven. I see that the POMs are not included in the repo, I will add them in 12 hours.

ChaohsinChan · Answer 11 · Tue Dec 13 2016 17:39:13 GMT+0800 (China Standard Time)

OK,Thanks. I find the files titan-graph.properties and log4j.properties are also missing,you can add them together.

Imri Hecht · Answer 12 · Tue Dec 13 2016 17:41:26 GMT+0800 (China Standard Time)

You can omit the log4j properties file, and graph.properties is your Titan properties file.

…

On Tue, 13 Dec 2016 at 11:39 ChaohsinChan ***@***.***> wrote: OK,Thanks. I find the files titan-graph.properties and log4j.properties are also missing,you can add them together. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABPy_26VYYfQiomQD89rSVMIXt9rib9gks5rHmfCgaJpZM4LKZ6v> .

ChaohsinChan · Answer 13 · Tue Dec 13 2016 18:42:53 GMT+0800 (China Standard Time)

I find an error:
Exception in thread "main" java.lang.IllegalArgumentException: Could not find implementation class: com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.

I suspect that this problem is about the config titan-graph.properties,can you show your config to me?

Imri Hecht · Answer 14 · Tue Dec 13 2016 18:51:55 GMT+0800 (China Standard Time)

Send me your properties file

…

On Tue, 13 Dec 2016 at 12:42 ChaohsinChan ***@***.***> wrote: I find an error: Exception in thread "main" java.lang.IllegalArgumentException: Could not find implementation class: com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager. I suspect that this problem is about the config titan-graph.properties,can you show your config to me? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABPy_7W_e2N-RVHHSSjJsE4uJ5r8wE6Wks5rHnatgaJpZM4LKZ6v> .

ChaohsinChan · Answer 15 · Tue Dec 13 2016 19:00:41 GMT+0800 (China Standard Time)

storage.backend=hbase
storage.hostname=hlg-3p163-wangyongzhi,hlg-3p190-wangyongzhi,hlg-3p166-wangyongzhi
storage.hbase.table=titandb
storage.hbase.ext.zookeeper.znode.parent=/hbase-unsecure
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
index.search.backend=elasticsearch
index.search.hostname=127.0.0.1
index.search.elasticsearch.client-only=true

ChaohsinChan · Answer 16 · Tue Dec 13 2016 19:02:40 GMT+0800 (China Standard Time)

I wonder this configuration is not right. I just copy them from Titan configuration .

Imri Hecht · Answer 17 · Tue Dec 13 2016 19:11:57 GMT+0800 (China Standard Time)

Add:
storage.hbase.compat-class = com.thinkaurelius.titan.diskstorage.hbase.HBaseCompat1_0

ChaohsinChan · Answer 18 · Tue Dec 13 2016 19:16:41 GMT+0800 (China Standard Time)

It does not work,should I need other dependencies?

Imri Hecht · Answer 19 · Wed Dec 14 2016 14:07:57 GMT+0800 (China Standard Time)

Let me build it myself and I will upload it as a complete Maven project. Will update you soon.

ChaohsinChan · Answer 20 · Wed Dec 14 2016 15:36:12 GMT+0800 (China Standard Time)

All the problems are solved by me, and now to the last step, but there was a mistake:

Exception in thread "main" java.lang.ClassCastException: com.thinkaurelius.titan.graphdb.types.VertexLabelVertex cannot be cast to com.thinkaurelius.titan.graphdb.internal.InternalRelationType
at mizo.rdd.MizoRDD.lambda$loadRelationTypes$3(MizoRDD.java:146)
at java.lang.Iterable.forEach(Iterable.java:75)

Would you give me some advice?

Imri Hecht · Answer 21 · Wed Dec 14 2016 15:37:32 GMT+0800 (China Standard Time)

Please send me your code

…

On Wed, 14 Dec 2016 at 9:36 ChaohsinChan ***@***.***> wrote: All the problems are solved by me, and now to the last step, but there was a mistake: Exception in thread "main" java.lang.ClassCastException: com.thinkaurelius.titan.graphdb.types.VertexLabelVertex cannot be cast to com.thinkaurelius.titan.graphdb.internal.InternalRelationType at mizo.rdd.MizoRDD.lambda$loadRelationTypes$3(MizoRDD.java:146) at java.lang.Iterable.forEach(Iterable.java:75) Would you give me some advice? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABPy_5DtwKfzfWdJCHt2fq0DF46VkT-Zks5rH5xsgaJpZM4LKZ6v> .

ChaohsinChan · Answer 22 · Wed Dec 14 2016 15:38:03 GMT+0800 (China Standard Time)

public class MizoEdgesCounter {
public static void main(String[] args) {
System.setProperty("hadoop.home.dir", "C:\F盘\hadoop-2.6.0.tar\hadoop-2.6.0\hadoop-2.6.0");
SparkConf conf = new SparkConf()
.setAppName("Mizo Edges Counter")
.setMaster("local[1]")
.set("spark.executor.memory", "4g")
.set("spark.executor.cores", "1")
.set("spark.rpc.askTimeout", "1000000")
.set("spark.rpc.frameSize", "1000000")
.set("spark.network.timeout", "1000000")
.set("spark.rdd.compress", "true")
.set("spark.core.connection.ack.wait.timeout", "6000")
.set("spark.driver.maxResultSize", "100m")
.set("spark.task.maxFailures", "20")
.set("spark.shuffle.io.maxRetries", "20");

    SparkContext sc = new SparkContext(conf);

    long count = new MizoBuilder()
            .logConfigPath("C:\\ideapluin\\mizo-master\\mizo-master\\target\\test\\mizo-rdd\\log4j.properties")
            .titanConfigPath("C:\\ideapluin\\mizo-master\\mizo-master\\target\\test\\mizo-rdd\\titan-graph.properties")
            .regionDirectoriesPath("hdfs://hlg-3p163-wangyongzhi:8020/apps/hbase/data/data/default/titandb6/8f68e1d6f9d35a4683e1a4c264cd669f/e")
            .parseInEdges(v -> false)
            .edgesRDD(sc)
            .toJavaRDD()
            .count();

    System.out.println("Edges count is: " + count);
}

}

ChaohsinChan · Answer 23 · Wed Dec 14 2016 15:45:26 GMT+0800 (China Standard Time)

I did not modify your code. This error occured here:
` protected static HashMap<Long, InternalRelationType> loadRelationTypes(String titanConfigPath) {
TitanGraph g = TitanFactory.open(titanConfigPath);
StandardTitanTx tx = (StandardTitanTx)g.newTransaction();

    HashMap<Long, InternalRelationType> relations = Maps.newHashMap();

    tx.query()
            .has(BaseKey.SchemaCategory, Contain.IN, Lists.newArrayList(TitanSchemaCategory.values()))
            .vertices()
            .forEach(v -> relations.put(v.longId(), new MizoTitanRelationType((InternalRelationType)v)));

    g.close();

    return relations;
}`

Imri Hecht · Answer 24 · Wed Dec 14 2016 15:45:32 GMT+0800 (China Standard Time)

On MizoRDD loadRelationTypes, change the forEach to: .foraEach(v -> { if (v instanceof InternalRelationType) { relation.put(...) } });

…

On Wed, 14 Dec 2016 at 9:40 ChaohsinChan ***@***.***> wrote: I did not modify your code. ` protected static HashMap<Long, InternalRelationType> loadRelationTypes(String titanConfigPath) { TitanGraph g = TitanFactory.open(titanConfigPath); StandardTitanTx tx = (StandardTitanTx)g.newTransaction(); HashMap<Long, InternalRelationType> relations = Maps.newHashMap(); tx.query() .has(BaseKey.SchemaCategory, Contain.IN, Lists.newArrayList(TitanSchemaCategory.values())) .vertices() .forEach(v -> relations.put(v.longId(), new MizoTitanRelationType((InternalRelationType)v))); g.close(); return relations; }` — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABPy_5YQ2XB81f7tLf7Yjjqg0oq9vu2hks5rH51mgaJpZM4LKZ6v> .

Imri Hecht · Answer 25 · Wed Dec 14 2016 15:48:35 GMT+0800 (China Standard Time)

Modify the code as i mentioned, should solve this problem

…

On Wed, 14 Dec 2016 at 9:45 ChaohsinChan ***@***.***> wrote: I did not modify your code. This error occured here: ` protected static HashMap<Long, InternalRelationType> loadRelationTypes(String titanConfigPath) { TitanGraph g = TitanFactory.open(titanConfigPath); StandardTitanTx tx = (StandardTitanTx)g.newTransaction(); HashMap<Long, InternalRelationType> relations = Maps.newHashMap(); tx.query() .has(BaseKey.SchemaCategory, Contain.IN, Lists.newArrayList(TitanSchemaCategory.values())) .vertices() .forEach(v -> relations.put(v.longId(), new MizoTitanRelationType((InternalRelationType)v))); g.close(); return relations; }` — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABPy_9kFLq2CXhOWk2kJ2C1Zm-vYvSqVks5rH56WgaJpZM4LKZ6v> .

ChaohsinChan · Answer 26 · Wed Dec 14 2016 16:05:58 GMT+0800 (China Standard Time)

The problem above was solved, but there was aslo a mistake:
java.lang.IllegalArgumentException: Invalid ASCII encoding offset: 625 at com.thinkaurelius.titan.graphdb.database.serialize.attribute.StringSerializer.read(StringSerializer.java:105) at mizo.hbase.MizoTitanHBaseRelationParser.readPropertyValue(MizoTitanHBaseRelationParser.java:179) at mizo.iterators.MizoBaseRelationsIterator.handleProperty(MizoBaseRelationsIterator.java:87) at mizo.iterators.MizoBaseRelationsIterator.getEdgeOrNull(MizoBaseRelationsIterator.java:46)

Imri Hecht · Answer 27 · Wed Dec 14 2016 16:08:08 GMT+0800 (China Standard Time)

Ok I will check it later today. Shortly - Mizo was never tested on a graph with vertex labels, so thats probably the issue. Can you describe your Titan schema? Which edges do you have, their types etc?

…

On Wed, 14 Dec 2016 at 10:05 ChaohsinChan ***@***.***> wrote: The problem above was solved, but there was aslo a mistake: java.lang.IllegalArgumentException: Invalid ASCII encoding offset: 625 at com.thinkaurelius.titan.graphdb.database.serialize.attribute.StringSerializer.read(StringSerializer.java:105) at mizo.hbase.MizoTitanHBaseRelationParser.readPropertyValue(MizoTitanHBaseRelationParser.java:179) at mizo.iterators.MizoBaseRelationsIterator.handleProperty(MizoBaseRelationsIterator.java:87) at mizo.iterators.MizoBaseRelationsIterator.getEdgeOrNull(MizoBaseRelationsIterator.java:46) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABPy_1wqv8jl7s-enhYRyVplAdd2be4vks5rH6NmgaJpZM4LKZ6v> .

ChaohsinChan · Answer 28 · Wed Dec 14 2016 16:12:55 GMT+0800 (China Standard Time)

I use the Titan example,Graph Of The Gods,you can see here http://s3.thinkaurelius.com/docs/titan/1.0.0/getting-started.html

Imri Hecht · Answer 29 · Wed Dec 14 2016 16:15:05 GMT+0800 (China Standard Time)

Ok, I will check it soon.

…

On Wed, 14 Dec 2016 at 10:12 ChaohsinChan ***@***.***> wrote: I use the Titan example,Graph Of The Gods,you can see here http://s3.thinkaurelius.com/docs/titan/1.0.0/getting-started.html <http://url> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABPy__uk9q-Vl1In_Yciqb7KeCX00rFJks5rH6UHgaJpZM4LKZ6v> .

Imri Hecht · Answer 30 · Mon Dec 19 2016 05:55:06 GMT+0800 (China Standard Time)

Fixed the bug - checked using the Graph of the Gods, works :)
Also updated the project to use Maven

Let me know if it works for you.

ChaohsinChan · Answer 31 · Thu Dec 22 2016 10:31:11 GMT+0800 (China Standard Time)

There was aslo a mistake,how can I resolve it? Should be guava version of the conflict

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsedTime(Ljava/util/concurrent/TimeUnit;)J at com.google.common.cache.LocalCache$LoadingValueReference.elapsedNanos(LocalCache.java:3600) at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2412) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2373) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2335) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2250) at com.google.common.cache.LocalCache.get(LocalCache.java:3985) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4788) at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$6$6.call(StandardTitanTx.java:1244) at com.thinkaurelius.titan.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:268) at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$6.execute(StandardTitanTx.java:1258) at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$6.execute(StandardTitanTx.java:1126) at com.thinkaurelius.titan.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:198) at com.thinkaurelius.titan.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:54) at com.thinkaurelius.titan.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:40) at com.thinkaurelius.titan.graphdb.query.ResultSetIterator.<init>(ResultSetIterator.java:30) at com.thinkaurelius.titan.graphdb.query.QueryProcessor.iterator(QueryProcessor.java:57) at com.google.common.collect.Iterables$7.iterator(Iterables.java:613) at java.lang.Iterable.forEach(Iterable.java:74) at mizo.rdd.MizoRDD.loadRelationTypes(MizoRDD.java:149) at mizo.rdd.MizoRDD.<init>(MizoRDD.java:71) at mizo.rdd.MizoBuilder$1.<init>(MizoBuilder.java:53) at mizo.rdd.MizoBuilder.edgesRDD(MizoBuilder.java:53) at MizoEdgesCounter.main(MizoEdgesCounter.java:32) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)

Imri Hecht · Answer 32 · Thu Dec 22 2016 15:22:22 GMT+0800 (China Standard Time)

This error is caused because there is a mismatch between Titan and other components version of Guava.

I succeed to run the code for HBase 1.0.3 -- try to checkout the code into a new directory and run it from there, without any modifications. Should work

ChaohsinChan · Answer 33 · Thu Dec 22 2016 17:33:59 GMT+0800 (China Standard Time)

When I run it without any modifications,there was an error here:
Exception in thread "main" java.lang.IllegalArgumentException: Could not find implementation class: com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager
at com.thinkaurelius.titan.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:47)
at com.thinkaurelius.titan.diskstorage.Backend.getImplementationClass(Backend.java:473)
at com.thinkaurelius.titan.diskstorage.Backend.getStorageManager(Backend.java:407)
at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.(GraphDatabaseConfiguration.java:1320)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:94)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:62)
at mizo.rdd.MizoRDD.loadRelationTypes(MizoRDD.java:141)

Imri Hecht · Answer 34 · Thu Dec 22 2016 17:49:56 GMT+0800 (China Standard Time)

Pushed an update for fixing this, try now - working for me

ChaohsinChan · Answer 35 · Fri Dec 23 2016 18:49:06 GMT+0800 (China Standard Time)

I get result,but there was an error here when the job completed:

27490 [main] INFO org.apache.spark.scheduler.DAGScheduler - Job 0 finished: count at MizoEdgesCounter.java:34, took 2.037018 s
Edges count is: 34

27871 [DestroyJavaVM] WARN com.thinkaurelius.titan.graphdb.database.StandardTitanGraph - Unable to remove graph instance uniqueid c0a8adc387204-DE0018-PC1 com.thinkaurelius.titan.core.TitanException: Could not execute operation due to backend exception at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:44) at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:144) at com.thinkaurelius.titan.diskstorage.configuration.backend.KCVSConfiguration.set(KCVSConfiguration.java:141) at com.thinkaurelius.titan.diskstorage.configuration.backend.KCVSConfiguration.set(KCVSConfiguration.java:118) at com.thinkaurelius.titan.diskstorage.configuration.backend.KCVSConfiguration.remove(KCVSConfiguration.java:159) at com.thinkaurelius.titan.diskstorage.configuration.ModifiableConfiguration.remove(ModifiableConfiguration.java:42) at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.closeInternal(StandardTitanGraph.java:191) at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.access$600(StandardTitanGraph.java:78) at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph$ShutdownThread.start(StandardTitanGraph.java:803) at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:102) at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46) at java.lang.Shutdown.runHooks(Shutdown.java:123) at java.lang.Shutdown.sequence(Shutdown.java:167) at java.lang.Shutdown.shutdown(Shutdown.java:234) Caused by: com.thinkaurelius.titan.diskstorage.PermanentBackendException: Permanent exception while executing backend operation setConfiguration at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69) at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:42) ... 13 more Caused by: java.lang.IllegalArgumentException: Connection is null or closed. at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:310) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:712) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:694) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:532) at com.thinkaurelius.titan.diskstorage.hbase.HConnection1_0.getTable(HConnection1_0.java:22) at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.mutateMany(HBaseStoreManager.java:424) at com.thinkaurelius.titan.diskstorage.hbase.HBaseKeyColumnValueStore.mutateMany(HBaseKeyColumnValueStore.java:189) at com.thinkaurelius.titan.diskstorage.hbase.HBaseKeyColumnValueStore.mutate(HBaseKeyColumnValueStore.java:88) at com.thinkaurelius.titan.diskstorage.locking.consistentkey.ExpectedValueCheckingStore.mutate(ExpectedValueCheckingStore.java:65) at com.thinkaurelius.titan.diskstorage.configuration.backend.KCVSConfiguration$2.call(KCVSConfiguration.java:146) at com.thinkaurelius.titan.diskstorage.configuration.backend.KCVSConfiguration$2.call(KCVSConfiguration.java:141) at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:133) at com.thinkaurelius.titan.diskstorage.util.BackendOperation$1.call(BackendOperation.java:147) at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:56) ... 14 more

Imri Hecht · Answer 36 · Fri Dec 23 2016 18:51:05 GMT+0800 (China Standard Time)

I will fix it soon. Did you succeed?

ChaohsinChan · Answer 37 · Fri Dec 23 2016 18:53:30 GMT+0800 (China Standard Time)

Yes! In addition to the above error, I have to get the results, it is not easy！

ChaohsinChan · Answer 38 · Fri Dec 23 2016 18:56:14 GMT+0800 (China Standard Time)

I will traverse all the vertex information soon, check the vertex information is correct or not.

Imri Hecht · Answer 39 · Fri Dec 23 2016 19:00:06 GMT+0800 (China Standard Time)

Ok keep me updated :)

ChaohsinChan · Answer 40 · Thu Jan 12 2017 21:58:11 GMT+0800 (China Standard Time)

How can I bulk import data to Titan, can you give me some advice? I have 100GB of data. Thanks.

Imri Hecht · Answer 41 · Sat Jan 14 2017 17:15:07 GMT+0800 (China Standard Time)

Hey,
Create a new transaction that uses batches (TitanGraph.buildTransaction().enableBatchLoading().checkExternalVertexExistence(false)), then commit() the transaction every X insertions, for example 50k.

huangfei · Answer 42 · Tue Jan 17 2017 21:23:22 GMT+0800 (China Standard Time)

Hello imri,
Thank you for the great work on mizo.
I meet same problems described in the questions in stackoverflow:
Q1: http://stackoverflow.com/questions/41121262/reading-a-large-graph-from-titan-on-hbase-into-spark?rq=1
Q2:http://stackoverflow.com/questions/35464538/how-to-process-large-titan-graph-using-spark
Until now, i can't find good practice by Titan with Spark for OLAP.
Do you have tried to directly use SparkGraphComputer to do OLAP? do you have any example codes?
In the TitanBlueprintsGraph.java file,when override the computer method:

@Override public <C extends GraphComputer> C compute(Class<C> graphComputerClass) throws IllegalArgumentException { if (!graphComputerClass.equals(FulgoraGraphComputer.class)) { throw Graph.Exceptions.graphDoesNotSupportProvidedGraphComputer(graphComputerClass); } else { return (C)compute(); } }
So i think,when i create TitanGraph,it don't support SparkGraphComputer, I can only create hadoopgraph by graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties'), how can it do the tranversal of Titan graph DB? I can't find how it scan the HBase tables.
Can you have any example code for SparkGraphComputer work with titan?

Thank you very much.

Imri Hecht · Answer 43 · Wed Jan 18 2017 05:15:49 GMT+0800 (China Standard Time)

Hey,

This answer might be helpful.

I have used SparkGraphComputer using Titan, but I malfunctions, and is really buggy. In order for this to work, you have to use HadoopGraph (as specified in the answer above), which internally uses an InputFormat to read the graph. Titan's implementation of InputFormat was buggy - first of all, it skips vertices (if you count the number of vertices using the InputFormat, you get a wrong answer). Second, it crashes in some circumstances (for example, an edge that connected vertex to itself). Third, SparkGraphComputer is really really slow - I haven't researched why. To sum up - as far as I'm concerned - SparkGraphComputer is bad.

What are you trying to achieve? Tell me more, maybe we can figure it out using Mizo.

Best regards

huangfei · Answer 44 · Wed Jan 18 2017 10:25:22 GMT+0800 (China Standard Time)

Thank you very much! So excited that you answered me. (Please ignore my english grammatical errors).
Now i am trying to use Titan to store some relation data about users,user follow relation, user's goods for sell (Second hand). And then i want to do some OLAP analyze to do some relation recommend,goods recommend, user cluster divide and so on.
For example:
Case1: A follow B, B follow C, and maybe A will be interesting with C.
Case2: I want to find why and how users follow another one, if there are any common features.

Now,I have already build my Titan Cluster using HBase + ElasticSearch as backend for OLTP service, and i am trying to build my OLAP environment based on Titan and Spark,but found there is no good document. And even Titan don't support Spark well.

When i found the mizo project, i think maybe i can do OLAP on Spark GraphX. I mean, i only scan my Titan Hbase table for all vertices and edges into Spark, and use Spark GraphX to do the analyze. Is this possible？

Thank you again !

Imri Hecht · Answer 45 · Wed Jan 18 2017 16:15:49 GMT+0800 (China Standard Time)

So if I get you right, you are willing to expand from a given vertex through multiple hops. Mizo only allows you to expand from a given vertex to its direct edges.

I haven't used GraphX, but as far as I'm concerned, it should be really easy to integrate Mizo with it, since it only expects an RDD of edges, so you can convert Mizo EdgesRDD to a RDD of GraphX edges. I'm not sure what you'll be able to achieve using GraphX, but give it a try.

If you need any help, let me know.

huangfei · Answer 46 · Wed Jan 18 2017 19:31:24 GMT+0800 (China Standard Time)

Thank you,i will have a try.

huangfei · Answer 47 · Tue Feb 07 2017 15:03:28 GMT+0800 (China Standard Time)

Hello imri,
I have started a spark OLAP task based on Titan &Hbase & Gremlin Spark Computer, But as your experiments, it works very slow, when i have 150 Vertex in the graph,it costs 4 minutes,and when there are 10millon vertex, it cost too long time.
Here it seems stop in readRDD from Titan.

My Hbase version is 0.94,but i found in mizo, it depends 1.0.2 hbase client. and my Hbase in production envrionment don't allow me directly read HFiles...

I am trying to solve these problems.

PS：I have a questions about using Titan, is is there any way to create the property key first, commit and then later do indexing? Because when i write properties without create index(using eslatic search),it have errors.

huangfei · Answer 48 · Wed Feb 08 2017 11:39:26 GMT+0800 (China Standard Time)

Hello,
I have successfully run the edges and vertices count test case user Mizo! Thank you. I am using hbase 0.98,spark1.5.1 and the Titan's God graph.
I still have some questions,the vertices count is not right, there are 17 edges,but the mizo count result eges coutn is 32. it is not 17*2.
Then i build a very simple graph, only 3 vertices, And after my test by mizo, it found the vertices count is 10, there are 7 non-related vertices, i think these edges may be index or some internal use vertices in Titan. i think this maybe related with 'Multiple Item Data Model(ref:http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Tools.TitanDB.BestPractices.html )',because when i scan my the table by Hbase shell, the same rowkey with more values.

in the MizoRDD.java file, when loading relationType, why these label configured for Vertices ignored. If i need the vertices label info it is impossible.

protected static HashMap<Long, MizoTitanRelationType> loadRelationTypes(String titanConfigPath)
{
    ...
                .forEach(v -> {
                    if (v instanceof InternalRelationType)
                        relations.put(v.longId(), new MizoTitanRelationType((InternalRelationType)v));
                });
}

when i use Hbase v0.98, In the MizoRegionFamilyCellsIterator.java, in the ASC_CELL_COMPARATOR, there are no CellComparator.compareRows and compareTimestamps method. so i changed to them to compareStatic,like follow code.

    private Comparator<Cell> ASC_CELL_COMPARATOR = (left, right) -> {
        int c = CellComparator.compareStatic(left, right);
        if (c != 0) {
            return c;
        } else {
            if (left.getFamilyLength() + left.getQualifierLength() == 0 &&
                    left.getTypeByte() == KeyValue.Type.Minimum.getCode()) {
                return 1;
            } else if (right.getFamilyLength() + right.getQualifierLength() == 0 &&
                    right.getTypeByte() == KeyValue.Type.Minimum.getCode()) {
                return -1;
            } else {
                boolean sameFamilySize = left.getFamilyLength() == right.getFamilyLength();
                if (!sameFamilySize) {
                    return Bytes.compareTo(left.getFamilyArray(), left.getFamilyOffset(), left.getFamilyLength(),
                            right.getFamilyArray(), right.getFamilyOffset(), right.getFamilyLength());
                } else {
                    int diff = CellComparator.compareStatic(left, right);
                    if (diff != 0) {
                        return diff;
                    } else {
                        c = Longs.compare(right.getTimestamp(), left.getTimestamp());
                        if (c != 0) diff=c;
                        //diff = CellComparator.compareTimestamps(right, left); // Different from CellComparator.compare()
                        return diff != 0 ? diff : (255 & right.getTypeByte()) - (255 & left.getTypeByte());
                    }
                }
            }
        }
    };

i am not quite under this part, why need Creates an ascending-sorted cells iterator, what does Cell mean, it is a properties or Edge in the one row?
Any suggested document for me to understand Htable,regionfamily,cell etc.
Any suggested document for me to understand Titan datamodule?