kevin-xw / Dogee

C++ extension for shared memory distributed programming

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dogee

Dogee is a C++ library for distributed programming on distributed shared memory (DSM) by shared memory multithreading model. Usually, DSM systems provide developers "get" and "set" APIs to use the shared memory. Dogee allows developers to operate the distributed shared memory in a similar way they operate local memory by C++, without using "get" and "set" explicitly. By using Birdee, developers can create arrays, shared variables and objects in DSM.

Birdee is a sister project, which is a new distributed programming language. Dogee can be viewed as Birdee in C++.

Build instructions

Build Dogee on Ubuntu

Make sure your g++ compiler supports c++11 features.

sudo apt-get install libmemcached-dev
make

Now the binary files will be ready in the "bin" directory.

Build Dogee on Windows

Dogee is based on libmemcached. This repository has included the libmemcached library (.lib and .dll) for Windows (for both debug and release mode and both x86 and x64 mode). You just need to open "Dogee.sln" and build Dogee with Visual Studio 2013 (or newer). Note that there are some bugs in the compiler of original version of VS2013, and you should update VS2013 to make Dogee compile.

Execution instructions

In this section, we take the logistic regression (which you can find in the "example" directory) as an example. After you build it, you will get a binary "LogsiticRegression" in "bin" directory (in Ubuntu). Copy the binary file to all machines in the cluster.

Start the slave node

./LogsiticRegression -s 18080

This command will make the program run in slave mode and wait for the connections from the master, listening port "18080".

Write a config file on the master

Create a file "DogeeConfig.txt" on the master node. The content of the file should be like:

DogeeConfigVer= 1
MasterPort= 9090
NumSlaves= 2
NumMemServers= 1
DSMBackend= ChunkMemcached
DSMCache= NoCache
Slaves= 
127.0.0.1 18080
127.0.0.1 18090
MemServers=
127.0.0.1 11211

Run the master node and the whole cluster

Make sure the file "DogeeConfig.txt" is in the current directory. Then run

./LogsiticRegression num_param 22283 num_points 1000 aaa 5896 iter_num 300 thread_num 2 step_size 0.005 test_partition 0.2

Note that the parameters following the command "LogsiticRegression" are user-defined, which in this case is the settings for logistic regression.

API Mannual

Dogee Class and Object Reference

Include header file "DogeeBase.h" and "DogeeMacro.h" to use the features.

To define a class to be stored in DSM, the class or its base class should extend the Dogee base class "Dogee::DObject". A "referenece" in Dogee is the counterpart of pointers in C++, while pointers point to objects in local memory, and refereneces point to objects in shared memory.

To declare member variables in the class, you should first "call" the macro "DefBegin(BASE_CLASS_NAME);". Then define the members by macro "Def(VARIABLE_NAME,TYPE)". Define referenece by "DefRef(Type,isVirtual,Name)", or by "Def(Dogee::Ref<TYPE>,VARIABLE_NAME)". Use "self" instead of "this" in the class's member functions. "Call" macro "DefEnd();" after defining the last member variable of a class. An example of defining a Dogee class.

class clsa : public DObject
{
	DefBegin(DObject);
public:
	Def(i,int);
	Def(arr,Array<float>);
	Def(next,Array<Ref<clsa>>);
	Def(prv,Ref<clsb, true>);
	DefEnd();
	clsa(ObjectKey obj_id) : DObject(obj_id)
	{
	}
	clsa(ObjectKey obj_id,int a) : DObject(obj_id)
	{

		self->arr = NewArray<float>();
		self->next = NewArray<Ref<clsa>>();
		self->arr[2] = a;
	}
};

IMPORTANT NOTES:

  • The macro "DefEnd();" should be delared to be public in a Dogee class.
  • The first parameter of the consturtors should always be "ObjectKey obj_id", and the base class constructor BASE_CLASS(obj_id) should always be called
  • The constructor CLASS_NAME(ObjectKey obj_id) : BASE_CLASS_NAME(obj_id) should always be declared and have empty body.

To create Dogee class instance, use Dogee::NewObj<CLASS_NAME>(PARAMETER_LIST), where PARAMETER_LIST is the list of parameters for the class constructor. However, the first paramter of the constructors is always "ObjectKey obj_id", which is provided by the Dogee system, and you should omit the first paramter of the constructors in PARAMETER_LIST. For example, to create a "clsa" object defined above.

Ref<clsa> ptr = NewObj<clsa>(32);

The parameter "32" is passed to the variable "i" in the constructor "clsa(ObjectKey obj_id,int a)".

We then have a reference to an object. A reference is declared by "Dogee:Ref<CLASS_NAME,isVirtual>". There are two types of references in Dogee. The first one is "non-virtual reference". Non-virtual reference inteprets the referenced objects as "CLASS_NAME" in the declaration, and use virtual function table of the class "CLASS_NAME". Non-virtual references may not accurately intepret the object, since the wrong virtual function can be called by using Non-virtual references. However, for a class without virtual functions, Non-virtual references are always accurate. Declare a Non-virtual reference by Dogee:Ref<CLASS_NAME,false> or by Dogee:Ref<CLASS_NAME> (References are non-virtual by default). Virtual references dynamically find the correct virtual function table for the refernced object, and virtual function calls are always accurately intepreted. Declare a Non-virtual reference by Dogee:Ref<CLASS_NAME,true>.

References can be used in the same way of pointers.

ptr->i = 0;
Ref<clsb, true> p2 = Dogee::NewObj<clsb>();
ptr->prv=p2;

About

C++ extension for shared memory distributed programming

License:Apache License 2.0


Languages

Language:C++ 96.6%Language:C 1.8%Language:Makefile 1.5%Language:Shell 0.1%