fvalle1 / nsbm

nSBM: multi branch topic modeling

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DOI Documentation Status Python package Conda test Docker GPL

multipartite Stochastic Block Modeling

Inheriting hSBM from https://github.com/martingerlach/hSBM_Topicmodel extends it to tripartite networks (aka supervised topic models)

The idea is to run SBM-based topic modeling on networks given keywords on documents

network

Install

With pip

python3 -m pip install . -vv

With conda/mamba

conda install -c conda-forge nsbm

Example

from nsbm import nsbm
import pandas as pd
import numpy as np

df = pd.DataFrame(
index = ["w{}".format(w) for w in range(1000)],
columns = ["doc{}".format(d) for d in range(250)],
data = np.random.randint(1, 100, 250000).reshape((1000, 250)))

df_key_list = []

## keywords
df_key_list.append(
    pd.DataFrame(
    index = ["keyword{}".format(w) for w in range(100)],
    columns = ["doc{}".format(d) for d in range(250)],
    data = np.random.randint(1, 10, (100, 250)))
)
    
## authors
df_key_list.append(
    pd.DataFrame(
    index = ["author{}".format(w) for w in range(10)],
    columns = ["doc{}".format(d) for d in range(250)],
    data = np.random.randint(1, 5, (10, 250)))
)
    
## other features
df_key_list.append(
    pd.DataFrame(
    index = ["feature{}".format(w) for w in range(25)],
    columns = ["doc{}".format(d) for d in range(250)],
    data = np.random.randint(1, 5, (25, 250)))
)

model = nsbm()
model.make_graph_multiple_df(df, df_key_list)

model.fit(n_init=1, B_min=50, verbose=False)
model.save_data()

Run with Docker

docker run -it -u jovyan -v $PWD:/home/jovyan/work -p 8899:8888 docker.pkg.github.com/fvalle1/trisbm/trisbm:latest

If a graph.xml.gz file is found in the current dir the analysis will be performed on it.

Tests

python3 tests/run_tests.py

Caveats

Please check this stuff in your data:

  • there should be no zero-degree nodes (all nodes should have at least one link)
  • there shouldn't be any duplicate node
  • The make_form_BoW_df function discretises the data

Documentation

Docs

Readthedocs

License

See LICENSE.

This work is in part based on sbmtm

Third party libraries

This package depends on graph-tool

About

nSBM: multi branch topic modeling

License:GNU General Public License v3.0


Languages

Language:Jupyter Notebook 96.7%Language:Python 3.3%Language:Batchfile 0.0%Language:Makefile 0.0%Language:Dockerfile 0.0%