How to use multi-thread in multi-label setting

Question

How to use multi-thread in multi-label setting

ZMK112 opened this issue 6 years ago · comments

Hello, everyone, I am a green hand to Alipy. I am so lucky to find such a good project: Alipy.
I haven't found a example for using multi-thread in multi-label setting, so I write one referring to aceThreading_usage.py

from sklearn.datasets import load_iris
from sklearn.preprocessing import OneHotEncoder
import numpy as np
from alipy import ToolBox
from alipy.data_manipulate import StandardScale
from alipy.query_strategy.multi_label import QueryMultiLabelMMC, LabelRankingModel
from alipy.index import get_Xy_in_multilabel
from alipy.experiment import State

# load data
X, y = load_iris(return_X_y=True)
mlb = OneHotEncoder()
mult_y = mlb.fit_transform(y.reshape((-1, 1)))
mult_y = np.asarray(mult_y.todense())
mult_y[mult_y == 0] = -1

# init alibox
alibox = ToolBox(X=X, y=mult_y, query_type='PartLabels')
alibox.split_AL(test_ratio=0.3, initial_label_rate=0.1, all_class=False)


def target_func(round, train_id, test_id, Lcollection, Ucollection, saver, examples, labels, global_parameters):
    qs = QueryMultiLabelMMC(examples, labels)
    model = LabelRankingModel()

    while len(Ucollection) > 30:
        select_index = qs.select(Lcollection, Ucollection)
        Ucollection.difference_update(select_index)
        Lcollection.update(select_index)

        # update model
        X_tr, y_tr, _ = get_Xy_in_multilabel(Lcollection, X=examples, y=labels, unknown_element=0)
        model.fit(X=X_tr, y=y_tr)
        _, pred = model.predict(examples[test_id, :])

        # calculate micro—f1
        Z = pred
        Y = labels[test_id]
        precision = np.sum(Z & Y) / max(1, np.sum(Z))
        recall = np.sum(Z & Y) / max(1, np.sum(Y))
        micro_f1 = 0 if precision == 0 and recall == 0 else \
                (2 * precision * recall) / (precision + recall)

        # save intermediate results
        st = State(select_index=select_index, performance=micro_f1)
        saver.add_state(st)
        saver.save()


# init acethread
acethread = alibox.get_ace_threading(target_function=target_func)
acethread.start_all_threads()

# get the result,return list of stateIO
stateIO_list = acethread.get_results()

# save the state of multi_thread to the saving_path in pkl form
acethread.save()

I always receive a error:
Exception: Label_size can not be induced from fully labeled set, label_size must be provided
in the line
acethread = alibox.get_ace_threading(target_function=target_func)
Can anyone hele me? I will be very grateful!

Tang · Answer 1 · Fri Mar 22 2019 15:45:22 GMT+0800 (China Standard Time)

Hi, this exception will be raised if the label_size parameter is not given and the inference of its value is also failed when initializing a MultiLabelIndexCollection object.

This should be a bug in the multi thread class.
You can update alipy with the code in dev branch, or create the acethread object manually:

from alipy.utils.multi_thread import aceThreading
acethread  = aceThreading(examples=X, labels=mult_y,
                            train_idx=train_idx, test_idx=test_idx,
                            label_index=label_idx,
                            unlabel_index=unlabel_idx,
                            refresh_interval=refresh_interval,
                            max_thread=max_thread,
                            saving_path=saving_path,
                            target_func=target_function)

The label_idx and unlabel_idx should be a list of MultiLabelIndexCollection object.

Besides, the predicted (and ground-truth mult_y[mult_y == 0] = -1) irrelevant label is -1 not 0 in LabelRank model, so your code of calculation of micro-f1 may not work.

ZMK112 · Answer 2 · Sat Mar 23 2019 16:51:06 GMT+0800 (China Standard Time)

Thanks for your patient answer!