How to use multi-thread in multi-label setting
ZMK112 opened this issue · comments
Hello, everyone, I am a green hand to Alipy. I am so lucky to find such a good project: Alipy
.
I haven't found a example for using multi-thread in multi-label setting, so I write one referring to aceThreading_usage.py
from sklearn.datasets import load_iris
from sklearn.preprocessing import OneHotEncoder
import numpy as np
from alipy import ToolBox
from alipy.data_manipulate import StandardScale
from alipy.query_strategy.multi_label import QueryMultiLabelMMC, LabelRankingModel
from alipy.index import get_Xy_in_multilabel
from alipy.experiment import State
# load data
X, y = load_iris(return_X_y=True)
mlb = OneHotEncoder()
mult_y = mlb.fit_transform(y.reshape((-1, 1)))
mult_y = np.asarray(mult_y.todense())
mult_y[mult_y == 0] = -1
# init alibox
alibox = ToolBox(X=X, y=mult_y, query_type='PartLabels')
alibox.split_AL(test_ratio=0.3, initial_label_rate=0.1, all_class=False)
def target_func(round, train_id, test_id, Lcollection, Ucollection, saver, examples, labels, global_parameters):
qs = QueryMultiLabelMMC(examples, labels)
model = LabelRankingModel()
while len(Ucollection) > 30:
select_index = qs.select(Lcollection, Ucollection)
Ucollection.difference_update(select_index)
Lcollection.update(select_index)
# update model
X_tr, y_tr, _ = get_Xy_in_multilabel(Lcollection, X=examples, y=labels, unknown_element=0)
model.fit(X=X_tr, y=y_tr)
_, pred = model.predict(examples[test_id, :])
# calculate micro—f1
Z = pred
Y = labels[test_id]
precision = np.sum(Z & Y) / max(1, np.sum(Z))
recall = np.sum(Z & Y) / max(1, np.sum(Y))
micro_f1 = 0 if precision == 0 and recall == 0 else \
(2 * precision * recall) / (precision + recall)
# save intermediate results
st = State(select_index=select_index, performance=micro_f1)
saver.add_state(st)
saver.save()
# init acethread
acethread = alibox.get_ace_threading(target_function=target_func)
acethread.start_all_threads()
# get the result,return list of stateIO
stateIO_list = acethread.get_results()
# save the state of multi_thread to the saving_path in pkl form
acethread.save()
I always receive a error:
Exception: Label_size can not be induced from fully labeled set, label_size must be provided
in the line
acethread = alibox.get_ace_threading(target_function=target_func)
Can anyone hele me? I will be very grateful!
Hi, this exception will be raised if the label_size
parameter is not given and the inference of its value is also failed when initializing a MultiLabelIndexCollection
object.
This should be a bug in the multi thread class.
You can update alipy with the code in dev branch, or create the acethread
object manually:
from alipy.utils.multi_thread import aceThreading
acethread = aceThreading(examples=X, labels=mult_y,
train_idx=train_idx, test_idx=test_idx,
label_index=label_idx,
unlabel_index=unlabel_idx,
refresh_interval=refresh_interval,
max_thread=max_thread,
saving_path=saving_path,
target_func=target_function)
The label_idx
and unlabel_idx
should be a list of MultiLabelIndexCollection
object.
Besides, the predicted (and ground-truth mult_y[mult_y == 0] = -1
) irrelevant label is -1 not 0 in LabelRank model, so your code of calculation of micro-f1 may not work.
Thanks for your patient answer!