Fast & Enhanced implementation of Local Context Focus.
Build from LC-ABSA / LCF-ABSA / LCF-BERT and LCF-ATEPC.
Provide tutorials of training and usages of ATE and APC models.
PyTorch Implementations (CPU & CUDA supported).
This is an ASBA research-oriented code repository. I notice that some Repos do not provide the inference script, and the codes may be redundant or hard to use, so I build PyABSA to make the training and inference easier. PyABSA contains ATEPC and APC models now. Except for providing SOTA models for both ATEPC and APC, some source codes in PyABSA are reusable. In another word, you can develop your model based on PyABSA. e.g., using efficient local context focus implementation from PyASBA. Please feel free to give me your interesting thoughts, to help me build an easy-to-use toolkit to reduce the cost of building models and reproduction in ABSA tasks.
If you are looking for the original codes of the LCF-related papers, please redirect to LC-ABSA / LCF-ABSA or LCF-ATEPC.
Please star this repository in order to keep notified of new features or tutorials in PyABSA. To use PyABSA, install the latest version from pip or source code:
pip install -U pyabsa
Then try our tutorials and have fun!
- SLIDE-LCF-BERT * (Faster & Performs Better than LCF/LCFS-BERT)
- SLIDE-LCFS-BERT * (Faster & Performs Better than LCF/LCFS-BERT)
- LCF-BERT (Reimplemented & Enhanced)
- LCFS-BERT (Reimplemented & Enhanced)
- FAST-LCF-BERT (Faster with slightly performance loss)
- FAST_LCFS-BERT (Faster with slightly performance loss)
- LCF-BERT-LARGE (Dual BERT)
- LCFS-BERT-LARGE (Dual BERT)
- BERT-BASE
- BERT-SPC
- LCA-Net
- Copyrights Reserved, please wait for the publishing of our paper to get the introduction of them in detail.
Models | Laptop14 (acc) | Rest14 (acc) | Rest15 (acc) | Rest16 (acc) |
---|---|---|---|---|
SLIDE-LCFS-BERT (CDW) | 81.35 | 88.04 | 85.93 | 92.52 |
SLIDE-LCFS-BERT (CDM) | 82.13 | 87.5 | 85.37 | 92.36 |
SLIDE-LCF-BERT (CDW) | - | - | - | - |
SLIDE-LCF-BERT (CDM) | - | - | - | - |
The optimal performance result among three random seeds. Note that with the update of this repo, the results could be updated. We are working on the construct of leaderboard, you can help us by reporting performance of other models.
PyABSA will check the latest available checkpoints before and load the latest checkpoint from Google Drive. To view available checkpoints, you can use the following code and load the checkpoint by name:
from pyabsa import update_checkpoints
checkpoint_map = update_checkpoints()
Sentence with predicted labels:
关(O) 键(O) 的(O) 时(O) 候(O) 需(O) 要(O) 表(O) 现(O) 持(O) 续(O) 影(O) 像(O) 的(O) 短(B-ASP) 片(I-ASP) 功(I-ASP) 能(I-ASP) 还(O) 是(O) 很(O) 有(O) 用(O) 的(O)
{'aspect': '短 片 功 能', 'position': '14,15,16,17', 'sentiment': '1'}
Sentence with predicted labels:
相(O) 比(O) 较(O) 原(O) 系(O) 列(O) 锐(B-ASP) 度(I-ASP) 高(O) 了(O) 不(O) 少(O) 这(O) 一(O) 点(O) 好(O) 与(O) 不(O) 好(O) 大(O) 家(O) 有(O) 争(O) 议(O)
{'aspect': '锐 度', 'position': '6,7', 'sentiment': '0'}
Sentence with predicted labels:
It(O) was(O) pleasantly(O) uncrowded(O) ,(O) the(O) service(B-ASP) was(O) delightful(O) ,(O) the(O) garden(B-ASP) adorable(O) ,(O) the(O) food(B-ASP) -LRB-(O) from(O) appetizers(B-ASP) to(O) entrees(B-ASP) -RRB-(O) was(O) delectable(O) .(O)
{'aspect': 'service', 'position': '7', 'sentiment': 'Positive'}
{'aspect': 'garden', 'position': '12', 'sentiment': 'Positive'}
{'aspect': 'food', 'position': '16', 'sentiment': 'Positive'}
{'aspect': 'appetizers', 'position': '19', 'sentiment': 'Positive'}
{'aspect': 'entrees', 'position': '21', 'sentiment': 'Positive'}
Sentence with predicted labels:
Check the detailed usages in ATE examples directory.
from pyabsa import train_atepc, atepc_config_handler
from pyabsa import ABSADatasets
from pyabsa import ATEPCModelList
param_dict = atepc_config_handler.get_apc_param_dict_chinese()
atepc_param_dict_chinese['model'] = ATEPCModelList.LCF_ATEPC
atepc_param_dict_chinese['log_step'] = 20
atepc_param_dict_chinese['evaluate_begin'] = 5
save_path = 'state_dict'
chinese_sets = ABSADatasets.Chinese
sent_classifier = train_apc(parameter_dict=param_dict, # set param_dict=None to use default model
dataset_path=chinese_sets, # train set and test set will be automatically detected
model_path_to_save=save_path, # set model_path_to_save=None to avoid save model
auto_evaluate=True, # evaluate model while training_tutorials if test set is available
auto_device=True # automatic choose CUDA or CPU
)
from pyabsa import load_aspect_extractor
from pyabsa import ATEPCTrainedModelManager
examples = ['相比较原系列锐度高了不少这一点好与不好大家有争议',
'这款手机的大小真的很薄,但是颜色不太好看, 总体上我很满意啦。'
]
model_path = ATEPCTrainedModelManager.get_checkpoint(checkpoint_name='Chinese')
sentiment_map = {0: 'Bad', 1: 'Good', -999: ''}
aspect_extractor = load_aspect_extractor(trained_model_path=model_path,
sentiment_map=sentiment_map, # optional
auto_device=False # False means load model on CPU
)
atepc_result = aspect_extractor.extract_aspect(examples=examples, # list-support only, for now
print_result=True, # print the result
pred_sentiment=True, # Predict the sentiment of extracted aspect terms
)
Check the detailed usages in APC examples directory.
love selena gomez !!!! she rock !!!!!!!!!!!!!!!! and she 's cool she 's my idol
selena gomez --> Positive Real: Positive (Correct)
thehils Heard great things about the ipad for speech/communication . Educational discounts are problem best bet . Maybe Thanksgiving ?
ipad --> Neutral Real: Neutral (Correct)
Jamie fox , Eddie Murphy , and barack obama because they all are exciting , cute , and inspirational to lots of people including me !!!
barack obama --> Positive Real: Neutral (Wrong)
from pyabsa import train_apc, apc_config_handler
from pyabsa import APCModelList
from pyabsa import ABSADatasets
param_dict = apc_config_handler.get_atepc_param_dict_english()
apc_param_dict_english['model'] = APCModelList.SLIDE_LCF_BERT
apc_param_dict_english['evaluate_begin'] = 2 # to reduce evaluation times and save resources
apc_param_dict_english['similarity_threshold'] = 1
apc_param_dict_english['max_seq_len'] = 80
apc_param_dict_english['dropout'] = 0.5
apc_param_dict_english['log_step'] = 5
apc_param_dict_english['l2reg'] = 0.0001
apc_param_dict_english['dynamic_truncate'] = True
apc_param_dict_english['srd_alignment'] = True
check parameter introduction and learn how to set them
laptop14 = ABSADatasets.Laptop14 # Here I use the integrated dataset, you can use your dataset instead
sent_classifier = train_apc(parameter_dict=apc_param_dict_english, # ignore this parameter will use defualt setting
dataset_path=laptop14, # datasets will be recurrsively detected in this path
model_path_to_save=save_path, # ignore this parameter to avoid saving model
auto_evaluate=True, # evaluate model if testset is available
auto_device=True # automatic choose CUDA if any, False means always use CPU
)
from pyabsa import load_sentiment_classifier
from pyabsa import ABSADatasets
from pyabsa.models import APCTrainedModelManager
# 如果有需要,使用以下方法自定义情感索引到情感标签的词典, 其中-999为必需的填充, e.g.,
sentiment_map = {0: 'Negative', 1: 'Neutral', 2: 'Positive', -999: ''}
# Here I provided some pre-trained models in case of having no resource to train a model,
# you can train a model and specify the model path to infer instead
model_path = APCTrainedModelManager.get_checkpoint(checkpoint_name='English')
sent_classifier = load_sentiment_classifier(trained_model_path=model_path,
auto_device=True, # Use CUDA if available
sentiment_map=sentiment_map # define polarity2name map
)
text = 'everything is always cooked to perfection , the [ASP]service[ASP] is excellent , the [ASP]decor[ASP] cool and understated . !sent! 1 1'
# Note reference sentiment like '!sent! 1 1' are not mandatory
sent_classifier.infer(text, print_result=True)
# batch inferring_tutorials returns the results, save the result if necessary using save_result=True
inference_sets = ABSADatasets.semeval
results = sent_classifier.batch_infer(target_file=inference_sets,
print_result=True,
save_result=True,
ignore_error=True, # some data are broken so ignore them
)
You use this function to search the optimal setting of some params, e.g., learning_rate.
from pyabsa.research.parameter_search.search_param_for_apc import apc_param_search
from pyabsa import ABSADatasets
from pyabsa.config.apc_config import apc_config_handler
apc_param_dict_english = apc_config_handler.get_apc_param_dict_english()
apc_param_dict_english['log_step'] = 10
apc_param_dict_english['evaluate_begin'] = 2
param_to_search = ['l2reg', [1e-5, 5e-5, 1e-4, 5e-4, 1e-3]]
apc_param_search(parameter_dict=apc_param_dict_english,
dataset_path=ABSADatasets.Laptop14,
search_param=param_to_search,
auto_evaluate=True,
auto_device=True)
- Laptop14
- Restaurant14
- Restaurant15
- Restaurant16
- Phone
- Car
- Camera
- Notebook
- Multilingual (The sum of the above datasets.)
Basically, you don't have to download the datasets, as the datasets will be downloaded automatically.
This work build from LC-ABSA/LCF-ABSA and LCF-ATEPC, and other impressive works such as PyTorch-ABSA and LCFS-BERT. Feel free to help us optimize code or add new features!
欢迎提出疑问、意见和建议,或者帮助完善仓库,谢谢!
- Add more BERT / glove based models
- Add more APIs
- Optimize codes and add comments
We hope you can help us to improve this work, e.g., provide new datasets. Or, if you develop your model using this PyABSA, It is highly recommended to release your model in PyABSA by pull request, as open-source projects make your work much more valuable! We will help you to do this, only if we have some free time.
The copyrights of contributed resources belong to the contributors, we hope you can help, thanks very much!
If PyABSA is helpful, please star this repo and consider cite our paper which is related to your current work:
- paper of LCF-ATEPC:
@article{yang2021multi,
title={A multi-task learning model for chinese-oriented aspect polarity classification and aspect term extraction},
author={Yang, Heng and Zeng, Biqing and Yang, JianHao and Song, Youwei and Xu, Ruyang},
journal={Neurocomputing},
volume={419},
pages={344--356},
year={2021},
publisher={Elsevier}
}
- paper of LCF-BERT:
@article{zeng2019lcf,
title={LCF: A Local Context Focus Mechanism for Aspect-Based Sentiment Classification},
author={Zeng, Biqing and Yang, Heng and Xu, Ruyang and Zhou, Wu and Han, Xuli},
journal={Applied Sciences},
volume={9},
number={16},
pages={3389},
year={2019},
publisher={Multidisciplinary Digital Publishing Institute}
}
- paper of LCA-Net:
@misc{yang2020enhancing,
title={Enhancing Fine-grained Sentiment Classification Exploiting Local Context Embedding},
author={Heng Yang and Biqing Zeng},
year={2020},
eprint={2010.00767},
archivePrefix={arXiv},
primaryClass={cs.CL}
}