No Description

Sjim f82c9f837c 增加根据词向量距离排序挑选词库的算法实验修改requirements.txt		2 years ago
.idea	ab4c35e370 update status	2 years ago
classify_model	f82c9f837c 增加根据词向量距离排序挑选词库的算法实验修改requirements.txt	2 years ago
classify_service	f82c9f837c 增加根据词向量距离排序挑选词库的算法实验修改requirements.txt	2 years ago
pic	33c9fe56ac 增加readme	2 years ago
word2index	f82c9f837c 增加根据词向量距离排序挑选词库的算法实验修改requirements.txt	2 years ago
.DS_Store	a76ba5490d rm ltp	2 years ago
.gitignore	f82c9f837c 增加根据词向量距离排序挑选词库的算法实验修改requirements.txt	2 years ago
flask.log	33c9fe56ac 增加readme	2 years ago
main.py	33c9fe56ac 增加readme	2 years ago
mt_clerk_test.sql	f82c9f837c 增加根据词向量距离排序挑选词库的算法实验修改requirements.txt	2 years ago
readme.md	c80bd1a576 更新 'readme.md'	2 years ago
requirements.txt	f82c9f837c 增加根据词向量距离排序挑选词库的算法实验修改requirements.txt	2 years ago
test_simbert.py	33c9fe56ac 增加readme	2 years ago

Exam Question Classification

本仓库为论文 Test Case Classification via Few-Shot Learning 实验代码，仅用于学术目的

Method: In this paper, we propose a test case classification approach based on few-shot learning and test case argumentation to address the limitations mentioned above. The proposed approach generates new test cases by the large pre-trained masked language model and extracts embedding representation by training word embedding models. Then a BiLSTM-based classifier is designed to perform test case classification by extracting the in-depth features. Besides, we also apply the attention mechanism to assign high weights to words that represent the test case category by lexicon matching.

项目结构

classify_model 数据集分类模型
classify_service 分类功能
- chinese_roformer-sim-char SimBERTv2模型
- chinese_simbert simbert模型
- ltp_data 哈工大语言技术平台模型
- word_list_data 训练集和测试集
- splited_data 发展集、训练集和测试集
- bilstm_attention.py bilstm训练主函数
- contrast_experiment.py 经典分类模型效果输出
- data_processor.py 数据处理工具
word2index 词向量数据

模型训练

参数调整

bilstm_attention.py 中全局变量进行修改

   vocab_size = 5000  # 词表大小
   embedding_size = 64  # 词向量维度
   num_classes = 6  # 6分类 todo
   sentence_max_len = 64  # 单个句子的长度
   hidden_size = 16
   
   num_layers = 1  # 一层lstm
   num_directions = 2  # 双向lstm
   lr = 1e-3
   batch_size = 16  # batch_size 批尺寸
   epochs = 50
   
   device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
   
   app_names = ["航天中认自主可控众包测试练习赛"]
   # 航天 random_state 6
   # 趣享 13
   # ,"决赛自主可控众测web自主可控运维管理系统"
   bug_type = ["不正常退出", "功能不完整", "用户体验", "页面布局缺陷", "性能", "安全"]
   lexicon = {0: [], 1: [], 2: [], 3: [], 4: [], 5: []}
   word_with_attention = {}
   n = 5  # 选择置信度最高的前n条数据
   m = 3  # 选择注意力权重最高的前m个词
   
   t1 = 3
   t2 = 8
   threshold_confidence = 0.9

运行模型训练
```
python bilstm_attention.py
```

实验对比

from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn import svm
from sklearn.naive_bayes import GaussianNB

对比分类器超参数说明

k最近邻分类器

knn_classifier = KNeighborsClassifier()
def __init__(
        self,
        n_neighbors=5,
        *,
        weights="uniform",
        algorithm="auto",
        leaf_size=30,
        p=2,
        metric="minkowski",
        metric_params=None,
        n_jobs=None,
    )

SVM分类器

svm_classifier = svm.SVC(C=2, kernel='rbf', gamma=10, decision_function_shape='ovr')
def __init__(
        self,
        *,
        C=1.0,
        kernel="rbf",
        degree=3,
        coef0=0.0,
        shrinking=True,
        probability=False,
        tol=1e-3,
        cache_size=200,
        class_weight=None,
        verbose=False,
        max_iter=-1,
        decision_function_shape="ovr",
        break_ties=False,
        random_state=None,
    )

朴素贝叶斯分类器

muNB_classifier = GaussianNB()
def __init__(self, *, priors=None, var_smoothing=1e-9)

bpnn分类器

bpnn_classifier = MLPClassifier(solver='lbfgs', random_state=0, hidden_layer_sizes=[10, 10])
def __init__(
        self,

        activation="relu",
        *,
        alpha=0.0001,
        batch_size="auto",
        learning_rate="constant",
        learning_rate_init=0.001,
        power_t=0.5,
        max_iter=200,
        shuffle=True,
        tol=1e-4,
        verbose=False,
        warm_start=False,
        momentum=0.9,
        nesterovs_momentum=True,
        early_stopping=False,
        validation_fraction=0.1,
        beta_1=0.9,
        beta_2=0.999,
        epsilon=1e-8,
        n_iter_no_change=10,
        max_fun=15000,
    )

readme.md

Exam Question Classification

相关工具版本

项目结构

模型训练

实验对比

对比分类器超参数说明