# Exam Question Classification > 本仓库为论文 *Test Case Classification via Few-Shot Learning* 实验代码,仅用于学术目的 > > **Method:** In this paper, we propose a test case classification approach based on few-shot learning and test case argumentation to address the limitations mentioned above. The proposed approach generates new test cases by the large pre-trained masked language model and extracts embedding representation by training word embedding models. Then a BiLSTM-based classifier is designed to perform test case classification by extracting the in-depth features. Besides, we also apply the attention mechanism to assign high weights to words that represent the test case category by lexicon matching. ### 相关工具版本 ``` bert4keras==0.11.4 Flask==1.0.2 nlpcda==2.5.8 numpy==1.15.1 pyltp==0.4.0 pymysql==1.0.2 scikit_learn==1.2.1 torch==1.13.0 xlrd==1.1.0 ``` 同requirements.txt ### 项目结构 - .png) - classify_model 数据集分类模型 - classify_service 分类功能 - chinese_roformer-sim-char **SimBERTv2模型** - chinese_simbert simbert模型 - ltp_data 哈工大语言技术平台模型 - word_list_data 训练集和测试集 - splited_data 发展集、训练集和测试集 - bilstm_attention.py bilstm训练主函数 - contrast_experiment.py 经典分类模型效果输出 - data_processor.py 数据处理工具 - word2index 词向量数据 ### 模型训练 1. 参数调整 bilstm_attention.py 中全局变量进行修改 ```python vocab_size = 5000 # 词表大小 embedding_size = 64 # 词向量维度 num_classes = 6 # 6分类 todo sentence_max_len = 64 # 单个句子的长度 hidden_size = 16 num_layers = 1 # 一层lstm num_directions = 2 # 双向lstm lr = 1e-3 batch_size = 16 # batch_size 批尺寸 epochs = 50 device = torch.device("cuda" if torch.cuda.is_available() else "cpu") app_names = ["航天中认自主可控众包测试练习赛"] # 航天 random_state 6 # 趣享 13 # ,"决赛自主可控众测web自主可控运维管理系统" bug_type = ["不正常退出", "功能不完整", "用户体验", "页面布局缺陷", "性能", "安全"] lexicon = {0: [], 1: [], 2: [], 3: [], 4: [], 5: []} word_with_attention = {} n = 5 # 选择置信度最高的前n条数据 m = 3 # 选择注意力权重最高的前m个词 t1 = 3 t2 = 8 threshold_confidence = 0.9 ``` 2. 运行模型训练 ``` python bilstm_attention.py ``` ### 实验对比 ```python from sklearn.neighbors import KNeighborsClassifier from sklearn.neural_network import MLPClassifier from sklearn import svm from sklearn.naive_bayes import GaussianNB ``` ##### 对比分类器超参数说明 1. k最近邻分类器 ```python knn_classifier = KNeighborsClassifier() def __init__( self, n_neighbors=5, *, weights="uniform", algorithm="auto", leaf_size=30, p=2, metric="minkowski", metric_params=None, n_jobs=None, ) ``` 2. SVM分类器 ```python svm_classifier = svm.SVC(C=2, kernel='rbf', gamma=10, decision_function_shape='ovr') def __init__( self, *, C=1.0, kernel="rbf", degree=3, coef0=0.0, shrinking=True, probability=False, tol=1e-3, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape="ovr", break_ties=False, random_state=None, ) ``` 3. 朴素贝叶斯分类器 ```python muNB_classifier = GaussianNB() def __init__(self, *, priors=None, var_smoothing=1e-9) ``` 4. bpnn分类器 ```python bpnn_classifier = MLPClassifier(solver='lbfgs', random_state=0, hidden_layer_sizes=[10, 10]) def __init__( self, activation="relu", *, alpha=0.0001, batch_size="auto", learning_rate="constant", learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, tol=1e-4, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-8, n_iter_no_change=10, max_fun=15000, ) ```