# Exam Question Classification

> 本仓库为论文 *Test Case Classification via Few-Shot Learning* 实验代码，仅用于学术目的
>
> **Method:** In this paper, we propose a test case classification approach based on few-shot learning and test case argumentation to address the limitations mentioned above. The proposed approach generates new test cases by the large pre-trained masked language model and extracts embedding representation by training word embedding models. Then a BiLSTM-based classifier is designed to perform test case classification by extracting the in-depth features. Besides, we also apply the attention mechanism to assign high weights to words that represent the test case category by lexicon matching.

### 相关工具版本

```
bert4keras==0.11.4
Flask==1.0.2
nlpcda==2.5.8
numpy==1.15.1
pyltp==0.4.0
pymysql==1.0.2
scikit_learn==1.2.1
torch==1.13.0
xlrd==1.1.0
```

同requirements.txt

### 项目结构

- ![](./pic/Framework (1).png)
- classify_model 数据集分类模型
- classify_service 分类功能
  - chinese_roformer-sim-char **SimBERTv2模型**
  - chinese_simbert simbert模型
  - ltp_data 哈工大语言技术平台模型
  - word_list_data 训练集和测试集
  - splited_data 发展集、训练集和测试集
  - bilstm_attention.py bilstm训练主函数
  - contrast_experiment.py 经典分类模型效果输出
  - data_processor.py 数据处理工具
- word2index 词向量数据

### 模型训练

1. 参数调整

   bilstm_attention.py 中全局变量进行修改

   ```python
   vocab_size = 5000  # 词表大小
   embedding_size = 64  # 词向量维度
   num_classes = 6  # 6分类 todo
   sentence_max_len = 64  # 单个句子的长度
   hidden_size = 16
   
   num_layers = 1  # 一层lstm
   num_directions = 2  # 双向lstm
   lr = 1e-3
   batch_size = 16  # batch_size 批尺寸
   epochs = 50
   
   device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
   
   app_names = ["航天中认自主可控众包测试练习赛"]
   # 航天 random_state 6
   # 趣享 13
   # ,"决赛自主可控众测web自主可控运维管理系统"
   bug_type = ["不正常退出", "功能不完整", "用户体验", "页面布局缺陷", "性能", "安全"]
   lexicon = {0: [], 1: [], 2: [], 3: [], 4: [], 5: []}
   word_with_attention = {}
   n = 5  # 选择置信度最高的前n条数据
   m = 3  # 选择注意力权重最高的前m个词
   
   t1 = 3
   t2 = 8
   threshold_confidence = 0.9
   ```

2. 运行模型训练

```
python bilstm_attention.py
```

### 实验对比

```python
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn import svm
from sklearn.naive_bayes import GaussianNB
```

##### 对比分类器超参数说明

1. k最近邻分类器

```python
knn_classifier = KNeighborsClassifier()
def __init__(
        self,
        n_neighbors=5,
        *,
        weights="uniform",
        algorithm="auto",
        leaf_size=30,
        p=2,
        metric="minkowski",
        metric_params=None,
        n_jobs=None,
    )
```

2. SVM分类器

```python
svm_classifier = svm.SVC(C=2, kernel='rbf', gamma=10, decision_function_shape='ovr')
def __init__(
        self,
        *,
        C=1.0,
        kernel="rbf",
        degree=3,
        coef0=0.0,
        shrinking=True,
        probability=False,
        tol=1e-3,
        cache_size=200,
        class_weight=None,
        verbose=False,
        max_iter=-1,
        decision_function_shape="ovr",
        break_ties=False,
        random_state=None,
    )
```

3. 朴素贝叶斯分类器

```python
muNB_classifier = GaussianNB()
def __init__(self, *, priors=None, var_smoothing=1e-9)
```

4. bpnn分类器

```python
bpnn_classifier = MLPClassifier(solver='lbfgs', random_state=0, hidden_layer_sizes=[10, 10])
def __init__(
        self,

        activation="relu",
        *,
        alpha=0.0001,
        batch_size="auto",
        learning_rate="constant",
        learning_rate_init=0.001,
        power_t=0.5,
        max_iter=200,
        shuffle=True,
        tol=1e-4,
        verbose=False,
        warm_start=False,
        momentum=0.9,
        nesterovs_momentum=True,
        early_stopping=False,
        validation_fraction=0.1,
        beta_1=0.9,
        beta_2=0.999,
        epsilon=1e-8,
        n_iter_no_change=10,
        max_fun=15000,
    )
```