Semi-supervised Classification Based Mixed Sampling for Imbalanced Data
https://www.degruyter.com/document/doi/10.1515/phys-2019-0103/html
... The imbalanced data classification is such a problem where the class distribution of training data is not balanced and the number of one class is far less than the other one.
... the method for imbalanced data classification can be divided into two levels: one is from the algorithm level, the other is from the data processing level.
... At the algorithm level, the imbalanced data classification method mainly includes ensemble method, cost sensitive learning, and so on.
... At the data processing level, it includes over sampling and under sampling, improving the imbalanced data set by some mechanism to obtain a balanced data distribution.
... supervised learning often needs a large amount of labeled samples, and it may take a lot of manpower and material resources to obtain labeled samples.
... which combines semi-supervised learning, over sampling, under sampling and ensemble learning.
... Semi-supervised learning is divided into semi-supervised clustering and classification.
... Semi-supervised classification mainly includes disagreement-based method, generative method, discriminative method and graph-based method.
The disagreement-based semi-supervised classification realizes the utilization of unlabeled data by using multiple classifiers. In the process of machine learning, the unlabeled data is used as a platform for interaction between multiple classifiers. The original disagreement-based algorithm was developed by Blum and Mitchell in 1998. They assumed that the data set had two views of sufficient redundancy, meeting the following conditions: First, each set of attributes was sufficient to describe the problem; second, each attribute set was conditioned to be independent of another set of attributes when it was marked.
The generative method assumes that the sample and class labels are generated by a set of probability distributions of a certain or certain structural relationship. From these distributions, the sample L with the class label and the sample U without the class label are generated.
The discriminative method uses the maximum interval algorithm to train the learning decision boundary of the labeled sample and unlabeled sample. The purpose of learning is to make the classification hyperplane through the low density data region, and to make the distance maximum between the classification hyperplane and the nearest sample.
Graph-based learning is a very active direction of semi-supervised learning in recent years. The essence of the graph based approach is the label propagation.
Reference
'Paper' 카테고리의 다른 글
[paper] Bioinformatics screening of biomarkers related to liver cancer (0) | 2021.11.11 |
---|---|
Rare disease (0) | 2021.06.25 |
[paper] 3Cnet (0) | 2021.06.11 |
[paper] RNA-seq expression (0) | 2021.06.09 |
Immune system (0) | 2021.06.04 |
댓글