基于成对约束的非线性维数约减框架.pdf
Computer Engineering and Applications计算机工程与应用2017,53(5)147基于成对约束的非线性维数约减框架尹学松,蒋融融,江立飞,施建华YIN Xuesong, JIANG Rongrong, JIANG Lifei, SHI Jianhua浙江广播电視大学计算机系,杭州310030Department of Computer Scicnce Tcchnology, Zhejiang Radio TV University, Hangzhou 3 10030, ChinaYIN Xuesong, ANG Rongrong, JANG Lifei, et al. General framework for constrained dimensionality reduction.Computer Engineering and Applications, 2017, 53(5): 147-153.Abstract: Scmi-supcrviscd dimensionality reduction refers to tind the Optimal low-dimcnsional structurcs from the originalhigh-dimensional data in terms of the joint knowledge from side information and a large number of unlabeled instances. Ithas been regarded as an effective way to grasp the high-dimensional data such as gene sequence, text data and face imagesIn this paper, it develops a general framework for semi-supervised dimensionality reduction with pairwise constraints(SSPC). SSPC learns a discriminant adjacent matrix by using pairwise constraints and nearest neighbors of data. Then, itcan learn a projection embedding the data from the original space to the low-dimensional space such that intra-clusterinstances become even more nearby while extra-cluster instances become as far away from each other as possible. Theproposed method can not only find a linear subspace which is optimal for discrimination, but also discover the nonlinearstructure of the manifold. Experimental results on various real data sets demonstrate that SSPC is superior to establisheddimensionality reduction approachesKey words: dimensionality reduction: side information: pairwise constraints; prior membership degrce, adjacent matrix摘要:半监督维数约简是指借助于辅助信息与大量无标记样本信息从高维数据空间找到一个最优低维判别空间便于后续的分类或聚类操作,它被看作是理解基因序列、文本与人脸图像等高维数据的有效方法。提出一个基于成对约束的半监督维数约简一般框架(SSPC)。该方法首先通过使用成对约束和无标号样本的内在几何结构学刁判别邻接矩阵;其次,新方法应用学到的投影将原来高维空间中的数据映射到低维空间中,以至于聚类内的样本之间距离变得更加紧凑,而不同聚类间的样本之间距离变得尽可能得远。所提出的算法不仅能找到一个最佳的线性判别子空间,还可以揭示流形数据的非线性结构些真实数据集上的实验结果表明,新方法的性能优于当前主流基于成对约束的维数约简算法的性能关键词:维数约简;辅助信息;成对约東;先验隶属度;尔接矩阵文献标志码:A中图分类号:TP311doi:10.3786i:.1002-8331.1508-0090标号信息。近来,越来越多的研究人员开始关注半嗌督在许多实际的数据挖掘应用中,经常要面临这样一降维方法。半嗌督维数约简从无标号样本与有标号样个情形:处理的数据中有庞大的无标号样本和少量的有本中学到知识来寻找高维数据的低维真实刻画,因此它标号样本。在这种情况下,监督维数约简方法在处理这比监督方法有更低类标号要求,比无监督方法有更高的个问题时往往会失效,因为有标号样本不够多;无监督性能。该方法已被应用到计算机视觉、统计学习和模式方法不能提供较高的性能,而且它忽略了有标号样本的识别等领域。实际上,在很多机器学习算法中,利用基金项日:浙江省公益性技术研究应用项目(No.2013C33087):浙江省高校中青年学科帯头人学术禁登项目(No.pd2013446作者简介:尹学松(1975-),男,教授,研究领域为数据挖掘,软计算理论;蒋融融(1979-),女,通讯作者,副教授,研究领域为模式识判,机器学习,E-mail:jiangrr(zjtvu.edu.cn;江立飞(1979一)、男、工程师、研究领域为大数据处理,机器学习;施建牛(1954-),男,研究领域为模式识别.图像处理。收稿口期:2015-08-10修回日期:2016-02-29文章编号:1002-8331(2017)05-0147-07CNKI网络优先版:2016-03-25,htp:/ww.cnki. netkcms/ detail/11.2l27.TP.20160325.1701.014.html万方数据
收藏
- 资源描述:
-
Computer Engineering and Applications计算机工程与应用
2017,53(5)147
基于成对约束的非线性维数约减框架
尹学松,蒋融融,江立飞,施建华
YIN Xuesong, JIANG Rongrong, JIANG Lifei, SHI Jianhua
浙江广播电視大学计算机系,杭州310030
Department of Computer Scicnce Tcchnology, Zhejiang Radio TV University, Hangzhou 3 10030, China
YIN Xuesong, ANG Rongrong, JANG Lifei, et al. General framework for constrained dimensionality reduction.
Computer Engineering and Applications, 2017, 53(5): 147-153.
Abstract: Scmi-supcrviscd dimensionality reduction refers to tind the Optimal low-dimcnsional structurcs from the original
high-dimensional data in terms of the joint knowledge from side information and a large number of unlabeled instances. It
has been regarded as an effective way to grasp the high-dimensional data such as gene sequence, text data and face images
In this paper, it develops a general framework for semi-supervised dimensionality reduction with pairwise constraints
(SSPC). SSPC learns a discriminant adjacent matrix by using pairwise constraints and nearest neighbors of data. Then, it
can learn a projection embedding the data from the original space to the low-dimensional space such that intra-cluster
instances become even more nearby while extra-cluster instances become as far away from each other as possible. The
proposed method can not only find a linear subspace which is optimal for discrimination, but also discover the nonlinear
structure of the manifold. Experimental results on various real data sets demonstrate that SSPC is superior to established
dimensionality reduction approaches
Key words: dimensionality reduction: side information: pairwise constraints; prior membership degrce, adjacent matrix
摘要:半监督维数约简是指借助于辅助信息与大量无标记样本信息从高维数据空间找到一个最优低维判别空间
便于后续的分类或聚类操作,它被看作是理解基因序列、文本与人脸图像等高维数据的有效方法。提出一个基于成
对约束的半监督维数约简一般框架(SSPC)。该方法首先通过使用成对约束和无标号样本的内在几何结构学刁
判别邻接矩阵;其次,新方法应用学到的投影将原来高维空间中的数据映射到低维空间中,以至于聚类内的样本
之间距离变得更加紧凑,而不同聚类间的样本之间距离变得尽可能得远。所提出的算法不仅能找到一个最佳的线
性判别子空间,还可以揭示流形数据的非线性结构
些真实数据集上的实验结果表明,新方法的性能优于当前
主流基于成对约束的维数约简算法的性能
关键词:维数约简;辅助信息;成对约東;先验隶属度;尔接矩阵
文献标志码:A中图分类号:TP311doi:10.3786i:.1002-8331.1508-0090
标号信息。近来,越来越多的研究人员开始关注半嗌督
在许多实际的数据挖掘应用中,经常要面临这样一降维方法。半嗌督维数约简从无标号样本与有标号样
个情形:处理的数据中有庞大的无标号样本和少量的有本中学到知识来寻找高维数据的低维真实刻画,因此它
标号样本。在这种情况下,监督维数约简方法在处理这比监督方法有更低类标号要求,比无监督方法有更高的
个问题时往往会失效,因为有标号样本不够多;无监督性能。该方法已被应用到计算机视觉、统计学习和模式
方法不能提供较高的性能,而且它忽略了有标号样本的识别等领域"。实际上,在很多机器学习算法中,利用
基金项日:浙江省公益性技术研究应用项目(No.2013C33087):浙江省高校中青年学科帯头人学术禁登项目(No.pd2013446
作者简介:尹学松(1975-),男,教授,研究领域为数据挖掘,软计算理论;蒋融融(1979-),女,通讯作者,副教授,研究领域为模
式识判,机器学习,E-mail:jiangrr(@zjtvu.edu.cn;江立飞(1979一)、男、工程师、研究领域为大数据处理,机器学习;施建
牛(1954-),男,研究领域为模式识别.图像处理。
收稿口期:2015-08-10修回日期:2016-02-29文章编号:1002-8331(2017)05-0147-07
CNKI网络优先Ⅲ版:2016-03-25,htp:/ww.cnki. netkcms/ detail/11.2l27.TP.20160325.1701.014.html
万方数据
展开阅读全文