Abstract:Abstract: Based on the traditional 40-dimensional feature vector of protein sequences, the 40-dimensional vector was decomposed into 20-, 4-and 16-dimensional feature vectors according to classification and physical-chemical properties of amino acids. Combined with hemagglutinin (HA) sequences from 33 H1N1 flu viruses and the theory of correlation, correlations between every two HA sequences of the 33 H1N1 flu viruses were analyzed by three sub-vectors, and correlations between different characteristic vectors of each H1N1 flu virus HA sequence were given by comparative analysis. The results showed a high correlation between every two protein sequences. Meanwhile, results between the 4- and 16-dimensional vectors were significantly correlated, but the 20-dimensional vector had a low correlation with others. The 33 H1N1 flu virus protein sequences were further classified according to the different characteristic vector. It showed that classification results based on 40-dimensional and 16-dimensional feature vectors were highly consistent. Therefore, the existing 40-dimensional eigenvector of protein sequences could be replaced by 16-dimensional eigenvector on the premise that the characterization of virus sequence features was not affected, which would greatly reduce the complexity of the calculation.
引用本文:
李巍巍, 李 阳, 唐旭清. 不同特征描述下H1N1病毒血凝素蛋白序列的比较分析[J]. 生命科学研究, 2016, 20(2): 119-124. LI Wei-Wei, LI Yang, TANG Xu-Qing. Comparative Analysis of H1N1 Influenza Virus Hemagglutinin Sequences by Different Feature Descriptions. Life Science Research, 2016, 20(2): 119-124.