首页 > 网络资讯 > 【学科发展报告】生物信息学
2022
06-23

【学科发展报告】生物信息学

aba7f663d2321a35030619895c06a3fd.jpeg

一、引言

生物信息学是信息与系统科学和生命科学高度交叉的前沿学科,是自动化学科群中的重要部分,包括计算生物学、系统生物学与合成生物学等方向。生物信息学涉及多个学科领域,信息、控制与系统的理论、方法和技术在其中发挥着重要作用,同时,它也把控制科学与工程的研究对象从机械、电子、物理、化学等系统扩展到了以分子和细胞为基本元件的生命系统。本专题报告从信息科学视角对 2013 — 2017 年我国在生物信息学领域的主要进展进行回顾。

二、研究现状与近年来主要成果

在当今大数据时代,生命科学领域的数据产出能力在各学科中处于领先位置,以基因组学和蛋白质组学数据为核心的组学大数据增长速度远超很多其他领域。作为生物信息学发展的重要趋势,数据量迅速增大,数据类型不断增加,为生物信息学方法提出了大量新挑战;组学技术使越来越多层面的生物机理被揭示出来,系统生物学研究越来越走向对生物调控机理的定量认识和建模;同时,对生物系统认识的深化和合成生物学、基因编辑技术的不断突破,使得合成基因线路与系统的理论和技术有很大发展。我国近年来在这些领域中都取得了很大进展。

(一)高通量测序数据处理与分析

新一代测序技术迅速发展,能以更高通量、更低成本快速完成基因组和转录组测序, 对数据处理和分析是很大挑战。序列比对、组装等传统生物信息学问题仍是当前研究的一个重点。其中,测序读段的组装处于数据处理流程中的最上游,中南大学提出了一种基于图优化的骨架片段构建方法1]和基于读段分布和插入尺寸的从头组装方法2],山东大学提出能提升重复性区域组装效果的算法3];对转录组测序组装,山东大学开发了假阳性率更低的组装工具4],香港大学开发了能更好应对基因表达水平不均衡情况的组装工具5]对宏基因组测序数据,香港大学开发了高速、单计算节点的组装算法6],中科院网络中心和云南大学开发了宏基因组读段比对系统13]。在序列比对这个经典问题上,中山大学、香港中文大学和深圳大学分别开发了更快、更准确的比对工具7,8,12],哈工大针对噪声强的长序列、多参考基因组和允许分裂匹配的情况设计了一系列比对方法9-11],香港浸会大学使用GPU 加速了序列比对的过程14],天津大学与哈尔滨工业大学开发了一种更快的并行化多序列比对方法15]

差异表达与聚类分析是测序数据下游分析中两个重要步骤,由于测序数据本身存在噪声、测序覆盖度变化范围大、类别样本数可能严重不平衡性等因素,有效、准确地针对测序数据进行差异表达和聚类分析是一个重要的基本问题。复旦大学、中山大学、清华大学、湖南师范大学、西安交通大学等针对差异表达分析中存在的数据噪声大、样本量过小或过大、假阳性率估计不准确、本间不独立等问题,开发了多种新的差异表达分析方法[16-21]。西南财大、北京林大、清华大学、同济大学等采用多种模型与算法,对传统RNA 测序和新兴的单细胞RNA测序数据,分别提出了几种不同应用场景下的聚类方法[22-26]。这些热点领域的算法研究使得我国在相关领域的研究已进入国际前沿。

(二)多组学数据整合与基于测序的遗传学分析

新一代测序技术使人们能从DNA、RNA和蛋白质等多个层次解析癌症等复杂的生物过程,其中多组学数据的有效整合是急需解决的关键问题。最近五年,中科院研究人员通过整合基因组、表观遗传学和转录组特征进行了肿瘤预后分析,得到了预后相关的关键分子特征[27],中南大学、清华大学、中科院、北京大学等通过数据整合分析研究药物与疾病的关系并预测药物响应[28-31],中科院等整合不同数据识别肿瘤生物标记物[32,33],华东师大等通过整合多源数据识别转录过程中的协同作用[34],哈尔滨医科大学、哈尔滨工业大学等对多层次分子信息之间相互关系进行研究,识别肿瘤发生发展过程中核心的调控模块[35-37],清华大学提出一种多组学数据联合快速地降维与可视化分析的方法,为肿瘤分类和聚类提供有效工具[38]。

我国学者提出多个全基因组关联研究(GWAS)相关的生物信息学方法。清华大学提出一种基于疾病—基因相互作用网络,用随机游走算法检测致病基因的方法[39],香港大学提出一种整合GWAS和基因组三维相互作用信息来检测有调控功能的变异的方法[40],哈医大建立了一个关联SNP 和非编码区域的数据库[41],复旦大学建立了通过远端调控解释疾病表型的模型[42],同济大学提出一个整合表观数据和遗传学数据进行调控通路富集分析的方法43]。伴随着方法学研究,我国在复杂性状表观遗传调控方面也取得多项新发现。

(三)基因网络分析

基因不是相互孤立行使功能的,而是多个基因通过多种转录调控、蛋白质相互作用等形成分子网络,以系统形式行使功能。近年来,我国学者在生物分子网络研究方面取得显著成绩,包括构建基因调控网络、miRNA 调控网络、基因共表达网络,识别网络模块和标志物,分析癌症的分子网络机制等。

在构建基因调控网络方面,中科院上海生科院基于互信息提出了条件相容信息,利用表达谱构建转录调控网络[44];南京中医药大学和香港浸会大学提出了基于贝叶斯网络和吉布斯采样的新工具,整合两种信息推断调控网络[45];北京林业大学利用互信息构建不同环境刺激下的动态调控网络[46];香港大学提出一种基于间隙对齐的表达谱分析方法, 有效构建剪接体特异的基因网络[47];中国台湾新竹清华大学通过eQTLs 推断局部因果关系网络,再通过随机场排序方法构建全局调控网络[48]。在miRNA调控分析方面,同济大学等仅用基因表达谱信息实现癌症中 miRNA- 基因 - 信号通路网络的推断[49];东南大学和南京师范大学通过表达相关性构建转录因子 -miRNA- 基因网络,从系统层次上理解乳腺癌[50];哈尔滨工业大学提出一种假性 3D聚类方法用于识别miRNA 和基因组成的双层网络中的模块[51]。对基因共表达网络,北京大学提出一种基于 F 范数的假设检验方法, 用RNA-seq 数据构建共表达网络[52];复旦大学利用典型相关性分析构建基于外显子等基因组元件的共表达网络[53];南方医大另辟蹊径,尝试了通过文本分析抽取与高频共出现 基因构建基因网络[54]。

在识别网络模块和标志物方面,南京大学将基因信号通路作为一个整体来构建网络, 通过多网络中心分析识别网络中的重要模块,找到关键信号通路[55];上海生科院提出将网络中节点和边互换,识别紧密相连的节点组合作为“边标志物”[56],还提出了通过单个病人的表达谱信息识别动态网络标志物[57];哈尔滨医科大学通过整合多组学数据和蛋白质互作网络,识别变异基因及其下游基因和驱动癌症的关键模块[58-60];上海师范大学与MIT合作,通过细胞系表达相似性和药物响应相似性构建双层网络,预测肿瘤细胞的药物响应[61];中科院数学科学院开发了基于网络稀疏惩罚的部分最小二乘方法,分析基因表达和药物响应的共通模式[62]。

(四)蛋白质组生物信息学

蛋白质与蛋白质组是生物信息学的另一个主要研究对象,结构生物学的突破尤其是冷冻电镜等技术的发展,为蛋白质相关的研究迎来了崭新的发展机遇。另外,蛋白质相关的图像和分子影像数据是重要数据来源,与之相匹配的深度学习等新型智能化算法不断涌现。

对结构预测问题,厦门大学提出了序列与结构特征相结合的方法[63],中国海洋大学提出将二级结构与伪氨基酸组分信息融合的方法64];在蛋白质互作预测上,香港理工和同济大学对序列进行多尺度局部特征提取65],中科院新疆理化所和中国矿大提出将进化信息与物理化学性质融合的方法66-67],内蒙古农业大学和内蒙古大学提出利用残基长程接触信息进行二级结构预测的方法68],东北大学提出利用预测二级结构表征来识别蛋白质的结构类别69]

日益增加的蛋白质结构、功能注释信息为发展以数据驱动的智能化算法提供了保障。上海交通大学在蛋白质亚细胞定位预测时,考虑细胞器间关联及多点定位问题,开发了系列新模型[70-72];针对判断多标记蛋白在各细胞器的分布量及位置迁移的困难问题,湖南大学对蛋白质荧光图像做了定量分析,提出了可变权重支持向量机VW-SVM,有效解决蛋白质定量分类问题[73,74]。

结构预测等问题面临巨大的搜索空间,吉林大学、南京航空航天大学、大连大学等用粒子群算法、遗传算法等近似算法发展了新策略[75-80],台湾中正大学和逢甲大学提出用能量模型和骨干角偏好相结合的多目标优化方法预测空间结构[81],上海交通大学提出了基于层次粒子群算法的分子结构非凸能量函数的优化方法[82]。除了对算法的改造外,南京理工大学提出基于机器学习的数据清洗和过滤策略用于蛋白质交互预测[83],香港科技大学提出基于MapReduce的并行SVM交互预测方法[84],清华大学提出基于云的分布式蛋白质设计算法[85]。

生物信息学领域中基于深度学习的方法也开始涌现。如四川大学在二级结构预测中提出一种深度递归编码解码网络[86];香港大学将深度卷积网络和递归网络结合起来进行二级结构预测[87];西南大学提出深度RBM 进行蛋白质功能预测[88];清华大学提出用于建模RNA 绑定蛋白结构特征的深度学习框架[89]、用于冷冻电镜图片中粒子挑选的深度学习方法[90]以及改进残基长程接触预测的深度学习框架[91];南京航空航天大学和上海交通大学提出了一种用于蛋白质亚细胞定位的深度提取图像特征的方法[92]。对于一些无法采用监督学习的问题,北京大学发表了一种基于迭代约束K均值算法的冷冻电镜图像聚类算法[93]。考虑到蛋白质功能的多重性,华南理工大学和南京大学将蛋白质功能预测建模的多标记、多实例问题进行解决[94-98],复旦大学将其建模为主动学习问题[99];此外,深圳大学、香港城市大学以及南京理工大学采用集成学习及极限学习建模来解决蛋白质 - 蛋白质交互预测问题[100-102]。

(五)表观遗传生物信息学

表观遗传是指不能用 DNA 序列改变来解释的稳定遗传性状,DNA 甲基化是其重要组成部分,是生物信息学研究的重要方面。清华大学利用分离比较策略第一次系统地研究了多能细胞非 CpG 甲基化的单链特异性分布[103];哈尔滨医科大学通过系统辨别和标注DNA 甲基化特征,揭示了调控细胞身份基因的低甲基化现象[104],并提出新的标注策略以解析癌症中长链非编码RNA 的甲基化特征105];东北师范大学与哈尔滨理工大学提出新的整合统计算法,以发现细胞、组织和个体上DNA 甲基化的差异106];中科院北京基因组研究所开发了软件分析DNA 甲基化模式的分布,以对表观异质性定量化107];上海师范大学和复旦大学还针对癌症研究中肿瘤纯度问题提出DNA 甲基化的分析方法108,109]

三维染色质结构是重要的表观遗传学因素,我国学者在这方面多项工作与国外同步走在前列。清华大学针对ChIA-PET 数据开发了鉴别染色质相互作用的方法110]、整合的ChIA-PET数据处理工具111]和利用层次狄利克雷过程从ChIA-PET 数据中寻找辅助因子复合物的方法112];北京大学和华中农业大学分别开发了新的用Hi-C数据探究染色体动态可接近性和拓扑结构域的方法113,114];清华大学利用核小体驱逐、多种辅助因子绑定,预测了雌激素受体的远程相互作用115],利用贝叶斯框架在3C染色质捕获数据上对三维空间结构进行了建模116]。复旦大学利用Hi-C数据与系统发育关系预测人类基因组远端调控因子的目标基因117];北京放射医学研究所整合三维数据、近端信号和连锁不平衡,构建人类非编码SNP 作用基因的数据库118],并利用三维数据和染色质状态鉴别CTCF 的多种功能119]

染色质可接近性、组蛋白修饰是重要的表观遗传学特征。清华大学用深度学习的方法实现了染色质可接近性的预测[120];中科院数学院开发一种分析染色质修饰差异的方法, 可用来发现细胞特异性调控元件[121];同济大学整合了大量人类和小鼠的可接近性数据和H3K27a 数据对顺式调控进行建模[122,123]。超级增强子具有宽阔的开放性染色质状态,可同时调控附近多个基因表达,清华大学构建了人和小鼠超级增强子数据库 dbSUPER[124], 已被人类基因数据库 GeneCards 整合使用。哈尔滨医科大学整合了多物种超级增强子并标注了它们调节细胞身份基因表达的潜在能力[125];同济大学建立了人和老鼠中染色质调控子与组蛋白修饰关系的数据库[126];华中科大收集了超过 580 个实验验证的组蛋白调节子,构建了读、写、擦除组蛋白乙酰化和甲基化的数据库[127]。

(六)合成生物学:人工基因线路的设计与实现

合成生物学是在生物信息学与系统生物学研究基础上对生物分子系统进行设计、合成与调控的学科,在医疗、化工、环境等方面有巨大应用前景。我国学者从控制与系统的角度出发,发展了新的合成基因线路元件与构建方法,在哺乳动物合成基因线路设计与实现上取得重要进展,并展示出医学应用前景。

合成基因线路构建是合成生物学的基础,近五年来,清华大学与 MIT 合作,利用一族正交 TALER 元件构建了具有良好性能的双稳态开关,并证明该双稳态开关可以精确利用 miRNA 表达识别不同细胞系[128];清华大学通过拆分 dCas9 蛋白的方法设计与逻辑门, 扩展了 dCas9 系统的应用范围[129];北京大学利用反向工程对三节点调控网络拓扑进行枚举仿真,以此为基础设计了可以感知细菌群体效应化合物的开关[130]。

长期以来,合成基因线路构建相对复杂耗时,清华大学发展新的 DNA 连接技术,实现了高效的多miRNA 表达质粒构建131];北京大学和中科院微生物所综合运用系统仿真、参数测定和前馈控制等技术手段,提出一套利用相互绝缘的转录调控元件构建基因线路的方法,提高了合成基因线路精确度132]。清华大学基于流平衡分析的方法开发了提高代谢工程设计效率的工具133],还与斯坦福大学合作开发了CRISPR 序列设计工具,满足了多物种、多用途 CRISPR 序列设计需求134]

合成生物学通过“以建而学”的方式借助合成基因线路了解生命系统中的调控机制。清华大学用微分方程建立 miRNA 与 ceRNA 关系的定量模型,用人工表达 miRNA 与ceRNA 的合成基因线路验证并修正模型,揭示了分子数量、结合位点数、结合能力和分子降解速度等因素对调控过程的影响,提出了这一过程中的非对称调控特性并将其应用于siRNA 设计中135];中科院深圳先进院与哈佛大学合作,利用合成生物学技术手段改造大肠杆菌菌株,在保持细胞生长速度不变的情况下实现细胞生长时直径或长度的定量控制,揭示了DNA 复制和细胞大小间的互相协调的机制136]

合成生物学技术有广阔应用前景,深圳市第二人民医院用 CRISPR/Cas9 系统设计了可感知膀胱癌细胞特异性信号的与逻辑门,实现了在体外对膀胱癌细胞的特异性识别和控制[137];华东师大将合成基因线路与智能手机结合,血糖仪将血糖信号传输给智能手机, 当血糖过高时启动皮下的远红外线灯,控制光敏的基因线路合成胰岛素或胰高血糖素样肽 -1,使机体的血糖维持在合理水平,实现了与人体生理调控对象的闭环[138]。

(七)中医药系统生物学和网络药理学

用信息和系统的观点和方法研究中医药是我国生物信息学研究的一大特色。我国近五年来在中医药系统生物学和网络药理学上取得多项进展。清华大学建立了基于网络药理学分析方法的计算平台,用以探究中药的网络调节机制和识别给定中药方剂的活性成分和协同成分组合,并提供了对六味地黄丸等中药方剂的网络调节机制分析[139,140];采用网络方法分析复杂的生物系统,通过评估疾病基因和中药靶标之间的关联提出了中药干预疾病分子网络的效应开关模型,并建立了基于化学基因组特征推断药物靶标相互作用的统计模型[141]。北京大学利用分子对接方法,构建了中药分子与流感病毒蛋白相互作用网络[142]。同济大学提出了一种靶标网络的分析框架,探讨了抗阿尔兹海默病的中药成分的作用机理[143]。浙江大学建立了一种整合疾病分子网络、组学数据和中药成分信息的方法来识别中药的活性成分[144]。中医药系统生物学和网络药理学的研究,是我国学者在信息科学与医学交叉领域发展出的具有原创性的新方向,具有广阔的发展前景。

三、国内外比较与发展趋势

生物信息学发展迅速,研究范围不断扩展,内容不断深化。我国生物信息学研究在近几年取得了长足的发展,在对各种组学数据的处理和分析方法、多种类型数据的整合、定量系统生物学与合成生物学研究等方面都已经走进世界先进行列,在中医药系统生物学与网络药理学相关研究上更是独树一帜。但是,我们清楚地看到,由于生命科学领域大部分关键实验和检测技术都是国际同行取得的,这些技术所带来的新的数据是驱动生物信息学乃至整个生命科学发展的关键,而信息领域的多项核心新技术也是源自国外,我国的生物信息学发展总体上仍与国际最先进水平有差距,尽管这一差距在近五年已经明显缩小。我们也欣喜地看到,我国科学家在如单细胞RNA 测序、合成基因线路等基因组学和合成生物学技术方面已经开始取得国际领先的成果。可以预见,伴随着相关学科的发展,我国在生物信息学领域的研究和应用将会全面进入国际前沿行列。

四、总结与展望

总结五年来我国生物信息学的发展,可以看到很多可喜变化。生物信息学作为一个交叉学科在信息领域和生命领域都得到了高度重视,我国生物信息学家取得的成果也越来越多。另外,生物信息学既是一个快速发展的技术领域,又是一个任重道远的基础学科,很多重大问题需长期艰苦的研究。科学研究上的广度并不一定能够带来深度;以生物信息学为关键技术促成在某些科学问题上的重大突破,并不一定意味着生物信息学本身也取得了重大突破。我们需要保持冷静的头脑,以对生物信息学本质的探索为目标,长期、持续、深入地努力,争取在对生命这一复杂信息系统的认识上不断取得突破,推动生物信息学学科本身的发展和完善。

五、结束语

本报告是根据我国学者近五年来在国际期刊发表的代表性论文进行调研后形成的,希望能反映我国生物信息学发展的概貌,但由于专业知识和所掌握资料的局限,加之时间和篇幅所限,难免挂一漏万,参考文献更是远未列全。请广大同行批评指正。

参考文献

[1] Luo J,Wang J,Zhang Z,et al. BOSS:a novel scaffolding algorithm based on an optimized scaffold graph[J]. Bioinformatics,2016,33(2):169.

[2] Luo J,Wang J,Zhang Z,et al. EPGA:de novo assembly using the distributions of reads and insert size.[J]. Bioinformatics,2015,31(6):825-833.

[3] Shi W,Ji P,Zhao F. The combination of direct and paired link graphs can boost repetitive genome assembly[J].Nucleic Acids Research,2017,45(6):e43.

[4] Zheng C,Li G,Liu J,et al. Bridger:a new framework for de novo tranome assembly using RNA-seq data[J]. Genome Biology,2015,16(1):30.

[5] Peng Y,Leung H C,Yiu S M,et al. IDBA-tran:a more robust de novo de Bruijn graph assembler for tranomes with uneven expression levels.[J].Bioinformatics,2013,29(13):i326.

[6] Li D,Liu C M,Luo R,et al. MEGAHIT:an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph[J].Bioinformatics,2015,31(10):1674.

[7] Chen Y,Ye W,Zhang Y,et al. High speed BLASTN:an accelerated MegaBLAST search tool[J].Nucleic Acids Research,2015,43(16):7762-7768.

[8] Leung A K,Kwok T P,Wan R,et al. OMBlast:Alignment Tool for Optical Mapping Using a Seed-and-extend Approach.[J].Bioinformatics,2016:311-319.

[9] Liu B,Guan D,Teng M,et al. rHAT:fast alignment of noisy long reads with regional hashing[J]. Bioinformatics,2016,32(11):1625-1631.

[10] Liu B,Guo H,Brudno M,et al. deBGA:read alignment with de Bruijn graph-based seed and extension[J]. Bioinformatics,2016,32(21):3224-3232.

[11] Liu B,Gao Y,Wang Y. LAMSA:fast split read alignment with long approximate matches.[J].Bioinformatics, 2016,33(2):btw594.

[12] Zhu Z,Li L,Zhang Y,et al. CompMap:a reference-based compression program to speed up read mapping to related reference sequences[J].Bioinformatics,2015,3(3):426-428.

[13] Zhou W,Li R,Yuan S,et al. MetaSpark:a spark-based distributed processing tool to recruit metagenomic reads to reference genomes[J].Bioinformatics,2017,33(7):1090.

[14] Zhao K,Chu X. G-BLASTN:accelerating nucleotide alignment by graphics processors[J].Bioinformatics, 2014,30(10):1384-1391.

[15] Zou Q,Hu Q,Guo M,et al. HAlign:Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy.[J].Bioinformatics,2015,31(15):2475-2481.

[16] Shen Q,Hu J,Jiang N,et al. contamDE:Differential expression analysis of RNA-seq data for contaminated tumor samples.[J].Bioinformatics,2016,32(5).

[17] Gu X. Statistical Detection of Differentially Expressed Genes based on RNA-seq:from Biological to Phylogenetic Replicates[J].Briefings in Bioinformatics,2015,17(2).

[18] Wang H,Sun Q,Zhao W,et al. Individual-level analysis of differential expression of genes and pathways for personalized medicine.[J].Bioinformatics,2015,31(1):62.

[19] Yang E W,Jiang T. SDEAP:A Splice Graph Based Differential Tran Expression Analysis Tool for Population Data.[J].Bioinformatics,2016,32(23):3593.

[20] Tan Y D,Xu H. A general method for accurate estimation of false discovery rates in identification of differentially expressed genes[J].Bioinformatics,2014,30(14):2018.

[21] Sun S,Hood M,Scott L,et al. Differential expression analysis for RNAseq using Poisson mixed models.[J]. Nucleic Acids Research,2016.

[22] Si Y,Liu P,Li P,et al. Model-based clustering for RNA-seq data[J].Bioinformatics,2013,30(2): 197-205.

[23] Wang N,Wang Y,Hao H,et al. A bi-Poisson model for clustering gene expression profiles by RNA-seq[J]. Briefings in Bioinformatics,2013,15(4):534-541.

[24] Ye M,Wang Z,Wang Y,et al. A multi-Poisson dynamic mixture model to cluster developmental patterns of gene expression by RNA-seq[J].Briefings in Bioinformatics,2015,16(2):205-215.

[25] Jiang L,Dong Y,Chen N,et al. DACE:A Scalable DP-means Algorithm for Clustering Extremely Large Sequence Data[J].Bioinformatics,2016.

[26] Jiang L,Chen H,Pinello L,et al. GiniClust:detecting rare cell types from single-cell gene expression data with Gini index[J].Genome Biology,2016,17(1):1-13.

[27] Zhang,W.,et al. Integrating genomic,epigenomic,and tranomic features reveals modular signatures underlying poor prognosis in ovarian cancer. Cell reports,2013,4(3):542-553.

[28] Liang,X.,et al. LRSSL:predict and interpret drug-disease associations based on data integration using sparse subspace learning. Bioinformatics,2017,33(8):1187-1196.

[29] Ding,Z.,S. Zu,J. Gu. Evaluating the molecule-based prediction of clinical drug responses in cancer.Bioinformatics,2016,32(19):2891-2895.

[30] Chen,J.,S. Zhang,Integrative analysis for identifying joint modular patterns of gene-expression and drug- response data. Bioinformatics,2016,32(11):1724-1732.

[31] Liu,Z.,et al. Similarity-based prediction for Anatomical Therapeutic Chemical classification of drugs by integrating multiple data sources. Bioinformatics,2015,31(11):1788-1795.

[32] Zou,M.,et al. NCC-AUC:an AUC optimization method to identify multi-biomarker panel for cancer prognosis from genomic and clinical data. Bioinformatics,2015,31(20):3330-3338.

[33] Ren,X.,et al. ellipsoidFN:a tool for identifying a heterogeneous set of cancer biomarkers based on gene expressions. Nucleic acids research,2012,41(4):e53.

[34] Hu,P.,et al. Integrating multiple resources to identify specific tranional cooperativity with a Bayesian approach. Bioinformatics,2013,30(6):823-830.

[35] Ping,Y.,et al. Identifying core gene modules in glioblastoma based on multilayer factor-mediated dysfunctional regulatory networks through integrating multi-dimensional genomic data. Nucleic acids research,2015,43(4): 1997-2007.

[36] Li,Y.,et al. Comprehensive analysis of the functional microRNA-mRNA regulatory network identifies miRNA signatures associated with glioma malignant progression. Nucleic acids research,2013,41(22):e203.

[37] Xu,Y.,et al. Identify bilayer modules via pseudo-3D clustering:applications to miRNA-gene bilayer networks.

Nucleic acids research,2016,44(20):e152.

[38] Wu,D.,et al. Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation:application to cancer molecular classification. BMC genomics,2015,16(1):1022.

[39] Jiang,R. Walking on multiple disease-gene networks to prioritize candidate genes. Journal of molecular cell biology7,2015:214-230.

[40] Liu,J. et al. LLR:A latent low-rank approach to colocalizing genetic risk variants in multiple GWAS.Bioinformatics,2017.

[41] Ning,S. et al. LincSNP:a database of linking disease-associated SNPs to human large intergenic non-coding RNAs. BMC bioinformatics15,2014:152 .

[42] Chen,J. & Tian,W. Explaining the disease phenotype of intergenic SNP through predicted long range regulation.

Nucleic acids research44,2016:8641-8654.

[43] Sun,H. et al. iPEAP:integrating multiple omics and genetic data for pathway enrichment analysis.Bioinformatics30,2013:737-739.

[44] Zhang,Xiujun,et al.“Conditional mutual inclusive information enables accurate quantification of associations in gene regulatory networks.”Nucleic acids research 43.5(2014):e31.

[45] Guan,Daogang,et al.“CMGRN:a web server for constructing multilevel gene regulatory networks using ChIP- seq and gene expression data.”Bioinformatics 30.8(2014):1190-1192.

[46] Wang,Jianxin,et al.“Reconstructing regulatory networks from the dynamic plasticity of gene expression by mutual information.”Nucleic acids research 41.8(2013):e97.

[47] Yalamanchili,Hari Krishna,et al.“SpliceNet:recovering splicing isoform-specific differential gene networks from RNA-Seq data of normal and diseased samples.”Nucleic acids research 42.15(2014):e121.

[48] Peng,Chien-Hua,et al.“Causal inference of gene regulation with subnetwork assembly from genetical genomics data.”Nucleic acids research 42.5(2013):2803-2819.

[49] Zhao,Xing-Ming,et al.“Identifying cancer-related microRNAs based on gene expression data.”Bioinformatics 31.8(2014):1226-1234.

[50] Qin,Sheng,Fei Ma,Liming Chen.“Gene regulatory networks by tranion factors and microRNAs in breast cancer.”Bioinformatics 31.1(2014):76-83.

[51] Xu,Yungang,et al.“Identify bilayer modules via pseudo-3D clustering:applications to miRNA-gene bilayer networks.”Nucleic acids research 44.20(2016):e152.

[52] Wang,Zengmiao,et al.“VCNet:vector-based gene co-expression network construction and its application to RNA-seq data.”Bioinformatics(2017):btx131.

[53] Hong,Shengjun,et al.“Canonical correlation analysis for RNA-seq co-expression networks.”Nucleic acids research 41.8(2013):e95.

[54] Wang,Jia-Hong,et al.“GenCLiP 2.0:a web server for functional clustering of genes and construction of molecular networks based on free terms.”Bioinformatics 30.17(2014):2534-2536.

[55] Gu,Zuguang,and Jin Wang.“CePa:an R package for finding significant pathways weighted by multiple network centralities.”Bioinformatics 29.5(2013):658-660.

[56] Yu,Xiangtian,Guojun Li,Luonan Chen.“Prediction and early diagnosis of complex diseases by edge-network.” Bioinformatics 30.6(2013):852-859.

[57] Liu,Xiaoping,et al.“Quantifying critical states of complex diseases using single-sample dynamic network biomarkers.”PLoS computational biology 13.7(2017):e1005633.

[58] Ping,Yanyan,et al.“Identifying core gene modules in glioblastoma based on multilayer factor-mediated dysfunctional regulatory networks through integrating multi-dimensional genomic data.”Nucleic acids research 43.4(2015):1997-2007.

[59] Zhang,Hongyi,et al.“Cooperative genomic alteration network reveals molecular classification across 12 major cancer types.”Nucleic acids research 45.2(2016):567-582.

[60] Xu,Juan,et al.“The mRNA related ceRNA-ceRNA landscape and significance across 20 major cancer types.” Nucleic acids research 43.17(2015):8169-8182.

[61] Zhang,Naiqian,et al.“Predicting anticancer drug responses using a dual-layer integrated cell line-drug network model.”PLoS computational biology 11.9(2015):e1004498.

[62] Chen,Jinyu,and Shihua Zhang.“Integrative analysis for identifying joint modular patterns of gene-expression and drug-response data.”Bioinformatics 32.11(2016):1724-1732.

[63] Wei L,Liao M,Gao X,et al. An improved protein structural classes prediction method by incorporating both sequence and structure information[J].IEEE transactions on nanobioscience,2015,14(4):339-349.

[64] Kong L,Zhang L,Lv J. Accurate prediction of protein structural classes by incorporating predicted secondary structure information into the general form of Chou’s pseudo amino acid composition[J].Journal of theoretical biology,2014,344:12-18.

[65] You Z H,Chan K C C,Hu P. Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest[J].PLoS One,2015,10(5): e0125811.

[66] Li Z W,You Z H,Chen X,et al. Accurate prediction of protein-protein interactions by integrating potential evolutionary information embedded in PSSM profile and discriminative vector machine classifier[J].Oncotarget, 2017,8(14):23638.

[67] Li Z W,You Z H,Chen X,et al. Highly Accurate Prediction of Protein-Protein Interactions via Incorporating Evolutionary Information and Physicochemical Characteristics[J].International journal of molecular sciences, 2016,17(9):1396.

[68] Feng Y,Luo L. Using long-range contact number information for protein secondary structure prediction[J]. International Journal of Biomathematics,2014,7(05):1450052.

[69] Zhang L,Kong L,Han X,et al. Structural class prediction of protein using novel feature extraction method from chaos game representation of predicted secondary structure[J].Journal of theoretical biology,2016,400:1-10.

[70] Xu Y-Y,Yang F,Zhang Y,et al. An image-based multi-label human protein subcellular localization predictor(iLocator)reveals protein mislocalizations in cancer tissues. Bioinformatics,2013,29(16):2032-2040.

[71] Xu Y-Y,Yang F,Zhang Y,et al. Bioimaging-based detection of mislocalized proteins in human cancers by semi- supervised learning. Bioinformatics,2015,31(7):1111-1119.

[72] Xu Y-Y,Yang F,Shen H-B. Incorporating organelle correlations into semi-supervised learning for protein subcellular localization prediction. Bioinformatics,2016,32(14):2184-2192.

[73] Yang Q,Zou H Y,Zhang Y,et al. Multiplex protein pattern unmixing using a non-linear variable-weighted support vector machine as optimized by a particle swarm optimization algorithm[J].Talanta,2016,147:609- 614.

[74] Yang Q,Tang L,Yu R. Efficient pattern unmixing of multiplex proteins based on variable weighting of texture deors[J].Analytical Methods,2016,8(46):8188-8195.

[75] Gao P,Wang S,Lv J,et al. A database assisted protein structure prediction method via a swarm intelligence algorithm[J].RSC Advances,2017,7(63):39869-39876.

[76] Zhou C,Hou C,Wei X,et al. Improved hybrid optimization algorithm for 3D protein structure prediction[J]. Journal of molecular modeling,2014,20(7):2289.

[77]Li Y,Zhou C,Zheng X. Artificial bee colony algorithm for the protein structure prediction based on the Toy model[J]. Fundamenta Informaticae,2015,136(3):241-252.

[78] Wei X,Zheng X,Zhang Q,et al. Improved Niche Genetic Algorithm for Protein Structure Prediction[C]//Bio- Inspired Computing-Theories and Applications. Springer,Berlin,Heidelberg,2015:475-492.

[79] Zhou C,Hu T,Zhou S. Protein structure prediction based on improved multiple populations and GA-PSO[M]// Bio-Inspired Computing-Theories and Applications. Springer,Berlin,Heidelberg,2014:644-647.

[80] Li Y,Zhou C,Zheng X. The Application of Artificial Bee Colony Algorithm in Protein Structure Prediction[M]// Bio-Inspired Computing-Theories and Applications. Springer,Berlin,Heidelberg,2014:255-258.

[81] Tsay J J,Su S C,Yu C S. A multi-objective approach for protein structure prediction based on an energy model and backbone angle preferences[J].International journal of molecular sciences,2015,16(7):15136-15149.

[82] Ngaam J. Cheung and Hong-Bin Shen,Hierarchical particle swarm optimizer for minimizing the non-convex potential energy of molecular structure,Journal of Molecular Graphics and Modelling,2014,54:114-122.

[83] Liu G H,Shen H B,Yu D J. Prediction of protein-protein interaction sites with machine-learning-based data- cleaning and post-filtering procedures[J].The Journal of membrane biology,2016,249(1-2):141-153.

[84] You Z H,Yu J Z,Zhu L,et al. A MapReduce based parallel SVM for large-scale predicting protein-protein interactions[J].Neurocomputing,2014,145:37-43.

[85] Yuchao Pan,Yuxi Dong,Jingtian Zhou,et al. cOSPREY:A Cloud-Based Distributed Algorithm for Large-Scale Computational Protein Design. Journal of Computational Biology. 2016.

[86] Wang Y,Mao H,Yi Z. Protein secondary structure prediction by using deep learning method[J].Knowledge- Based Systems,2017,118:115-123.

[87] Li Z,Yu Y. Protein secondary structure prediction using cascaded convolutional and recurrent neural networks[J]. arXiv preprint arXiv:2016,1604.07176.

[88] Zou X,Wang G,Yu G. Protein Function Prediction Using Deep Restricted Boltzmann Machines[J].BioMed Research International,2017,2017.

[89] Zhang S,Zhou J,Hu H,et al. A deep learning framework for modeling structural features of RNA-binding protein targets[J].Nucleic acids research,2015,44(4):e32.

[90] Feng Wang,Huichao Gong,Gaochao Liu,et al. DeepPicker:A deep learning approach for fully automated particle picking in cryo-EM . Journal of Structural Biology. Volume 195,Issue 3. 2016.

[91] Dapeng Xiong,Jianyang Zeng*,Haipeng Gong*. A deep learning framework for improving long-range residue- residue contact prediction using a hierarchical strategy. Bioinformatics. 2017.

[92] Shao W,Ding Y,Shen H B,et al. Deep model-based feature extraction for predicting protein subcellular localizations from bio-images[J].Frontiers of Computer Science,2017,11(2):243-252.

[93] Xu Y,Wu J,Yin C C,et al. Unsupervised cryo-EM data clustering through adaptively constrained K-means algorithm[J].PloS one,2016,11(12):e0167765.

[94] Han C,Chen J,Wu Q,et al. Sparse Markov chain-based semi-supervised multi-instance multi-label method for protein function prediction[J].Journal of bioinformatics and computational biology,2015,13(05):1543001.

[95] Xu Y,Min H,Song H,et al. Multi-instance multi-label distance metric learning for genome-wide protein function prediction[J].Computational biology and chemistry,2016,63:30-40.

[96] Xu Y,Min H,Wu Q,et al. Multi-Instance Metric Transfer Learning for Genome-Wide Protein Function Prediction[J]. Scientific Reports,2017,7.

[97] Zou X,Wang G,Yu G. Protein Function Prediction Using Deep Restricted Boltzmann Machines[J].BioMed Research International,2017,2017.

[98] Wu J S,Huang S J,Zhou Z H. Genome-wide protein function prediction through multi-instance multi-label learning[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2014,11(5):891-902.

[99] Xiong W,Xie L,Zhou S,et al. Active learning for protein function prediction in protein-protein interaction networks[J].Neurocomputing,2014,145:44-52.

[100] You Z H,Lei Y K,Zhu L,et al. Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis[J].BMC bioinformatics,2013,14(8): S10.

[101] Wang D D,Wang R,Yan H. Fast prediction of protein-protein interaction sites based on extreme learning machines[J].Neurocomputing,2014,128:258-266.

[102] Wei Z S,Han K,Yang J Y,et al. Protein-protein interaction sites prediction by ensembling SVM and sample- weighted random forests[J].Neurocomputing,2016,193:201-212.

[103] Guo,et al.“Characterizing the strand-specific distribution of non-CpG methylation in human pluripotent cells.”Nucleic acids research 42.5:2013,3009-3016.

[104] Liu,et al.“Systematic identification and annotation of human methylation marks based on bisulfite sequencing methylomes reveals distinct roles of cell type-specific hypomethylation in the regulation of cell identity genes.”Nucleic acids research 44.1:2016,75-94.

[105] Zhi,et al.“A novel reannotation strategy for dissecting DNA methylation patterns of human long intergenic non- coding RNAs in cancers.”Nucleic acids research 42.13:2014,8258-8270.

[106] Zhang,et al.“Functional DNA methylation differences between tissues,cell types,and across individuals discovered using the M&M algorithm.”Genome research 23.9:2013,1522-1540.

[107] He,et al“. DMEAS:DNA methylation entropy analysis software.”Bioinformatics 29.16:2013,2044-2045.

[108] Zheng,et al.“MethylPurify:tumor purity deconvolution and differential methylation detection from single tumor DNA methylomes.”Genome biology 15.7:2014,419.

[109] Zheng,et al.“Estimating and accounting for tumor purity in the analysis of DNA methylation data from cancer studies.”Genome biology 18.1:2017,17.

[110] He,et al.“MICC:an R package for identifying chromatin interactions from ChIA-PET data.”Bioinformatics 31.23:2015,3832-3834.

[111] Li,et al.“ChIA-PET2:a versatile and flexible pipeline for ChIA-PET data analysis.”Nucleic acids research

45.1:2017,e4.

[112] Djekidel et al.“3CPET:finding co-factor complexes from ChIA-PET data using a hierarchical Dirichlet process.”Genome biology 16.1:2015,288.

[113] Wang,et al.“Dynamic chromatin accessibility modeled by Markov process of randomly-moving molecules in the 3D genome.”Nucleic acids research 45.10:2017,e85.

[114] Wang,et al.“HiTAD:detecting the structural and functional hierarchies of topologically associating domains from chromatin interactions.”Nucleic Acids Research.

[115] He,et al.“Nucleosome eviction and multiple co-factor binding predict estrogen-receptor-alpha-associated long- range interactions.”Nucleic acids research 42.11:2014,6935-6944.

[116] Wang,et al“. Inferential modeling of 3D chromatin structure.”Nucleic acids research 43.8:2015,e54.

[117] Lu,et al.“Combining Hi-C data with phylogenetic correlation to predict the target genes of distal regulatory elements in human genome.”Nucleic acids research 41.22:2013,10391-10402.

[118] Lu,et al.“3DSNP:a database for linking human noncoding SNPs to their three-dimensional interacting genes.”Nucleic acids research 45.D1:2017,D643-D649.

[119] Lu, et al.“Defining the multivalent functions of CTCF from chromatin state and three-dimensional chromatin interactions.”Nucleic acids research 44.13:2016,6200-6212.

[120] Min,et al.“Chromatin accessibility convolutional long short-prediction via term memory networks with k-mer embedding”Bioinformatics 33:2017,i92-i101.

[121] Chen,et al.“Discovery of cell-type specific regulatory elements in the human genome using differential chromatin modification analysis.”Nucleic acids research 41.20:2013,9230-9242.

[122] Mei,et al.“Cistrome Data Browser:a data portal for ChIP-Seq and chromatin accessibility data in human and mouse.”Nucleic acids research 45.D1:2017,D658-D662.

[123] Wang, et al.“Modeling cis-regulation with a compendium of genome-wide histone H3K27ac profiles.”Genome

research 26.10:2016,1417-1429.

[124] Khan and Zhang.“dbSUPER:a database of super-enhancers in mouse and human genome.”Nucleic acids research 44.D1:2016,D164-D171.

[125] Wei,et al“. SEA:a super-enhancer archive.”Nucleic acids research 44.D1:2016,D172-D179.

[126] Wang,et al.“CR Cistrome:a ChIP-Seq database for chromatin regulators and histone modification linkages in human and mouse.”Nucleic acids research 42.D1:2013,D450-D458.

[127] Xu,et al.“WERAM:a database of writers,erasers and readers of histone acetylation and methylation in eukaryotes.”Nucleic acids research 45.D1:2017,D264-D270.

[128] Li,Yinqing,et al.“Modular construction of mammalian gene circuits using TALE tranional repressors.”Nature chemical biology11.3(2015):207-213.

[129] Ma,Dacheng,Shuguang Peng,et al.“Integration and exchange of split dCas9 domains for tranional controls in mammalian cells.”Nature communications7(2016).

[130] Zeng,Weiqian,et al“. Rational design of an ultrasensitive quorum-sensing switch.”ACS Synthetic Biology(2017).

[131] Wang,Tingting,et al.“Construction and characterization of a synthetic microRNA cluster for multiplex RNA interference in mammalian cells.”ACS synthetic biology5.11(2015):1193-1200.

[132] Zong,Yeqing,et al.“Insulated tranional elements enable precise design of genetic circuits.”Nature Communications8(2017).

[133] Liu,Honglei,Yanda Li,et al.“OP-synthetic:identification of optimal genetic manipulations for the overproduction of native and non-native metabolites.”Quantitative Biology2.3(2014):100-109.

[134] Liu,Honglei,et al.“CRISPR-ERA:a comprehensive design tool for CRISPR-mediated gene editing, repression and activation.”Bioinformatics31.22(2015):3676-3678.

[135] Yuan,Ye,et al“. Model-guided quantitative analysis of microRNA-mediated regulation on competing endogenous RNAs using a synthetic gene circuit.”Proceedings of the National Academy of Sciences112.10(2015):3158- 3163.

[136] Zheng,Hai,et al.“Interrogating the Escherichia coli cell cycle by cell dimension perturbations.”Proceedings of the National Academy of Sciences113.52(2016):15000-15005.

[137] Liu,Yuchen,et al.“Synthesizing AND gate genetic circuits based on CRISPR-Cas9 for identification of bladder cancer cells.”Nature communications5(2014):5393.

[138] Shao,Jiawei,et al.“Smartphone-controlled optogenetically engineered cells enable semiautomatic glucose homeostasis in diabetic mice.”Science translational medicine 9.387(2017):eaal2298.

[139] Zhang B,et al,An integrative platform of TCM network pharmacology and its application on an herbal formula, Qing-Luo-Yin.Evidence-based Complementary and Alternative Medicine,2013,456747.

[140] Liang X,et al,A novel network pharmacology approach to analyse traditional herbal formulae:the Liu-wei-di- huang Pill as a case study. Molecular BioSystems,2014,1(5):1014-1022.

[141] Zu S,et al,Global optimization-based inference of chemogenomic features from drug-target interactions. Bioinformatics,2015,31(15):2523-2529.

[142] Gu S,et al,Understanding molecular mechanisms of traditional Chinese medicine for the treatment of influenza viruses infection by computational approaches. Molecular BioSystems,2013,9(11):2696-2700.

[143] Sun Y,et al,Towards a bioinformatics analysis of anti-Alzheimer’s herbal medicines from a target network perspective. Briefings in Bioinformatics,2013,14(3):327-343.

[144] Wang L,Li Z,Shao Q,et al. Dissecting active ingredients of Chinese medicine by content-weighted ingredient- target network. Molecular Biosystems,2014,10(7):1905-1911.


来源:中国自动化学会

最后编辑:
作者:萌小白
一个热爱网络的青年!

发布评论

表情