科普 | DisGeNET,与疾病相关的基因公开数据库有什么好的?
卖萌控的博客
点击这里进入电脑版页面!体验更好
科普 | DisGeNET,与疾病相关的基因公开数据库有什么好的?
2023-12-3 萌小白


DisGeNET数据库是一个与疾病相关的基因数据库,整合了来自各种存储库(包括孟德尔,复杂和环境疾病)的人类基因疾病协会(GDA)和变异疾病协会(VDA)的信息。具体技术包括对基因-疾病词汇的mapping、DisGeNET本体分析。






网址:https://www.disgenet.org/



目前的版本DisGeNET v4.0,包含17,381个基因,15,093
种疾病、异常和表型,共429,036相关性;同时还有72,870种 variant-disease associations
(VDAs),由46,589 个SNPs 和6,356 种疾病、表型组成。



开发了基因-疾病打分模型,提供 Search and Browse功能,cytoscape plugin。支持本地下载SQLite database 和 RDF (Resource Description Framework) database。



1.数据源



Curated Data:



UNIPROT、CTD、CLINVAR、ORPHANET、GWAS CATALOG



Predicted Data:



CTD、MGD、RGD



Literature Data:



GAD、LHGDN、BeFree Data



Variant Data:



dbSNP、EXAC、1000 Genomes Project、Ensembl



2.各个数据库统计









3. DisGeNET 打分系统






考虑因素:数据来源和数量(数据库类型、模式生物), publications的数量等。



打分值从0-1。



Distribution of the DisGeNET score according to the number of sources reporting the association






4. 疾病特异性指数Disease Specificity Index(DSI)






where:



- Nd is the number of diseases associated to the gene



- NT is the total number of diseases in DisGeNET (13,674)



The DSI ranges from 0 to 1.



DSI = 0 implies that the gene is associated only to phenotypes.



Example: TNF, associated to more than 1,500 diseases, has a DSI of
0.247, while IDH3A is associated to one disease, with a DSI of 1.



5. 疾病多效应指数Disease Pleiotropy Index(DPI)






where:



- Ndc is the number of the different MeSH disease classes of the diseases associated to the gene



- NTC is the total number of MeSH diseases classes in DisGeNET (27)



The DPI ranges from 0 to 1.



DPI = 0 implies that the gene is associated only to phenotypes, or that the associated diseases do not map to any MeSH classes.



Example: gene KCNE2 is associated to 38 diseases and 10 phenotypes.
36 out of the 38 diseases have a MeSH disease class. The 36 diseases are
associated to 10 different MeSH classes. The DPI index for KCNE2 =
10/27*100 ~ 0.37. Nevertheless, gene APOE, associated to more than 700
diseases, of different disease classes, has a DPI of 1.



6. 词汇映射(Vocabulary Mapping)



Diseases:



The vocabulary used for diseases in the current release of DisGeNET
is the Unified Medical Language System®(UMLS®) vocabulary. The
repositories of gene-disease associations use different disease
vocabularies, OMIM® terms for diseases from UniProt, CTDTM, and MGD;
MeSH terms used by CTDTM, LHGDN, and RGD, UMLS® Concept Unique
Identifiers (CUIs) from CLINVAR; Orphanet identifiers are mapped using
Orphanet cross-references. Disease names from GAD and the GWAS Catalog
are normalized using the UMLS Metathesaurus. We also used UMLS®
Metathesaurus® concept structure to map MIM and MeSH terms to UMLS®
CUIs.



Genes:



For human genes, HGNC symbols (used for some entries in GAD), and
Uniprot accession numbers (used by Uniprot) are converted to NCBI Entrez
gene identifiers using an in house dictionary that crossreferences
HGNC, Uniprot and NCBI-Gene information. For mapping of mouse and rat
genes, we used files
ftp://ftp.informatics.jax.org/pub/reports/HOM_MouseHumanSequence.rpt,
and ftp://rgd.mcw.edu/pub/data_release/RGD_ORTHOLOGS.txt both with
information of orthology from MGD and RGD, respectively to map rat and
mouse Entrez gene identifiers to human Entrez identifiers. We discarded
the relationships when a human ortholog of the mouse or rat gene could
not be found.



7. The DisGeNET Association Type Ontology









8. 数据属性



疾病



·the disease name, provided by theUMLS®Metathesaurus®



·theUMLS®semantic types



·theMeSHclass: We classify the diseases according the MeSH hierarchy
using 23 upper level concepts of the MeSH tree branch C (Diseases) plus
three concepts of the F branch (Psychiatry and Psychology: "Behavior and
Behavior Mechanisms", "Psychological Phenomena and Processes", and
"Mental Disorders").



·The top level concepts from theHuman Disease Ontology.



·The DisGeNET disease type:disease,phenotype and group.



UMLS semantic types:



- Disease or Syndrome



- Neoplastic Process



- Acquired Abnormality



- Anatomical Abnormality



- Congenital Abnormality



- Mental or Behavioral Dysfunction



UMLS® semantic types:



- Pathologic Function



- Sign or Symptom



- Finding



- Laboratory or Test Result



- Individual Behavior



- Clinical Attribute



- Organism Attribute



- Organism Function



- Organ or Tissue Function



- Cell or Molecular Dysfunction



These classifications were manually checked. In addition, disease
entries referring to disease groups such as "Cardiovascular Diseases",
"Autoimmune Diseases", "Neurodegenerative Diseases, and "Lung Neoplasms"
were classified as disease group.



Removed terms considered as diseases by other sources, but are not
strictly diseases, such as terms belonging to the following UMLS®
semantic types:



- Gene or Genome



- Genetic Function



- Immunologic Factor



- Injury or Poisoning



These attributes are shown in the different views of the browser, and they are all shown in the Disease Tab.



基因



·the official gene symbol, from theNCBI



·the NCBI Official Full Name



·theUniprotaccession



·the top level Panther protein class.



·the top level Reactome pathways.



·the Specificity Index (SI)



·the Pleiotropy Index (PI)



突变



·The position in the chromosome



·The reference and alternative alleles



·The class of the variant: SNP, deletion, insertion, indel, somatic SNV, substitution, sequence alteration, and tandem repeat



·The allelic frequency according to the 1000 Genomes Project



·The allelic frequency according to the Exome Aggregation Consortium



·The most severe consequence type according to the VEP



·Links to dbSNP



·Links to ClinVar



·Links to Ensembl



基因-疾病相关性



·theDisGeNET score



·the DisGeNET Gene-Disease Association Type



·the publication(s) that reports the gene-disease association, with the Pubmed Identifier



·a representative sentence from the publication describing the
association between the gene and the disease (If a representative
sentence is not found, we provide the title of the paper)



·the original source reporting the Gene-Disease Association



·For some sources, we provide the variant(s) associated to the gene-disease association



9. 提供Cytoscape插件



DisGeNET Cytoscape App



10.支持本地下载



可以下载tab格式的文档(Curated、BeFree gene-disease associations和publications);



提供RDF Linked Dataset;



同时query大量的数据,还支持python、perl、R脚本;



还提供mapping功能:



1. UniProt Downloads DisGeNET genes -> UniProt entries



2. UMLS CUI -> MeSH Identifier


版权申明:本文系“启帆医学BioSCI”公众号转载的文章,仅作分享之用jibing
发表评论:
昵称

邮件地址 (选填)

个人主页 (选填)

内容