발행년 : | 2016 |
---|---|
구분 : | 학위논문 |
학술지명 : | 공주대학교 대학원 : 산업시스템공학과 (석사) |
관련링크 : | http://www.riss.kr/link?id=T13985586 |
전장 유전체 연관 분석을 위한 통계적 방법에 관한 연구
기타서명 A Study on the Statistical Methods for the Genome-Wide Association Analysis
저자 윤병국
형태사항 iii, 63장 : 삽도 ; 26cm
일반주기 지도교수:이창용
참고문헌 : 26-29장
학위논문사항 학위논문(석사)-- 공주대학교 대학원 : 산업시스템공학과 2016. 2
발행국 충청남도
언어 한국어
출판년 2016
소장기관 공주대학교 도서관
초록
Human beings have made improvements of breed by using what is passed down to the next generation of organisms through heredity. The genetic information to be passed down to the next generation is stored in the deoxyribonucleic acid(DNA) in a spiral structure in cells. As the bases comprised of DNA are in sequence, it is called a base sequence. All genetic information is stored in base sequences. The analytical methodology to find connections between a base sequence and phenotype is called genome wide association study(GWAS). This study investigated GWAS for the phenotypes with quantitative traits, whose researches had been active in recent years.
GWAS uses as input data the results of genome reanalysis process, which identifies a genetic variation called single nucleotide polymorphism(SNP) by using the data produced by a base sequence analyzer. It also follows the order of solving the collective structurization issue through the principal component analysis(PCA) of input data and finding and visualizing the candidate groups to influence the phenotypes used in the analysis of genotypes through statistical testing.
The Genome Association and Prediction Integrated Tool(GAPIT) program is widely used in researches to analyze phenotypes with quantitative traits. Since it is written in the R programming language with some programming language problems, however, it is relatively slow and contains internal errors. In addition, it is difficult to trust its results when genome data do not follow T or F distribution in case of T or F parametric test, respectively. The present study developed a program to improve the GAPIT program by using the C programming language instead of the R programming language and conducting Kandell's T test, a non-parametric test, instead of such parametric tests as T and F tests. The analysis results show that the run time of the program developed in the study was faster by that of GAPIT program by approximately 4~17 times according to the number of SNP.
목차
I. 서론 1
II. 유전 변이 규명 5
1. DNA 유전체 재분석 과정 5
2. 유전체 데이터 형식 7
III. 전장 유전체 연관 분석 10
1. 개요 10
2. 주성분 분석을 사용한 집단 구조화 12
3. Kendall τ 상관 계수와 가설 검정 14
4. False Discovery Rate 17
IV. 전장 유전체 연관 분석 결과 20
1. 데이터 소개 20
2. Manhattan Plot 20
3. Quantile-Quantile Plot 21
4. 분석 성능 비교 23
V. 결론 25
참고문헌 26
ABSTRACT 30
부록 1 32
부록 2 34
부록 3 35