Skip Navigation

JNCI Monographs 1999 1999(26):1-16;
© 1999 by Oxford University Press
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Schaid, D. J.
Right arrow Articles by Dahl, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Schaid, D. J.
Right arrow Articles by Dahl, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Journal of the National Cancer Institute Monographs, No. 26, 1-16, 1999
© 1999 Oxford University Press


I. GENE DISCOVERY PANEL

Discovery of Cancer Susceptibility Genes: Study Designs, Analytic Approaches, and Trends in Technology

Daniel J. Schaid, Ken Buetow, Daniel E. Weeks, Ellen Wijsman, Sun-Wei Guo, Jurg Ott, Carol Dahl

Affiliations of authors: D. J. Schaid, Departments of Health Sciences Research and Medical Genetics, Mayo Clinic/Mayo Foundation, Rochester, MN; K. Buetow, C. Dahl, National Cancer Institute, National Institutes of Health, Bethesda, MD; D. E. Weeks, Department of Human Genetics, University of Pittsburgh, PA; E. Wijsman, Medical Genetics and Biostatistics, University of Washington, Seattle; S.-W. Guo, Division of Epidemiology, University of Minnesota, Minneapolis; J. Ott, Rockefeller University, New York, NY.

Correspondence to: Daniel J. Schaid, Ph.D., Harwick 7, Mayo Clinic, 200 First St. SW, Rochester, MN 55905 (e-mail: schaid{at}mayo.edu).


    ABSTRACT
 Top
 Abstract
 Introduction
 Definition and Measurement of...
 Intermediate End Points
 Sampling Considerations
 Analytic Methods
 Linkage Methods
 Artificial Neural Networks and...
 Association Methods
 Pooled Analyses
 Technology Trends
 The Changing Gene Discovery...
 Genetic Analysis Opportunities...
 Summary
 References
 
Determining the genetic causes of cancers has immense public health benefits, ranging from prevention to earlier detection and treatment of disease. Although a number of cancer susceptibility genes have been successfully identified, design and analytic issues remain that challenge the current paradigm of gene discovery. Some examples are the definition and measurement of cancer phenotype, the use of intermediate end points, the choice of sample (e.g., affected relative pairs versus large extended pedigrees), the choice of analytic method [e.g., parametric logarithm of the odds (LOD) score method versus model-free methods], and the influence of gene-environment interaction on linkage analysis. Furthermore, association methods, based on either the traditional case-control study design or family-based controls, are popular choices to evaluate candidate genes or screen for linkage disequilibrium. Finally, the study design and analytic methods for gene discovery are determined to some extent by what genomic technology is feasible within the laboratory. Many of the main issues related to gene discovery, as well as trends in genomic technology that will impact on gene discovery, are discussed from the perspective of their strengths and weaknesses, pointing to areas in need of further work.



    INTRODUCTION
 Top
 Abstract
 Introduction
 Definition and Measurement of...
 Intermediate End Points
 Sampling Considerations
 Analytic Methods
 Linkage Methods
 Artificial Neural Networks and...
 Association Methods
 Pooled Analyses
 Technology Trends
 The Changing Gene Discovery...
 Genetic Analysis Opportunities...
 Summary
 References
 
Determining the genetic causes of cancers has immense public health benefits, ranging from prevention to earlier detection and treatment of disease. Numerous studies have demonstrated that both somatic genetic changes and hereditary factors are involved in the etiology of many cancers. Although somatic genetic changes may appear to be more frequently related to cancers than hereditary factors, the primary focus of this paper is on the discovery of hereditary factors—those genes that increase the susceptibility to cancer. Genetic mapping consists of two different yet interrelated problems: to estimate the genetic effects and to localize the gene(s). The biggest challenge today for gene discovery is to find the locations of the genes with little or no knowledge at all of how many genes are involved, how they interact with each other or with environmental factors, and what the genotype-phenotype relationship is. Although the contribution of a single gene may account for only a portion of susceptibility to the disorder, the common occurrence of the disorder warrants a better understanding of its causes. Never before have the available genetic tools, and those on the near horizon, offered as much promise to help in this determination as they offer today. The promises are many, but so too are the complexities of common cancers. For example, some complexities of heritable disorders are incomplete penetrance (<100% risk of disease among carriers), phenocopies (disease among noncarriers), locus and allelic heterogeneity, oligogenic and polygenic inheritance, gene-environment interactions, gene-gene interactions, mitochondrial inheritance, parental imprinting (differential gene expression depending on the sex of the transmitting parent), and anticipation (a trait that progressively increases in severity in successive generations, often via expansion of trinucleotide repeats). The hallmarks of single gene disorders are a very high recurrence risk compared with the general population, with the risk to relatives decreasing by a factor of one half with each degree of relationship, clustering of disease within families that tends to follow predictable mendelian transmission patterns and rare mutant disease-causing allele(s). In contrast, many common diseases with a complex genetic basis do not demonstrate the simple mendelian expectations when multiple genes interact to increase the disease risk, and the recurrence risk to relatives decreases with each degree of relationship more rapidly than a factor of one half (1). Of course, there are also many complex traits that do not demonstrate mendelian expectations that still maintain a decline in recurrence risk of about one half with each degree of relationship. These complexities emphasize the need for efficient study designs and analytic methods for gene discovery.

The goals of this paper are the following: 1) to present the main design and analytic approaches that are currently available for the discovery of disease susceptibility genes; 2) to highlight some of the complex, and perhaps debatable, issues regarding gene discovery; 3) to offer directions for future methodologic research; and 4) to discuss recent genetic technological advances that will influence the future directions of gene discovery.


    DEFINITION AND MEASUREMENT OF CANCER PHENOTYPE
 Top
 Abstract
 Introduction
 Definition and Measurement of...
 Intermediate End Points
 Sampling Considerations
 Analytic Methods
 Linkage Methods
 Artificial Neural Networks and...
 Association Methods
 Pooled Analyses
 Technology Trends
 The Changing Gene Discovery...
 Genetic Analysis Opportunities...
 Summary
 References
 
Some of the clinical clues that there may be an inherited predisposition to cancer are 1) the occurrence of cancer at an unusually young age, relative to the typical age for the type of cancer; 2) multifocal development of cancer in a single organ or bilateral development of cancer in paired organs; 3) development of more than one primary tumor of any type in a single person; 4) family history of cancer of the same type in close relative(s); 5) high rate of cancer in a family; and 6) the co-occurrence of congenital anomalies or birth defects in a person with cancer or within a cancer-prone family (2).

The definition and measurement of the cancer phenotype are critical for gene discovery because misclassification of phenotype status can lead to loss in power for familial aggregation studies as well as for linkage and association studies. The level of certainty of cancer diagnosis will likely differ for different types of studies. For example, it may be most feasible to collect family history of disease by a single family member (informant) or perhaps supplement this information with a second family informant, when performing large studies of the aggregation of disease within families or performing complex segregation analyses. The accuracy of the primary site of cancer based on family history has been reported to be high for first-degree relatives (85% accurate) and lower for second-degree relatives (67% accurate) (3,4), suggesting that more effort is needed to validate cancers among more distant relatives. For example, a single informant within each nuclear family of a pedigree may be a reasonable strategy. Furthermore, the accuracy tends to vary according to the site of the primary cancer, with lower accuracy for cancers in female pelvic organs (e.g., ovary, cervix, and uterus), and in sites where metastasis is common (e.g., liver and bone). These inaccuracies will bias the estimates of genetic parameters if not accounted for. A method to account for measurement errors is to randomly sample a validation sample from the total of all subjects, validate the cancer phenotype status of all subjects in the validation sample, and use the validation sample to estimate the probability of the true disease status, given the disease status reported by family history. This probability distribution can then be used to correct those phenotypes that were not validated, by including this correction probability distribution in the statistical analysis. Similar methods have been proposed to account for measurement error of covariate data (5). Although this approach is straightforward to implement in the design of a study, current genetic analysis software does not implement this type of validation information. For linkage and association studies of either candidate genes or genome-wide scans, it is generally accepted that it is necessary to confirm, by medical records (preferably pathology reports), the cancer diagnoses for all subjects included in the analyses. Given the efforts required to collect blood specimens and perform the genotyping, and the consequences of misclassification on gene discovery, this viewpoint is warranted.

Because genes may be related to multiple outcomes, such as multiple cancer types (i.e., pleiotropy), it is not always clear what the best definition of cancer "phenotype" should be. Most current linkage studies focus on a single cancer type, such as breast cancer or prostate cancer. Although this may be adequate for rare genes of high penetrance that mainly affect a single cancer site, this approach can sacrifice power for common genes with low penetrance that affect multiple cancer sites. Further work, along the lines of characterizing the relationship between genotype and phenotype, is needed to better understand the best methods to discover these types of susceptibility genes. Further clues in the discovery of cancer genes can be gathered by looking for traits that could suggest chromosomal rearrangements, such as mental retardation or birth defects.


    INTERMEDIATE END POINTS
 Top
 Abstract
 Introduction
 Definition and Measurement of...
 Intermediate End Points
 Sampling Considerations
 Analytic Methods
 Linkage Methods
 Artificial Neural Networks and...
 Association Methods
 Pooled Analyses
 Technology Trends
 The Changing Gene Discovery...
 Genetic Analysis Opportunities...
 Summary
 References
 
Other traits that may be intermediate end points for cancer may be worth considering for use in gene discovery studies. The rationale for considering intermediate end points is that, being closest to the responsible genetic components that eventually lead to the disease of interest, the intermediate end points can thus be easier to map in a more efficient way. Of course, a potential flaw with this approach would occur when the intermediate end points are actually more remotely determined by the genetic components responsible for the trait of interest, leading to a loss in efficiency. Because quantitative traits may be useful for linkage analyses and are oftentimes more powerful than binary traits derived from quantitative traits (6), it is tempting to use intermediate traits as surrogates for cancer diagnoses. Some examples of such traits are breast density (a risk factor for breast cancer); counts of nevi (related to malignant melanoma); polyps and precancerous pathologic features, such as colonic aberrant crypt foci (related to colon cancer); and prostate-specific antigen (related to prostate cancer). Statistical methods to use surrogate and auxiliary end points have been developed for clinical trials in which the true end point of interest is the time to failure. In the context of gene discovery, age at cancer onset can be substituted for "time to failure," so that many of the same principles for surrogate end points can be applied to designs and analyses for the discovery of genes that increase the risk for early age at onset of cancer. According to the principle laid out by Prentice (7), a surrogate end point is used to define a trait for which a test of the null hypothesis of no relationship with a genetic marker or a candidate gene is also a valid test of the corresponding null hypothesis based on the true end point. The principal criterion for a surrogate trait is that it fully explains the relationship between the genetic marker and the cancer phenotype. Statistically, this requires that the distribution of the true end point does not depend on the genetic marker, given the value of the surrogate trait. In other words, the relationship between the true end point and the surrogate trait is the same for all marker genotypes. To emphasize this by example, consider a gene (G) related to colon cancer (C), and the use of colonic polyps (P) as a surrogate trait. If the gene causes polyps, and then polyps lead to an increased risk of colon cancer, then P would be a valid surrogate if the risk of colon cancer does not depend on G, once P is measured. This can be illustrated from a mechanistic viewpoint by arrows leading from the cause to effect as illustrated in Fig. 1.Go



View larger version (4K):
[in this window]
[in a new window]
 
Fig. 1. Example of a causal path from a genetic cause (G), to the intermediate trait polyps (P), to the end point colon cancer (C).

 
In contrast, P would not be a completely valid surrogate if the risk for colon cancer depends on knowing both P and G, with an example illustrated by the causal pathway in Fig. 2.Go



View larger version (8K):
[in this window]
[in a new window]
 
Fig. 2. Example of a mechanism by which the gene (G) influences the risk of polyps (P), yet, even after adjustment for the influence of polyps on the risk of colon cancer (C), the gene still has an influence on the risk of colon cancer.

 
Although Prentice's operational criterion for a valid surrogate end point is strict, and the second example above may still offer some information on the association of the genetic marker with cancer phenotype when using polyps as a surrogate, the criterion does offer guidance when evaluating a potential surrogate trait. So, before using a trait as a surrogate for cancer phenotype, there should be evidence that the surrogate is heritable and that it is appropriately related to the outcome of interest. To appropriately use a surrogate trait, it is best to obtain a validation sample on a subset of subjects, so that the validation subset can be used to statistically relate the surrogate outcome with the cancer phenotype (8).


    SAMPLING CONSIDERATIONS
 Top
 Abstract
 Introduction
 Definition and Measurement of...
 Intermediate End Points
 Sampling Considerations
 Analytic Methods
 Linkage Methods
 Artificial Neural Networks and...
 Association Methods
 Pooled Analyses
 Technology Trends
 The Changing Gene Discovery...
 Genetic Analysis Opportunities...
 Summary
 References
 
The power of a linkage study to detect a susceptibility gene depends on the frequency of informative families and the ability to use informative markers. A basic principle to consider when planning a study is that recombinant meioses can be identified only among subjects who are doubly heterozygous for both the disease locus and the marker locus. Because the underlying genotype is only indirectly observed through the phenotype, ambiguity of whether the underlying genotype is doubly heterozygous and ambiguity with regard to the linkage phase of the disease and marker loci both diminish the amount of linkage information. When penetrance is low, when phenocopies are frequent, or when multiple interacting genes are involved, the relationship between phenotype and genotype becomes more ambiguous, leading to increased sample size requirements. Sample size requirements can be reduced by increasing the informativity of the markers, by using markers that are in linkage disequilibrium with the disease locus, and by extending pedigrees to include informative relatives. An informative relative is one who helps to determine whether a person in the pedigree is a double heterozygote as well as the phase of the double heterozygote. This determination suggests that including grandparents who have discordant phenotypes should increase the amount of linkage information (9). Another way to increase linkage information is to restrict analyses to clinical subsets that demonstrate mendelian inheritance. For some cancers, such as breast and colon cancers, sampling pedigrees that demonstrate mendelian inheritance patterns has led to successful gene discovery (e.g., BRCA1 and BRCA2 for breast cancer).

Although it is difficult to give general sampling guidance without considering the specifics of the potential genetic mechanisms underlying a trait, some guiding principles have been provided by Risch (10) and Weeks (11). For a mendelian autosomal recessive disease, nuclear families with unaffected parents and many affected offspring are informative (because both parents are heterozygous for the susceptibility gene), as are inbred families (12). For a rare mendelian autosomal dominant disease, extended multigenerational pedigrees with many affected subjects are especially informative.

Some of the advantages of large pedigrees over nuclear families are that there can be increased power to detect linkage (as outlined above), and genetic homogeneity is likely to be increased. However, there are limitations of large pedigrees that nuclear families or sib-pairs can overcome: Specific mutations in rare large pedigrees are probably different from those that cause the more common form of the disease; for common diseases, large pedigrees do not necessarily guarantee genetic homogeneity, and phenocopies are more likely to occur, which can dilute the power for linkage; rare large, dense pedigrees for common diseases may represent a chance combination of multiple genetic and/or environmental factors, which can diminish the chance of detecting a single gene by linkage methods; multigeneration pedigrees are likely to bias the detectable genetic susceptibility gene to a dominant mode of inheritance; errors in the phenotype of the disease or marker in a few key people in a large pedigree can have profound influences on the linkage results (13,14). When the population-attributable risk of a susceptibility gene is assessed, a large number of families is required to reliably estimate the fraction of families that show linkage of disease to the marker, which is feasible when sampling nuclear families.

In contrast, under oligogenic epistatic models, which may be plausible for some complex traits (1), affected sib pairs (ASPs) are more likely to be informative for linkage than are large pedigrees with multiple affected members (10). This point can be understood as follows: If the genetic mechanism involves several interacting genes of small to modest effects, and large extended pedigrees with many affected subjects are sampled, then parents who have many affected children can have an increased probability of being homozygous for the disease locus, especially if the parents are also affected, causing the parents to be uninformative for linkage. Also, if multiple susceptibility loci are involved, and susceptibility alleles at these loci are fairly common, then a pedigree with many affected subjects has an increased chance of having multiple susceptibility alleles segregating, diminishing the power to detect any one of the loci. Without strong evidence that mendelian inheritance occurs for the disease of interest, a popular option is to sample ASPs, or other affected relative pairs, and use model-free methods of analysis that do not require specification of the penetrance and mode of inheritance. A popular sampling consideration for ASPs is to sample families enriched for a large number of affected sibs and to sample according to the parents' affection status, with preference for unaffected parents. However, the power of these sampling criteria depends on the underlying genetic mechanism; for some genetic models, particularly two-locus epistatic models, sampling ASPs from sparsely affected sibships can offer greater power than sampling from densely affected sibships. That is, multiple affected sibs increase the probability that additional susceptibility alleles are segregating in the family, causing a reduction in the average power of each sib pair. An alternative sampling scheme is to simply sample all ASPs, regardless of the affection status of first-degree relatives, and recognize that the potential loss in power of this strategy, which may be only modest for some genetic models, can be compensated by increasing the sample size. These points, and further heuristics for ASP sampling, are discussed by McCarthy et al. (15). It is noteworthy that the ASP design for linkage remains a controversial issue. Results from the Genetics Analysis Workshop 10 indicate that, for traits with low epistatic and dominance variance, ASP methods can be very inefficient relative to sampling large pedigrees, even in the absence of a well-defined genetic model (16). A subtle point for multilocus diseases is that ASPs can be particularly poor for searching for additional loci, conditional on the first locus being known (17). Further discussion regarding the relative merits of ASPs versus pedigree data is given by Schork and Xu (18).

Both simulation (19) and analytic (9,20,21) methods can be used to determine the power of parametric logarithm of the odds (LOD) scores to detect linkage for a specified genetic mechanism, and methods to determine power for ASP methods have been published for both binary affection status (15,22,23) and quantitative traits (24,25).


    ANALYTIC METHODS
 Top
 Abstract
 Introduction
 Definition and Measurement of...
 Intermediate End Points
 Sampling Considerations
 Analytic Methods
 Linkage Methods
 Artificial Neural Networks and...
 Association Methods
 Pooled Analyses
 Technology Trends
 The Changing Gene Discovery...
 Genetic Analysis Opportunities...
 Summary
 References
 
Powerful methods of genetic resolution depend on linkage or association of traits, or a combination of both, with genetic markers or candidate genes. A gene may be measured directly at the DNA level or at the level of the gene product as well as indirectly by DNA markers (such as simple sequence repeats). It is convenient to refer to any of these measured phenotypes as a genetic marker, which is defined as a genetically determined trait for which the relationship between genotype and phenotype is known. Linkage analysis of complex diseases has proven to be an effective method to identify genes related to disease subtypes that are primarily monogenic, such as BRCA1 and BRCA2 mutations that cause breast cancer. Linkage has also been useful for diseases that are likely caused by multiple genes, such as insulin-dependent diabetes. Although linkage of insulin-dependent diabetes has been obtained with up to 18 different chromosome regions (26-28), recent reports (29,30) have suggested that there may be much fewer regions involved.

Other disease genes will likely be identified with the use of linkage analyses. However, recent advances in molecular genetics and statistical techniques have led to even more powerful tools, using genome-wide association studies (31,32). The potential advantage of association studies over linkage is that any association of a marker with a trait, because of either a pleiotropic effect of the marker (e.g., a candidate gene) or tight linkage with disequilibrium between the marker and trait loci, is much easier to detect than linkage without disequilibrium (32-37). This advantage has historically motivated many association studies of complex diseases with genetic markers and candidate genes, despite the biggest potential pitfall—the choice of control group. Using diseased cases and unrelated control subjects is prone to spurious (i.e., nongenetic) associations because of population stratification, admixture, and migration. For more than 40 years, scientists have recognized that relatives of diseased cases can serve as unbiased control subjects that avoid the potential spurious associations given by unrelated control subjects (38), but only recently have scientists begun to explore the statistical methods needed to design and analyze such family-based association studies. Below we briefly review the main features of linkage methods and association methods, pointing out their strengths and weaknesses to consider where further work is needed.


    LINKAGE METHODS
 Top
 Abstract
 Introduction
 Definition and Measurement of...
 Intermediate End Points
 Sampling Considerations
 Analytic Methods
 Linkage Methods
 Artificial Neural Networks and...
 Association Methods
 Pooled Analyses
 Technology Trends
 The Changing Gene Discovery...
 Genetic Analysis Opportunities...
 Summary
 References
 
Linkage analysis is one of the basic tools used to evaluate the coinheritance within families of genetic markers with a trait and has proven to be highly successful at mapping monogenic traits as well as some diseases that are likely caused by multiple genes. The classic LOD score method (39), based on likelihood principles for an assumed genetic model, has been the most widely used method for linkage analysis and the most successful in terms of the number of genes subsequently cloned. These successes have been primarily for simple mendelian disorders that are inherently much easier to genetically map than are complex disorders. The theoretical foundation of the LOD score method is well understood (40) and implemented with sophisticated algorithms (41-46) in widely available computer software. The basic feature of the LOD score method is an assumed genetic model for both the trait phenotype and the marker phenotype. The genetic model for the trait phenotype requires specification of autosomal or X-linked trait loci, the number of loci, the number of alleles at each locus and their frequencies, the joint genotype probabilities, and the genotype-specific penetrances, which can vary with a known risk factor (e.g., liability classes). Similar genetic parameters are required for the marker phenotype, but these parameters are most often easier to specify. The difficulty in using the LOD score method for complex traits is specification of the genetic model for the trait phenotype. Although errors in the model can lead to inconsistent parameter estimates and lack of power, incorrect models do not lead to an increased false-positive rate unless the parameters for both the trait and the marker loci are wrong (47,48).

Because of the difficulty in specifying the correct genetic model, allele-sharing methods that do not require this specification ("model-free" methods) have become popular for studying complex traits (1,22,49-52). Sharing of alleles between relatives can be considered for either the number of alleles shared identical by descent (IBD) or identical by state (IBS). Alleles shared IBS may not be of the same ancestral origin, and so this information is confounded between linkage and the population frequency of the marker alleles. Because linkage requires information on inheritance, IBD information offers greater power than does IBS information. Allele-sharing methods are often based on affected relatives, usually ASPs, which simplify data collection. This approach offers a big advantage for complex traits, because large sample sizes are often needed to detect the modest effects of susceptibility genes. Note, however, that collection of unaffected family members can substantially increase the power of ASP methods by increasing the IBD inheritance information (50). There has been immense growth in the number of statistical methods for ASP designs, ranging from use of complete family data to infer genotypes to extract IBD information (53); to consideration of the optimal weights used to combine sibships with varying numbers of affected members (54); to inclusion of different types of affected-relative pairs (55); to improved maximum likelihood IBD sharing estimation and model-free maximum LOD scores (MLS) (22); to interval mapping and exclusion of chromosomal regions based on recurrence risk ratios for complex diseases (56); to improved likelihood methods by restriction to genetically plausible IBD sharing ("possible-triangle" constraints) (57); to two-locus disease models that simultaneously evaluate two marker loci, leading to improved power to detect oligogenic disease mechanisms (58-60); to complete multipoint extraction of IBD information for small pedigrees (46) and improved nonparametric scoring functions (61). The GENEHUNTER software (Whitehead Institute for Biomedical Research, Boston, MA) (46) incorporates these latter two features, but the proposed linkage test tends to be conservative when the IBD information is incomplete, which often occurs. However, a recent extension to GENEHUNTER, based on inclusion of only one additional parameter for the scoring function (40), can dramatically improve the accuracy of the linkage test (62). A recent comparison of 23 different statistics for model-free linkage analysis of nuclear family data indicated that the ASP "mean" test (50), which compares the observed number of alleles shared IBD with its null expected value, performs well in terms of type-I error and power for a variety of genetic effects; extension of this test to include unaffected sibs increases power (50). Of interest, this mean test has been shown to be the uniformly most powerful test under a recessive mode of inheritance and is locally optimal otherwise (63). Extension of the idea that model-free score tests are optimal for a genetic model (40) allows a general framework in which powerful statistical tests can be developed.

The above methods are generally based on IBD information. Alternatively, the affected-pedigree-member (APM) method of linkage analysis is a nonparametric statistic that measures the similarity of marker alleles shared IBS among affected relatives (49). This approach is computationally efficient (64) and has been extended to include unaffected relatives (65), X-linkage (66), multipoint analyses (67), and linkage heterogeneity (68). The APM method has been advocated as an alternative to IBD methods because only affected members are required, which is convenient for situations in which it is difficult to determine IBD information, such as for late-onset disease (69). But there is a sacrifice in power compared with methods based on IBD status (70-72). One of the greatest drawbacks of the APM method is that it is not robust (i.e., inflated false-positive rates) to misspecified marker allele frequencies (73). However, a permutational extension of the APM method, called the SimIBD statistic (72), uses IBD information when it is available and IBS when it is not. The SimIBD statistic can be markedly less sensitive to misspecification of marker allele frequencies than APM, because the statistic is computed conditionally on the marker genotypes of the nonaffected relatives.

Another nonparametric linkage method, the weighted rank pairwise correlation (WRPC) statistic (74-76), is also based on IBS information. But, unlike the APM method, the WRPC statistic allows for not only binary traits but also quantitative and age-dependent traits as well as inclusion of nongenetic covariates.

Finding genes in the presence of strong environmental risk factors may be difficult, so control (at least in the statistical sense) of environmental risk factors may be crucial. Although parametric linkage analyses can allow for environmental strata, by allowing the penetrance of the susceptibility gene to vary across environmental strata (i.e., liability classes), it is difficult to implement liability classes in practice because of unknown penetrance values. Further statistical work is needed to appropriately account for measured environmental factors. Because some environmental risk factors may have large measurement errors (such as diet surveys), it may be worthwhile to obtain a validation sample on a subset of subjects so that the validation subset can be used to account for the measurement errors.

When considering covariates, researchers should also consider the best methods of design or analysis to account for gene-environment interaction when searching for genes. Finding genes that interact with environmental factors will be a great challenge. Virtually all complex traits are known, or suspected, to be influenced by various environmental risk factors. And many of them are believed to be influenced by both genetic and environmental factors and their interactions. However, very little is known about how genetic and environment factors interact and about how this interaction affects current mapping strategies. In fact, there is a lack of a precise definition of gene-environment interaction and a lack of appreciation of the fact that there is a distinction between statistical interactions and biologic interactions.

Researchers now know that, in the presence of gene-environment interaction, the widely used measure of familial aggregation, the recurrence risk ratio, can be a poor measure for the strength of genetic effects of the underlying genetic component, simply because the genetic and environmental factors are now confounded. Other measures, such as heritability and twin concordance rates, suffer the same problem. In addition, a single-locus model with environment factors can display surprisingly complex behaviors that are otherwise unexplainable under a pure, single-locus model without environment factors. Moreover, at least for sib-pair designs, the presence of gene-environment interaction can markedly affect the statistical power to detect either linkage or association.

Despite the many advances in the development of nonparametric statistics for linkage, debate still exists regarding use of the classic LOD score method versus model-free methods for complex traits (77-80). By model free, we mean that the analytic method does not require specification of the frequency of the susceptibility allele or the mode of inheritance and penetrance. Technically, even the model-free methods require some basic genetic assumptions, such as mendelian transmission (as opposed to meiotic drive) and no interference. However, one should be aware that the power of model-free methods depends on the underlying genetic mechanisms, and so the power of model-free methods can be improved if the genetic basis of the disease being studied matches the assumptions of the model (40). Analytic results indicate that the LOD score method is robust to misspecification of either the genetic parameters related to the trait, or those related to the markers, but not misspecification of both sets of parameters (47). If ascertainment is through either the marker or trait phenotypes but not both, then the LOD score does not depend on the mode of ascertainment. For complex diseases, the genetic parameters related to the trait are rarely known, so it is critical to have accurate marker allele frequencies to avoid an inflated chance of a false-positive finding. But correct allele frequencies are not sufficient if the analyzed sample represents an admixed or stratified population, because information on the population structure must be included to fully describe the distribution of the marker genotypes. Unfortunately, because this population information is rarely known, it is often ignored in linkage analyses. Ignoring population stratification in linkage analyses can lead to an inflated false-positive rate, as demonstrated for ASP analyses (56).

When the genetic parameters are misspecified, it is not surprising that there is some loss in power of the LOD score method (35). The amount of power loss depends critically on the mode of inheritance and, to a lesser extent, on the allele frequencies, on the penetrance values, and on the size of the pedigree (40). Simulations suggest that, if two loci epistatically influence disease, accounting for the second locus can increase power (81). If, however, there is locus heterogeneity, and a linked locus accounts for less than 25% of the diseased families, then the power of both the LOD score admixture method, which allows for a fraction of unlinked families (82,83), and the ASP method (84) is so weak that the sample size requirements are prohibitive.


    ARTIFICIAL NEURAL NETWORKS AND LINKAGE
 Top
 Abstract
 Introduction
 Definition and Measurement of...
 Intermediate End Points
 Sampling Considerations
 Analytic Methods
 Linkage Methods
 Artificial Neural Networks and...
 Association Methods
 Pooled Analyses
 Technology Trends
 The Changing Gene Discovery...
 Genetic Analysis Opportunities...
 Summary
 References
 
For complex traits, it is generally believed that multiple, possibly interacting loci confer susceptibility rather than a single-disease locus. Therefore, pattern recognition techniques have been developed with the aim to identify sets of marker loci that jointly show deviations from random allele sharing. Some of the markers may act only as modifiers for others. For example, allele sharing may be increased at one marker only when allele sharing is elevated at another marker. If there are interactions among disease loci, pattern recognition techniques may be able to recognize these interactions and, therefore, be more powerful than conventional ASP methods.

Pattern recognition techniques are often used to predict an outcome (e.g., disease diagnosis) on the basis of observed predictor variables. This prediction is achieved by monitoring observations with known predictor variables and outcomes. Linear discriminant analysis may then be applied to determine the set of predictor variables that best predict outcome. For allele-sharing data in ASPs, a particular type of pattern recognition technique called artificial neural network (ANN) can be employed, which is able to carry out nonlinear discriminant analysis (85). The network architecture is a feed-forward ANN with three neuronal layers (input, hidden, and output).

ANNs were originally developed as simple models for the way nerve cells transmit impulses in the brain. In ANNs, neurons are modeled as nodes that are arranged in layers. A given node potentially receives impulses from all nodes in the layer preceding it and potentially sends an impulse ("fires") to each node in the layer following it. Whether a node fires is determined by the sum of all impulses received and some threshold function. On the basis of multiple observations at the input nodes and predetermined values of the output nodes, ANNs "learn" how best to connect input and output nodes via pathways through hidden nodes. The "strengths" of connections among pairs of nodes in adjacent layers are called weights, and iteratively estimating these weights is called training a network. A common training (learning) algorithm is the back-propagation (a simple downhill) method. Initially, weights have some assumed starting values, which are modified and optimized in the course of learning. These weights are analogous to the coefficients associated with predictor variables in discriminant or multiple regression analysis. ANNs may be seen to carry out simple calculations in a highly parallel manner, which enables them to do tasks for which other methods are unsatisfactory.

Allele-sharing observations have a very simple structure: For each parent, at any marker, an ASP shows sharing (x = 1) or no sharing (x = -1). Thus, for all markers studied on the genome, observations form an array (a matrix) of x values with m rows (markers) and n columns (parents). The ANN has m input nodes, a hidden layer with a smaller number of nodes, and an output layer with two nodes, O1 and O2. The network is "trained" (estimates the weights) with 1) observed allele-sharing data and 2) data randomly generated on the computer as input data. The rationale is that ASP data contain "signal" (disease loci) and "noise" (random allele sharing at markers unlinked with disease loci), while randomly generated data contain only "noise." Correspondingly, training of the ASP data is done with output nodes set to O1 = O2 = 1, while for randomly generated data they are set to O1 = 0 and O2 = 1. Then one differentiates between weights to the two output nodes, O1 - O2 (signal + noise - noise = signal) and computes a compound contribution value, Ci, i = 1 . . . m, for each marker locus.

Application of these methods to published allele-sharing data for a genome screen of diabetes genes (27) showed that the ANN recognizes all markers found by conventional methods. In addition, it points to markers not implicated in the original study. In particular, one of the genes (IDDM4) implicated in diabetes was seen in the original genome screen only after sib pairs were subdivided into two groups—those that showed allele sharing at human leukocyte antigen (HLA) for both parents and those that did not, which indicates an interaction between IDDM4 and a diabetes gene (IDDM1) at the HLA region. In contrast, the neural network recognizes IDDM4 without subdividing data.

This neural network approach to gene mapping appears promising for various complex traits, such as cancer. Data types other than ASPs, in which recognizing patterns among observations is important, may also be suitable for analysis by neural nets. For example, many cancer tumors are homozygous for a marker, while the individual is heterozygous. This loss of heterozygosity (LOH) is a well-known method for identifying cancer genes. In a genomic survey for LOH, neural nets may be able to pick out patterns of interactions among cancer-causing genes that are difficult to recognize otherwise.


    ASSOCIATION METHODS
 Top
 Abstract
 Introduction
 Definition and Measurement of...
 Intermediate End Points
 Sampling Considerations
 Analytic Methods
 Linkage Methods
 Artificial Neural Networks and...
 Association Methods
 Pooled Analyses
 Technology Trends
 The Changing Gene Discovery...
 Genetic Analysis Opportunities...
 Summary
 References
 
Association studies have often been an essential step after linkage analyses have indicated linked chromosomal regions because the resolution of linkage studies is, at best, approximately 1-2 centimorgans for simple mendelian diseases and feasible sample sizes (86) and is likely to be less refined for complex traits. In contrast, the mapping resolution based on linkage disequilibrium (LD) can be as fine as 50-75 kilobases (87). For simple mendelian diseases, there have been numerous statistical methods proposed for LD fine-scale mapping (88-95). However, for complex diseases, family-based association studies may be more desirable than strictly linkage-based methods because they can have greater power, as shown for ASPs (31). With improved genetic marker maps and maps of expressed sequence tags (96,97), it will be feasible to perform genome-wide association studies (31). Besides family-based studies, other study designs for associations are case-control, cohort, cross-sectional, and admixed populations (98,99). But family-based designs are least prone to biased associations caused by the genetic structure of the population.

Although the causes of association between genetic markers and traits can be difficult to determine by association studies, it is important to design studies and analytic methods that distinguish between genetic and nongenetic causes of association. Two genetic causes of association are 1) the pleiotropic effects of the genetic marker (as in a candidate gene study) and 2) LD of the trait and marker loci. Note that LD occurs when haplotype combinations of alleles at different loci do not occur randomly. This LD can occur when most diseased subjects inherit from a common ancestor a segment of a chromosome containing the disease allele and marker allele (founder effect), and LD is most easily detected in a genetically homogeneous population. In this case, the main factor influencing LD is recombination between the two loci. If the recombination fraction is small, as expected for dense marker maps, the amount of LD will remain large even after many generations, making association studies based on LD a potentially powerful strategy for complex traits.

Other causes of association between a trait and marker allele(s) that are not due to linkage, yet which can mislead interpretations, are joint selection for both marker and trait locus alleles, small population variation (i.e., random genetic drift), and the structure of the population, which may include inbreeding, admixture, or stratification of different ethnic groups. If a population is composed of a recent admixture of different ethnic groups that have different frequencies of marker alleles, then any trait more frequent in an ethnic group will be positively associated with any marker allele that is more frequent in that group, even if these loci are not linked. However, because disequilibrium as a result of admixture, selection, or drift between unlinked loci decays very rapidly over generations, but slowly for linked loci (100), mapping genes via association remains appealing, especially when it is possible to control for nongenetic associations by using family-based association studies (101).

Although association studies have become a popular and effective method of fine-scale gene localization for simple diseases, the complexities of common diseases will challenge association methods. Diallelic markers and multiallelic markers have been compared in terms of sample sizes required to detect LD in a case-control study for diseases with mendelian inheritance, and these results have been extrapolated to discuss the feasibility of LD screening in more complex situations. Results showed that multiallelic markers always have more power to detect LD than diallelic markers (under otherwise equivalent conditions) and that the ratio of the number of diallelic to multiallelic markers needed for equivalent power increases with mutation age and complexity of mode of inheritance. Increasing complexity has particularly negative effects on power to detect LD, especially in the context of a diallelic genome screen. Such complexity can be introduced by dominance, allelic or locus heterogeneity, or presence of sporadic cases. Equivalent power to that achieved by a multiallelic screen can theoretically be achieved by using a more dense diallelic screen, but mapping panels of the necessary resolution are not currently available and may be difficult to achieve. Genome screening with the use of LD testing may therefore only be feasible for young (<20 generations), rare, monophyletic mendelian diseases, such as may be found in rapidly growing genetic isolates (102). Similar conclusions were drawn for transmission disequilibrium testing with the use of diseased cases and their parents (103).

Because of the difficulty in defining an appropriate control group for association studies in heterogeneous populations, Falk and Rubinstein (104) proposed to measure the genetic marker on both the diseased cases and their parents (which we refer to as parental control subjects) to compare the frequencies of those alleles that were transmitted from parents to children versus those that were not transmitted. For this type of analysis, only heterozygous parents are used (homozygous parents are not informative), and the genotype of each parent can be considered a matched pair of alleles, one transmitted and the other not; the McNemar statistic for matched pairs, which does not require independence of parental alleles, leads to a valid statistical test. This is also called the transmission/disequilibrium test, or TDT (105,106). An alternative approach, which uses all parental alleles, is to ignore the matching in order to compare the frequency of an allele among the 2n transmitted alleles versus the 2n nontransmitted alleles, where n is the number of cases [also called the haplotype-based haplotype relative risk statistic, HHRR (107)]. However, this method requires independence of parental alleles in the population, which is not true for a stratified population (105).

The statistical properties of various statistics and strategies for TDT-type analyses for binary phenotypes have been investigated (107-122) with considerations of statistics for multiple marker alleles (106,123-127), exact tests (128), designs for admixed populations (103,129), design issues for incomplete LD (130), unbiased tests for association in the presence of linkage when using multiple affected sibs (131), likelihood methods useful to distinguish positively and negatively associated alleles (132), modeling gene-environment interaction (133), and analyzing marker haplotypes with two loci (134). Also, a general framework based on a conditional likelihood method has been presented (111,135,136), which has allowed development of powerful omnibus score statistics (137). This framework has also been used to compute maximum likelihood estimates of allelic effects and to assess interactions between marker genotypes and environmental covariates (135,138-140).

Because data on parents can be difficult to obtain, especially for late-onset diseases, sib-control subjects offer a valid alternative (141-148), although sib-control subjects have less power to assess transmission disequilibrium than do parental marker genotype data (147,148). It is critical to recognize that designs using parents or sibs as controls are sensitive only to associations caused by both linkage and LD and hence avoid the biases due to nongenetic causes of association.

Complications of associations caused by LD are that different mutations causing the same trait can arise on chromosomes that bear different marker alleles, or mutations at the marker locus can occur. These factors can cause the associated marker allele to differ across different populations and can decrease LD in a population mixed with different mutations. To study this effect in detail requires studies of haplotypes composed of multiple marker loci to determine if particular haplotypes are associated with disease. The statistical methods used to evaluate associations with marker alleles can also be used to evaluate associations with haplotypes created from multiple marker loci. Haplotypes can be inferred either by family studies or statistically by using measures of population LD to predict the most likely haplotypes (149) or a combination of both (150).

With the availability of many genetic markers, one of the most challenging statistical issues is the choice of the level of statistical significance to maximize power yet minimize the chance of false positives. When testing many markers, the prior probability that any one is associated with disease is often so low that one needs to be very conservative to avoid false-positive associations. To account for multiple comparisons with many alleles (or many haplotypes), the Bonferonni correction is often used, although the power of this method is often weak. Two-stage designs may prove to be useful (111) as well as P-value plots (151) and empirical Bayes shrinkage estimates (152,153).

Association studies using more extensive pedigree data offer yet another useful design because members of the same pedigree are likely to have the same genetic etiology, and extended pedigrees give more information about the genetic mechanisms underlying the trait than does a sample of unrelated persons. An important feature of pedigree data is that the statistical dependence of members in the same pedigree needs to be incorporated into analyses to obtain unbiased association parameters (154) and accurate estimates of the variances of these parameter estimates. For a large number of independent pedigrees, the robust method of generalized estimating equations has been used (155). However, for one or a few pedigrees, the asymptotic results for generalized estimating equations are not likely to hold, so it is necessary to use a statistical model that incorporates familial correlations. For continuous traits, methods based on the multivariate normal distribution with covariance matrices determined by the genetic relationships among pedigree members can be used (156,157). Note, however, that this latter method and methods based on generalized estimating equations measure the marginal association of a genetic marker with a trait, not the association due to transmission/disequilibrium. Methods based on marginal associations are sensitive to population stratification and other nongenetic associations.

Alternative methods based on full likelihoods, which combine association, segregation, and linkage, have been proposed for nuclear families (158,159); LD was incorporated by introducing coupling frequencies—the probabilities for a gamete to have the disease allele, given it carries a particular marker allele. One advantage of these likelihood models is that ill-defined ascertainment methods can be accounted for by conditioning on the affection status of everyone in the pedigree. Similar methods, called the MOD score, have been proposed for linkage analyses, by maximizing the LOD score over the genetic parameters (e.g., allele frequencies, penetrance, or recombination fraction). This method is a valid way to correct for complex and ill-defined ascertainment criteria (34,35,160,161) and is equivalent to maximizing the conditional probability of the marker genotypes, given the disease status, although there is some loss in efficiency. However, the model-based methods can be restrictive because the correct genetic model must be specified, and there is a large number of parameters to be estimated (e.g., allele frequencies, coupling frequencies, etc.). To avoid the long computer processing time required to fit these likelihood models and to avoid convergence problems when estimating many parameters, the Marker Association Segregation Chi-squares (MASC) method has been proposed (162), which is based on estimating the genetic parameters of coupling frequencies and genotype-specific penetrances by minimizing a sum of independent Pearson chi-squares. It is worthwhile to note that 1) the likelihood methods are not sensitive to the absolute penetrance values but rather to their relative values (162); 2) for diseases with low penetrance, the marker information from unaffected sibs is minimal [in fact, recent results regarding inclusion of unaffected sibs in TDT-like tests for complex disease indicate that this can do more harm than good, because unaffected sibs can add excessive variation to the marker genotype distribution (163)]; and 3) phenotype information from parents may be of importance (164). Also, it has been shown that extension of the MASC method to account for two candidate genes involved in complex diseases can lead to increased power, instead of studying each separately (111), much like what has been observed in genetic linkage analyses (81).


    POOLED ANALYSES
 Top
 Abstract
 Introduction
 Definition and Measurement of...
 Intermediate End Points
 Sampling Considerations
 Analytic Methods
 Linkage Methods
 Artificial Neural Networks and...
 Association Methods
 Pooled Analyses
 Technology Trends
 The Changing Gene Discovery...
 Genetic Analysis Opportunities...
 Summary
 References
 
The detection of susceptibility genes for complex traits often requires large sample sizes that require collaboration of multiple research organizations. Also, multiple replication studies for linkage analyses of complex traits often produce different results. For these reasons, it is reasonable to predict that the discovery of genes underlying complex traits will often require pooling of data from multiple sources. Complications involved in performing these types of meta-analyses involve enforcing consistent diagnostic criteria, allowing for varying levels of diagnosis confirmation, and accounting for ethnic backgrounds and marker allele frequencies that vary across the different studies. Another factor that can affect pooled analyses is a difference in ascertainment criteria, which can cause the amount of linkage information to dramatically differ across different studies (15). A popular meta-analytic method is to assume that the main parameters of interest are random variables that vary across the different studies. For the Haseman and Elston sib-pair linkage method, the regression parameter (for the regression of the squared trait difference on the estimated proportion of alleles IBD) can be assumed to depend on a random effect, so that the combined linkage result over all studies is a weighted average, giving more stable results than any single study can produce (165). Another parameter of interest for parametric LOD score analyses is the fraction of linked families (the {alpha} parameter for the admixture test for linkage). However, this parameter should be interpreted with caution. First, the precision of the estimated {alpha} is typically very poor, resulting in a very wide confidence interval for {alpha}. Second, the estimated value of {alpha} can be biased when the assumed penetrance model is wrong. This is because, when the penetrance model is wrong, the recombination fraction is overestimated and because {alpha} and the recombination fraction are highly correlated, bias in the recombination fraction will cause bias in {alpha}. Robust methods for estimating {alpha} are needed for pooled analyses.


    TECHNOLOGY TRENDS
 Top
 Abstract
 Introduction
 Definition and Measurement of...
 Intermediate End Points
 Sampling Considerations
 Analytic Methods
 Linkage Methods
 Artificial Neural Networks and...
 Association Methods
 Pooled Analyses
 Technology Trends
 The Changing Gene Discovery...
 Genetic Analysis Opportunities...
 Summary
 References
 
The methodologic approaches applied to gene discovery are dictated, in part, by what is technically feasible within the laboratory. This constraint is currently in dramatic flux. Developing technologies may permit the realization of the promise of the routine application of genomic approaches to genetic analysis.

The genomic approach assesses the contribution of any gene or gene product, from the complete catalogue of genes and gene products, in defining a particular biologic state. Implicit in this approach is prior knowledge of the complete genome, or at least the genes that compose the genome. For this approach to be feasible, substantial information infrastructure and cost-effective comprehensive laboratory technologies are required.

Key to the success of these technologies are advances that have enhanced capabilities through increased miniaturization, parallelism, and automation. Miniaturization is required to reduce the laboratory reagent costs and sample volumes required to perform the large number of assays for comprehensive analysis. Miniaturization also offers formats that enable new applications. Parallelism is required to reduce the time necessary to complete the characterizations. For example, it is conservatively estimated that the human genome may contain 70 000 genes. To characterize one individual's genome in a year requires one to test 192 genes each day! Automation is required to address the large number of tests that must be performed, reduce costs, and improve quality control and uniformity of outcomes. On the basis of cost alone, it would not be feasible to perform large-scale genetic analysis without additional automation.

Technologies that are being proposed for genome-wide DNA analysis include serial sequencing, hybridization, and enzymatic approaches. In particular, hybridization array approaches have been developed in many formats to enable a wide range of genome-wide DNA analyses (Table 1Go) (166). Different technologies use very different types of capture sequences. Strategies for sequence capture include oligonucleotides (of varying length for different approaches) (167), complementary DNAs (cDNAs) (either full length or fragments) (168), and genomic DNA fragments. The method by which arrays are generated represents one of the more significant differences between technologies. The method associated with DNA "chips" uses the same photolithography technology used to make computer chips (the origin of the term "DNA chip") to build oligonucleotide detection sequences on a detection array (169). An alternative strategy for building oligonucleotide sequences on glass slides uses technology similar to that used in ink jet printers. Other technologies simply "spot" oligos, cDNAs, or genomic DNA fragments at high density. Technologies also differ according to how sequences are discriminated and the capability to resolve single-base sequence variation. Different strategies include hybridization specificity, enzymatic specificity, and capture of solution-phase products. Fig. 3Go shows an example of the microarray technology. In this example, hybridization to a cDNA microarray compares messenger RNA (mRNA) from one lymphoma specimen to a control mRNA pool from several lymphoma cell lines. Red spots are genes more highly expressed in the lymphoma specimen, and green spots are genes that are underexpressed in the lymphoma. The intensity of the spots can be used to indicate the magnitude of expression.


View this table:
[in this window]
[in a new window]
 
Table 1. Transcript expression analysis approaches and technology providers*

 


View larger version (207K):
[in this window]
[in a new window]
 
Fig. 3. Sample microarray image that represents the comparison of messenger RNA (mRNA) from one lymphoma specimen to a control mRNA pool from several lymphoma cell lines. Red spots are genes more highly expressed in the lymphoma specimen, and green spots are genes that are underexpressed in the lymphoma. [Image provided courtesy of A. Alizadeh and L. M. Staudt (Metabolism Branch, National Cancer Institute) and M. Eisen and P. Brown (Stanford University School of Medicine)].

 
The developing portfolio of genome-wide analysis technologies has many potential applications. Hybridization array technology has provided many examples of proof of concept experiments in genome-wide analysis. At the DNA level, arrays have been used for mutation detection within genes of interest, gene amplification or deletion detection, resequencing, and genotyping (170-176). The array technology also has been demonstrated to be of utility in determining gene expression profiles for different cell types as well as in analysis of protein interactions and protein-DNA interactions. A similar breadth of applications is evolving for strategies using serial sequence and enzymatic approaches.

It is worth noting that few, if any, of these promising technologies have moved from proof-of-concept experiments conducted by the developer to routine application in other laboratories. Thus, the potential of comprehensive molecular analysis tools largely remains a promise. There are several reasons for this. First, much of the technology is insufficiently robust. In its current form, it is not suitable for day-to-day production application. The results, even in the hands of the developer, vary from experiment to experiment. Work is still required to "harden" the technology so that consistent results are routinely obtained. Another major obstacle to deployment of this new technology is its cost. Most of the approaches require substantial instrumentation and costly consumables. The cost of instrumentation for this technology is beyond the means of most academic investigators. Many of the reagents used in these assays are unique and must be custom generated. Therefore, the cost of development and manufacture must be recovered solely from the consumer, such as the research community (see below). A corollary of the cost obstacle is accessibility. Cost represents only one barrier to accessibility. Additional barriers include intellectual property issues, concern over market opportunities beyond the research community, and technology transfer logistics.

The rate at which this new technology is production "hardened" is determined only in a small part by the research community. The research community represents a very small market for many manufacturers and only recently have systems been developed that primarily target the basic and clinical research communities. To be cost-effective, instruments and reagents need to be produced on a very large scale and sold at prices that recoup development costs. Therefore, market forces beyond the needs of basic research drive the development and marketing of technology. Much of the current targeting of these efforts is to higher-end commercial users that require genomic information for the development of new therapeutics. Other potentially large markets include the areas of diagnostics, early detection (screening), and prevention; these areas remain underserved while awaiting technology hardening, reductions in cost, demonstration of utility, and an expanded base of information with respect to the function of expression products in the genome.

One of the greatest obstacles facing these new technologies is the conversion of data to information. Each of the new approaches promises to overwhelm the current information infrastructure. The challenge is not simply managing the tsunami of data that will be generated by these approaches. New analytic strategies will be required to interpret it and new models will need to be generated to allow investigators to manipulate the data. Each of the new approaches promises to overwhelm the current information infrastructure. The collection of interrelated challenges is illustrated in Fig. 4.Go With improved information analysis and modeling strategies, the promise exists for in silico modeling of biologic systems that can, at a minimum, generate hypotheses for testing as well as potentially predict the behavior of biologic systems.



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 4. Interrelationship of informatics needs associated with high-throughput genetic analysis. Only through the seamless integration of databases, models, and analysis tools will it be possible to extract information from the data generated by such large-scale analysis infrastructures.

 

    THE CHANGING GENE DISCOVERY PARADIGM
 Top
 Abstract
 Introduction
 Definition and Measurement of...
 Intermediate End Points
 Sampling Considerations
 Analytic Methods
 Linkage Methods
 Artificial Neural Networks and...
 Association Methods
 Pooled Analyses
 Technology Trends
 The Changing Gene Discovery...
 Genetic Analysis Opportunities...
 Summary
 References
 
The introduction of genome-wide genetic analysis technologies has the capacity to dramatically alter the gene discovery paradigm. Gene discovery is currently dominated by the positional cloning paradigm. Its success has driven the generation of biologic reagents and analytic approaches. Genetic analysis is applied first to circumscribe the physical region of the genome that must be examined. Further genetic analysis is then used once genes within the region have been identified.

However, one can argue that the current technologies, such as DNA chips and single nucleotide polymorphisms (SNPs) only make genotyping faster, cheaper, and more reliable, yet they do not seem to have changed fundamentally our basic concept of gene mapping, i.e., localization of the genes based on meiotic events. It is likely that the use of genome-wide analysis technology will alter this paradigm in the future. Gene discovery is currently dominated by the positional cloning paradigm, but an alternative approach starts with the genome's complement of genes (Fig. 5)Go.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 5. Illustration of how the analytic gene discovery paradigms may change with the introduction of whole-genome technologies. The upper panel shows the steps in the current positional cloning paradigm. The lower panel indicates how discovery might change with the capacity to analyze all of the genes within the genome.

 
There is much enthusiasm for the genome-wide association studies (177). In the simplest version of these studies, linkage is simply replaced by association as the metric for entering the physical mapping domain. Given that the physical regions that will be identified in these studies are smaller, the follow-up gene studies are likely to be much more manageable. It has been suggested that such studies can be performed with marker densities ranging from 3000 to 30 000 loci, depending on the evolutionary history of the population being studied. Developing microarray technology shows promise for permitting the routine characterization of this number of loci. However, it is worth noting that such high-resolution maps do not yet exist, and, even if they did exist, the hardened technologies for their routine characterization do not yet exist.

An alternative approach starts with the genome's complement of genes. Assume that each gene has multiple polymorphic "tags" in or near it. These tags could then be tested for association with disease status using a variety of study designs (see above). Those genes showing association are then candidates for more detailed functional and genetic analysis. The polymorphic marker density for this approach is within the same order of magnitude required for anonymous genome-wide disequilibrium mapping. Again, neither the reagents nor the technology exists today to permit this type of investigation. Of interest, the entire complement of genes within the human genome has yet to be definitively determined. Efforts such as the National Cancer Institute's Cancer Genome Anatomy Project (178) should address this critical limitation in the very near term.

It is possible to explore the potential of this approach with today's technology and resources. One approach is to use genetic analysis to assess the role of genes in well-described pathways in the development of disease. This approach merges gene mapping and "traditional" candidate locus studies by including as candidates all the members of a pathway. For complex traits, it is commonly not possible to construct a definitive set of candidate genes for study. This is because which portions of given pathways and which gene family members operate in different tissues following different exogenous or endogenous exposures is commonly unknown. When cost-effective genome technologies are available, it will be possible to routinely use genetic analysis to answer these questions.

With the use of existing technology, it has been possible to explore this approach for aflatoxin B1 (AFB1) exposure and primary hepatocellular carcinoma (HCC) (179). AFB1 is activated by the cytochrome P450 systems and detoxified by the glutathione S-transferase and epoxide hydrolase gene families. It is not a priori clear which members of these families should be most important in their respective roles. By tagging individual members with polymorphic variants and examining them in a case-control study (180,181), it is possible to begin to assess which members may be important in modulating HCC risk through this pathway, as illustrated in Fig. 6.Go



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 6. Results of evaluating multiple genes in the aflatoxin B1 (AFB1) detoxification pathways. Multiple members of the glutathione S-transferase (GST) family and epoxide hydrolase families show allelic association with hepatocellular carcinoma (HCC). "Yes" in association column indicates a significant association; "ns" means no significant association observed. Poly = polymorphic and SG = glutathione conjugate.

 

    GENETIC ANALYSIS OPPORTUNITIES NOVEL TO CANCER
 Top
 Abstract
 Introduction
 Definition and Measurement of...
 Intermediate End Points
 Sampling Considerations
 Analytic Methods
 Linkage Methods
 Artificial Neural Networks and...
 Association Methods
 Pooled Analyses
 Technology Trends
 The Changing Gene Discovery...
 Genetic Analysis Opportunities...
 Summary
 References
 
The genetic analysis of cancer has available to it a resource not accessible to other complex traits—the tumor. Cancer is increasingly recognized as a disease of genes. It, therefore, stands to reason that information related to the inherited genetic basis of cancer may be obtained from genetic analysis of tumors. The tumor suppressor gene paradigm represents a historic example of the utility of joining somatic and germline genetic analysis (182). Analytic methods through which somatic allele loss associated with tumor suppressor gene etiologies of cancer can be incorporated into linkage analysis have been previously described (183,184).

Allele loss to localize tumor suppressor genes represents only one possibility for the use of somatic information in gene discovery. Comprehensive analysis of tumors has the potential of identifying molecular heterogeneity that may not be easily detectable by histopathology. This heterogeneity could be important in describing different etiologic pathways, both at a genetic level and at an environmental level. Different somatic patterns may provide clues to what exposures were important to cause a cancer and, therefore, clues into what genes may have mediated the exposures' effects.

Routine comprehensive somatic genetic characterization awaits the same reagent and technology development limiting the analyses described above. However, the potential utility of such information can be assessed in a limited way using today's technology and reagents. A collection of 32 HCC normal pairs was examined by the use of a collection of 391 genome-wide simple tandem repeat polymorphism markers. Allele loss patterns were observed to be complex. To assess whether underlying patterns existed within the data, evolutionary tree building algorithms (PHYLIP v3.5, provided courtesy of J. Felsenstein, http://evolution.genetics.washington.edu/phylid.html) were used to examine the data. Markers informative in more than 10 tumors were analyzed by MIX and CONSENSE. CONSENSE identified a tree consistent at the first set of nodes (found in 100% of replicates). The branches of the consensus tree (Fig. 7Go) had different chromosome-specific loss patterns, different rates of genome-wide allele loss rates, and different candidate locus risk allele distributions. This finding suggests that inclusion of such information may be important in gene discovery exercises.



View larger version (32K):
[in this window]
[in a new window]
 
Fig. 7. Tree constructed using allele loss data showing clusters of related tumors. Loci indicated at branch points indicate genomic locations showing allele loss in all tumors within that cluster. Resampling analysis indicates that 100% of replicates show this pattern of loss. HCC = hepatocellular carcinoma.

 

    SUMMARY
 Top
 Abstract
 Introduction
 Definition and Measurement of...
 Intermediate End Points
 Sampling Considerations
 Analytic Methods
 Linkage Methods
 Artificial Neural Networks and...
 Association Methods
 Pooled Analyses
 Technology Trends
 The Changing Gene Discovery...
 Genetic Analysis Opportunities...
 Summary
 References
 
In summary, we have reviewed a broad range of design and analytic issues related to the discovery of cancer susceptibility genes. Some critical points considered were the following: the definition of phenotype and the necessary efforts required to verify diagnoses; sampling issues regarding ASPs, nuclear families, and large extended pedigrees; analytic methods for gene discovery, including linkage strategies, and association methods; and new genetic technology for genomic analyses. To fully exploit genetic information and to improve efficiency of gene discovery, it will be necessary to have adequate samples available, such as by well-defined family-based registries, as well as continued development of improved study designs, genomic technology, and analytic methods.


    REFERENCES
 Top
 Abstract
 Introduction
 Definition and Measurement of...
 Intermediate End Points
 Sampling Considerations
 Analytic Methods
 Linkage Methods
 Artificial Neural Networks and...
 Association Methods
 Pooled Analyses
 Technology Trends
 The Changing Gene Discovery...
 Genetic Analysis Opportunities...
 Summary
 References
 

1 Risch N. Linkage strategies for genetically complex traits. I. Multilocus models. Am J Hum Genet 1990;46:222-8.[Web of Science][Medline]

2 Lindor NM, Greene MH, the Mayo Familial Cancer Program. The concise handbook of family cancer syndromes. J Natl Cancer Inst 1998;14:1039-71.

3 Airewele G, Adatto P, Cunningham J, Mastromarino C, Spencer C, Sharp M. Family history of cancer in patients with glioma: a validation study of accuracy. J Natl Cancer Inst 1998;90:543-4.[Free Full Text]

4 Love RR, Evans AM, Josten DM. The accuracy of patient reports of a family history of cancer. J Chron Dis 1985;38:289-93.[CrossRef][Web of Science][Medline]

5 Pepe M, Fleming TR. A nonparametric method for dealing with mismeasured covariate data. J Am Stat Assoc 1991;86:108-13.[CrossRef][Web of Science]

6 Wijsman EM, Amos CI. Genetic analysis of simulated oligogenic traits in nuclear and extended pedigrees: summary of GAW10 contributions. Genet Epidemiol 1997;14:719-35.[CrossRef][Web of Science][Medline]

7 Prentice RL. Surrogate endpoints in clinical trials: definition and operational criteria. Stat Med 1989;8:431-40.[Web of Science][Medline]

8 Pepe MS. Inference using surrogate outcome data and a validation sample. Biometrika 1992;79:355-65.[Abstract/Free Full Text]

9 Ginsburg EK, Axenovich TI. On planning of samples for linkage analysis: two ways of a sample size reduction. Genet Epidemiol 1996;13:343-54.[CrossRef][Web of Science][Medline]

10 Risch N. Mapping genes for psychiatric disorders. In: Gershon ES, Cloninger CR, editors. Genetic approaches to mental disorders. Washington (DC): American Psychiatric Press; 1994. p. 47-61.

11 Weeks DE. Pedigree selection and information content. Curr Protocols Hum Genet 1994;1:1-21.

12 Lander ES, Botstein D. Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science 1987;236:1567-70.[Abstract/Free Full Text]

13 Pauls DL. Behavioural disorders: lessons in linkage. Nat Genet 1993;3:4-5.[CrossRef][Web of Science][Medline]

14 Greenberg DA. There is more than one way to collect data for linkage analysis: what a study of epilepsy can tell us about linkage strategy for psychiatric disease.Arch Gen Psychiatry 1992;49:745-50.[Abstract/Free Full Text]

15 McCarthy MI, Kruglyak L, Lander ES. Sib-pair collection strategies for complex diseases. Genet Epidemiol 1998;15:317-40.[CrossRef][Web of Science][Medline]

16 Goldin LR, Bailey-Wilson JE, Borecki IB, Falk CT, Goldstein AM, Suarez BK, et al. Genetic analysis workshop 10: detection of genes for complex traits. Genet Epidemiol 1997;14:549-1152.[CrossRef][Web of Science]

17 Goldgar DE, Easton DF. Optimal strategies for mapping complex diseases in the presence of multiple loci. Am J Hum Genet 1997;60:1222-32.[Web of Science][Medline]

18 Schork NJ, Xu X. Sib pairs versus pedigrees: what are the advantages? Diab Rev 1997;5:116-22.

19 Ploughman LM, Boehnke M. Estimating the power of a proposed linkage study for a complex trait. Am J Hum Genet 1989;44:543-51.[Web of Science][Medline]

20 Ginsburg EK, Axenovich TI, Goodman DW. On estimation of linkage test power. Genet Epidemiol 1996;13:355-65.[CrossRef][Web of Science][Medline]

21 Ginsburg EK, Axenovich TI. Sample size required for predefined linkage decision quality. Genet Epidemiol 1997;14:479-91.[CrossRef][Web of Science][Medline]

22 Risch N. Linkage strategies for genetically complex traits. II. The power of affected relative pairs. Am J Hum Genet 1990;46:229-41.[Web of Science][Medline]

23 Todorov AA, Borecki IB, Rao DC. Linkage analysis of complex traits using affected sibpairs: effects of single-locus approximations on estimates of the required sample size. Genet Epidemiol 1997;14:389-401.[CrossRef][Web of Science][Medline]

24 Risch N, Zhang H. Extreme discordant sib pairs for mapping quantitative trait loci in humans. Science 1995;268:1584-9.[Abstract/Free Full Text]

25 Zhao H, Zhang H, Rotter JI. Cost-effective sib-pair designs in the mapping of quantitative-trait loci. Am J Hum Genet 1997;60:1211-21.[Web of Science][Medline]

26 Field LL, Tobias R, Magnus T. A locus on chromosome 15q26 (IDDM3) produces susceptibility to insulin-dependent diabetes mellitus. Nat Genet 1994;8:189-94.[CrossRef][Web of Science][Medline]

27 Davies JL, Kawaguchi Y, Bennett ST, Copeman JB, Cordell HJ, Pritchard LE, et al. A genome-wide search for human type 1 diabetes susceptibility genes. Nature 1994;371:130-6.[CrossRef][Medline]

28 Hashimoto L, Habita C, Beressi JP, Delepine M, Besse C, Cambon-Thomsen A, et al. Genetic mapping of a susceptibility locus for insulin-dependent diabetes mellitus on chromosome 11q. Nature 1994;371:161-4.[CrossRef][Medline]

29 Concannon P, Gogolin-Ewens KJ, Hinds DA, Walpelhorst B, Morrison VA, Stirling B, et al. A second-generation screen of the human genome for susceptibility to insulin-dependent diabetes mellitus. Nat Genet 1998;19:292-6.[CrossRef][Web of Science][Medline]

30 Mein CA, Esposito L, Dunn MG, Johnson GC, Timms AE, Goy JV, et al. A search for type 1 diabetes susceptibility genes in families from the United Kingdom. Nat Genet 1998;19:297-300.[CrossRef][Web of Science][Medline]

31 Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science 1996;273:1516-7.[Abstract/Free Full Text]

32 Lander ES. The new genomics: global views of biology. Science 1996;274:536-9.[Free Full Text]

33 Elston RC. The genetic dissection of multifactorial traits. Clin Exp Allergy 1995;25:103-6.

34 Clerget-Darpoux F, Bonaiti-Pellie C. Strategies based on marker information for the study of human diseases. Ann Hum Genet 1992;56:145-53.[Web of Science][Medline]

35 Clerget-Darpoux F, Bonaiti-Pellie C, Hochez J. Effects of misspecifying genetic parameters in lod score analysis. Biometrics 1986;42:393-9.[CrossRef][Web of Science][Medline]

36 Lander ES, Schork NJ. Genetic dissection of complex traits. Science 1994;265:2037-48.[Abstract/Free Full Text]

37 Clerget-Darpoux F. Bias of the estimated recombination fraction and lod score due to an association between a disease gene and a marker gene. Ann Hum Genet 1982;46:363-72.

38 Manuila A. Blood groups and disease—hard facts and delusions. JAMA 1958;167:2047-53.[Web of Science]

39 Morton NE. Sequential tests for the detection of linkage. Am J Hum Genet 1955;7:277-318.[Web of Science][Medline]

40 Whittemore AS. Genome scanning for linkage: an overview. Am J Hum Genet 1996;59:704-16.[Web of Science][Medline]

41 Ott J. Estimation of the recombination fraction in human pedigrees: efficient computation of the likelihood for human linkage studies. Am J Hum Genet 1974;26:588-97.[Web of Science][Medline]

42 Lathrop GM, Lalouel JM. Easy calculation of lod scores and genetic risks on small computers. Am J Hum Genet 1984;36:460-5.[Web of Science][Medline]

43 Cottingham RW Jr, Idury RM, Schaffer AA. Faster sequential genetic linkage computations. Am J Hum Genet 1993;53:252-63.[Web of Science][Medline]

44 Schaffer AA, Gupta SK, Shiram K, Cottingham RW Jr. Avoiding recomputation in linkage analysis. Hum Hered 1994;44:225-37.[Web of Science][Medline]

45 O'Connell JR, Weeks DE. The VITESSE algorithm for rapid exact multilocus linkage analysis via genotype set-redcoding and fuzzy inheritance. Nat Genet 1995;11:402-8.[CrossRef][Web of Science][Medline]

46 Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES. Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet1996 ;58:1347-63.[Web of Science][Medline]

47 Amos CI, Williamson JA. Robustness of the maximum-likelihood (LOD) method for detecting linkage. Am J Hum Genet 1993;52:213-4.[Web of Science][Medline]

48 Williamson JA, Amos CI. Guess LOD approach: sufficient conditions for robustness. Genet Epidemiol 1995;12:163-76.[CrossRef][Web of Science][Medline]

49 Weeks DE, Lange K. The affected-pedigree-member method of linkage analysis. Am J Med Genet 1988;42:315-26.

50 Davis S, Weeks DE. Comparison of nonparametric statistics for detection of linkage in nuclear families: single-marker evaluation. Am J Hum Genet 1997;61:1431-44.[CrossRef][Web of Science][Medline]

51 Weeks DE, Lathrop GM. Polygenic disease: methods for mapping complex disease traits. Trends Genet 1998;11:513-9.

52 Whittemore AS, Tu IP. Simple, robust linkage tests for affected sib pairs. Am J Hum Genet 1998;62:1228-42.[CrossRef][Web of Science][Medline]

53 Sandkuyl LA. Analysis of affected sib pairs using information from extended families. Prog Clin Bio Res 1989;329:117-22.

54 Sham PC, Zhao JH, Curtis D. Optimal weighting scheme for affected sib-pair analysis of sibship data. Ann Hum Genet 1997;61:61-9.[Web of Science][Medline]

55 Curtis D, Sham PC. Using risk calculation to implement an extended relative pair analysis. Ann Hum Genet 1994;58:151-62.[Web of Science][Medline]

56 Hauser ER, Boehnke M, Guo SW, Risch N. Affected-sib-pair interval mapping and exclusion for complex genetic traits: sampling considerations. Genet Epidemiol 1996;13:117-37.[CrossRef][Web of Science][Medline]

57 Holmans P. Asymptotic properties of affected-sib-pair linkage analysis. Am J Hum Genet 1993;52:362-74.[Web of Science][Medline]

58 Knapp M, Seuchter SA, Baur MP. Two-locus disease models with two marker loci: the power of affected-sib-pair tests. Am J Hum Genet 1994;55:1030-41.[Web of Science][Medline]

59 Cordell HJ, Todd JA, Bennett ST, Kawaguchi Y, Farrall M. Two-locus maximum lod score analysis of a multifactorial trait: joint consideration of IDDM2 and IDDM4 with IDDM1 in type 1 diabetes. Am J Hum Genet 1995;57:920-34.[Web of Science][Medline]

60 Farrall M. Affected sibpair linkage tests for multiple linked susceptibility genes. Genet Epidemiol 1997;14:103-15.[CrossRef][Web of Science][Medline]

61 Whittemore AS, Halpern J. A class of tests for linkage using affected pedigree members. Biometrics 1994;50:118-27.[CrossRef][Web of Science][Medline]

62 Kong A, Cox NJ. Allele-sharing models: LOD scores and accurate linkage tests. Am J Hum Genet 1997;61:1179-88.[CrossRef][Web of Science][Medline]

63 Knapp M, Seuchter SA, Baur MP. Linkage analysis in nuclear families 1: optimality criteria for affected sib-pair tests. Hum Hered 1994;44:37-43.[CrossRef][Web of Science][Medline]

64 Schroeder M, Brown DL, Weeks DE. Improved programs for the affected-pedigree-member method of linkage analysis. Genet Epidemiol 1994;11:69-74.[CrossRef][Web of Science][Medline]

65 Ward PJ. Some developments on the affected-pedigree method of linkage analysis. Am J Hum Genet 1993;52:1200-15.[Web of Science][Medline]

66 Weeks DE, Valappil TI, Schroeder M, Brown DL. An X-linked version of the affected-pedigree-member method of linkage analysis. Hum Hered 1995;45:25-33.[Web of Science][Medline]

67 Weeks DE, Lange K. A multilocus extension of the affected-pedigree-member method of linkage analysis. Am J Hum Genet 1992;50:859-68.[Web of Science][Medline]

68 Matise TC, Weeks DE. Detecting heterogeneity with the affected-pedigree-member (APM) method. Genet Epidemiol 1993;10:401-6.[CrossRef][Web of Science][Medline]

69 Weeks DE, Harby LD. The affected-pedigree-member method: power to detect linkage. Hum Hered 1995;45:13-24.[CrossRef][Web of Science][Medline]

70 Bishop DT, Williamson JA. The power of identity-by-state methods in linkage analysis. Am J Hum Genet 1990;46:254-65.[Web of Science][Medline]

71 Goldin LR, Weeks DE. Two-locus models of disease: comparison of likelihood and nonparametric linkage methods. Am J Hum Genet 1993;53:908-15.[Web of Science][Medline]

72 Davis S, Schroeder M, Goldin LR, Weeks D. Nonparametric simulation-based statistics for detecting linkage in general pedigrees. Am J Hum Genet 1996;58:867-80.[Web of Science][Medline]

73 Babron MC, Martinez M, Bonaiti-Pellie C, Clerget-Darpoux F. Linkage detection by the affected-pedigree-member method: what is really tested? Genet Epidemiol 1993;10:389-94.[CrossRef][Web of Science][Medline]

74 Commenges D, Olson J, Wijsman E. The weighted rank pairwise correlation statistic for linkage analysis: simulation study and application to Alzheimer's disease. Genet Epidemiol 1994;11:201-12.[CrossRef][Web of Science][Medline]

75 Commenges D. Robust genetic linkage analysis based on a score test of homogeneity: the weighted pairwise correlation statistic. Genet Epidemiol 1994;11:189-200.[CrossRef][Web of Science][Medline]

76 Commenges D, Laurent A. Improving the robustness of the weighted pairwise correlation test for linkage analysis. Genet Epidemiol 1996;13:559-73.[CrossRef][Web of Science][Medline]

77 Greenberg DA, Hodge SE, Vieland VJ, Spence MA. Affecteds-only linkage methods are not a panacea. Am J Hum Genet 1996;58:892-5.[Web of Science][Medline]

78 Farrall M. LOD wars: the affected-sib-pair paradigm strikes back. Am J Hum Genet 1997;60:735-7.[Web of Science][Medline]

79 Kruglyak L. Nonparametric linkage tests are model free. Am J Hum Genet 1997;61:254-5.[Web of Science][Medline]

80 Greenberg DA, Hodge SE, Vieland VJ, Spence MA. Power, model of inheritance, and Type I error in lod scores and affecteds-only methods: reply to Kruglyak. Am J Hum Genet 1998;62:202-4.[CrossRef][Web of Science][Medline]

81 Schork NJ, Boehnke M, Terwilliger JD, Ott J. Two-trait-locus linkage analysis: a powerful strategy for mapping complex genetic traits. Am J Hum Genet 1993;53:1127-36.[Web of Science][Medline]

82 Martinez M, Goldin LR. Power of the linkage test for a heterogeneous disorder due to two independent inherited causes: a simulation study. Genet Epidemiol 1990;7:219-30.[CrossRef][Web of Science][Medline]

83 Levinson DF. Power to detect linkage with heterogeneity in samples of small nuclear families. Am J Med Genet 1993;48:94-102.[CrossRef][Web of Science][Medline]

84 Goldin LR, Gershon ES. Power of the affected-sib-pair method for heterogeneous disorders. Genet Epidemiol 1988;5:35-42.[CrossRef][Web of Science][Medline]

85 Bishop CM. Neural networks for pattern recognition. Oxford (U.K.): Clarendon Press; 1995.

86 Boehnke M. Limits of resolution of genetic linkage studies: implications for the positional cloning of human disease genes. Am J Hum Genet 1994;55:379-90.[Web of Science][Medline]

87 Jorde LB. Invited editorial: linkage disequilibrium as a gene-mapping tool. Am J Hum Genet 1995;56:11-4.[Web of Science][Medline]

88 Devlin B, Risch N, Roeder K. Disequilibrium mapping: composite likelihood for pairwise disequilibrium. Genomics 1996;36:1-16.[CrossRef][Web of Science][Medline]

89 Rannala B, Slatkin M. Likelihood analysis of disequilibrium mapping, and related problems. Am J Hum Genet 1998;62:459-73.[CrossRef][Web of Science][Medline]

90 Lazzeroni LC. Linkage disequilibrium and gene mapping: an empirical least-squares approach. Am J Hum Genet 1998;62:159-70.[CrossRef][Web of Science][Medline]

91 Guo SW. Linkage disequilibrium measures for fine-scale mapping: a comparison. Hum Hered 1997;47:301-14.[Web of Science][Medline]

92 Xiong M, Guo SW. Fine-scale genetic mapping based on linkage disequilibrium: theory and applications. Am J Hum Genet 1997;60:1513-31.[Web of Science][Medline]

93 Terwilliger JD. A powerful likelihood method for the analysis of linkage disequilibrium between trait loci and one or more polymorphic marker loci. Am J Hum Genet 1995;56:777-87.[Web of Science][Medline]

94 Hastbacka J, delaChapelle A, Kaitila I, Sistonen P, Weaver A, Lander E. Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland. Nat Genet 1992;2:204-11.[CrossRef][Web of Science][Medline]

95 Kaplan NL, Hill WG, Weir BS. Likelihood methods for locating disease genes in nonequilibrium populations. Am J Hum Genet 1995;56:18-32.[Web of Science][Medline]

96 Boguski MS. The turning point in genome research. Trends Biochem Sci 1995;20:295-6.[CrossRef][Web of Science][Medline]

97 Walter MA, Spillett DJ, Thomas P, Weissenbach J, Goodfellow PN. A method for constructing radiation hybrid maps of whole genomes. Nat Genet1994 ;7:22-8.[CrossRef][Web of Science][Medline]

98 Houwen RH, Baharloo S, Blankenship K, Raeymakers P, Juyn F, Sandkuijl LA, et al. Genome screening by searching for shared segments: mapping a gene for benign recurrent intrahepatic cholestasis. Nat Genet 1994;8:380-6.[CrossRef][Web of Science][Medline]

99 Stephens JC, Briscoe D, O'Brien SJ. Mapping by admixture linkage disequilibrium in human populations: limits and guidelines. Am J Hum Genet 1994;55:809-24.[Web of Science][Medline]

100 Chakraborty R, Weiss KM. Admixture as a tool for finding linked genes and detecting that difference from allelic association between loci. Proc Natl Acad Sci U S A 1988;85:9119-23.[Abstract/Free Full Text]

101 Kaplan NL, Martin ER, Morris RW, Weir BS. Marker selection for the transmission/disequilibrium test, in recently admixed populations. Am J Hum Genet 1998;62:703-12.[CrossRef][Web of Science][Medline]

102 Chapman NH, Wijsman EM. Genome screens using linkage disequilibrium tests: optimal marker characteristics and feasibility. Am J Hum Genet1998 ;63:1872-85.[CrossRef][Web of Science][Medline]

103 Xiong M, Guo SW. The power of linkage detection by the transmission/disequilibrium tests. Hum Hered 1998;48:295-312.[CrossRef][Web of Science][Medline]

104 Falk CT, Rubinstein P. Haplotype relative risks: an easy reliable way to construct a proper control sample for risk calculations. Ann Hum Genet 1987;51:227-33.[Web of Science][Medline]

105 Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 1993;52:506-16.[Web of Science][Medline]

106 Spielman RS, Ewens WJ. The TDT and other family-based tests for linkage disequilibrium and association. Am J Hum Genet 1996;59:983-9.[Web of Science][Medline]

107 Terwilliger JD, Ott J. A haplotype-based `Haplotype Relative Risk' approach to detecting allelic associations. Hum Hered 1992;42:337-46.[CrossRef][Web of Science][Medline]

108 Knapp M, Seuchter SA, Bauer MP. The haplotype-relative-risk (HRR) method for analysis of association in nuclear families. Am J Hum Genet 1993;52:1085-93.[Web of Science][Medline]

109 Ott J. Statistical properties of the haplotype relative risk. Genet Epidemiol 1989;6:127-30.[CrossRef][Web of Science][Medline]

110 Parsian A, Todd RD, Devor EJ, O'Malley KL, Suarez, BK, Reich T, et al. Alcoholism and alleles of the human D2 dopamine receptor locus. Studies of association and linkage. Arch Gen Psychiatry 1991;48:655-63.[Abstract/Free Full Text]

111 Schaid DJ, Sommer SS. Genotype relative risks: methods for design and analysis of candidate-gene association studies. Am J Hum Genet 1993;53:1114-26.[Web of Science][Medline]

112 Schaid DJ, Sommer SS. Comparison of statistics for candidate-gene association studies using cases and parents. Am J Hum Genet 1994;55:402-9.[Web of Science][Medline]

113 Jin K, Speed TP, Klitz W, Thomson G. Testing for segregation distortion in the HLA complex. Biometrics 1994;50:1189-98.[CrossRef][Web of Science][Medline]

114 Ewens WJ, Spielman RS. The transmission/disequilibrium test: history, subdivision, and admixture. Am J Hum Genet 1995;57:455-64.[Web of Science][Medline]

115 Thomson G. Analysis of complex human genetic traits: an ordered-notation method and new tests for mode of inheritance. Am J Hum Genet 1995;57:474-86.[Web of Science][Medline]

116 Thomson G. Mapping disease genes: family-based association studies. Am J Hum Genet 1995;57:487-98.[Web of Science][Medline]

117 Flanders WD, Khoury MJ. Analysis of case-parental control studies: method for the study of associations between genetic markers. Am J Epidemiol 1996;144:696-703.[Abstract/Free Full Text]

118 Khoury MJ. Case-parental control method in the search for disease-susceptibility genes. Am J Hum Genet 1994;55:410-5.[Web of Science][Medline]

119 Tai JJ, Song WH. Linkage disequilibrium and linkage information from one-child families. Hum Hered 1991;41:316-23.[CrossRef][Web of Science][Medline]

120 Harley JB, Moser KL, Neas BR. Logistic transmission modeling of simulated data. Genet Epidemiol 1995;12:607-12.[CrossRef][Web of Science][Medline]

121 Clerget-Darpoux F, Babron MC, Bickeboller H. Comparing the power of linkage detection by the transmission disequilibrium test and the identity-by-descent test.Genet Epidemiol 1995;12:583-8.[CrossRef][Web of Science][Medline]

122 Morris AP, Curnow RN, Whittaker JC. Randomization tests of disease-marker associations. Ann Hum Genet 1997;61:49-60.[Web of Science][Medline]

123 Rice JP, Neuman RJ, Hoshaw SL, Daw EW, Gu C. TDT with covariates and genomic screens with mod scores: their behavior on simulated data. Genet Epidemiol 1995;12:659-64.[CrossRef][Web of Science][Medline]

124 Sham PC, Curtis D. An extended transmission/equilibrium test (TDT) for multi-allele marker loci. Ann Hum Genet 1995;59:323-36.[Web of Science][Medline]

125 Bickeboller H, Clerget-Darpoux F. Statistical properties of the allelic and genotypic transmission/disequilibrium test for multiallelic markers. Genet Epidemiol 1995;12:865-70.[CrossRef][Web of Science][Medline]

126 Kaplan NL, Martin ER, Weir BS. Power studies for the transmission/disequilibrium tests with multiple alleles. Am J Hum Genet 1997;60:691-702.[Web of Science][Medline]

127 Sham P. Transmission/disequilibrium tests for multiallelic loci. Am J Hum Genet 1997;61:774-8.[Web of Science][Medline]

128 Cleves MA, Olson JM, Jacobs KB. Exact transmission-disequilibrium tests with multiallelic markers. Genet Epidemiol 1997;14:337-47.[CrossRef][Web of Science][Medline]

129 McKeigue PM. Mapping genes underlying ethnic differences in disease risk by linkage disequilibrium in recently admixed populations. Am J Hum Genet 1997;60:188-96.[Web of Science][Medline]

130 Camp NJ. Genomewide transmission/disequilibrium testing—consideration of the genotypic relative risk at disease loci. Am J Hum Genet 1997;61:1424-30.[CrossRef][Web of Science][Medline]

131 Martin ER, Kaplan NL, Weir BS. Tests for linkage and association in nuclear families. Am J Hum Genet 1997;61:439-48.[CrossRef][Web of Science][Medline]

132 Morris AP, Whittaker JC, Curnow RN. A likelihood ratio test for detecting patterns of disease-marker association. Ann Hum Genet 1997;61:335-50.[CrossRef][Web of Science][Medline]

133 Maestri NE, Beaty TH, Hetmanski J, Smith EA, McIntosh I, Wyszynski DF, et al. Application of transmission disequilibrium tests to nonsyndromic oral clefts: including candidate genes and environmental exposures in the models. Am J Med Genet 1997;73:337-44.[CrossRef][Web of Science][Medline]

134 Wilson SR. On extending the transmission/disequilibrium test (TDT). Ann Hum Genet 1997;61:151-61.[CrossRef][Web of Science][Medline]

135 Self SG, Longton G, Kopecky KJ, Liang KY. On estimating HLA/disease association with application to a study of aplastic anemia. Biometrics 1991;47:53-61.[CrossRef][Web of Science][Medline]

136 Knapp M, Wassmer G, Baur MP. The relative efficiency of the Hardy-Weinberg equilibrium-likelihood and the conditional on parental genotype-likelihood methods for candidate-gene association studies. Am J Hum Genet 1995;57:1476-85.[Web of Science][Medline]

137 Schaid DJ. General score tests for associations of genetic markers with disease using cases and their parents. Genet Epidemiol 1996;13:423-49.[CrossRef][Web of Science][Medline]

138 Langholz B, Tuomilehto-Wolf E, Thomas D, Pitkaniemi J, Tuomilehto J, DiMe Study Group. Variation in HLA-associated risks of childhood insulin dependent diabetes in the Finnish population: I. Allele effects at A, B, and DR loci. Genet Epidemiol 1995;12:441-53.[CrossRef][Web of Science][Medline]

139 Schaid DJ. Relative-risk regression models using cases and their parents. Genet Epidemiol 1995;12:813-8.[CrossRef][Web of Science][Medline]

140 Witte JS, Gauderman WJ, Thomas DC. Asymptotic bias and efficiency in case-control studies of candidate genes and gene-environment interactions: basic family designs. Am J Epidemiol 1999;149:693-705.[Abstract/Free Full Text]

141 Goldstein AM, Hodge SE, Haile RW. Selection bias in case-control studies using relatives as the controls. Int J Epidemiol 1989;18:985-9.[Abstract/Free Full Text]

142 Andrieu N, Goldstein AM. Use of relatives of cases as controls to identify risk factors when an interaction between environmental and genetic factors exists. Int J Epidemiol 1996;25:649-55.[Abstract/Free Full Text]

143 Curtis D. Use of siblings as controls in case-control association studies. Ann Hum Genet 1997;61:319-33.[CrossRef][Web of Science][Medline]

144 Langefeld CD, Pericak-Vance MA, Saunders AM, Boehnke M. Family-based tests for association using discordant sib pairs. Am J Hum Genet1997 ;61:A1643.

145 Monks SA, Martin ER, Weir BS, Kaplan NL. A sibship test of linkage in the absence of parental information. Am J Hum Genet 1997;61:A1669.

146 Ewens WJ, Spielman RS. The sib-TDT (S-TDT): a TDT (transmission/disequilibrium test) without parents. Am J Hum Genet 1997;61:A1600.

147 Spielman RS, Ewens WJ. A sibship test for linkage in the presence of association: the sib transmission/disequilibrium test. Am J Hum Genet 1998;62:450-8.[CrossRef][Web of Science][Medline]

148 Schaid DJ, Rowland CR. Parents, sibs, and unrelated controls to detect associations of genetic markers with disease. Am J Hum Genet 1998;63:1492-1506.[CrossRef][Web of Science][Medline]

149 Long JC, Williams RC, Urbanek M. An E-M algorithm and testing strategy for multiple-locus haplotypes. Am J Hum Genet 1995;56:799-810.[Web of Science][Medline]

150 Excoffier L, Slatkin M. Incorporating genotypes of relatives into a test of linkage disequilibrium. Am J Hum Genet 1998;62:171-80.[CrossRef][Web of Science][Medline]

151 Schweder T, Spjotvoll E. Plots of P-values to evaluate many tests simultaneously. Biometrika 1982;69:493-502.[Abstract/Free Full Text]

152 Thomas D, Pitkaniemi J, Langholz B, Tuomilehto-Wolf E, Tuomilehto J. Variation of HLA-associated risks of childhood insulin-dependent diabetes in the Finnish population: II. Haplotype effects. Genet Epidemiol 1995;12:455-66.[CrossRef][Web of Science][Medline]

153 Thomas D, Langholz B, Clayton D, Pitkaniemi J, Tuomilehto-Wolf E, Toumilehto J. Emperical Bayes methods for testing associations with large numbers of candidate genes in the presence of environmental risk factors, with applications to HLA associations in IDDM. Ann Med 1992;24:387-92.[Web of Science][Medline]

154 Liang KY. Extended Mantel-Haenszel estimating procedure for multivariate logistic regression models. Biometrics 1987;43:289-99.[CrossRef][Web of Science][Medline]

155 Tregouet DA, Ducimetiere P, Tiret L. Testing association between candidate-gene markers and phenotype in related individuals, by use of estimating equations. Am J Hum Genet 1997;61:189-99.[Web of Science][Medline]

156 Boerwinkle E, Chakraborty R, Sing CF. The use of measured genotype information in the analysis of quantitative phenotypes in man. I. Models and analytical methods. Ann Hum Genet 1986;50:181-94.[Web of Science][Medline]

157 George VT, Elston RC. Testing the association between polymorphic markers and quantitative traits in pedigrees. Genet Epidemiol 1987;4:193-201.[CrossRef][Web of Science][Medline]

158 Risch N. Segregation analysis incorporating linkage markers. 1. Single-locus models with an application to Type I diabetes. Am J Hum Genet1984 ;36:363-86.[Web of Science][Medline]

159 Maclean CJ, Morton NE, Yee S. Combined analysis of genetic segregation and linkage under an oligogenic model. Comp Biomed Res 1984;17:471-80.[CrossRef][Web of Science][Medline]

160 Elston RC. Man bites dog? The validity of maximizing lod scores to determine mode of inheritance. Am J Med Genet 1989;34:487-8.[CrossRef][Web of Science][Medline]

161 Hodge SE, Elston RC. Lods, wrods, and mods: the interpretation of lod scores calculated under different models. Genet Epidemiol1994 ;11:329-42.[CrossRef][Web of Science][Medline]

162 Clerget-Darpoux F, Babron MC, Prum B, Lathrop GM, Deschamps I, Hors J. A new method to test genetic models in HLA associated diseases: the MASC method. Ann Hum Genet 1988;52:247-58.[Web of Science][Medline]

163 Boehnke M, Langefeld CD. A transmission/disequilibrium test that uses both affected and unaffected offspring. Am J Hum Genet1997 ;61:A15464.

164 Ewens WJ, Spielman RS. Statistical properties of maximum likelihood estimators for genetic parameters of HLA-linked diseases. Am J Hum Genet 1985;37:1172-91.[Web of Science][Medline]

165 Li Z, Rao DC. Random effects model for meta-analysis of multiple quantitative sibpair linkage studies. Genet Epidemiol 1996;13:377-83.[CrossRef][Web of Science][Medline]

166 Marshall A, Hodgson J. DNA chips: an array of possibilities. Nat Biotech 1998;16:27-31.[CrossRef][Web of Science][Medline]

167 Fodor SPA, Rava RP, Huang XC, Pease AC, Holmes CP, Adams CL. Multiplexed biochemical assays with biological chips. Nature 1993;364:555-6.[CrossRef][Medline]

168 Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995;270:467-70.[Abstract/Free Full Text]

169 Pease AC, Solas D, Sullivan EJ, Cronin MT, Holmes CP, Fodor SP. Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc Natl Acad Sci U S A 1994;91:5022-6.[Abstract/Free Full Text]

170 Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, et al. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotech 1996;14:1675-80.[CrossRef][Web of Science][Medline]

171 DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M, et al. Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat Genet 1996;14:457-60.[CrossRef][Web of Science][Medline]

172 Hacia JG, Brody LC, Chee MS, Fodor SP, Collins FS. Detection of heterozygous mutations in BRCA1 using high density oligonucleotide arrays and two-colour fluorescence analysis. Nat Genet 1996;14:441-7.[CrossRef][Web of Science][Medline]

173 Shoemaker DD, Lashkare DA, Morries D, Mittmann M, Davis RW. Quantitative phenotypic analysis of yeast deletion mutants using a highly parallel molecular bar-coding strategy. Nat Genet 1996;14: 450-6.[CrossRef][Web of Science][Medline]

174 Lashkari DA, DeRisi JL, McCusker JH, Namath AF, Gentile C, Hwang SY, et al. Yeast microarrays for genome wide parallel genetic and gene expression analysis. Proc Natl Acad Sci U S A 1997;94:13057-62.[Abstract/Free Full Text]

175 Wang DG, Fan JB, Siao CJ, Berno A, Young P, Sapolsky R, et al. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 1998;280:1077-82.[Abstract/Free Full Text]

176 Drmanac S, Kita D, Labat K, Hauser B, Schmidt C, Burczak JD, et al. Accurate sequencing by hybridization for DNA diagnostics and individual genomics. Nat Biotech 1998;16:54-8.[Web of Science][Medline]

177 Collins FS, Guyer MS, Chakravarti A. Variations on a theme: cataloging human DNA sequence variation. Science 1998;278:1580-1.[Free Full Text]

178 Strausberg RL, Dahl CA, Klausner RD. New opportunities for uncovering the molecular basis of cancer. Nat Genet 1997;16(suppl):415-516.[CrossRef]

179 McGlynn KA, Rosvold EA, Lustbader ED, Hu Y, Clapper ML, Zhou T, et al. Susceptibility to hepatocellular carcinoma is associated with genetic variation in the enzymatic detoxification of aflatoxin B1. Proc Natl Acad Sci U S A 1995;92:2384-7.[Abstract/Free Full Text]

180 Murray JC, Buetow KH, Donovan M, Hornung S, Motulsky AG, Disteche C, et al. Linkage disequilibrium of plasminogen polymorphisms and assignment of the gene to human chromosome 6q26-6q27. Am J Hum Genet 1987;40:338-50.[Web of Science][Medline]

181 Ardinger HH, Buetow KH, Bell GI, Bardach J, VanDemark DR, Murray JC. Association of genetic variation of the transforming growth factor-alpha gene with cleft lip and palate. Am J Hum Genet 1989;45:348-53.[Web of Science][Medline]

182 Knudson AG Jr. Mutation and cancer: statistical study of retinoblastoma. Proc Natl Acad Sci U S A 1971;68:820-3.[Abstract/Free Full Text]

183 Rebbeck TR, Lustbader ED, Buetow KH. Somatic allele loss in genetic linkage analysis of cancer. Genet Epidemiol 1994;11:419-29.[CrossRef][Web of Science][Medline]

184 Lustbader ED, Rebbeck TR, Buetow KH. Using loss of heterozygosity data in affected pedigree member linkage tests. Genet Epidemiol 1995;12:339-50.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Arch Intern MedHome page
P. Yang, Z. Sun, M. J. Krowka, M.-C. Aubry, W. R. Bamlet, J. A. Wampfler, S. N. Thibodeau, J. A. Katzmann, M. S. Allen, D. E. Midthun, et al.
Alpha1-Antitrypsin Deficiency Carriers, Tobacco Smoke, Chronic Obstructive Pulmonary Disease, and Lung Cancer Risk
Arch Intern Med, May 26, 2008; 168(10): 1097 - 1103.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
P. A. Newcomb, J. Baron, M. Cotterchio, S. Gallinger, J. Grove, R. Haile, D. Hall, J. L. Hopper, J. Jass, L. Le Marchand, et al.
Colon Cancer Family Registry: An International Resource for Studies of the Genetic Epidemiology of Colon Cancer
Cancer Epidemiol. Biomarkers Prev., November 1, 2007; 16(11): 2331 - 2343.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Schaid, D. J.
Right arrow Articles by Dahl, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Schaid, D. J.
Right arrow Articles by Dahl, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?