Sample size considerations

The statistical analysis will be dependent on the various designs, different sub-cohorts and biobanks.

Aim I
The statistical power for analyses of different cancer sites is simple to estimate based on standard levels of significance (0.05%) and power (80%) and a certain dichotomous distribution of exposure. The estimated number of incident cases according to site shown below has been based on a simple projection of cases for the next 3+3 years.

Aim II
The number of cases and controls necessary for finding a relative risk of 2 or 3 for analysis of gene-expression in PBCs is given below. In order to obtain prospectively the necessary number of cases annually of about 100-130, the cohort should contain 40-50 000 women, given an incidence rate of breast cancer of 260/100.000 per year.

Aim III
The analysis of single nucleotide polymorphisms (SNPs) will mainly be an integrated part of the EPIC collaboration. Due to the study size for both breast and prostate cancer funding has been given from National Cancer Institute by Consortium of cohorts for gene-environment analysis.

Aim IV
The calculation of sample size will be similar to Aim II. Except for the prospective analysis of collected blood samples, we will have 100% complete follow-up of incident cases through the national cancer register.
Sampling of breast tumor tissue: In 2000, 52 Norwegian hospitals had done breast cancer operations (Norwegian Inpatient Register, unpublished data), out of which 30 hospitals had 20 operations or more per year. The exclusion of smaller hospitals (with less than 20 operations per year) could reduce our total number of biopsies with around 10%. We expect 5% of the women enquired will refuse to give a blood sample and a specimen of fresh tumor tissue. About 2% of the samples are expected to be lost due to late transport, too little blood etc. In addition, failure of identifying the women in the short time interval between diagnosis and operation will inevitable give a loss, like other practical and administrative problems. This implies that the effective sample size would be around 75%. The annual incidence rate of breast cancer for the year 2000 in the actual age groups (50-64 years) was on the average 260/100.000 per year (The Cancer Registry of Norway 2002). Based on these assumption an expected number of 75-100 breast cases per year will be implemented, sufficient to find a RR=2. The effective sample size will depend on the variation in gene expression.