Serum Proteomic Profiling of Lung Cancer in High ... - Nicolas Molinari

high-risk of developing this cancer in order (i) to determine .... ities to differentiate were tested using a cross-validation class ... Immunoassay Quantification.
824KB taille 2 téléchargements 298 vues
ORIGINAL ARTICLE

Serum Proteomic Profiling of Lung Cancer in High-Risk Groups and Determination of Clinical Outcomes William Jacot, MD, PhD,*§ Ludovic Lhermitte, MD,†§ Nade`ge Dossat, PhD,‡§ Jean-Louis Pujol, MD, PhD,*§ Nicolas Molinari, PhD,‡§ Jean-Pierre Daure`s, MD, PhD,‡§ Thierry Maudelonde, MD, PhD,†§储 Alain Mange´, PhD,†§储¶ and Je´roˆme Solassol, MD, PhD†§储

Hypothesis: Lung cancer remains the leading cause of cancerrelated mortality worldwide. Currently known serum markers do not efficiently diagnose lung cancer at early stage. Methods: In the present study, we developed a serum proteomic fingerprinting approach coupled with a three-step classification method to address two important clinical questions: (i) to determine whether or not proteomic profiling differs between lung cancer and benign lung diseases in a population of smokers and (ii) to assess the prognostic impact of this profiling in lung cancer. Proteomic spectra were obtained from 170 pathologically confirmed lung cancer or smoking patients with benign chronic lung disease serum samples. Results: Among the 228 protein peaks differentially expressed in the whole population, 88 differed significantly between lung cancer patients and benign lung disease, with area under the curve diagnostic values ranging from 0.63 to 0.84. Multiprotein classifiers based on differentially expressed peaks allowed the classification of lung cancer and benign disease with an area under the curve ranging from 0.991 to 0.994. Using a cross-validation methodology, diagnostic accuracy was 93.1% (sensitivity 94.3%, specificity 85.9%), and more than 90% of the stage I/II lung cancers were correctly classified. Finally, in the prognosis part of the study, a 4628 Da protein was found to be significantly and independently associated with prognosis in advanced stage non-small cell lung cancer patients (p ⫽ 0.0005). Conclusions: The potential markers that we identified through proteomic fingerprinting could accurately classify lung cancers in a high-risk population and predict survival in a non-small cell lung cancer population.

Departments of *Thoracic Oncology, †Cellular Biology, CHU Montpellier, Hoˆpital Arnaud de Villeneuve; ‡Department of Biostatistic, IURC, Epidemiology and Clinical Research; §University of Montpellier I; 储INSERM, U540; ¶Department of Clinical Oncoproteomic, Val d’Aurelle Cancer Institute, Montpellier, France. Disclosure: The authors declare no conflicts of interest. Both W. J. and L. L. contributed equally to this study. Both A. M. and J. S. should be considered as the senior authors of this study. Address for correspondence: Jean Louis Pujol, MD, PhD, Hoˆpital Arnaud de Villeneuve, 371 Avenue du Doyen Giraud, 34295 Montpellier Cedex 5, France. E-mail: [email protected] Copyright © 2008 by the International Association for the Study of Lung Cancer ISSN: 1556-0864/08/0308-0840

840

Key Words: Lung cancer, SELDI, Serum, Proteomics. (J Thorac Oncol. 2008;3: 840 –850)

L

ung cancer is a worldwide clinical problem and a major cause of cancer deaths. More than 150,000 new cases are diagnosed in Europe every year and overall mortality is increasing, with an overall 5-year survival rate of 6 to 16%.1 This poor prognosis is primarily related to propensity for early spread and late diagnosis of lung cancer. Approximately 87% of lung cancers are smoking related due to active or second hand tobacco smoking.2 The identification of early stages of lung cancer and an efficient screening of this high-risk population of smoking patients would enable earlier treatment, which seems to clearly improve outcome, with a 5-year survival rate reaching up to 60 to 70% for stage I surgically-treated patients3,4 and up to 88% in a screening population.5 However, despite many intensive studies, no efficient screening or early diagnosis methods have been validated so far. The old screening trials using chest radiography or sputum cytologic analysis have thus far been unable to decrease lung cancer mortality in the high-risk population of smoking individuals.2,6 The survival impact of low-dose spiral computerized tomography (CT) screening is currently evaluated by randomized clinical trials.5,7,8 This limitation has spurred continued interest in identifying reliable tumor markers that can be detected in serum. Tumor markers, such as Carcinoembryonic Antigen, Neuron Specific Enolase (NSE) and Cyfra 21-1 are mostly used as complementary tools to establish the initial diagnosis,9 to assess the efficacy of treatment and to monitor evidence of relapse. Their low diagnostic value prevent their use in the screening setting. A noninvasive technique based on reliable diagnosis markers will be useful in this setting. Recent advances in proteomic instrumentation, such as surface-enhanced laser desorption ionization/time-of-flight mass spectrometry (SELDI/TOF-MS) technology, and computational methodologies offer a unique chance to rapidly identify these new candidate markers.10 Using SELDI/TOFMS, we aimed to address two important clinical questions: first, the determination of a serum proteomic fingerprints from lung cancer patients and control subjects, which have a high-risk of developing this cancer in order (i) to determine whether or not proteomic profiling differs between lung

Journal of Thoracic Oncology • Volume 3, Number 8, August 2008

Journal of Thoracic Oncology • Volume 3, Number 8, August 2008 Serum Proteomic Profiling of Lung Cancer in High-Risk Groups

cancer and benign lung diseases in a population of smokers and (ii) to assess the prognostic impact of this profiling in lung cancer. In the first part of the study (discriminating value analysis), we analyzed spectra using a three-step algorithm and then assessed whether our pattern of markers could be used to discriminate lung cancer and benign lung disease. In the prognostic part of the study, we analyzed spectra of a homogeneous subpopulation of advanced non-small cell lung cancer (NSCLC) patients in search of new prognostic tumor markers.

PATIENTS AND METHODS Study Population Sera from pathologically confirmed lung cancer and benign tobacco-induced or tobacco-associated chronic lung disease patients were collected at the Thoracic Oncology Unit of the University Hospital (Montpellier, France) with institutional approval. Diagnosis of lung cancer was based on histologic analysis of fiberoptic lung biopsies or surgical resections and cancers were classified as small cell lung cancers and NSCLC (including squamous cell carcinomas, adenocarcinomas, and large cell carcinomas). The benign disease patients were collected during the same accrual period. Patients in the benign chronic lung disease group were affected by diseases resulting of (chronic obstructive pulmonary disease with or without infectious complications, n ⫽ 17) or worsened by (postinfectious bronchectasis, silicosis, severe asthma, n ⫽ 6) tobacco consumption, without previous or concomitant diagnosis of malignant disease. Whole blood was collected during fasting. Serum was separated by centrifugation for 20 minutes at 4000 rpm, aliquoted, and frozen at ⫺80°C.

Proteomic Analysis To optimize the peak resolution and the number of protein peaks detected, an anion-exchange fractionation procedure was performed as described previously.11 Briefly, 20 ␮l of serum samples were denaturated in 9 M Urea, 2% 3-[(2-cholamidopropyl)-dimethylammonio]-1-propane-sulfonate hydrate (CHAPS), and 50 mM Tris-HCl, pH 9, and then separated on a 96-wells filter plate containing Qhyperd F resin (BioSepra Corp., Fremont, CA) into 6 different pH gradient elution fractions, referred as to F1 (pH 9), F2 (pH 7), F3 (pH 5), F4 (pH 4), F5 (pH 3), and F6 (organic elution). Each fraction was randomly applied to a CM10 array surface. F2 was not subjected to analysis due to the weak number of peaks detected in preliminary experiments. Sinapinic acid solution was applied as an energy absorbing matrix twice to each spot. Arrays were read on a phosphate-buffered saline II ProteinChip reader. Reproducibility was estimated using a human control serum (Ciphergen Biosystems), which was randomly spotted onto each chip to measure the variability of fractionation, on-chip spotting, and data acquisition. Spectra were background subtracted and the peak intensities were normalized in each fraction to the total ion current of m/z between 2.5 and 50 kDa. Peak detection was performed using the ProteinChip and Biomarker Wizard software (Ciphergen Biosystem). Automatic peak detection was performed in the

range of 2.5 to 50 kDa with the following settings: (i) signal-to-noise ratio at four for the first pass and two for the second pass, (ii) minimal peak threshold at 15% of all spectra, (iii) cluster mass window at 0.5% of mass. The resulting file containing absolute intensity and m/z ratio values was exported into Microsoft Excel for subsequent analysis.

Statistical Data Analysis The most discriminating markers between lung cancer and benign diseases were selected in three steps: (1) As peaks were defined by their intensities, the assumption of equality of intensities in the two groups was tested by a two-sided Wilcoxon test. Bonferroni‘s correction was not used due to the large number of tested peaks. Hence, a significant p value of 0.01 was chosen from the histogram of the 228 p values and corresponded to a break point in the histogram; (2) Both forward and stepwise procedures of logistic regression were used for final peak selection; (3) The selected markers abilities to differentiate were tested using a cross-validation class prediction analysis. Briefly, a random selection of training and test samples was built with 80% and 20% of the total population, respectively. A class prediction model was defined from the training samples using different supervised classification methods and was then used to predict the class of the test samples. Moreover, the model selection robustness was confirmed by using different training sample sizes that varied from 40, 60, and 80% of the total sample size (n ⫽ 170). The process was repeated 1000 times to improve the consistency, robustness, and validity of the model. For survival data analysis, the significant proteins were selected with the same method. Survival was calculated from the date of histologic diagnosis to the time of death, with at least 1 year follow-up duration. In this study, a poor prognosis group was defined as a population with an overall survival shorter than 12 months. This time to event is the current median survival for patient with advanced NSCLC treated with conventional first line chemotherapy. Patients alive at the time of follow-up were censored at this date. Kaplan– Meier survival curves were compared by the log-rank test with a cutoff value of p ⬍ 0.0001. We estimated the performance value of our classifier together with classic clinical and biologic features using the threshold defined in previous publications.3,12–15 The cutoff point was p ⬍ 0.05 for univariate analysis (log-rank test). Variables with a univariate p value ⬍0.1 were considered to be entered into a stratified Cox proportional hazard model. The cutoff point was p ⬍ 0.05 for the Cox model results.

Immunoassay Quantification The two tumor markers, Cyfra21-1 and NSE, were measured in the 170 sera using a Kryptor immunoassay system (Brahms) with cutoff values of 3.6 and 12.5 ng/ml, respectively.

RESULTS Patients’ Characteristics The clinicopathologic characteristic of patients and samples are given in Table 1. These included 147 lung cancer

Copyright © 2008 by the International Association for the Study of Lung Cancer

841

Journal of Thoracic Oncology • Volume 3, Number 8, August 2008

Jacot et al.

TABLE 1. Patient’s Characteristics Variables Chronic lung diseases Age Median (range) Sex Male Female Smoking history Never Former Active Lifetime pack-years Median (range) Diagnosis COPD COPD ⫹ infectious complications Severe asthma Bronchectasis Silicosis Lung cancer Age Median (range) Sex Male Female Smoking history Never Former Active Lifetime pack-years Median (range) Histology Small cell carcinoma Squamous cell carcinoma Adenocarcinoma Large cell carcinoma Tumor status 1–2 3–4 Nodal status 0–1 2–3 Metastatic status 0 1 Stage grouping (Mountain) Ia Ib II a II b III a III b IV

842

Patients 23 70 (53–84) 17 (74%) 6 (26%) 0 14 (61%) 9 (39%) 40 (2–100) 14 (61%) 3 (13%) 3 (13%) 2 (9%) 1 (4%) 147 62 (32–83) 115 (78%) 32 (22%) 14 (10%) 58 (39%) 75 (51%) 39 (0–150) 20 (14%) 40 (27%) 55 (37%) 32 (22%) 45 (31%) 102 (69%) 57 (39%) 90 (61%) 63 (43%) 84 (57%) 2 (1%) 3 (2%) 3 (2%) 11 (7%) 17 (12%) 27 (18%) 84 (57%)

and 23 benign disease samples. Of the 147 lung cancer samples assayed, most patients had advanced lung cancer (87%) and 13% had early lung cancer diseases.

Lung Cancer Versus Benign Disease Serum Proteomic Fingerprinting To increase the success rate of marker discovery, we first fractionated the serum samples based on the pIs of the proteins into 6 fractions. We then profiled each fraction on CM10 arrays by SELDI/TOF-MS. To evaluate the variability of SELDI, we first tested the experimental reproducibility using a human serum control. The coefficients of variation of 15 reliable peaks were calculated by averaging peak intensity values derived from 40 different runs. The average coefficient of variations were 10% and 17%, respectively, for intra- and inter-assay variability, which is consistent with previously reported studies.16 –19 A total set of 228 protein peak clusters ranging from 2 to 80 kDa were obtained from the 170 serum samples. According to our first statistical criterion, we selected 88 of the 228 protein peaks that were significantly differentially expressed between lung cancer and benign lung disease population samples (p ⬍ 0.01). Among these peaks, 67 were overexpressed and 21 were underexpressed in lung cancer compared with benign samples. The diagnostic values area under the curve (AUC) of each of these 88 differentially expressed peaks ranged from 0.63 to 0.84 (Table 2). A representative example of three discriminating protein peaks is shown in Figure 1. To identify a multiprotein signature that can distinguish the lung cancer and benign populations, we used a three-step statistical process. After peak selection, forward and stepwise logistic regression were used to construct two multiprotein classifiers composed of seven and eight protein peaks, respectively. Interestingly, six protein peaks were common to the two models, confirming the robustness of the model selection. Both composite indices yielded higher AUC values (0.991 and 0.994) than the individual protein peaks (ranging from 0.67– 0.833). In the same way, the AUC values from our multiprotein index were higher than those of the two classic lung cancer markers, NSE and Cyfra 21-1, or their combinatory index (Figure 2). We next applied our classifiers to cross-validation test samples to estimate their accuracy. Table 3 indicates the mean performance of our classifiers on 1000 randomly generated 80:20 sets of samples by use of different class-prediction models. Linear discriminant analysis (LDA) associated with the forward multiprotein classifier gave the best performance result for discriminating lung cancer and benign samples, achieving an overall classification accuracy of 93.1. Considering the estimated prevalence of lung cancer in high-risk populations of 3.2%, we evaluated the positive predictive value and the negative predictive value in our high-risk population to 18.1 and 99.8, respectively. Interestingly, the mean performance of the LDA forward multiprotein classifier remained unchanged with a mean specificity ranging from 85.6 to 86.2%, and sensitivity from 93.2 to 94.3% with a training set of 40% instead of 80% (Table 4). Finally, to estimate the misclassification rate of early stage lung cancer in our population (stage I–II, n ⫽ 19), we applied the different class-prediction models to cross-validation test

Copyright © 2008 by the International Association for the Study of Lung Cancer

Journal of Thoracic Oncology • Volume 3, Number 8, August 2008 Serum Proteomic Profiling of Lung Cancer in High-Risk Groups

TABLE 2. Differentially Expressed Markers Peak Intensity m/z 5916.65 6122.65 3203.06 3274.03 26562.70 7172.00 6099.92 77993.59 39845.94 14787.64 66755.68 4823.63 60854.55 56265.52 33419.79 9293.28 51405.19 12880.15 53762.91 42221.83 14159.08 45459.28 44606.65 14059.26 41148.13 7474.29 79502.96 6443.14 17270.63 58792.16 28292.37 6643.25 9725.59 9436.27 40295.53 48867.54 9294.82 14502.12 9728.00 28108.87 7657.77 34682.28 9436.99 12622.71 3325.15 17258.20 28948.57 13088.56 9653.85 8825.22 8930.98 4414.32 10275.04

Fraction

p

AUC

Chronic Lung Disease

Lung Cancer

F1 F1 F1 F1 F3 F6 F1 F5 F3 F6 F5 F1 F5 F6 F5 F1 F5 F6 F3 F6 F6 F6 F5 F6 F6 F1 F3 F6 F6 F4 F6 F6 F6 F5 F4 F3 F5 F5 F5 F6 F1 F5 F6 F6 F6 F5 F6 F6 F6 F6 F6 F6 F1

0.00000008 0.0000001 0.0000001 0.0000001 0.0000003 0.0000004 0.0000008 0.000004 0.000005 0.000005 0.000007 0.000007 0.00001 0.00002 0.00002 0.00002 0.00002 0.00002 0.00002 0.00002 0.00002 0.00003 0.00005 0.00005 0.00006 0.00007 0.0001 0.0001 0.0001 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0003 0.0003 0.0003 0.0003 0.0004 0.0004 0.0004 0.0005 0.0007 0.0007 0.0007 0.0007 0.0007 0.0008 0.0008 0.0009 0.0010 0.001

0.822 0.812 0.822 0.842 0.822 0.812 0.792 0.802 0.782 0.761 0.782 0.792 0.761 0.761 0.782 0.761 0.751 0.751 0.751 0.761 0.751 0.751 0.681 0.751 0.761 0.741 0.731 0.721 0.741 0.721 0.721 0.721 0.721 0.711 0.721 0.731 0.741 0.701 0.711 0.701 0.741 0.721 0.691 0.721 0.691 0.731 0.691 0.731 0.721 0.691 0.711 0.691 0.711

11.901 2.092 3.293 4.523 0.105 29.932 2.557 0.304 0.541 4.235 7.264 12.964 0.148 3.870 4.709 7.848 0.441 17.104 0.052 0.595 19.752 1.499 0.426 19.752 0.424 4.979 0.929 85.467 15.773 0.150 31.761 97.164 43.027 24.897 0.191 0.137 4.901 6.667 8.772 41.366 3.171 3.286 58.600 7.202 26.495 1.871 14.994 5.331 43.027 74.777 62.084 22.192 4.264

32.027 8.086 9.960 13.923 0.191 12.748 7.680 0.424 0.894 6.466 10.822 8.032 0.237 7.044 6.401 14.516 0.582 26.425 0.079 1.004 28.870 2.371 0.586 28.870 0.742 9.011 1.422 58.230 23.151 0.204 43.976 62.184 32.935 14.428 0.284 0.175 6.479 8.838 5.791 55.348 4.845 4.467 51.184 11.239 14.322 2.894 19.807 7.438 32.935 55.742 48.242 14.098 8.368

Copyright © 2008 by the International Association for the Study of Lung Cancer

Fold Change 2.7 3.9 3.0 3.1 1.8 0.4 3.0 1.4 1.7 1.5 1.5 0.6 1.6 1.8 1.4 1.8 1.3 1.5 1.5 1.7 1.5 1.6 1.4 1.5 1.8 1.8 1.5 0.7 1.5 1.4 1.4 0.6 0.8 0.6 1.5 1.3 1.3 1.3 0.7 1.3 1.5 1.4 0.9 1.6 0.5 1.5 1.3 1.4 0.8 0.7 0.8 0.6 2.0 (Continued)

843

Journal of Thoracic Oncology • Volume 3, Number 8, August 2008

Jacot et al.

TABLE 2. (Continued) Peak Intensity m/z 17399.18 6678.33 25697.08 22758.13 8149.31 17385.43 6464.33 17596.60 9143.29 59019.37 17157.64 22270.43 26214.75 6631.31 36938.75 6432.63 44678.99 13891.22 7770.31 4720.56 14163.93 28123.05 52072.00 6850.63 34754.05 23227.41 21098.20 4476.78 8931.48 13883.37 6946.05 3327.54 28318.09 60794.67 21784.52

Fraction

p

AUC

Chronic Lung Disease

Lung Cancer

Fold Change

F5 F5 F5 F6 F5 F5 F5 F6 F6 F5 F6 F3 F6 F5 F6 F5 F4 F6 F4 F6 F4 F4 F6 F6 F6 F5 F4 F6 F5 F4 F4 F1 F4 F3 F6

0.001 0.001 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.005 0.005 0.005 0.006 0.006 0.007 0.008 0.008 0.008 0.008 0.009 0.009 0.009 0.009 0.009

0.711 0.671 0.691 0.691 0.701 0.711 0.701 0.691 0.671 0.701 0.681 0.681 0.681 0.701 0.681 0.681 0.711 0.681 0.701 0.661 0.660 0.671 0.661 0.681 0.681 0.681 0.661 0.661 0.651 0.651 0.661 0.631 0.661 0.651 0.651

2.535 23.701 0.343 0.927 0.883 2.535 8.099 9.084 30.374 0.186 5.366 0.645 0.865 58.236 1.003 30.440 0.423 4.850 6.800 26.113 6.626 3.853 0.426 33.476 1.815 0.735 0.286 20.076 10.044 10.375 9.121 6.180 2.581 0.275 1.133

3.837 27.568 0.490 1.323 2.195 3.837 11.334 12.029 23.160 0.246 9.056 0.798 1.291 45.857 1.381 22.909 0.492 7.090 3.457 19.651 8.803 5.686 0.492 24.106 2.355 0.840 0.364 14.882 7.227 13.960 11.858 9.397 3.657 0.393 1.474

1.5 1.2 1.4 1.4 2.5 1.5 1.4 1.3 0.8 1.3 1.7 1.2 1.5 0.8 1.4 0.8 1.2 1.5 0.5 0.8 1.3 1.5 1.2 0.7 1.3 1.1 1.3 0.7 0.7 1.3 1.3 1.5 1.4 1.4 1.3

samples. Using our classifiers, 89.5 to 100% of the early stage lung cancer samples were correctly classified (data not shown).

NSCLC Prognosis We next searched for a correlation between a multiprotein signature and the clinical outcome of patients. We first selected a homogeneous population of 87 stage III–IV NSCLC patients receiving conventional first line chemotherapy with exhaustive clinical and biologic pretherapeutic prognostic data and at least a 1 year follow-up duration. Using our three-step statistical test, we constructed a predictive survival classifier composed of only one protein peak at 4628 Da (fraction 5), which can predict by itself the clinical outcome of patients (Figure 3). The robustness and performance of the predictive value of this potential marker were verified with our cross-validation (Table 3). The protein peak at 4628 Da divided patients into a group with poor prognosis (overall

844

survival shorter than 12 months, n ⫽ 33) and one with good prognosis (overall survival longer than 12 months, n ⫽ 54) with a highly significant difference (p ⬍ 0.0001 according to the Log-rank test, Figure 4) and a peak intensity threshold of 32.25. Finally, we estimated the performance value of this new prognostic factor together with classic clinical and biologic features in this population (Table 5). In multivariate analysis, four variables were independent determinants of a poor outcome: stage grouping, age, performance status, and the 4628 Da marker (Table 6). Interestingly, the 4628 Da marker showed the strongest statistically significant association with clinical outcome (Hazard Ratio ⫽ 3.45 [95% confidence interval 1.22– 6.14], p ⫽ 0.0005). It is worth noting that this marker remained as the only biologic variable statistically associated with a poor outcome, in contrast to the classic NSCLC tumor markers.

Copyright © 2008 by the International Association for the Study of Lung Cancer

Journal of Thoracic Oncology • Volume 3, Number 8, August 2008 Serum Proteomic Profiling of Lung Cancer in High-Risk Groups

FIGURE 1. Representative SELDI-TOF spectra obtained from chronic lung disease and lung cancer serum samples. (A) Overlay of protein mass spectra. Protein mass spectra obtained from sera of patients with lung cancer (red) and noncancer individuals (blue) are superimposed. Differential variations in intensity indicate potential markers. (B) Representative spectra from 3 patients with chronic lung disease (upper panel) and 3 patients with lung cancer (lower panel); the depicted peaks were identified by statistical analysis as optimally discriminatory. Frames indicate the positions of 3 peaks at 5916, 6100 and 6122 Da, respectively, overexpressed in fraction 1 of lung cancer patients (right panel) and one peak at 7172 Da underexpressed in fraction 6 of lung cancer patients.

DISCUSSION A possible development to improve the early diagnosis of the lung cancer could be to simultaneously screen for multiple markers. Using SELDI-TOF, we sought to prospectively classify lung cancer patients and smoking individuals with benign lung disease on the basis of the mass protein spectra of their sera and found that simple profiles can statistically significantly predict in our population the cancer status of patients and their survival. Proteomic fingerprinting by SELDI-TOF has been successfully applied in identifying early detection markers in multiple cancers.16,19 –25 However, few reports have shown that serum proteomic fingerprints by SELDI-TOF or other proteomic approaches, can distinguish lung cancer patients from control subjects.26 –37 Indeed, these studies have been limited by the use of only some histologic forms of lung cancer and the use of healthy subjects as a control population, which may induce selection bias. In contrast, and for the first time to our knowledge, our study was based on an unselected population of lung cancer patients, including patients suffering from large cell carcinoma, a population excluded in previously studies. Moreover, since active or second hand tobacco smoking is responsible for approximately 87% of all

FIGURE 2. ROC curves for the lung cancer multiprotein index and both classic lung cancer markers (NSE and Cyfra 21-1). ROC curves for the forward (A) and the stepwise (B) multiprotein indices, as well as for the composite index of NSE and Cyfra 21-1, are plotted. The areas under the ROC curves are 0.991 (A), 0.994 (B) and 0.828 (C) and are compared with the individual markers.

lung cancer, we selected our control population to include patients at risk for lung cancer, such as patients suffering from tobacco-induced or tobacco-worsened chronic lung diseases. One major advantage of both the pathologic and control populations used here is that they allowed for analysis of the most realistic situation for a screening setting that could be proposed in clinical practice.

Copyright © 2008 by the International Association for the Study of Lung Cancer

845

Journal of Thoracic Oncology • Volume 3, Number 8, August 2008

Jacot et al.

TABLE 3. Diagnostic and Prognostic Performance of Classifiers Tested Population Model Diagnostic index Forward multiprotein index Support vector machine Linear discriminant analysis Quadratic discriminant analysis Neural networks Classification tree Boosting tree Stepwise multiprotein index Support vector machine Linear discriminant analysis Quadratic discriminant analysis Neural networks Classification tree Boosting tree Prognostic index Support vector machine Linear discriminant analysis Quadratic discriminant analysis Neural networks Classification tree Boosting tree

High Risk Population*

Accuracy

Sensitivity

Specificity

PPV

NPV

PPV

NPV

92.8 93.1 91.7 94.2 82.3 81.4

95.1 94.3 94.1 97.7 91.6 98.1

79.8 85.9 78.0 74.0 28.6 16.8

96.5 97.5 96.1 95.6 88.1 82.0

73.6 72.4 69.4 84.8 36.9 70.0

13.5 18.1 12.4 11.0 4.1 3.8

99.8 99.8 99.7 99.9 99.0 99.6

93.5 93.6 90.9 95.7 82.3 86.5

96.4 95.3 94.6 98.8 91.6 95.9

76.3 83.7 69.8 77.8 28.3 38.1

95.9 97.1 94.8 96.3 88.1 88.9

78.6 75.4 69.0 91.9 36.8 64.5

11.8 16.2 9.4 12.8 4.1 4.9

99.8 99.8 99.7 99.9 99.0 99.6

73.4 74.8 69.6 72.7 68.8 72.5

71.0 82.0 86.0 73.0 68.0 75.0

76.6 65.2 49.3 72.7 70.1 69.0

79.1 74.7 67.9 76.9 73.9 75.2

67.7 74.8 73.7 68.0 63.5 69.1

— — — — — —

— — — — — —

*Positive Predictive Value (PPV) and Negative Predictive Value (NPV) were estimated for the tested population and for a high-risk population (with a prevalence of 3.4%).

The most important steps in proteomic studies are the differential and statistical analysis of all the data from spectra. To analyze our data, we searched for a combination of methods that could reliably detect lung cancer in a high-risk population. We developed a three-step strategy to extract combinations of markers and to evaluate the robustness of these classifiers. Our multivariate analysis first identified combinations of seven and eight protein peaks with AUCs of 0.991 and 0.994, respectively. Then, since there is no “goldstandard” method for classification of mass spectrometry data, we were interested in testing the validity and robustness of these multiprotein classifiers by using different classification methods. We compared the performance of these various supervised classification methods by the standard leave-andout cross-validation method. Interestingly, we observed that once the most discriminating markers were selected, the results for sensitivity, specificity, and accuracy can be radically different from one classification method to another. From our data, LDA, associated with the forward multiprotein classifier, gave the best performance result to discriminate lung cancer and benign samples, achieving an overall classification accuracy of 93.1, a sensitivity of 94.3, and a specificity of 85.9. Using the most common decision tree classification method, previous studies comparing healthy subjects and lung cancer patients showed similar sensitivities and specificities to those reported here.32,33 However, using our data, methods based on decision tree classification meth-

846

ods failed to correctly classify individuals from our control group with specificities lower than 38%. These results can probably be largely explained by the fact that we have chosen high-risk individuals as a control group, whereas the classification methods based on a decision tree used thresholds. It would be interesting to ensure a medium term follow-up of this control population to rule out the possibility of infraclinical and radiologic cancer development, which should allow the reallocation of some of these individuals into the correct group. Finally, and importantly, the choice of a classification method will depend on the data, the preselection method used and the problem that has to be solved. Therefore, we believe that the LDA remained the most robust method, allowing for accurate detection of lung cancer in a high-risk population. One of the principal aims of research in lung cancer is to find efficient screening or early diagnosis methods, which would enable earlier treatment by curative tumor resection and improve the outcome of patients. Recently, screening trials using low-dose spiral CT have encouraged a resurgence of interest in the prevention of lung cancer.5,7,8,38 However, a recent review reported a positive CT examination rate from 5.1 to 51%, whereas lung cancer prevalence in the high-risk population screened was ⱕ3.2%.39 The high resolution of the low-dose spiral CT thus seems to detect very small lesions, leading to overdiagnosis and thus invasive procedures for benign lesions. In this context, and to reduce the unnecessary

Copyright © 2008 by the International Association for the Study of Lung Cancer

Journal of Thoracic Oncology • Volume 3, Number 8, August 2008 Serum Proteomic Profiling of Lung Cancer in High-Risk Groups

TABLE 4. Robustness of Cross-Validation

Model Forward multiprotein index Support vector machine Linear discriminant analysis Quadratic discriminant analysis Neural networks

Classification tree

Boosting tree

Stepwise multiprotein index Support vector machine Linear discriminant analysis Quadratic discriminant analysis Neural networks

Classification tree

Boosting tree

Training Sample Size (%) Accuracy Sensitivity Specificity

80 60 40 80 60 40 80 60 40 80 60 40 80 60 40 80 60 40

92.8 92.2 91.4 93.1 92.8 92.3 91.7 91.6 91.3 94.2 94.2 92.8 82.3 82.7 82.3 81.4 85.4 85.6

95.1 94.6 94.6 94.3 93.8 93.2 94.1 94.9 99.2 97.7 97.5 97.5 91.6 90.9 91.2 98.1 98.2 98.7

79.8 77.5 71.2 85.9 86.2 85.6 78.0 71.4 45.6 74.0 73.2 65.5 28.6 30.0 26.5 16.8 22.4 22.0

80 60 40 80 60 40 80 60 40 80 60 40 80 60 40 80 60 40

93.5 93.7 93.4 93.6 93.4 92.3 90.9 91.0 91.4 95.7 97.2 95.7 82.3 83.2 82.9 86.5 93.7 94.3

96.4 96.4 97.0 95.3 94.5 93.4 94.6 95.7 97.4 98.8 98.4 97.9 91.6 91.6 92.1 95.9 96.0 95.9

76.3 73.8 66.8 83.7 86.0 84.8 69.8 60.9 30.3 77.8 86.4 78.4 28.3 29.4 25.2 38.1 55.9 58.1

use of such procedures, a screen based upon our results could be employed before a spiral CT scan as a two-step screening policy. Considering the sensitivity and specificity of our proteomic profiles and the expected prevalence of lung cancer in the high-risk population, one can expect an extrapolated PPV of 18.11% and a NPV of 99.78%. The herein reported study included a low proportion of patients suffering from early stage NSCLC precluding any firm conclusion regarding the putative ability to differentiate this population subset from patients suffering from nonmalignant pulmonary diseases and other malignancies. The further step of our research program

FIGURE 3. Representative view of the predictor of clinical outcome. (A) Overlay of protein mass spectra. Protein mass spectra obtained from sera of patients with good prognoses (red) and poor prognoses (blue) are superimposed. (B) Representative spectra from 3 patients with good prognoses (upper panel) and 3 patients with poor prognoses (lower panel). Frames indicate the position of the 4628 Da prognostic protein peak.

will take into account this important methodological point by focusing on early stage NSCLC. Because of the high NPV, the negative test population could be spared of invasive or irradiating procedures. Patients with suspected lung cancer should be offered a thoracic CT scan to further progress into the diagnostic procedure. A considerable amount of both clinical and preclinical research has focused on the role of prognostic factors in the multidisciplinary care of patients affected by NSCLC. Up to now, the most widely-accepted prognostic determinants of NSCLC are disease stage and performance status.3,40 The prognostic impact of classic tumor markers has been evaluated, but their use needs to be improved.13–15 In the present study, the prognostic utility of the tumor marker identified seems to be superior to the one produced by classic tumor markers. In the Cox model, the 4628 Da protein peak was found to be the only statistically significant independent biologic prognostic variable. Interestingly, classic clinical

Copyright © 2008 by the International Association for the Study of Lung Cancer

847

Jacot et al.

Journal of Thoracic Oncology • Volume 3, Number 8, August 2008

FIGURE 4. Kaplan–Meier survival curves from groups with good and poor prognosis according to the serum protein-based classification. Circles represent censored patients. p ⬍ 0.0001 according to Log-rank test.

variables associated with poor prognosis in a NSCLC population, namely stage grouping, age, and performance status, were also retained in the Cox model, arguing for the conventional characteristics of the studied population. A better definition of prognostic determinants affecting a population of NSCLC patients may be important in both clinical trials and routine practice,41– 44 either to allow a better stratification of patients in clinical trials or to consider different treatments depending on the overall prognosis of a patient. Further identification and determination whether or not this 4628 Da protein is a therapeutic or surrogate biomarker will be of importance. Validation in an independent population, of a more discriminating tumor marker, such as the 4628 Da protein peak, could then improve clinical management of this population. The majority of newly diagnosed NSCLC patients are affected by advanced stage and survival improvements are slow despite recent introduction of targeted therapies in this setting. Therefore, one can hypothesize that improving tools able to define outcome will also help in designing randomized therapeutic studies with accurate stratification variables. There are some statistical limitations in this study that need to be addressed. First, the relatively low specificity obtained with our data could be explained by the strong imbalance in the size of both sample groups or by the choice of individuals with a high risk of cancer as control group. Second, although we did not have enough samples to perform an independent validation test, we have used a rigorous and well-accepted cross-validation method to estimate the classi-

848

fication accuracy of our multiprotein classifiers. Even with a training set of 40% instead of 80%, as described, all the selection methods were stable and the LDA remained the most robust method. However, validation studies are mandatory to validate the protein peak combinations’ accuracies in independent populations. This further validation study will need a larger cohort of patients suffering from non malignant pulmonary diseases to ascertain our proteomic profile reliability. Nevertheless, our data strongly illustrate the strength of serum profiling studies with screening/diagnosis and prognosis objectives. In conclusion, we found that serum SELDI protein profiling can distinguish lung cancer patients from patients affected by benign lung diseases with higher sensitivity and specificity than any other molecular markers identified to date. A validation study of the discriminating value of this set of proteins remains mandatory. Although the incidence of lung cancer does not yet justify a screening policy for the general population, the multiprotein index developed in this study may have clinical importance in a high-risk population. We also found a specific serum protein that can be used to identify patients with good or poor prognosis. Altogether, these results are promising in terms of identification of new predictive and early diagnosis markers of lung cancer. The identification and development of specific tests against selected proteins will be needed to enable routine testing of the performance of these markers in comparison with other classic tumor markers.

Copyright © 2008 by the International Association for the Study of Lung Cancer

Journal of Thoracic Oncology • Volume 3, Number 8, August 2008 Serum Proteomic Profiling of Lung Cancer in High-Risk Groups

ACKNOWLEDGMENTS

TABLE 5. Univariate Analysis of Clinical and Biological Parameters Variable Age (year) ⱕ60 ⬎60 Gender Male Female ECOG performance status ⬍2 ⱖ2 Stage grouping III IV Histology Adenocarcinoma Nonadenocarcinoma Cyfra 21-1 ⬍3.6 ng/ml ⱖ3.6 ng/ml NSE ⱕ12.5 ng/ml ⬎12.5 ng/ml Blood leukocyte ⱕ1010/l ⬎1010/l Blood lymphocyte ⬍109/l ⱖ109/l Alkaline phosphatase Normal Increased LDH Normal Increased Serum albumin ⬍35 g/l ⱖ35 g/l 4628 Da peak intensity ⱕ32.25 u ⬎32.25 u

Median Survival (months 关95% CI兴)

p (Log-rank)

10.7 关7.1–16兴 18.9 关13.6–NR兴

0.09

REFERENCES

16.3 关9.6–NR兴 14 关9.9–18.9兴

0.31

19.5 关13–NR兴 3.9 关2.5–10.8兴

⬍ 0.0001

NR 关17.7–NR兴 11 关7–14.8兴

0.002

16.3 关12.9–NR兴 19.5 关17.7–NR兴

0.08

NR 关16–NR兴 9.8 关5.4–13兴

0.0002

17.7 关13–NR兴 10.8 关7.1–NR兴

0.25

19.5 关16–NR兴 7.7 关3.8–12.9兴

0.0002

13.6 关4.1–NR兴 13.6 关4.1–NR兴

0.85

17.7 关13–NR兴 10 关3.5–12.9兴

0.02

16 关11.8–NR兴 6.5 关3.2–17.7兴

0.02

4.3 关2–16.3兴 16 关11.2–NR兴

0.01

NR 关16.3–NR兴 6 关3.8–10.3兴

⬍ 0.0001

Numbers in bracket in column 2 show the 95% confidence interval (95% CI). NR, not reach.

TABLE 6. Cox Proportional Hazards Multivariate Analyses Variable 4628 Da protein peak Stage grouping Age ECOG performance status

This work was supported in part by grants from the Comite´ de l’He´rault de la Ligue National Contre le Cancer. The authors thank Pr G. Favre and Dr C. Allal for the use of the PBSIIc ProteinChip reader.

Hazard Ratio 关95% CI兴

p

3.45 关1.22–6.14兴 2.74 关1.72–6.93兴 2.19 关1.16–4.15兴 2.14 关1.06–4.30兴

0.0005 0.01 0.02 0.03

1. Janssen-Heijnen ML, Coebergh JW. The changing epidemiology of lung cancer in Europe. Lung Cancer 2003;41:245–258. 2. Humphrey LL, Teutsch S, Johnson M. Lung cancer screening with sputum cytologic examination, chest radiography, and computed tomography: an update for the U.S. Preventive Services Task Force. Ann Intern Med 2004;140:740 –753. 3. Mountain CF. Revisions in the International system for staging lung cancer. Chest 1997;111:1710 –1717. 4. Muley T, Dienemann H, Ebert W. CYFRA 21-1 and CEA are independent prognostic factors in 153 operated stage I NSCLC patients. Anticancer Res 2004;24:1953–1956. 5. Henschke CI, Yankelevitz DF, Libby DM, et al. Survival of patients with stage I lung cancer detected on CT screening. N Engl J Med 2006;355:1763–1771. 6. Marcus PM, Bergstralh EJ, Fagerstrom RM, et al. Lung cancer mortality in the Mayo Lung Project: impact of extended follow-up. J Natl Cancer Inst 2000;92:1308 –1316. 7. Henschke CI, McCauley DI, Yankelevitz DF, et al. Early Lung Cancer Action Project: overall design and findings from baseline screening. Lancet 1999;354:99 –105. 8. Bach PB, Jett JR, Pastorino U, et al. Computed tomography screening and lung cancer outcomes. JAMA 2007;297:953–961. 9. Bates J, Rutherford R, Divilly M, et al. Clinical value of CYFRA 21. 1, carcinoembryonic antigen, neurone-specific enolase, tissue polypeptide specific antigen and tissue polypeptide antigen in the diagnosis of lung cancer. Eur Respir J 1997;10:2535–2538. 10. Solassol J, Jacot W, Lhermitte L, et al. Clinical proteomics and mass spectrometry profiling for cancer detection. Expert Rev Proteomics 2006;3:311–320. 11. Solassol J, Marin P, Demettre E, et al. Proteomic detection of prostatespecific antigen using a serum fractionation procedure: potential implication for new low-abundance cancer biomarkers detection. Anal Biochem 2005;338:26 –31. 12. Zubrod C, Schneiderman M, Frei EJ. Appraisal of methods for the study of chemotherapy of cancer in man: comparative therapeutic trial of nitrogen mustard an triethylene thiophosphoramide. J Chronic Dis 1960;11:7–33. 13. Pujol JL, Grenier J, Daures JP, et al. Serum fragment of cytokeratin subunit 19 measured by CYFRA 21-1 immunoradiometric assay as a marker of lung cancer. Cancer Res 1993;53:61– 66. 14. Pujol JL, Boher JM, Grenier J, et al. Cyfra 21-1, neuron specific enolase and prognosis of non-small cell lung cancer: prospective study in 621 patients. Lung Cancer 2001;31:221–231. 15. Jorgensen LG, Osterlind K, Genolla J, et al. Serum neuron-specific enolase (S-NSE) and the prognosis in small-cell lung cancer (SCLC): a combined multivariable analysis on data from nine centres. Br J Cancer 1996;74:463– 467. 16. Petricoin EF, Ardekani AM, Hitt BA, et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet 2002;359:572–577. 17. Rai AJ, Stemmer PM, Zhang Z, et al. Analysis of Human Proteome Organization Plasma Proteome Project (HUPO PPP) reference specimens using surface enhanced laser desorption/ionization-time of flight (SELDI-TOF) mass spectrometry: multi-institution correlation of spectra and identification of biomarkers. Proteomics 2005;5:3467–3474. 18. Semmes OJ, Feng Z, Adam BL, et al. Evaluation of serum protein profiling by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry for the detection of prostate cancer: I. Assessment of platform reproducibility. Clin Chem 2005;51:102–112. 19. Zhang Z, Bast RC Jr, Yu Y, et al. Three biomarkers identified from serum proteomic analysis for the detection of early stage ovarian cancer. Cancer Res 2004;64:5882–5890. 20. Yu Y, Chen S, Wang LS, et al. Prediction of pancreatic cancer by serum

Copyright © 2008 by the International Association for the Study of Lung Cancer

849

Jacot et al.

21.

22.

23.

24.

25.

26.

27.

28.

29.

30.

31.

Journal of Thoracic Oncology • Volume 3, Number 8, August 2008

biomarkers using surface-enhanced laser desorption/ionization-based decision tree classification. Oncology 2005;68:79 – 86. Kozak KR, Amneus MW, Pusey SM, et al. Identification of biomarkers for ovarian cancer using strong anion-exchange ProteinChips: potential use in diagnosis and prognosis. Proc Natl Acad Sci USA 2003;100: 12343–12348. Adam BL, Qu Y, Davis JW, et al. Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res 2002;62: 3609 –3614. Koopmann J, Zhang Z, White N, et al. Serum diagnosis of pancreatic adenocarcinoma using surface-enhanced laser desorption and ionization mass spectrometry. Clin Cancer Res 2004;10:860 – 868. Li J, Zhang Z, Rosenzweig J, et al. Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clin Chem 2002;48:1296 –1304. Mathelin C, Cromer A, Wendling C, et al. Serum biomarkers for detection of breast cancers: a prospective study. Breast Cancer Res Treat 2005:1– 8. Ali IU, Xiao Z, Malone W, et al. Plasma proteomic profiling: search for lung cancer diagnostic and early detection markers. Oncol Rep 2006;15: 1367–1372. Fujii K, Nakano T, Kanazawa M, et al. Clinical-scale high-throughput human plasma proteome analysis: lung adenocarcinoma. Proteomics 2005;5:1150 –1159. Li Y, Dang TA, Shen J, et al. Identification of a plasma proteomic signature to distinguish pediatric osteosarcoma from benign osteochondroma. Proteomics 2006;6:3426 –3435. Maciel CM, Junqueira M, Paschoal ME, et al. Differential proteomic serum pattern of low molecular weight proteins expressed by adenocarcinoma lung cancer patients. J Exp Ther Oncol 2005;5:31–38. Okano T, Kondo T, Kakisaka T, et al. Plasma proteomics of lung cancer by a linkage of multi-dimensional liquid chromatography and two-dimensional difference gel electrophoresis. Proteomics 2006;6:3938 –3948. Zhukov TA, Johanson RA, Cantor AB, et al. Discovery of distinct protein profiles specific for lung tumors and pre-malignant lung lesions by SELDI mass spectrometry. Lung Cancer 2003;40:267–279.

850

32. Xiao X, Liu D, Tang Y, et al. Development of proteomic patterns for detecting lung cancer. Dis Markers 2003;19:33–39. 33. Yang SY, Xiao XY, Zhang WG, et al. Application of serum SELDI proteomic patterns in diagnosis of lung cancer. BMC Cancer 2005;5:83. 34. Sidransky D, Irizarry R, Califano JA, et al. Serum protein MALDI profiling to distinguish upper aerodigestive tract cancer patients from control subjects. J Natl Cancer Inst 2003;95:1711–1717. 35. Howard BA, Wang MZ, Campa MJ, et al. Identification and validation of a potential lung cancer serum biomarker detected by matrix-assisted laser desorption/ionization-time of flight spectra analysis. Proteomics 2003;3:1720 –1724. 36. Wang MZ, Howard B, Campa MJ, et al. Analysis of human serum proteins by liquid phase isoelectric focusing and matrix-assisted laser desorption/ ionization-mass spectrometry. Proteomics 2003;3:1661–1666. 37. Yildiz PB, Shyr Y, Rahman JS, et al. Diagnostic accuracy of MALDI mass spectrometric analysis of unfractionated serum in lung cancer. J Thorac Oncol 2007;2:893–901. 38. Kaneko M, Kusumoto M, Kobayashi T, et al. Computed tomography screening for lung carcinoma in Japan. Cancer 2000;89:2485–2488. 39. Black C, Bagust A, Boland A, et al. The clinical effectiveness and cost-effectiveness of computed tomography screening for lung cancer: systematic reviews. Health Technol Assess 2006;10:iii–iv, ix–x, 1–90. 40. Brechot JM, Chevret S, Charpentier MC, et al. Blood vessel and lymphatic vessel invasion in resected nonsmall cell lung carcinoma. Correlation with TNM stage and disease free and overall survival. Cancer 1996;78:2111–2118. 41. Charloux A, Hedelin G, Dietemann A, et al. Prognostic value of histology in patients with non-small cell lung cancer. Lung Cancer 1997;17:123–134. 42. Komaki R, Pajak TF, Byhardt RW, et al. Analysis of early and late deaths on RTOG non-small cell carcinoma of the lung trials: comparison with CALGB 8433. Lung Cancer 1993;10:189 –197. 43. Merrill RM, Henson DE, Barnes M. Conditional survival among patients with carcinoma of the lung. Chest 1999;116:697–703. 44. Paesmans M, Sculier JP, Libert P, et al. Response to chemotherapy has predictive value for further survival of patients with advanced non-small cell lung cancer: 10 years experience of the European Lung Cancer Working Party. Eur J Cancer 1997;33:2326 –2332.

Copyright © 2008 by the International Association for the Study of Lung Cancer