The surgical efficiency score: a feasible, reliable, and valid

alpha 0.80) but only moderate for small bowel anastomosis (0.59). Conclusions: ... The need for objective measures of surgical ability has been increasingly ... examination uses between 30 and 100 trained examiners to ... (equivalent to postgraduate year 3-5), 7 senior specialist ... Careful handling of tissue ..... explain this.
202KB taille 2 téléchargements 203 vues
The American Journal of Surgery 192 (2006) 372–378

Surgical education

The surgical efficiency score: a feasible, reliable, and valid method of skills assessment Vivek Datta, M.D., F.R.C.S.*, Simon Bann, M.D., F.R.C.S., Mirren Mandalia, M.R.C.S., Ara Darzi, K.B.E., M.D., F.R.C.S., F.R.C.S.I., F.A.C.S., F.R.C.P.S.G., F.Med.Sci. Department of Surgical Oncology and Technology, Imperial College, 83 Sumatra Road, West Hampstead, London NW6 1PT, United Kingdom Manuscript received December 23, 2005; revised manuscript June 4, 2006

Abstract Background: Technical skills assessments are being increasingly used in surgical residency programs, with the objectivity and validity of several techniques well established. However, many of these methods are labor and time intensive, limiting their feasibility. This study aims to compare more efficient techniques of skills appraisals with an established gold standard. Methods: Thirty surgeons completed 2 previously validated laboratory-based surgical models: small bowel anastomosis and vein patch insertion. Gold standard evaluation was the Objective Structured Assessment of Technical Skills (OSATS) method. “Efficient” techniques used were (1) quality of final product (FP); (2) snapshot assessment (SS), in which task performance was edited to a 2-minute sound bite and scored with OSATS; and (3) the surgical efficiency score (SES), a combination of final product quality and hand-motion analysis. All human observer evaluations used retrospective video analysis with 3 trained observers. Nonparametric tests were used to analyze the results. Results: With respect to small bowel anastomosis, correlations with OSATS were as follows: FP 0.341 (P ⫽ .07), SS 0.577 (P ⬍ .001), and SES 0.842 (P ⬍ .001). For vein patch insertion, the correlations were as follows: FP 0.545 (P ⫽ .001), SS 0.609 (P ⬍ .001), and SES 0.700 (P ⬍ .001). Interobserver concordance was high for both models with respect to FP (Cronbach’s alpha 0.80 for small bowel anastomosis and 0.84 for vein patch insertion). With respect to SS, interobserver reliability was high for vein patch insertion (Cronbach’s alpha 0.80) but only moderate for small bowel anastomosis (0.59). Conclusions: The surgical efficiency score and snap shot assessments both show significant correlations with the traditional OSATS appraisals and suggest that skills assessment can be made more feasible. Correlations were closer with the former and interobserver concordance more variable with the latter, suggesting the surgical efficiency score as the most reliable of the methods evaluated. © 2006 Excerpta Medica Inc. All rights reserved. Keywords: Objective assessment; Surgery; OSATS; Motion analysis; Technical skill; Simulation

The need for objective measures of surgical ability has been increasingly important in recent years. Several techniques have been shown to be reliable and valid evaluators of skill and are now used in residency training programs to monitor performance and progress [1–3]. Both human observer as well as computer-based assessments have been used [4 – 6]. The Objective Structured Assessment of Technical Skills (OSATS) technique can be regarded as one of the “gold standards” of technical skills assessment [6 –10]. However, one of the main drawbacks of this technique, or of any

* Corresponding author. Tel.: ⫹44-207-886-1310; fax: ⫹44-207-8861810. E-mail address: [email protected]

observer-based method, is the extensive commitment, both human and time, needed to carry out such an appraisal, which is exaggerated by using multiple examiners [1,2,7]. For example, any particular OSATS multiple-bench test examination uses between 30 and 100 trained examiners to assess candidates [7,10]. In addition, live appraisal does introduce the factor of possible bias toward a candidate by an assessor. Retrospective analysis of video-recorded performance has been used to try and address these problems, both by reducing the number of observers needed and blinding candidates’ identity, with very good interexaminer correlations being shown [2,11]. However, even this process is still very time consuming and labor intensive. The problem arises as to how to engineer down the process of skills assessment to a more feasible level.

0002-9610/06/$ – see front matter © 2006 Excerpta Medica Inc. All rights reserved. doi:10.1016/j.amjsurg.2006.06.001

V. Datta et al. / The American Journal of Surgery 192 (2006) 372–378

Hand-motion analysis is a truly objective measure of skill that has been extensively validated and been shown to correlate closely with other (human) evaluators of ability [12–15]. The system does not measure the quality of performance but has been shown to correlate closely with systems that do. It has been proposed that surgical technical proficiency ⫽ task efficiency ⫻ task quality. Used in isolation, motion analysis is an efficient and effective measure of task efficiency, although it gives no measure of the quality of the finished article, even though a strong relationship with OSATS has been shown [14]. Therefore, a combination of movement efficiency with analysis of the quality of the final product may give a very accurate and effective assessment of technical proficiency. A second possible method is rather than evaluate the whole task from start to finish, analyze only the key or technically more difficult parts of a procedure as well as the overall completed product. These elements can easily be isolated from video recordings of procedures and obviously reduce the time needed to perform an OSATS assessment. Finally, does just measuring the quality of final product alone give an efficient, easy-to-measure gauge of skill not necessitating complex assessments of methodology? These techniques are possible ways of performing technical skills assessment more efficiently, thus making the process more feasible. The aim of this study is to determine which of these methods best reflects actual performance as measured by one of the current “gold standard” analyses.

Methods Two laboratory-based tasks were used, both using synthetic simulations of surgery. Both models have been shown to be valid discriminators of skill in previous studies [14,16]. The first was a small bowel anastomosis (Limbs & Things, Bristol, UK), using an interrupted, single-layer seromuscular technique (Fig. 1). The model has a luminal diameter of 20 mm and 3-0 polyglycan (Polysorb, Autosuture, USA) suture material was used for all anastomoses. The second task was more technically challenging; a vein patch was inserted into artery (both Annexart Angelsey, UK) at depth with restricted access by using a continuous suturing method. Double-ended 5-0 polypropylene (Surgipro Autosuture, USA) was used as the suture material. Both the artery and vein patch were prepared beforehand. A precut 30-mm longitudinal arteriotomy was made in the artery (10-mm diameter), and the patch was shaped according to a predesigned template 40 mm in length. Both of these standardized techniques are taught in the surgical training course in the United Kingdom, and all subjects had previous experience of these [2]. Synthetic, rather than cadaveric or animal, models were used to allow exact replication of conditions for each subject and therefore ensure validity when making comparisons. Each participant was

373

Fig. 1. Small bowel anastomosis showing video recording set up and ICSAD hand motion tracking.

given 1 attempt at performing both tasks. Previous studies with these models of surgery have shown that performance does not significantly improve with repeated trials [14,15]. Participants Thirty general surgeons and trainees were recruited for the study. These were as follows: 8 basic surgical trainees (equivalent to postgraduate year 1-2), 8 junior specialist trainees, in the first three years of higher surgical training (equivalent to postgraduate year 3-5), 7 senior specialist trainees in the last 2 years of subspecialty training (equivalent to fellows), and 7 consultant general surgeons (equivalent to attending surgeons). All participants had at least some experience of the surgical methodology. Candidates performed both tasks together, individually in the laboratory, with data collection taking 3 weeks. Assessment of performance OSATS This consists of 2 components, a structured step-by-step checklist and an 8-parameter global rating (each scored 1-5, maximum score 40). Previous studies have revealed the global rating score to be the more accurate discriminator of performance, especially in more experienced candidates, and therefore checklist assessments were not used in this study [2,7–10,14]. The global rating parameters are shown in Fig. 2. Candidates were videotaped performing each procedure, with only hands and arms within the capture and so blind assessors to their identity. Three trained examiners, all of whom were equivalent to attending surgeons and were experienced in OSATS assessments, performed their evaluations independently of each other.

374

V. Datta et al. / The American Journal of Surgery 192 (2006) 372–378

GLOBAL RATING EVALUATION OF PERFORMANCE Please circle the number corresponding to the candidate’s performance regardless of their level of training

Respect for tissue

1

2

Frequently used unnecessary force on tissue or cause damage by inappropriate instrument use

3

4

Careful handling of tissue but occasionally caused inadvertent damage

5 Consistently handled appropriately with minimal damage to tissue

_________________________________________________________________________________________________________________________________

Time and motion

1

2

Many unnecessary moves

3

4

Efficient time & motion but some unnecessary moves

5 Clear economy of movement and maximum efficiency

_________________________________________________________________________________________________________________________________

Instrument handling

1

2

Repeatedly makes awkward or tentative moves with instruments through inappropriate use

3

4

Competent use of instruments but occasionally appeared stiff or awkward

5 Fluid movements with instruments and no stiffness or awkwardness

_________________________________________________________________________________________________________________________________

Suture handling

1

2

3

4

5

Awkward and unsure with Careful and slow with majority Excellent suture control repeated entanglement, of knots placed correctly with with correct placement of poor knot tying and inability appropriate tension knots and correct tension to maintain tension _________________________________________________________________________________________________

Flow of operation

1

2

3

4

5

Frequently stopped operating Demonstrated some forward Obviously planned and seemed unsure of planning and reasonable operation with efficiency next move progression of procedure from one move to another _________________________________________________________________________________________________

Knowledge of procedure

1

2

3

4

5

Insufficient knowledge Knew all important steps Demonstrated familiarity Looked unsure and hesitant of operation with all steps of operation _________________________________________________________________________________________________

Overall performance

1

2

3

4

5

Very poor Competent Clearly superior _________________________________________________________________________________________________

Quality of final product

1 Very poor

2

3

4

Competent

5 Clearly superior

Fig. 2. The global rating scale from the OSATS [9].

Snapshot assessment This refers to an OSATS evaluation of selected parts of each procedure. The video recordings of performance were edited to down to 2 minutes or so per task, focusing on the key and technically most difficult parts of each procedure. For the small bowel model, the 3 interrupted sutures at each

corner of the anastomosis were recorded. Clinically, this is the point in which the majority of anastomosis leaks occur. For the vein patch insertion task, the 3 patch/artery bites at the apex of the graft were recorded, which is the most technically difficult part of the procedure. The final six throw knot was also included in the “snapshot.” In addition,

V. Datta et al. / The American Journal of Surgery 192 (2006) 372–378

the final completed task for both models was recorded. As before, 3 examiners used an 8-parameter global rating evaluation to assess performance.

375

used. The factor of 1,000 was included in the equation to make the final ratings easier to interpret. Statistical methods

Final product core The quality of final product, from the global rating evaluations, was scored 1 to 5. Once again, examiners used retrospective, video-captured data and were blinded to candidates’ identity. Motion analysis Motion analysis was performed by using the Imperial College Surgical Assessment Device (ICSAD). This is a combination of a commercially available electromagnetic tracking system (Isotrak II; Polhemus Inc, Chester, VT) and a bespoke computer software program. The latter has been developed within Imperial College by our department in conjunction with the Department of Computing. A single 10-mm electromagnetic tracker is attached to the dorsum of each hand, at the midshaft point of the third metacarpal. Surgical latex gloves are then worn to help secure the trackers, ensure no unwanted slippage, and mimic real life conditions. The system collects the x,y,z Cartesian coordinate information from each tracker at a resolution of 1 mm and frequency of 20 Hz. These raw 3-dimensional positional data collated from the tracking system are transferred to computer and extrapolated into 3 scores of dexterity by the custom software. These are (1) the number of movements made by each hand (an individual movement being defined as a change in velocity), (2) the path length of hand/instrument travel, and (3) the time taken to complete the procedure. The software program includes a Gaussian filter, which eliminates background noise, ensuring that only purposeful actions are recorded. The system has been validated and used extensively in objective measures of open and laparoscopic surgery [11–16]. Surgical efficiency score This quotient was determined by calculating the task efficiency with the quality of the final product. For both tasks, ICSAD was used to calculate the total number of hand movements made; data collection was performed as before. In addition, each completed procedure was scored 1 to 5 for the quality of the final product (5 indicating superior performance) by 3 expert examiners appraised retrospectively from video-recorded images. The surgical technical proficiency rating was calculated as follows: Surgical efficiency score ⫽

End product quality score No. of hand movements

⫻ 1000

Previous studies have shown that the number of hand movements needed to complete the task decreases with increase experience and expertise. Therefore, an inverse quotient is

Because the data did not exhibit a normal distribution, nonparametric tests were used for all analyses. Spearman nonparametric coefficients were used to determine correlations between datasets. Cronbach’s alpha was used to determine inter observer reliability for the global ratings. A P ⬍.05 is usually deemed as significant. However, multiple determinant analyses were made (3 in total) and therefore significance adjusted accordingly, with an unadjusted P value ⬍.01 deemed to be significant. All reported P values have been unadjusted for multiple comparisons. Diagrammatic summaries of the results are presented by using scatter plots.

Results Small bowel anastomosis The results, including correlations, are summarized in Table 1. There was a significant relationship between the number of hand movements made with the OSATS global rating evaluation score. Both surgical efficiency score (Fig. 3) and the snapshot evaluation (Fig. 4) showed a significant relationship with the global rating, the former quotient showing the tightest relationship of all the parameters measured. In contrast, evaluation of the quality of the final product alone, with no measure of efficiency, revealed no statistical correlation with the OSATS global rating. Interobserver reliability between the 3 examiners was again high for the traditional global rating evaluation (Cronbach’s alpha 0.84) but only moderate for the snapshot totals (Cronbach’s alpha 0.59). In addition, the mean average for the traditional OSATS global evaluation was higher and the range of performance greater than that of the snapshot evaluation scores. Interexaminer reliability for the analysis of the quality of the final product was good (Cronbach’s alpha 0.80). Table 1 Results and correlations between methods of performance assessment with OSATS global rating evaluation score for small bowel anastomosis model Mean/standard deviation Global rating score No. of hand movements made Final product score Surgical efficiency score Snapshot global rating

Correlation with average global rating score

27.1/5.0 1,564/452 3.1/0.5 2.18/0.79 25.3/3.3

* Donates a statistically significant result.

0.794 0.341 0.842 0.577

P P P P

⬍ ⫽ ⬍ ⬍

.001* .07 .001* .001*

376

V. Datta et al. / The American Journal of Surgery 192 (2006) 372–378 Table 2 Results and correlations between methods of performance assessment with OSATS global rating evaluation score for vein patch insertion model

SURGICAL EFFICIENCY SCORE

4.0 3.5

Mean/standard deviation

3.0 2.5

Global rating score No. of hand movements made Final product score Surgical efficiency score Snapshot global rating

2.0 1.5 1.0

Correlation with average global rating score

25.2/5.9 1,166/305 2.6/0.6 2.47/0.88 23.8/4.6

0.579 0.545 0.700 0.609

P P P P

⫽ ⫽ ⬍ ⬍

.001 .001* .001* .001*

* Donates a statistically significant result.

.5 0.0 0

10

20

30

40

GLOBAL RATING SCORE Fig. 3. Surgical efficiency score versus global rating score for small bowel anastomosis model. The line of best fit is shown. R2 ⫽ 0.65

Vein patch insertion The results and correlations are summarized in Table 2. Again, there was a significant relationship between the number of hand movements made with the OSATS global rating evaluation score. Once more, the surgical efficiency score (Fig. 5) and the snapshot evaluation score (Fig. 6) showed a significant relationship with the global rating, the former quotient showing the tightest relationship of all the parameters measured. In addition, evaluation of the quality of the final product alone also correlated with the OSATS global rating, in contrast to the findings seen in the small bowel anastomosis model. Interobserver reliability between the 3 examiners was high for both the traditional global rating evaluation (Cron-

bach’s alpha 0.86) and for the snap shot totals (Cronbach’s alpha 0.80). Again, the mean average for the traditional OSATS global evaluation was higher and the range of performance greater than that of the snapshot evaluation scores. Interexaminer reliability for the analysis of the quality of the final product was high (Cronbach’s alpha 0.84). With respect to both models, snapshot total scores were lower than the equivalent traditionally appraised global ratings and the spread of performance was less.

Discussion Although the reliability and validity must be shown before any appraisal system can be used in confidence, the issue of feasibility must be addressed before widespread adoption of any assessment process can take place. The OSATS method has been used as a reference point, with justification as validity and reliability have been well established. However, the issue of feasibility is also important, and here local differences give different perspectives what

6

SURGICAL EFFICIENCY SCORE

35

SNAP SHOT RATING

30 25 20 15 10 5 0

5

4

3

2

1 0

0

10

20

30

40

GLOBAL RATING SCORE Fig. 4. Snap shot rating versus global rating score for small bowel anastomosis model. The line of best fit is shown. R2 ⫽ 0.33

0

10

20

30

40

GLOBAL RATING SCORE Fig. 5. Surgical efficiency score versus global rating score for vein patch insertion model. The line of best fit is shown. R2 ⫽ 0.49

V. Datta et al. / The American Journal of Surgery 192 (2006) 372–378

SNAP SHOT RATING

40

30

20

10

0 0

10

20

30

40

GLOBAL RATING SCORE Fig. 6. Snap shot rating versus global rating score for vein patch insertion model. The line of best fit is shown. R2 ⫽ 0.37

constitutes this. For example, the University of Toronto, when using the OSATS system, assessed up to 50 subjects by using over 100 trained examiners performing evaluations live [7,10]. Clearly, the availability of senior surgeons there is far in excess of what is possible and practical in the United Kingdom. Even with video recording and retrospective analysis with 3 trained observers, the process is still time and labor intensive, and the feasibility of performing wide-scale assessment diminished. In this chapter, several methods of engineering down the process of technical skills assessment have been compared with the recognised “gold standard” OSATS method. In these studies, once again, hand movement analysis has shown a significant relationship with the OSATS global assessment in both inanimate models. Motion analysis using ICSAD is a time-efficient technique not necessitating expert examiners to perform assessment. However, work from the field of biomechanics has suggested that motion-analysis systems are not designed to replace or make obsolete the biomechanical examination and “trained human eye”; rather, they act to compliment these skills [17]. The surgical efficiency score works along these lines, with hand-movement analysis complimenting the expert human interpretation of performance. This factor showed very strong relationships with OSATS global evaluations in both models, with closer correlations than any of the other assessment methods. The technique also shows time and labor efficiency, with the amount of time needed to assess the quality of performance minimal compared with the traditional OSATS evaluation process. The good interexaminer reliability with end-product evaluation score adds further confidence to this parameter. Measuring the quality of the final end product produced variable results. In the case of the small bowel anastomosis, the final product rating did not predict the global rating score. With respect to the vein patch insertion

377

model, with end-product evaluation did show a significant relationship, although the correlations were not as tight as the other 2 parameters used in this study. This echoes the results achieved in a previous study, in which, overall, these measures gave only modest correlation with true performance [18]. The results from the snapshot evaluation were varied. There was a significant correlation between the traditional and snapshot OSATS global score in both models, but again the relationship was not as tight as that achieved by the surgical efficiency score. Closer inspection of the results revealed that, overall, the snapshot total scores were lower than the equivalent traditionally appraised global ratings, and the spread of performance was less. This suggests that it is difficult to differentiate good performers by using the snapshot technique. It can be argued that isolating virtuoso performance is not necessary for a widespread assessment process; the emphasis should be on highlighting deficient performance, possibly initially as part of a screening process, before more detailed scrutiny is performed. However, it is apparent from the scatter plots of both tasks (Figs. 4 and 6) that there were several trials that were given good snapshot ratings but actually faired quite badly with the traditional global assessment. There were several reasons that might explain this. For example, with respect to the vein patch model, several candidates repeatedly entangled and knotted the polypropylene suture at a point not recorded in the snapshot video segment and were correspondingly marked down on the full global assessment. In addition, in this model, suturing backhand rather than forehand caused several subject problems again not apparent by observing suturing around the toe of the patch. More detailed snapshot segments might allow these situations to be observed; however, the longer this is, the less powerful the argument for using this method over the traditional OSATS evaluation. In addition, interobserver concordance was more variable with snapshot appraisals, somewhat reducing the confidence in this methodology. Finally, although the time spent by examiners reviewing procedures was greatly reduced, the process of information capture and editing was still significant, therefore reducing the feasibility of this technique. It is possible that the high correlation coefficients may have been exaggerated by the sample size and range of individual performance, which may be a limitation of this study. In addition, only 1 trial of each task per candidate was used, and it has been suggested that allowing each participant several attempts at the tasks may give a better average of performance and so increase the confidence of the methodology. However, it has been previously showed that with these procedures that there is no significant change in performance with repeated trials [14,15].

378

V. Datta et al. / The American Journal of Surgery 192 (2006) 372–378

Conclusion The surgical efficiency score, combining economy of motion, as measured by hand motion analysis/ICSAD and an end-product quality evaluation score, shows the closest relationship of all with OSATS global evaluation of the same task. It may therefore be possible to construct a technical skills appraisal by using this methodology. The speed/ quality and snapshot ratings have also showed significant correlations, but these associations are not as tight as those achieved by the surgical efficiency score. References [1] Bruce N. Evaluation of procedural skills of internal medicine residents. Acad Med 1989;64:213– 6. [2] Mackay S, Datta V, Chang A, et al. Multiple Objective Measures of Skill (MOMS): a new approach to the assessment of technical ability in surgical trainees. Ann Surg 2003;238:291–300. [3] Baldwin PJ, Paisley AM, Brown SP. Consultant surgeons’ opinion of the skills required of basic surgical trainees. Br J Surg 1999;86:1078 – 82. [4] Batra EK, Taylor PT, Franz DA, et al. A portable tensiometer for assessing surgeon’s knot tying technique. Gynecol Oncol 1993;48: 114 – 8. [5] Francis NK, Hanna GB, Cuschieri A. Reliability of the Advanced Dundee Endoscopic Psychomotor Tester for bimanual tasks. Arch Surg 2001;136:40 –3. [6] Reznick R, Regehr G, MacRae H, et al. Testing technical skill via an innovative “bench station” examination. Am J Surg 1997;173: 226 –30.

[7] Martin JA, Regehr G, Reznick R, et al. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg 1997;84:273– 8. [8] Regehr G, MacRae H, Reznick RK, Szalay D. Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination. Acad Med 1998; 73:993–7. [9] Anastakis DJ, Regehr G, Reznick RK, et al. Assessment of technical skills transfer from the bench training model to the human model. Am J Surg 1999;177:167–70. [10] MacRae H, Regehr G, Leadbetter W, Reznick RK. A comprehensive examination for senior surgical residents. Am J Surg 2000;179:190 –3. [11] Datta VK, Bann SD, Beard J, et al. Comparison of bench test evaluations of surgical skill with live operating performance assessments. J Am Coll Surg 2004;199:603– 6. [12] Taffinder NJ, McManus IC, Gul Y, et al. Effect of sleep deprivation on surgeons’ dexterity on laparoscopy simulator. Lancet 1998;352: 1191. [13] Taffinder N, Smith S, Mair J, et al. Can a computer measure surgical precision? Reliability, validity and feasibility of the ICSAD. Surgical Endoscopy 1999;13(suppl 1):81. [14] Datta V, Chang A, Mackay S, Darzi A. The relationship between motion analysis and surgical technical assessments. Am J Surg 2002; 184:70 –3. [15] Datta V, Mandalia M, Mackay S, et al. Relationship between skill and outcome in the laboratory-based model. Surgery 2002;131:318 –23. [16] Datta V, Mackay S, Mandalia M, Darzi A. The use of electromagnetic motion tracking analysis to objectively measure open surgical skill in the laboratory-based model. J Am Coll Surg 2001;193:479 – 85. [17] Blake RL, Ferguson HJ. The motion analysis system for dynamic gait analysis. Clin Podiatr Med Surg 1993;10:501–27. [18] Szalay D, MacRae H, Regehr G, Reznick R. Using operative outcome to assess technical skill. Am J Surg 2000;180:234 –7.