Download Learning Curves in Emergency Ultrasound Education

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Echocardiography wikipedia , lookup

Transcript
EDUCATIONAL ADVANCE
Learning Curves in Emergency Ultrasound
Education
David J. Blehar, MD, Bruce Barton, PhD, and Romolo J. Gaspari, MD, PhD
Abstract
Objectives: Proficiency in the use of bedside ultrasound (US) has become standard in emergency
medicine residency training. While milestones have been established for this training, supporting data
for minimum standard experience are lacking. The objective of this study was to characterize US
learning curves to identify performance plateaus for both image acquisition and interpretation, as well as
compare performance characteristics of learners to those of expert sonographers.
Methods: A retrospective review of an US database was conducted at a single academic institution. Each
examination was scored for agreement between the learner and expert reviewer interpretation and given
a score for image quality. A locally weighted scatterplot smoothing method was used to generate a
model of predicted performance for each individual examination type. Performance characteristics for
expert sonographers at the site were also tracked and used in addition to performance plateaus as
benchmarks for learning curve analysis.
Results: There were 52,408 US examinations performed between May 2007 and January 2013 and
included for analysis. Performance plateaus occurred at different points for different US protocols, from
18 examinations for soft tissue image quality to 90 examinations for right upper quadrant image
interpretation. For the majority of examination types, a range of 50 to 75 examinations resulted in both
excellent interpretation (sensitivity > 84% and specificity > 90%) and good image quality (90% the image
quality benchmark of expert sonographers).
Conclusions: Educational performance benchmarks occur at variable points for image interpretation and
image quality for different examination types. These data should be considered when developing training
standards for US education as well as experience requirements for US credentialing.
ACADEMIC EMERGENCY MEDICINE 2015;22:574–582 © 2015 by the Society for Academic Emergency
Medicine
T
he use of clinician-performed ultrasound (US)
examination has increased dramatically over the
past several decades. Initially adopted by relatively few physicians, it has become part of the standard
practice of emergency medicine (EM) in academic and
community settings alike and is now considered a requisite skill for graduating EM residents. The American
College of Emergency Physicians (ACEP) instated US
training guidelines based on expert consensus in 2001
and again in 2008.1 In 2008 the Council of Emergency
Medicine Residency Directors (CORD) introduced minimum training guidelines for clinician-performed US. In
2012 the Accreditation Council for General Medical
Education (ACGME) designated US as one of the milestones competencies for graduating EM residents.2
The initial ACEP guidelines focused on the number of
US examinations that were required to be performed by
a physician prior to being considered competent. In the
most recent ACEP guidelines, it is recommended that
25 to 50 examinations be performed for each of the core
applications. While the ACEP guidelines note that other
metrics may be used to determine competency, the
absolute number of US examinations performed
remains the most common (and easiest to obtain) metric. Likewise, while the ACGME milestone competencies
focus on a variety of metrics, they do include a cutoff of
150 total examinations as minimum experience to complete residency training. Choosing a specific number as
a benchmark for competency remains a theme for
national guidelines in EM and other specialties.
From the Department of Emergency Medicine (DJB, RJG) and the Department of Quantitative Health Sciences (BB), University of
Massachusetts Medical School, Worcester, MA.
Received July 29, 2014; revision received November 7, 2014; accepted November 10, 2014.
Presented at the Society for Academic Emergency Medicine, Dallas, TX, May 2014.
The authors have no relevant financial information or potential conflicts to disclose.
Supervising Editor: John H. Burton, MD.
Address for correspondence and reprints: David J. Blehar, MD; e-mail: [email protected].
A related article appears on page 597.
574
574
ISSN 1069-6563
574
PII ISSN 1069-6563583
© 2015 by the Society for Academic Emergency Medicine
doi: 10.1111/acem.12653
ACADEMIC EMERGENCY MEDICINE • May 2015, Vol. 22, No. 5 • www.aemj.org
The recommended 25 to 50 examination cutoff was
chosen based on expert consensus, as there has been
very little published literature regarding the learning
curves for US. Previous studies on learning curves for
clinician-performed US have focused primarily on interpretation metrics without consideration for image
acquisition skill.3 A consensus conference on how to
evaluate for US competency identified a number of elements, including image optimization and image interpretation, but little data exist on how these metric
change as individuals gain experience.4
The objective of this study was to characterize learning
curves for novice physician sonographers to identify
experience levels where educational performance plateaus occur. We additionally sought to identify experience
levels requisite for training sonographers to approach the
performance standards of expert physician sonographers
for image acquisition and interpretation. This study uses a
large educational database from a single site to calculate
learning curves for each of the core EM US applications.
We compare learner metrics to expert metrics to introduce a discussion on learning curves and competency for
each individual US application.
METHODS
Study Design
This study was a retrospective review of an educational
database from a single EM residency training program
over a 5-year period, from May 2007 to January 2013.
The study was reviewed by the local institutional review
board and was determined to be exempt from informed
consent requirements.
Study Setting and Population
Ultrasound examinations in the database are a combination of those performed in the course of clinical care as
well as those performed by learning sonographers
solely for educational purposes across four separate
emergency departments (EDs) staffed by a single academic EM physician group. The four sites ranged in
character from a small community ED with annual volume of 20,000 patients, to a large urban Level I trauma
center with annual volume over 100,000 patients. The
US machines used during the study period varied by
location and over time, with acquisition of newer equipment as older machines were removed from service.
Equipment used included a range of Sonosite machines
(Titan and MicroMaxx, Sonosite, Bothel, WA) and Zonare machines (Zone Ultra, ZS3, Zonare Medical Systems, Inc., Mountain View, CA). Still and video images
were captured in DICOM video file format stored to
DVD storage or USB drive (Zonare) or recorded as continuous video clips to DVD recorder (Sonosite). Changes
in equipment did not have a significant effect on image
quality or accuracy metrics. There was no significant
difference in image quality (p = 0.62, Student’s t-test)
between the first and last year of our study data. Similarly, sensitivity (0.86% vs. 0.88%) and specificity (0.96%
and 0.97%) did not differ between the first and last year
in the study.
Learning sonographers in the database included not
only training residents, but also attending physician
575
staff without prior US training or experience. All sonographers, regardless of whether they were residents or
faculty, underwent similar education. Prior to inclusion
in the database, they underwent a 1-day educational
session focusing on basic principles of the core US
applications. Residents participated in 1 dedicated
month in each of their 3 years of training that included
a focus on US, during which time they attended weekly
educational sessions. These periodic educational sessions focused on core US applications including limited
US of the aorta, chest wall, endovaginal uterus, focused
assessment with sonography in trauma (FAST), right
upper quadrant (RUQ), lower-extremity duplex, renal,
soft tissue, and limited cardiac echo. Each educational
session consisted of a 1-hour lecture, 60 to 90 minutes
of hands-on education, and 60 to 90 minutes of image
review. Faculty learners attended identical educational
sessions over a similar 3-year time period. All learners,
including both residents and faculty, attended between
eight and 12 of these complete sessions. Additional
hands-on sessions were available for learners. An additional US elective was available to third-year residents
as either a 2- or a 4-week rotation. All learners were
able to perform and record US during clinical shifts and
received automated timely feedback on their images
using the system described in the next section.
Study Protocol
Digital video of every US examination performed in the
ED was recorded at the time of performance and
uploaded into an electronic database for review. All pertinent patient information, the indication for imaging,
and the initial image interpretation of the diagnostic
study by the sonographer performing the examination
were recorded. Examinations performed solely for the
purposes of the educational experience of the learning
sonographer were logged as such by the sonographer
and designated as educational US in the database. All
other examination indications were designated as clinical US. All images were reviewed in a standardized
fashion by one of five physicians expert in bedside US.
The US experience of the expert reviewers ranged from
5 years and over 4,000 US examinations to 15 years and
over 20,000 US performed and/or reviewed. Image
reviews were performed unblinded to the initial interpretation, the identity of the patient, or the performing
sonographer.
Required images and image elements differed for the
different US protocols performed in this study. These
requirements did not change regardless of the experience level or type of learner (resident or faculty). Some
protocols required a specific number of images, such as
four images for the FAST exam and two images for soft
tissue examination, while other protocols could require
a variable number of images. For example, chest wall
US required at least one image of each hemithorax, but
multiple views of each hemithorax could be submitted if
needed. Lower-extremity duplex could include between
six and 12 (or more) paired images of compressed and
uncompressed veins in the lower leg.
The image review focused on image interpretation,
image acquisition skill, and resultant image quality.
Image interpretation consisted of a predefined
576
Blehar et al. • LEARNING CURVES IN EMERGENCY US EDUCATION
structured interpretation focusing on a primary finding
specific to each individual US application. Interpretation
was dichotomous, either positive or negative for the primary finding for that application. A listing of the core
US applications and the primary findings are included
in Table 1. Each examination was scored for agreement
between the initial interpretation (learner) and final
review (expert). During the initial study period (first
3 years of the database), image review was performed
twice weekly by one of three expert physician reviewers. During the final 2 years of the study period, image
review was performed on a daily basis Monday through
Friday by one of five reviewers.
This image review performed by the expert reviewer
also focused on metrics related to image acquisition.
These metrics included image quality ratings for the
images as a whole on a predetermined ordinal scale
from 1 to 8, with 8 representing perfect image quality
and 1 representing poor image quality. Any score of 4
or less was predefined to indicate that the images were
of sufficiently poor quality to adversely affect the interpretation. All data were recorded in the electronic database and sent back to the initial sonographer via e-mail
for feedback at the time of the review.
Data were downloaded from the electronic database
for all users in the system. US examinations performed
by any individual with experience that preceded inclusion in the electronic tracking system were excluded
from data analysis. In addition, all US of noncore applications were excluded from analysis (e.g., musculoskeletal, US-guided nerve blocks, ocular US). For analysis of
interpretative skill, those US that did not include an
interpretation by the initial sonographer were excluded.
Table 1
Definition of Ultrasound Terms and Their Findings
Term
Finding
Aorta
Chest wall
Abdominal
aortic
aneurysm
Pericardial
effusion
Pneumothorax
Endovaginal
uterine
FAST
Intrauterine
pregnancy
Free fluid
Lowerextremity
duplex
Renal
Deep vein
thrombosis
Right upper
quadrant
Gallstones
Soft tissue
Abscess
Cardiac
Hydronephrosis
Short Definition of Positive
Finding
Diameter of aorta measuring
≥3 cm.
Anechoic fluid visualized in
pericardial space.
Lack of visualized sliding of
interface between chest
wall and lung tissue.
Visualized fetus and or yolk
sac in endometrial cavity.
Anechoic fluid visualized in
peritoneal or pericardial
space.
Inability to compress deep
veins of lower extremity.
Visualized branched
anechoic center to kidney.
Mobile hyperechoic
structures in lumen of
gallbladder.
Disruption of superficial soft
tissue with focal collection
of anechoic fluid.
FAST = focused assessment with sonography in trauma.
Image quality metrics were converted to an ordinal
scale representing a dichotomous quality metric of poor
quality limiting interpretation (score of 4 or below) or
good quality (score of 5 or greater).
Data Analysis
The data were not normally distributed (KolmogorovSmirnov goodness-of-fit test for uniform distributions)
and are presented as medians with interquartile ranges
(IQR). Learner interpretation was analyzed using sensitivity and specificity analysis with the findings from
expert review as the criterion standard. All examinations were included for analysis, regardless of image
quality.
Learner image quality was assessed by comparing
their image quality to the image quality of the expert
reviewers’ independent imaging performed during the
study. To describe expert reviewer benchmarks related
to image quality, the US images obtained and interpreted by the expert reviewers were analyzed separately. Each image was interpreted and reviewed in a
fashion identical to those included in the study database. Data on the average image quality for each US
application were calculated.
Internal validity of the rating scale was analyzed by
comparing the agreement of the five expert reviewers.
All reviewers independently and blindly reviewed and
rated 100 randomly selected US images from the database for image quality. Agreement between reviewers
was analyzed using a kappa analysis on those image
interpretations.
A locally weighted scatterplot smoothing method
(loess) was used to generate a model of predicted performance (+95% CI) for each individual examination
type. Performance curves were analyzed to determine
plateau points where experiential benefit diminished. In
addition, performance curves were analyzed with a reference to the performance of the expert reviewers. SAS
(version 9.3) software was used for data analyses. Not
all patients will have perfect imaging, so data for image
quality was normalized to the image quality for images
obtained by the experts. In the displayed data, 100%
references the average image quality of the experts for
that US protocol and 0 represents the lowest possible
score for the scale.
While a plateau point within a curve can be estimated
visually, for the purposes of this study the plateau was
mathematically defined as the point where there was a
change in slope for the interval immediately preceding
a specific point of US experience (x axis) compared to
the slope immediately following that point. Slopes were
calculated as the change in percent experience (y axis)
over the number of US performed (x axis). For the purpose of this study, only slope changes greater than 25%
were considered eligible as a plateau point. For curves
with multiple changes in slope, the point of the greatest
slope change was used as the plateau point for this
study.
RESULTS
A total of 191 EPs performed a total of 89,052 US examinations. After excluding experienced physicians (those
ACADEMIC EMERGENCY MEDICINE • May 2015, Vol. 22, No. 5 • www.aemj.org
who began US imaging prior to starting this study) and
noncore applications, 101 EPs and 52,408 examinations
were included in the data set for analysis. US examinations in the database included those performed for educational purposes (33%), as well as clinical purposes
(66%). There were nine image applications included in
the final data set, representing the core US applications
as defined by ACEP. The number of US examinations in
each application ranged from 12,963 (FAST) to 1,253
(endovaginal uterus). Overall, 12.5% of imaging was
positive for pathology, but the percentage of positive
findings varied by US application, with a range of 4.5%
(chest wall) to 56% (soft tissue; see Table 2).
The median image rating (rated from 1 to 8) for
images obtained by expert sonographers provided a
benchmark for the image ratings of learners. The median image quality score for each of the US applications
for those images obtained by expert physicians was as
follows: aorta, 7 (IQR = 5 to 8); cardiac, 6 (IQR = 5 to 8);
chest wall, 8 (IQR = 7 to 8); endovaginal uterus, 8 (IQR =
7 to 8); FAST, 7 (IQR 6 to 8); lower-extremity duplex, 7
(IQR = 6 to 8); renal, 7 (IQR = 6 to 8); RUQ, 7 (IQR = 6 to
8); and soft tissue, 8 (IQR = 7 to 8).
The overall image quality and agreement with final
interpretation varied by US application for US acquired
by learners. Agreement between reviewers related to
rating the images on the image rating scale was good
(j = 0.81). The median image rating for all US in the
database was 6 on the 8-point ordinal scale. The US
applications with the worst image quality were the cardiac and aorta examinations, with median scores of 6
(IQRs 5 to 8), while the chest wall and renal examinations had the best mean image rating, with medians of
7 (IQR = 7 to 8) for chest wall and 7 (IQR = 6 to 8) for
renal. The overall agreement between the initial sonographer interpretation and the expert review was 95.9%,
but there was variability by US application.
The learning curves for image interpretation differed
by US application with regard to initial agreement level,
slope of learning curve, and overall shape of the learning curve. Most of the learning curves demonstrate a
slow steady improvement in agreement as experience
increases until a point of plateau, where further experience is associated with little or no improvement. All
577
image interpretation learning curves are displayed in
Figure 1.
Interpretation performance plateaus differed for the
different imaging protocols (Table 3). Some US protocols had plateau points that occurred at relatively lower
experience levels. Interpretation plateaus for soft tissue
(for abscess) and cardiac (for pericardial effusion)
occurred at 27 and 30 US, respectively. Other examinations such as FAST, chest wall, and aorta required more
experience (57, 60, and 66 US, respectively). Renal (78
US) and RUQ (90 US) required the most experience to
reach interpretation plateaus. Plateau points for endovaginal uterus and lower-extremity duplex were not definable.
Another way to analyze the changes in interpretation
performance for learners over time is to compare sensitivity and specificity of learners as they accumulate
experience. For most protocols, the sensitivity and specificity improved over time (Figures 2 and 3). Soft tissue
and endovaginal uterus were the easiest to learn to
interpret, with sensitivities in the mid-90s. The FAST
exam was the hardest protocol to learn to interpret, as
it demonstrated the lowest sensitivity, with a peak
around 80%. All protocols had excellent specificity, with
soft tissue and renal the only protocols below a specificity of 96%. Performing 50 US is a common goal for
many learners based on current guidelines, and this
would produce a sensitivity and specificity greater than
84 and 90%, respectively, for all US protocols with the
exception of the FAST exam, where performing 50 US
produces sensitivity of 80% and specificity of 96%.
The learning curves for image acquisition also differed by application (see Figure 4). Performance plateaus were identified for five of nine examination types
and occurred earliest for soft tissue (18), and latest for
aorta (84; see Table 3). The four learning curves that did
not display plateau points did so because their curves
demonstrated no change in slope for three, and there
was insufficient data for one. Image quality was easiest
to obtain for chest wall US, where performing three US
resulted in 95% of the image quality obtained by
experts. Image quality was hardest to learn for cardiac,
endovaginal uterus, and lower-extremity duplex, where
learners never surpassed 90% of the image quality of
Table 2
Characteristics of Ultrasounds Included in the Database
Examination Type
Aorta
Cardiac
Chest wall
Endovaginal uterine
FAST
Lower-extremity duplex
Renal
Right upper quadrant
Soft tissue
Total
Number of US
Examinations
Number of
Sonographers
6,183
5,689
6,713
1,253
12,963
2,871
6,173
8,118
2,445
52,408
99
100
97
89
99
98
99
100
97
101
FAST = focused assessment with sonography for trauma.
Positive
Findings, n (%)
354
505
304
308
1,290
174
802
1,432
1,377
6,546
(5.3)
(8.9)
(4.5)
(24.6)
(10.0)
(6.1)
(13.0)
(17.6)
(56.3)
(12.5)
Image Rating,
Mean (95% CI)
5.8
5.4
7.0
5.8
6.2
5.8
6.3
6.0
6.8
6.1
(5.75–5.85)
(5.35–5.45)
(6.97–7.03)
(5.7–5.9)
(6.17–6.23)
(5.73–5.87)
(6.26–6.34)
(5.96–6.04)
(6.74–6.86)
(6.09–6.11)
Interpretation
Agreement (%)
98.7
95.6
98.6
90.7
95.7
96.9
93.2
95.5
92.2
95.9
578
(A)
Blehar et al. • LEARNING CURVES IN EMERGENCY US EDUCATION
(B)
(C)
(D)
(E)
(F)
(G)
(H)
(I)
Figure 1. Image interpretation learning curves. Predicted percentage agreement with expert review is plotted as a function of
examination experience with surrounding 95% CI.
the experts. Both FAST and RUQ demonstrated long
learning curves, where it took 183 and 96 US, respectively, to reach 90% of the image quality of an expert.
As mentioned previously, performing 50 US is a common goal for many learners based on current guidelines, and this would produce different skill levels for
the different US protocols. Performing 50 US resulted in
an image quality relative to expert as follows: aorta
(73%), cardiac (68%), chest wall (93%), endovaginal
uterus (>76%), FAST (84%), lower-extremity duplex
(51%), renal (87%), RUQ (79%), and soft tissue (94%).
DISCUSSION
It is logical to expect that individuals learning US will
demonstrate an increase in skill level over time. The
learning curves for interpretation in this article demonstrate a gradual improvement over time, but the degree
of improvement was relatively shallow. This may be
related to the fact that the agreement was relatively
high for most US applications, even for the first few US
in the learning experience. For some of the examination
types, this initial high performance level may in part
explain the absence of a discernable performance plateau (e.g., FAST exam image quality). A few of the
learning curves (FAST, cardiac) demonstrate a small
decrease over the later stages of experience. It is
unclear what is responsible for this decrease in interpretation performance. One possibility is that more
experienced learners interpreted trace fluid as negative,
mistaking the clinical significance with the actual finding. Another possibility is that patient selection for
ACADEMIC EMERGENCY MEDICINE • May 2015, Vol. 22, No. 5 • www.aemj.org
Table 3
Plateau Points for Interpretation and Image Acquisition Metrics
Examination
Aorta
Cardiac
Chest wall
Endovaginal uterine
FAST
Lower-extremity duplex
Renal
Right upper quadrant
Soft tissue
Interpretation
66
30
60
None
57
None
78
90
27
Image Acquisition
84
27
39
None
None
None
75
None
18
Data represent the number of ultrasounds performed prior to
reaching a plateau in the learning curve of the indicated metric. Plateau is defined as a decrease in slope of learning
curve >25%. The curve for endovaginal uterus did not have
enough data points to calculate a plateau point. All other
curves without a plateau point did not demonstrate sufficient
changes in slope to calculate a plateau point.
Figure 2. Sensitivity of ultrasound by examination type compared to interpretation by expert reviewer.
Figure 3. Specificity of ultrasound by examination type compared to interpretation by expert reviewer.
learning sonographers with more relative experience
includes difficult patients who are avoided by more novice sonographers. It is also possible that not all individuals will demonstrate increases in skill over time. A
study examining cardiologists found no association
between experience and proficiency.5 One final possibility relates to the fact that FAST and cardiac examinations demonstrated proportionally more educational US
during the earlier learning phases. These would be
579
more likely to be obtained during nonclinical times
when the sonographer has more time to obtain and
interpret images, thus artificially enhancing performance.
Even including the initial US experience, our overall
interpretation agreement rate of 95.9% compares favorably to published rates for other specialties. Discrepancy rates of US interpretation between radiology
residents and faculty range from 0.2% to 4.0%.6–9 A
renal US study in 2012 found good agreement between
two experienced radiologists (j = 0.82), similar to the
agreement in our study.10 Studies comparing cardiology
fellows using portable US to cardiology faculty found
comparable agreement to our study (j = 0.66 to
0.89).11,12 Echocardiography studies involving internists
or EPs demonstrate similar agreement (j = 0.79).13,14 A
study by novice surgeons learning RUQ US demonstrated less agreement (j = 0.40), significantly lower
than the agreement in this article (j = 0.87).15 The most
likely explanation for this difference is that their training
was significantly less than the training of the learners in
this article.
Similar to image interpretation, it is logical to expect
that imaging technique will improve as experience
increases. However, not all of the learning curves for
image quality improved over time. Three of the US
applications (aorta, lower extremity duplex, and endovaginal uterus) demonstrated decreases in image quality
over the initial experience before increasing back to
baseline. We speculate that this decrease relates to
patient selection, as new learners choose patients who
are easier to image (thinner, fasting patients) and more
experienced sonographers recorded examinations from
more challenging patients based on clinical necessity. It
is also possible that the decrease in image quality is
independent of the sonographer, and some of the
patients at different times had higher percentages of
characteristics that decrease US imaging (for example,
obesity, recent food intake, increased intestinal gas or
pain tolerance, and ability to cooperate with emergent
imaging).
There are little published data related to image acquisition for US. One study focusing on acquisition of
images for the FAST exam found that US technique
improved even after 75 US examinations.16 The learning
curve in this article demonstrates similar findings, in
that the image quality for the FAST exam increased
even after 200 US. A study of internists found that even
after 35 cardiac US, their image quality was not equal
to experienced sonographers.17 Our learning curve
demonstrates that learners do not reach the quality
equivalent of expert sonographers even after 120 US. A
study exploring learning curves for US-guided interventions found that students needed between 37 and 109
US-guided procedures to gain competency.18
Many national groups recommend or require a certain number of US for training, in effect using the number of US as a surrogate for length of experience and
by extension competency. In most of these cases, the
specific numbers are chosen by expert consensus and
are not based on any data. In 2008 ACEP recommended
that physician perform 25 to 50 US for each application
except US-guided procedures, which required 10 US
580
Blehar et al. • LEARNING CURVES IN EMERGENCY US EDUCATION
(A)
(B)
(C)
(D)
(E)
(F)
(G)
(H)
(I)
Figure 4. Image quality learning curve. Image quality is expressed as percentage of examinations at a given experience level that
are predicted to be of high quality (rating scale of 5 or greater) with surrounding 95% CI.
each.1 The American Registry for Diagnostic Medical
Sonography requires that physicians perform 800
US prior to taking a test to certify competency
(www.ardms.org). The American College of Cardiology
and the American Heart Association published recommendations in 2003 that learners perform 150 cardiac
echoes prior to independently performing echocardiography.19 The American College of Radiology requires
that physicians document at least 500 US to certify
competency (www.acr.org).
Translating the performance curves into a metric for
competency is more complicated, as the definition of
competency with regard to US imaging is unclear. Performance plateaus found in our study offer a guide for
US education to understand at what point additional
experience offers minimal improvement in image acquisition or interpretation. We have additionally assessed
the performance of the expert reviewers as a yardstick
to serve as a comparison for the individuals who are
learning US. Although the ideal rating for image quality
on our scale would be 8.0, there are many contributors
to decreased image quality, including patient factors
(oral intake, body habitus, painful condition, urgency of
imaging) and sonographer factors (skill level, attention
to detail), to name a few. Similarly, the ideal agreement
would be 100%, but even expert reviewers sometimes
disagree with each other. Some statisticians define a
kappa of 0.80 or greater as “nearly perfect.”20
Some of the newer guidelines on proficiency in US
have moved away from requiring a specific number of
examinations. The guidelines for proficiency in critical
care US describe educational goals without a reference
to number of US performed.21 CORD has instituted
“milestones,” where US competency is assessed through
ACADEMIC EMERGENCY MEDICINE • May 2015, Vol. 22, No. 5 • www.aemj.org
simulation, direct observation, and examinations.2 This
trend makes logical sense, as not every individual learns
at the same speed, and periodic assessments will allow
a greater certainty that an individual is competent.
From a group perspective, the data presented in this
article provide some understanding of how a given
experience level translates into a predicted level of performance.
LIMITATIONS
This study was conducted at a single center with a common educational process for all learning sonographers
supported by rapid interpretative and technical feedback.
This potentially limits applicability in other centers with
different educational programs. Specific educational
styles can affect both individual skill acquisition rates
and practice patterns of US utilization. However, it is
unclear how unevenly distributed imaging, or periods of
inactivity during a period of learning US, affect learning.
It is possible that skill decay is seen following periods of
inactivity, but our study does not address this issue.
In this study our learners sometimes benefit from
more experienced individuals (residents and staff) present during image acquisition and interpretation. Our
database does not specifically track such interactions,
and therefore the effect of this interaction cannot be
quantified in the analysis of our data. Although it is possible that interactions with more experienced staff during image acquisition could influence learning curves,
this situation exists in many EDs, and it should not limit
the generalizability of our findings.
The image rating scale used in this study has not
been previously validated and is potentially unique to
our center. Simplifying the scale to a dichotomous variable of good versus poor image quality not only assists
in statistical analysis, but also converts the scale to one
we believe is easily applied to other educational programs.
Our primary outcome measure relates to performance
of expert sonographers at our center. While all five
experts in this study have extensive training and experience (as detailed under Methods), there is no standard
to define expertise in emergency US, and this creates
another limitation to the generalizability of the results of
this study to other centers.
CONCLUSIONS
Educational performance curves vary by ultrasound
application, not only for image acquisition skill but also
for image interpretation. While not providing an
absolute cutoff for requisite examination experience, our
results would suggest that for the majority of ultrasound
examination types, a minimum of 50 examinations, as
suggested by the current American College of Emergency Physicians guidelines, will result in a reasonable
performance level compared to expert sonographers.
References
1. Emergency ultrasound guidelines. Ann Emerg Med
2009;53:550–70.
581
2. Lewiss RE, Pearl M, Nomura JT, et al. CORD-AEUS:
consensus document for the emergency ultrasound
milestone project. Acad Emerg Med 2013;20:740–5.
3. Gaspari RJ, Dickman E, Blehar D. Learning curve of
bedside ultrasound of the gallbladder. J Emerg Med
2009;37:51–6.
4. Tolsgaard MG, Todsen T, Sorensen JL, et al. International multispecialty consensus on how to evaluate ultrasound competence: a Delphi consensus
survey. PloS One 2013;8:e57687.
5. Nair P, Siu SC, Sloggett CE, Biclar L, Sidhu RS,
Yu EH. The assessment of technical and interpretative proficiency in echocardiography. J Am Soc
Echocardiogr 2006;19:924–31.
6. Ruma J, Klein KA, Chong S, et al. Cross-sectional
examination interpretation discrepancies between
on-call diagnostic radiology residents and subspecialty faculty radiologists: analysis by imaging
modality and subspecialty. J Am Coll Radiol
2011;8:409–14.
7. Ruutiainen AT, Durand DJ, Scanlon MH, Itri JN.
Increased error rates in preliminary reports issued
by radiology residents working more than 10 consecutive hours overnight. Acad Radiol 2013;20:
305–11.
8. Ruchman RB, Jaeger J, Wiggins EF 3rd, et al. Preliminary radiology resident interpretations versus
final attending radiologist interpretations and the
impact on patient care in a community hospital. Am
J Roentgenol 2007;189:523–6.
9. Ruutiainen AT, Scanlon MH, Itri JN. Identifying
benchmarks for discrepancy rates in preliminary
interpretations provided by radiology trainees at
an academic institution. J Am Coll Radiol 2011;8:
644–8.
10. Rud O, Moersler J, Peter J, et al. Prospective evaluation of interobserver variability of the hydronephrosis index and the renal resistive index as
sonographic examination methods for the evaluation of acute hydronephrosis. BJU Int 2012;110:
E350–6.
11. Giusca S, Jurcut R, Ticulescu R, et al. Accuracy of
handheld echocardiography for bedside diagnostic
evaluation in a tertiary cardiology center: comparison with standard echocardiography. Echocardiography 2011;28:136–41.
12. Borges AC, Knebel F, Walde T, Sanad W, Baumann
G. Diagnostic accuracy of new handheld echocardiography with Doppler and harmonic imaging properties. J Am Soc Echocardiogr 2004;17:234–8.
13. Bustam A, Noor Azhar M, Singh Veriah R, Arumugam K, Loch A. Performance of emergency physicians in point-of-care echocardiography following
limited training. Emerg Med J 2014;31:369–73.
14. Vignon P, Mucke F, Bellec F, et al. Basic critical
care echocardiography: validation of a curriculum
dedicated to noncardiologist residents. Crit Care
Med 2011;39:636–42.
15. Eiberg JP, Grantcharov TP, Eriksen JR, et al. Ultrasound of the acute abdomen performed by surgeons in training. Minerva Chir 2008;63:17–22.
16. Jang T, Kryder G, Sineff S, Naunheim R, Aubin C,
Kaji AH. The technical errors of physicians learning
582
to perform focused assessment with sonography in
trauma. Acad Emerg Med 2012;19:98–101.
17. Martin LD, Howell EE, Ziegelstein RC, Martire C,
Shapiro EP, Hellmann DB. Hospitalist performance
of cardiac hand-carried ultrasound after focused
training. Am J Med 2007;120:1000–4.
18. de Oliveira Filho GR, Helayel PE, da Conceicao DB,
Garzel IS, Pavei P, Ceccon MS. Learning curves and
mathematical models for interventional ultrasound
basic skills. Anesthes Analg 2008; 106:568–73.
19. Quinones MA, Douglas PS, Foster E, et al. ACC/
AHA clinical competence statement on echocardiography: a report of the American College of Cardiol-
Blehar et al. • LEARNING CURVES IN EMERGENCY US EDUCATION
ogy/American Heart Association/American College
of Physicians-American Society of Internal Medicine
Task Force on clinical competence. J Am Soc Echocardiogr 2003;16:379–402.
20. Viera AJ, Garrett JM. Understanding interobserver
agreement: the kappa statistic. Fam Med 2005;
37:360–3.
21. Mayo PH, Beaulieu Y, Doelken P, et al. American
College of Chest Physicians/La Societe de Reanimation de Langue Francaise statement on competence
in critical care ultrasonography. Chest 2009;135:
1050–60.