 |
|
| |
|
| |
 |
| |
 |
| |
It's not too late! Your support of the SIIM Research & Education Fund through the 4th Annual "Ride to SIIM" will help fund the SIIM Grant Program and the Samuel J. Dwyer, III, PhD, FSIIM, Memorial Lecture.
Make a per-mile contribution to the SIIM Research & Education Fund today!
|
|
| |
|
| |
|
|
 |
 |
Statistical Validation Process of the Imaging Informatics Professional Certification Examination |
| |
| Authors: |
| Mark Raymond, PhD, The American Registry of Radiologic Technologists; Paul G. Nagy, PhD |
| |
| Background: |
The American Board of Imaging Informatics (ABII) is a non-profit organization that has created an examination for imaging informatics professionals. ABII was founded by the Society of Imaging Informatics in Medicine (SIIM) and the American Registry of Radiologic Technologists (ARRT). The process to generate the test took several years and dozens of subject matter experts active in the field volunteering their time and submitting and vetting questions for the examination. This paper will discuss the process used to generate and validate the examination.
History
Job Analysis Questionnaire. In 2005, the SIIM Certification Committee directed the administration of a job analysis survey as an initial step toward the development of the Certified Imaging Informatics Professional (CIIP) program. The survey consisted of 127 competency statements organized under three major domains: behavioral science, business, and technical. These domains were further subdivided into 15 subdomains or roles, with each role consisting of 6 to 15 competency statements. The competency statements represented a combination of job responsibilities and knowledge, skills, abilities (KSAs). Each competency was thought to be required for successful job performance. The purpose of the survey was to verify this assumption and to prioritize competencies. Exhibit 1 (optional) summarizes the structure and content of the questionnaire. The data supported the importance of most competencies, with mean ratings on the 7-point scale ranging from 4.2 to 6.4. A complete description of survey procedures and results can be found in the Journal of Digital Imaging, 2005, 18(4), 251-259.
Test Specifications. The 127 competencies served as the basis for the test specifications. Test specifications describe the content to be covered by: (1) an examination, (2) the emphasis allocated to each topic, and (3) other important features of the test. Not only are test specifications used by ABII to ensure that different forms of an exam are comparable over time, but also the specifications serve as a useful guide for examinees, those who provide training, and employers who wish to know the specific competencies of those who pass the test.
The SIIM Certification Committee and ARRT staff worked on the test specifications, or test content outline (TCO), in 2006 over a period of several months. An early draft of the TCO, accompanied by a questionnaire, was posted on the SIIM website and distributed at the SIIM 2006 Annual Meeting. The questionnaire was completed by 226 members. Comments were generally positive, but it was evident more specificity on the work was needed. The Certification Committee considered alternative frameworks, and ultimately settled on an outline consisting of 10 major domains, with three to six specific job responsibilities listed under each major domain. Within each specific responsibility, the outline further identified the KSAs required to effectively carry out those responsibilities. The 10 major domains, and their associated weights, appear in Exhibit 2. In October 2006, the Certification Committee approved the TCO and determined that the exam would consist of 150 items (130 scored plus 20 unscored pilot items). A complete copy TCO is available at abii.org.
|
| |
| Evaluation: |
Examination Development
Test-Item Writing. Launching and maintaining a certification examination requires the availability of hundreds of high-quality test items addressing each area of TCO. Representatives from the Certification Committee and general SIIM membership attended the first item-writing workshop held in July 2006 at ARRT offices in St Paul, MN. The goals of the workshop were to obtain input on the TCO, which was still in draft form, to learn basic principles of test-item writing, and to begin building up a pool of test items. A second workshop held in February 2007 was attended by six individuals from the SIIM membership. Approximately 400 questions were submitted during, and subsequent to, those first two workshops. As described below, items are accepted only after extensive review by subject matter experts.
Item Banking. Newly written test items are submitted, using secure web-based software written specifically for item writing and review. Once submitted, items undergo two levels of review. The first level is completed by a review panel consisting of item writers and other volunteer reviewers. Items are rated for relevance, technical accuracy, and overall quality. The second level of review is completed by members of the Examination Committee, who review the items in light of comments and ratings provide by initial reviewers. Items at the second level of review are either accepted as is, accepted with revision, or rejected.
Once accepted, items migrate into a permanent database called an item bank. The item bank contains hundreds of fields of information organized into various tables that facilitate the item’s use on multiple test forms over time. In addition to the question itself, the item bank stores information such as the item’s author, reference, date of acceptance, edit history, classification under the TCO, accompanying graphics, usage history, numerous statistical indices, and other data. To ensure an ample supply of test items for new forms of the exam, and to make certain that the future exams keep pace with advances in technology and changes in job responsibilities, new item writers are appointed and additional workshops are periodically held.
Assembly of Pilot Test Form. The Certification Committee convened for two meetings early in 2007, to begin development of the test form to be used for the pilot study. The first meeting was held in January, and was devoted primarily to reviewing, revising, and classifying newly submitted test items. At the conclusion of this meeting, a tally of the item bank was completed, and item-writers were provided with feedback regarding content areas in need of test questions. During a second three-day meeting, held in April 2007, the Committee continued its review and revision of new questions. It was also necessary for the Committee to write several questions to obtain coverage of all content areas.
The pilot-study exam form was edited and finalized by psychometric staff at ARRT. It was administered to 100 qualified participants as a paper-and-pencil test on June 9, 2007, at the SIIM 2007 Annual Meeting in Providence, RI. Examinees were invited to comment on individual questions for relevance and accuracy, and asked to complete a short survey regarding the exam as a whole.
Statistical Analysis and Standard Setting
Statistical Analysis. Immediately following the pilot test, responses to each question were subjected to an item analysis to verify the accuracy of the scoring key and to statistically validate each question. An additional purpose of the item analysis was to help determine which 130 items (of the 150) would be used for scoring purposes. Examinee comments were also evaluated as part of this decision-making process.
A statistical item analysis evaluates responses to each question as a function of each examinee’s total test score. In general, individuals with high scores are expected to have a higher probability than examinees with low scores in answering an item correctly. When a significant portion of high scoring test takers answer an item incorrectly, the item is scrutinized for accuracy and clarity of wording. To illustrate the utility of an item analysis, Exhibit 3 represents the output for a single test item with questionable statistics. Although most individuals correctly answered this sample item, many high scorers answered it incorrectly, as indicated by the negative discrimination index, negative biserial correlation, and relatively high proportion of high scoring examinees (24%) choosing option A. Statistics such as this do not always mean the item is flawed, but works well for screening purposes. In the case of the item in Exhibit 3, the item was worded so option A also appeared correct. Flawed items can either be replaced with a pilot item, given multiple correct answers, or removed from scoring. The Certification Committee might decide to revise such items or just discard them from the item bank.
Other examination statistics are also evaluated to verify that the test is functioning as intended, including the mean, standard deviation, range, frequency distribution, and other graphical displays. Three very important values are: an index of reliability, such as coefficient alpha; the standard error of measurement; and an index of decision consistency. All values provide information about the consistency or dependability of test scores.
|
| |
| Discussion: |
Passing Score. Passing scores for most certification examinations are established using criterion-referenced, standard-setting procedures. Examinees pass or fail based entirely on whether their level of proficiency, as measured by the examination, meets or exceeds some absolute criterion, or cut-off score. It is conceivable – but not likely – that everyone could pass, or everyone could fail. This stands in sharp-contrast to a norm-referenced test, where some predetermined percentage of examinees will pass or fail regardless of their proficiency.
A criterion-referenced, standard-setting procedure method known as the Angoff method was used to set the passing score for the CIIP examination. The standard setting study was conducted about 2 weeks after administration of the pilot exam. Eight individuals participated. Three were members of the Certification Committee. The other five had served as item writer and reviewers. Participants were selected on the basis of their expertise in the field and represented a variety of work settings.
The Angoff method requires that participants inspect each item on the exam and estimate the proportion of minimally qualified test takers who would get the item correct. This estimated proportion is called item rating. The sum of the item ratings for a participant is an expected score for a minimally qualified examinee. The objective of the Angoff method is to approximate the empirical outcome that would be obtained if it were possible to actually give the exam to a group individuals who had been determined, a priori, to be minimally qualified. By asking multiple participants to make their ratings independently, it is reasonable to average their scores to derive a passing standard, which represents, in some sense, a consensus of agreement on the expected performance of minimally qualified candidates.
Scores were converted via linear transformation so the passing scaled score would correspond to 75 and the maximum scaled score would be 99. Of the 103 examinees who took the pilot exam 99, of them passed. The high pass rate was attributed to the fact that a majority of the individuals who signed up for the pilot study were highly-motivated and very proficient early adopters. Scores from multiple exams will be presented to show the maturation process.
Subsequent Examination Forms
Form Assembly and Scoring. Four additional forms of the CIIP examination have been administered during the two years since the pilot study. Throughout this time frame, item writers continued to produce new items, and the Certification (Examination Committee) continued meeting to review items and assemble new test forms. To guarantee that the new test forms are comparable, in terms of test content, all forms have been constructed in strict accordance with the TCO. To ensure that the scores on different exam forms are comparable, a process known as statistical equating is used. Equating requires that new exam form have about 20% to 25% of items in common with previous forms. Then statistical models can be used to detect and correct for differences in test form difficulty. So, for example, if test B is harder than test A, test B would have a lower passing point on the raw score scale. However, scores from both forms would be placed on the same scale for reporting purposes – a scale where the passing point is set at 75.
Exam Results. Exhibit 4 summarizes results for the first four test administrations. The slight decrease in mean scores and pass rates, and the increase in variability, suggest that the initial pilot study group probably was not representative of the entire population of imaging informatics personnel (i.e., they were more proficient). However, even though the pass rate has dropped, the current pass rates are still reasonably high and indicate that examinees are generally quite proficient. |
| |
| Domain |
Role |
Sample Competency |
| Behavioral |
Training |
Developing user training programs |
| |
Workflow Engineering |
Workflow analysis |
| |
Reading Environment |
RIS-PACS dictation integration |
| |
Customer Relations Management |
Overcoming psychological barriers |
| |
|
|
| Business |
PACS Readiness |
Understanding the CIO perspective |
| |
Strategic visions |
Building strategic and operational committees |
| |
Economics of PACS |
Total cost of ownership |
| |
Vendor Selection |
Vendor support |
| |
Project Management |
Performance milestones |
| |
Sustaining PACS |
Recruiting PACS professionals |
| |
|
|
| Technical |
Technology Overview |
Work stations and displays |
| |
Systems Management |
Recoverability policies |
| |
Troubleshooting |
Network administration |
| |
Modalities |
Integration with PACS via DICOM/lHE |
| |
Security |
Understanding HIPAA security and auditing
|
|
| |
| |
| Performance Domain |
% of Exam |
Performance Domain |
% of Exam |
| Procurement |
5 |
Image Management |
20 |
| Project Management |
5 |
Information Technology |
15 |
| Operations |
10 |
Systems Management |
10 |
| Communications |
10 |
Clinical Engineering |
10 |
| Training and Education |
5 |
Medical Informatics |
10 |
|
| |
| |
| |
|
|
|
|
Proportion Choosing |
| Item |
Prop Correct |
Disc Index |
Biserial Corr |
Option |
Total Group |
Low Scorers |
High Scorers |
Biserial Corr |
| 93 |
.68 |
-.21 |
-.17 |
A |
.15 |
.10 |
.24 |
.20 |
| Key=D |
|
|
|
B |
.09 |
.03 |
.06 |
.06 |
| |
|
|
|
C |
.08 |
.07 |
.12 |
.03 |
| |
|
|
|
D* |
.68 |
.80 |
.59 |
-.17 |
|
| |
| |
| Exam Date |
N |
Min Score |
Max Score |
Mean |
Std Dev |
% Pass |
| June 2007 |
103 |
59 |
97 |
85.7 |
6.4 |
96.1 |
| Sept 2007 |
96 |
56 |
97 |
83.1 |
7.6 |
87.5 |
| March 2008 |
103 |
64 |
98 |
83.0 |
7.7 |
84.5 |
| Sept 2008 |
120 |
61 |
94 |
80.8 |
7.0 |
80.8 |
|
|
|
| |
| |
|
|