inv
top top2
arrow SIIM Home  arrow Contact Us
SIIM
 
Stay Connected!

 

Twitter

 

Twitter

 

LinkedIn

 

Facebook

 

Facebook

Wordpress

 
CFA 2010
 
Ride to SIIM
 

It's not too late! Your support of the SIIM Research & Education Fund through the 4th Annual "Ride to SIIM" will help fund the SIIM Grant Program and the Samuel J. Dwyer, III, PhD, FSIIM, Memorial Lecture.

Make a per-mile contribution to the SIIM Research & Education Fund today!

 
 
Gateway
 
 
Scientific Abstracts
invisible
Statistical Validation Process of the Imaging Informatics Professional Certification Examination
 
Authors:
Mark Raymond, PhD, The American Registry of Radiologic Technologists; Paul G. Nagy, PhD
 
Background:

The American Board of Imaging Informatics (ABII) is a non-profit organization that has created an examination for imaging informatics professionals. ABII was founded by the Society of Imaging Informatics in Medicine (SIIM) and the American Registry of Radiologic Technologists (ARRT). The process to generate the test took several years and dozens of subject matter experts active in the field volunteering their time and submitting and vetting questions for the examination. This paper will discuss the process used to generate and validate the examination.

 

History

 

Job Analysis Questionnaire. In 2005, the SIIM Certification Committee directed the administration of a job analysis survey as an initial step toward the development of the Certified Imaging Informatics Professional (CIIP) program. The survey consisted of 127 competency statements organized under three major domains: behavioral science, business, and technical. These domains were further subdivided into 15 subdomains or roles, with each role consisting of 6 to 15 competency statements. The competency statements represented a combination of job responsibilities and knowledge, skills, abilities (KSAs). Each competency was thought to be required for successful job performance. The purpose of the survey was to verify this assumption and to prioritize competencies. Exhibit 1 (optional) summarizes the structure and content of the questionnaire. The data supported the importance of most competencies, with mean ratings on the 7-point scale ranging from 4.2 to 6.4. A complete description of survey procedures and results can be found in the Journal of Digital Imaging, 2005, 18(4), 251-259.

 

Test Specifications. The 127 competencies served as the basis for the test specifications. Test specifications describe the content to be covered by: (1) an examination, (2) the emphasis allocated to each topic, and (3) other important features of the test. Not only are test specifications used by ABII to ensure that different forms of an exam are comparable over time, but also the specifications serve as a useful guide for examinees, those who provide training, and employers who wish to know the specific competencies of those who pass the test.

 

The SIIM Certification Committee and ARRT staff worked on the test specifications, or test content outline (TCO), in 2006 over a period of several months. An early draft of the TCO, accompanied by a questionnaire, was posted on the SIIM website and distributed at the SIIM 2006 Annual Meeting. The questionnaire was completed by 226 members. Comments were generally positive, but it was evident more specificity on the work was needed. The Certification Committee considered alternative frameworks, and ultimately settled on an outline consisting of 10 major domains, with three to six specific job responsibilities listed under each major domain. Within each specific responsibility, the outline further identified the KSAs required to effectively carry out those responsibilities. The 10 major domains, and their associated weights, appear in Exhibit 2. In October 2006, the Certification Committee approved the TCO and determined that the exam would consist of 150 items (130 scored plus 20 unscored pilot items). A complete copy TCO is available at abii.org.

 
Evaluation:

Examination Development

 

Test-Item Writing. Launching and maintaining a certification examination requires the availability of hundreds of high-quality test items addressing each area of TCO. Representatives from the Certification Committee and general SIIM membership attended the first item-writing workshop held in July 2006 at ARRT offices in St Paul, MN. The goals of the workshop were to obtain input on the TCO, which was still in draft form, to learn basic principles of test-item writing, and to begin building up a pool of test items. A second workshop held in February 2007 was attended by six individuals from the SIIM membership. Approximately 400 questions were submitted during, and subsequent to, those first two workshops. As described below, items are accepted only after extensive review by subject matter experts.

 

Item Banking. Newly written test items are submitted, using secure web-based software written specifically for item writing and review. Once submitted, items undergo two levels of review. The first level is completed by a review panel consisting of item writers and other volunteer reviewers. Items are rated for relevance, technical accuracy, and overall quality. The second level of review is completed by members of the Examination Committee, who review the items in light of comments and ratings provide by initial reviewers. Items at the second level of review are either accepted as is, accepted with revision, or rejected.

 

Once accepted, items migrate into a permanent database called an item bank. The item bank contains hundreds of fields of information organized into various tables that facilitate the item’s use on multiple test forms over time. In addition to the question itself, the item bank stores information such as the item’s author, reference, date of acceptance, edit history, classification under the TCO, accompanying graphics, usage history, numerous statistical indices, and other data. To ensure an ample supply of test items for new forms of the exam, and to make certain that the future exams keep pace with advances in technology and changes in job responsibilities, new item writers are appointed and additional workshops are periodically held.

 

Assembly of Pilot Test Form. The Certification Committee convened for two meetings early in 2007, to begin development of the test form to be used for the pilot study. The first meeting was held in January, and was devoted primarily to reviewing, revising, and classifying newly submitted test items. At the conclusion of this meeting, a tally of the item bank was completed, and item-writers were provided with feedback regarding content areas in need of test questions. During a second three-day meeting, held in April 2007, the Committee continued its review and revision of new questions. It was also necessary for the Committee to write several questions to obtain coverage of all content areas.

 

The pilot-study exam form was edited and finalized by psychometric staff at ARRT. It was administered to 100 qualified participants as a paper-and-pencil test on June 9, 2007, at the SIIM 2007 Annual Meeting in Providence, RI. Examinees were invited to comment on individual questions for relevance and accuracy, and asked to complete a short survey regarding the exam as a whole.

 

Statistical Analysis and Standard Setting

 

Statistical Analysis. Immediately following the pilot test, responses to each question were subjected to an item analysis to verify the accuracy of the scoring key and to statistically validate each question. An additional purpose of the item analysis was to help determine which 130 items (of the 150) would be used for scoring purposes. Examinee comments were also evaluated as part of this decision-making process.

A statistical item analysis evaluates responses to each question as a function of each examinee’s total test score. In general, individuals with high scores are expected to have a higher probability than examinees with low scores in answering an item correctly. When a significant portion of high scoring test takers answer an item incorrectly, the item is scrutinized for accuracy and clarity of wording. To illustrate the utility of an item analysis, Exhibit 3 represents the output for a single test item with questionable statistics. Although most individuals correctly answered this sample item, many high scorers answered it incorrectly, as indicated by the negative discrimination index, negative biserial correlation, and relatively high proportion of high scoring examinees (24%) choosing option A. Statistics such as this do not always mean the item is flawed, but works well for screening purposes. In the case of the item in Exhibit 3, the item was worded so option A also appeared correct. Flawed items can either be replaced with a pilot item, given multiple correct answers, or removed from scoring. The Certification Committee might decide to revise such items or just discard them from the item bank.

 

Other examination statistics are also evaluated to verify that the test is functioning as intended, including the mean, standard deviation, range, frequency distribution, and other graphical displays. Three very important values are: an index of reliability, such as coefficient alpha; the standard error of measurement; and an index of decision consistency. All values provide information about the consistency or dependability of test scores.

 
Discussion:

Passing Score. Passing scores for most certification examinations are established using criterion-referenced, standard-setting procedures. Examinees pass or fail based entirely on whether their level of proficiency, as measured by the examination, meets or exceeds some absolute criterion, or cut-off score. It is conceivable – but not likely – that everyone could pass, or everyone could fail. This stands in sharp-contrast to a norm-referenced test, where some predetermined percentage of examinees will pass or fail regardless of their proficiency.

 

A criterion-referenced, standard-setting procedure method known as the Angoff method was used to set the passing score for the CIIP examination. The standard setting study was conducted about 2 weeks after administration of the pilot exam. Eight individuals participated. Three were members of the Certification Committee. The other five had served as item writer and reviewers. Participants were selected on the basis of their expertise in the field and represented a variety of work settings.

 

The Angoff method requires that participants inspect each item on the exam and estimate the proportion of minimally qualified test takers who would get the item correct. This estimated proportion is called item rating. The sum of the item ratings for a participant is an expected score for a minimally qualified examinee. The objective of the Angoff method is to approximate the empirical outcome that would be obtained if it were possible to actually give the exam to a group individuals who had been determined, a priori, to be minimally qualified. By asking multiple participants to make their ratings independently, it is reasonable to average their scores to derive a passing standard, which represents, in some sense, a consensus of agreement on the expected performance of minimally qualified candidates.

 

Scores were converted via linear transformation so the passing scaled score would correspond to 75 and the maximum scaled score would be 99. Of the 103 examinees who took the pilot exam 99, of them passed. The high pass rate was attributed to the fact that a majority of the individuals who signed up for the pilot study were highly-motivated and very proficient early adopters. Scores from multiple exams will be presented to show the maturation process.

 

Subsequent Examination Forms

 

Form Assembly and Scoring. Four additional forms of the CIIP examination have been administered during the two years since the pilot study. Throughout this time frame, item writers continued to produce new items, and the Certification (Examination Committee) continued meeting to review items and assemble new test forms. To guarantee that the new test forms are comparable, in terms of test content, all forms have been constructed in strict accordance with the TCO. To ensure that the scores on different exam forms are comparable, a process known as statistical equating is used. Equating requires that new exam form have about 20% to 25% of items in common with previous forms. Then statistical models can be used to detect and correct for differences in test form difficulty. So, for example, if test B is harder than test A, test B would have a lower passing point on the raw score scale. However, scores from both forms would be placed on the same scale for reporting purposes – a scale where the passing point is set at 75.

 

Exam Results. Exhibit 4 summarizes results for the first four test administrations. The slight decrease in mean scores and pass rates, and the increase in variability, suggest that the initial pilot study group probably was not representative of the entire population of imaging informatics personnel (i.e., they were more proficient). However, even though the pass rate has dropped, the current pass rates are still reasonably high and indicate that examinees are generally quite proficient.

 
Domain Role Sample Competency
Behavioral Training Developing user training programs
  Workflow Engineering Workflow analysis
  Reading Environment RIS-PACS dictation integration
  Customer Relations Management Overcoming psychological barriers
     
Business PACS Readiness Understanding the CIO perspective
  Strategic visions Building strategic and operational committees
  Economics of PACS Total cost of ownership
  Vendor Selection Vendor support
  Project Management Performance milestones
  Sustaining PACS Recruiting PACS professionals
     
Technical Technology Overview Work stations and displays
  Systems Management Recoverability policies
  Troubleshooting Network administration
  Modalities Integration with PACS via DICOM/lHE
  Security Understanding HIPAA security and auditing
 
 
Performance Domain % of Exam Performance Domain % of Exam
Procurement 5 Image Management 20
Project Management 5 Information Technology 15
Operations 10 Systems Management 10
Communications 10 Clinical Engineering 10
Training and Education 5 Medical Informatics 10
 
 
         
Proportion Choosing
Item Prop Correct Disc Index Biserial Corr Option Total Group Low Scorers High Scorers Biserial Corr
93 .68 -.21 -.17 A .15 .10 .24 .20
Key=D       B .09 .03 .06 .06
        C .08 .07 .12 .03
        D* .68 .80 .59 -.17
 
 
Exam Date N Min Score Max Score Mean Std Dev % Pass
June 2007 103 59 97 85.7 6.4 96.1
Sept 2007 96 56 97 83.1 7.6 87.5
March 2008 103 64 98 83.0 7.7 84.5
Sept 2008 120 61 94 80.8 7.0 80.8