September Update: Our Standard Setting Process

Tweet about this on TwitterShare on FacebookShare on Google+Email this to someone

Thanks to the help of librarians from throughout southern California, we made a big step forward with test modules 1 and 2 this summer.  Because TATIL is a criterion referenced test (rather than a norm referenced test like SAILS) we rely on the expertise of librarians and other educators to set performance standards so that we can report more than a raw score when students take the test.  By setting standards, we can make and test claims about what students’ scores indicate about their exposure to and mastery of information literacy.  This standard setting process is iterative and will continue throughout the life of the test.  By completing the first step in that ongoing effort, we now have two module result reports that provide constructive feedback to students and educators.

Standard setting plays an important role in enhancing the quality of the test.  For more detailed information about the standard setting method like the one we used, I recommend these slides from the Oregon Department of Education. The essence of this approach to standard setting is that we used students’ responses from the first round of field testing to calculate the difficulty of each test item.  Then the test items were printed out in the order of how difficult they were for students.  Expert panelists went through these item sets, using their knowledge of student learning to identify points in the continuum of items where the knowledge or ability required to correctly answer the questions seemed to cross a threshold.  These thresholds indicate the boundary between beginning students, intermediate students, and expert students’ performance.  We then used the difficulty levels of the items at the thresholds to calculate the cut scores.

We are extremely grateful to our expert panelists for the depth and authenticity of their engagement throughout the two days of standard setting.  We made a strong team because of our differences.  The panel included two library deans, librarians trained in instructional design, community college and university librarians; librarians who work predominantly with first-generation students, English learners, and returning students as well as librarians who usually work with upper division students, science majors, and special collections.  Panelists also contributed insights from their own linguistic, ethnic, racial, and class backgrounds, which informed our conversations about item thresholds, performance level descriptors, and test items.

The performance level descriptors are the narrative explanations of the skills, abilities, and knowledge that are associated with each level of students’ development in their progression toward expertise in information literacy.  We developed our descriptors throughout the process of refining outcomes and performance indicators.  You can see a draft version of our descriptors here.  During the standard setting, panelists used the descriptors to guide their decisions about which items were on the threshold between two levels.  After the standard setting, panelists used what they had learned about students’ performance to revise the descriptors so that they were complete and unambiguous.  I then used the revised descriptors to create our test report narratives, which describe students’ performance and suggest strategies students can use to strengthen their information literacy in the future.

The items we used to calculate the cut scores were identified over the course of multiple rounds of bookmarking, during which our panelists’ made individual judgements about which items represented a transition at the threshold between lower and higher levels of performance.  Between rounds we discussed the descriptors and patterns in the range of items the panelists selected.  With each round, the panelists’ selections came into closer alignment, and then we used the most difficult selected items to calculate the cut score.  Doing this means that all items that any of our panelists thought were part of a level are included in the calculation of the cut score for that level.

I was moved by the care that the panelists took throughout the standard setting process and with the respect they showed for students’ information literacy at each level of development.  I am looking forward to reconvening expert panelists this winter to revisit the cut scores for modules 1 and 2 and perform the bookmarking that will result in standard setting for modules 3 and 4.

Tweet about this on TwitterShare on FacebookShare on Google+Email this to someone