Assessing Assessing: 2009

Monday, December 14, 2009

Assessment and Evaluating Student Work

It's ironic, given it's centrality, how little that's sensible and defensible has been said about the relation between grading and assessment. To my mind, it's a lost opportunity to offer constructive criticism of grading in general as well as a failure on the part of the assessment industry to demonstrate and convey clear thinking and to develop useful tools for teachers.

And so here is part one of working through a relationship between grading and assessing.

My students always want to know "how much does this count for" and so my syllabus always says something like

Exam 1	30%
Problem Sets	20%
Final Essay	40%
Participation	10%

When I grade any particular item -- assignment or paper -- I make use (at least implicitly) of a similar decomposition of the grade. If it is an essay I may be evaluating the quality of the writing, the use of evidence, the structure of the argument, the use of sources, and so on. If it is an exam, the questions can usually be separated into a finite number of groups, each "testing" a particular skill or understanding of a particular concept (but see fn 1 below). Let's imagine a class in which five skills or concepts, J,K,L,M, and N, make up the content. And let's imagine my graded activities from above can be described this way

Exam 1	25%J	25%K	25%L	25% critical thinking
Problem Sets	20%J	20%K	20%L	20%M	20%N
Final Essay	20% Writing	30% J-N	30% Argument	20% Scholarly conventions
Participation	33% J-N	33% Staying up with material in course	33% Poise, verbal skills, etc.

Now let's look at how all of the things I've graded fit together. In the table below, the rows represent skills or learning outcomes that I want students to demonstrate. The columns show me the evaluative tools I've used and which of these each one included.

Substance	Exam 1	Problem Sets	Final Essay	Participation
Concept J	+	+	+	+
Concept K	+	+	+	+
Concept L	+	+	+	+
Concept M	-	+	+	+
Concept N	-	+	+	+
Writing	-	-	+	-
Argument	-	-	+	-
Scholarly Conventions	-	-	+	-
Critical Thinking	+	-	+	-
Keeping Up	-	-	-	+
Verbal Skills	-	-	-	+

Next let's suppose that I graded each of these items on an A-F scale and that I've made some attempt to put on paper how I "operationalize" the grades "excellent," "good," "satisfactory," etc. I might, for example, have let students know that I consider an excellent use of concepts in the final essay to be when

Essay employs 3 or more of main concepts from the course in a manner that's appropriate to the subject at hand and that demonstrates a strong understanding of what they mean and how they can be useful.

And finally, let's assume that my program goals include concepts K and M and the school as a whole includes writing and critical thinking as goals.

I can simply take scores on concepts K from all four evaluations and M from the last three and then take the writing score from the final essay and the critical thinking scores from the first exam and the final essay and, oila, I've got my assessment.

fn 1 Two things require mention: 1) not every skill/concept that we expect to be learned is measured on every exam/exercise -- exams are samples; 2) many "items" will depend on more than one skill or concept. More on these issues later.

Wednesday, November 25, 2009

Responding to Student Writing

In a piece called "ABOUT RESPONDING TO STUDENT WRITING," Peter Elbow writes:

The fact is there is no best way to respond to student writing. The right comment is the one that will help this student on this topic on this draft at this point in the semester -- given her character and experience.My best chance for figuring out what is going on for any particular student at any given point depends on figuring out what was going on for her as she was writing.

I found this document on our schools website under "Teaching Resources" on the Provost Office page. It sounds like pretty good advice.

It also sounds different from the advice I find on another page of the institution's website. On that page I find a "rubric" for assessing student learning in essays. It gives me six categories (overall impression, argument, evidence, counter evidence, sources, citations) and wordy descriptions of different levels of achievement in each. It's pretty unclear from the document how it is intended to be used, but basically, it's a grading scale.

Here's my question: which kind of teaching students to write does my boss want me to use?

Friday, November 6, 2009

Spellings' Flawed Metaphor

An interesting article and even more interesting responses in Inside Higher Ed from a few years back: "The Flawed Metaphor of the Spellings Summit"

Monday, November 2, 2009

Six Things to Beware of, Grasshopper...

A lot of such talk as there is about innovation and change in higher education these days shows up in the general orbit of assessment. Having recently listened to or read a lot of material from assessment experts, I jotted down a few cautions.

Beware the tyranny of software. I like software. I write computer programs. But, at the risk of sounding Asimovian, software has to serve education, not the other way round. For the last ten years we have repeatedly adjusted the way we educate to the needs of the software ("Banner won't let you do that..."). Banner and its ilk are just hammers. No right-thinking carpenter changes the way she builds a house because of hammer limitations.

Beware the fetishization of uniformity. Healthy ecosystems, organizations, relationships, families, and individuals entertain a healthy dialectical tension between sameness and difference, uniformity and irregularity, standardization and improvisation. Whether you are trying to be a virus that can outwit immune systems (or an immune system that can shut down a virus), or building a firm that can weather economic ups and downs, or running a college that produces excellent graduates from a stunning variation of inputs, the key is to cultivate order and chaos simultaneously.

Beware projection and other forms of X-o-centrism. We are all subject to our own version of Saul Steinberg's classic "New Yorker's view of the World." What works for me, or in one course, or in my department, or in our division, or in one school I know about, is surely good for you. Whether it's a analogy or metaphor, an algorithm, a social form, or a paradigm, or a homeomorphism, context and local history matter.

Beware foolish numberers and their arrogant misquotations of Lord Kelvin -- if you can't measure it, it does not exist -- and mindless adherence to quantification as an end in itself.

Beware saviors, those who in the face of skepticism and critique fancy themselves the new Galileo or who too readily imagine they are members of a new Vienna Secession or Salon des Refusés.

Beware assurances that complex things can be done with little effort or in far less time than you think. Most things that are easy and simple and beneficial have already been done. Things like the valid measurement of educational outcomes are not simple. Getting it right takes time and effort.

Most of all, beware a movement that cannot apply its own techniques to itself.

When the Blind Meet the Lost

Assessment has landed where it has because the "movement" is driven, far beyond our individual institutions' halls, by a political agenda and small minds who have seized on an entrepreneurial opportunity and attached themselves to it. In a democratic society, that politcal agenda deserves a free and open debate. Unfortunately, many of the individuals who have attached themselves to it, are either unaware of its terms or incapable (or afraid) of engaging in such a debate. Unfortunately for those who are behind the movement, many of their foot soldiers are an embarrassment and either they themselves are not competent to realize this or they are too ideologically blinded to care.

Perhaps the most telling characteristic of the "assessment movement" is its failure to live up to its own standards: there is no culture of accountability and measurement and assessment in the assessment community. It would not be the first movement (in education or elsewhere) to suffer from this shortcoming.

Friday, October 30, 2009

The Fetishization of Rubrics I

The one thing you see over and over and over in the assessment literature is the "rubric." Never mind, for now, the history of the concept -- that's an interesting story but it's for another time.

For now, just a quick note. A rubric is basically a two dimensional structure, a table, a matrix. The rows represent categories or observable or measurable phenomena (such as, for grading an essay, "statement of topic," "grammar," "argument," and "conclusion") and the columns represent levels of achievement (e.g., "elementary," "intermediate," "advanced"). The cells of the table then contain a description of the level of "grammar" that would constitute different levels of performance.

A rubric is, we could say, just a series of scales that use the same values with something like the operationalization of each value specified.

Rubrics are, in other words, nothing new. Why then, our first question must be, do assessment fanatics act as if rubrics are new, something they have discovered and delivered to higher education?

I would submit that the answer is ignorance and naivete. They just don't know.

A second question is why their rubrics are so often so unsophisticated. Most rubrics you find on assessment websites, for example, suggest no appreciation for something as elementary as the difference between ordinal, interval, and ratio measurements. Take this one, which is a meta-rubric (an assessment rubric for rating efforts at assessment using rubrics). (Source: WASC)

Criterion	Initial	Emerging	Developed	Highly Developed
Comprehensive List
Assessable Outcomes
Alignment
Assessment Planning
The Student Experience

Looks orderly enough, eh? Let's examine what's in one of the boxes. Here's the text for "Assessment Planning" at the "Developed" level:

The program has a reasonable, multi-year assessment plan that identifies when each outcome will be assessed. The plan may explicitly include analysis and implementation of improvements.

It looks like we need another rubric because we've got lots going on here:

What makes a "reasonable, multi-year plan"?
Mainly what we need here are dates : when will each outcome be assessed.
How should the assessor rate the "may-ness" of analysis and implementation? Apparently these do not make the plan better or worse since they may or may not be present.

Our next analytical step might be to look at what varies between the different levels of "Assessment Planning" but first let's ask what conceptual model lies behind this approach? It's very much that of developmental studies, especially psychology. The columns are, thus, stages of development. In psychology or child development the columns have some integrity in the natural stages a person goes through. Dimensions may be independent in terms of measurement but highly correlated (typically with chronology) and so "stages" emerge naturally from the data.

In the case of assessment, though, these are a priori categories made up by small minds who like to put things in boxes. And the analogy they are making when they make them up is very much to child development. An assessment rubric is a grown up version of kindergarten report cards.

Friday, August 28, 2009

Why Do We Need a Faculty Assessment Committee?

Any time we create a committee we should stop and ask why. The baseline for answering that question should be the world (or institution) without the committee. How was it? How would it be?

When I think about that in this case, here's what I come up with. With no practicing assessment committee (it was appointed but never met last year):

Faculty have felt little opportunity for real input into assessment
The process has in fact, over the years, been dominated by non-faculty and non-academics.
Many faculty members are unimpressed with the process. Substantive missteps have been frequent. Faculty members' assessments of assessment span the range from feeling insulted by the unprofessional and intellectually demeaning tone with which assessment has frequently been conveyed to serious criticism of the validity of the methods used in assessment and real concern about how it is consistently ignored or dismissed. And much in between.

[I suspect that from the "other side" it looks like this

Faculty have been slow to adopt a culture and practice of assessment
Our job is to get the institution to comply with WASC enough to get us re-certified]

So how to make the world different WITH an assessment committee? If I were an administrator, I'd think that the committee could help me to bring the faculty along. I could co-opt them as fellow champions of assessment as currently practiced and they'd be vanguards of the movement.

Uh, I don't think so. The problem with assessment is not lack of faculty buy-in. Let's repeat that: THE PROBLEM WITH ASSESSMENT IS NOT FACULTY BUY-IN. The problem with assessment is (are):

its methods are methodologically dubious
its logic model (observation>analysis>change) is vague, rarely made explicit, and more wishful thinking than realistic
it dishonestly or naively hides its political values behind a veil of "objective measurement"
it is dominated by self-serving educational entrepreneurs who live off, not for, assessment
it is evangelized in the absence of hard thinking about institutional inputs and outputs, the very things it purports to be sensitive to
it enters the academy as a fait accompli, more based on conviction and belief than theory, analysis, and argument, and exempts itself from the critical examination and culture of evidence that it champions

So, what can an assessment committee do if even some of the above is in fact that case? Mainly, I think, hold assessment accountable to normal standards of intellectual integrity and professionalism. If we do that, I predict, there would be changes in how assessment is implemented, changes that would allow the process to capitalize on its virtues and avoid some of its vices. And in reaction to THAT you would get more buy in. The giant flaw in how it's been handled so far is that assessment is blind to closing its own loop. When faculty don't fall in line, it's not necessarily because they are resistant to change, unwilling to give up their comfortable sinecures, or too arrogant to think about students. Sometimes its because they have looked at something, and, smart people that they are, found it wanting.

It may even be that the resistance to change and feedback, the comfortable sinecures, and the arrogance that deflects all criticism may lie in the assessment industry itself. The rest is, as they say, projection.

Sunday, August 23, 2009

Let's Take It Seriously

Let's take assessment and accountability seriously AS AN INSTITUTION. There is a tendency to equate assessment with measuring what professors do to/with students. The buzz word is "accountability" and there's this unspoken assumption that the locus of lack of accountability in higher education is the faculty. I think that assumption is wrong.

We should broaden the concept of assessment to the whole institution. Course instructors get feedback on an almost daily basis -- students do or don't show up for class; instructors face 20 to 100 faces projecting boredom or engagement several times per week; students write papers and exams that speak volumes about whether they are learning anything; advisees tell faculty about how good their colleagues are. By contrast, the rest of the institution has little, if any, opportunity for feedback. It's important: one substandard administrative act can affect the entire faculty, so even small things can have a big negative effect on learning outcomes.

In the name of accountability throughout the institution I propose something simple, but concrete: every form or memo should have a "feedback button" on it. Clicking on this button will allow "users" anonymously to offer suggestions or criticism. These should be recorded in a blog format -- that is, they accumulate and are open to view. At the end of each year, the accountable officer would be required in her or his annual report to tally these comments and respond to them, indicating what was learned, what changes have been made or why changes were not made.

The important component of this is that the comments are PUBLIC so that constituents can see what others are saying. Each "user" can see whether her ideas are commonly held or idiosyncratic and the community can know what kind of feedback an office is receiving and judge its responsiveness accordingly.

Why anonymous? This is feedback, not evaluation. This information cannot be used to penalize or injure anyone. The office has opportunity to respond either immediately or in an annual report. Crank comments will be weeded out by sheer numbers and users who will contradict them. In the other direction, it is clear that honest feedback can be compromised by concerns about retribution, formal or informal. Further analysis along these lines would further support the idea that comments should be (at least optionally) anonymous.

We should note that we already do all of this in principle -- many offices around campus have some version of a "suggestion box." What is missing is (1) systematic and consistent implementation so that users get accustomed to the process of providing feedback, and (2) a protocol for using the feedback to enrich the community knowledge pool and to build it into an actual accountability structure.

The last paragraph makes the connection to a sociology of information. Information asymmetries (as when the recipient knows what the aggregate opinion is, but the "public" does not) and the atomization of polities (this is what happens when opinion collection is done in a way that minimizes interactions among the opinion holders -- cf. Walmart not wanting employees to discuss working conditions -- preventing the formation of open, collective knowledge*) are a genuine obstacle to organizational improvement. Many, many private organizations have learned this; it's not entirely surprising that colleges and universities are the last to get on board.

* as opposed, say, to things that might be called "open secrets"

Friday, August 21, 2009

The Bannerization of Assessment

I met recently, along with Andy, Alice, and Kiem, with two reps from the Blackboard company. They were here to tell us about a Blackboard add-on product called, I think, "the assessment module." Herewith, some observations.

The product incorporates some of the functionality that most of us have seen recently in the CARP software. It allows folks at different levels of the instructional process -- from instructors up to deans and assessment staff -- to input, collate, tally, analyze, query, and report on all manner of information related to assessment. It has the advantage of using the same overall interface and design logic that we are familiar with from our use of Blackboard for classes and it's "flexible" and can be integrated with BANNER.

The mere fact of investing in the software would probably send a positive signal to WASC that we are an institution that is taking assessment seriously. It would also greatly simplify the work of the office of institutional research by organizing assessment data in one place and one format. Nobody at the meeting was prepared to give actual numbers but it seems logical that it could save lots and lots of hours of work in that office (and probably in other offices that have to prepare materials for WASC).

Much of the labor saving derives from the fact that the system assumes that instructors will use it to collect and assess at least some of the work students do in their courses. At a minimum, the system allows students to submit papers, essays, etc. in electronic form and then the assessment group can process these using rubrics we've developed so as to arrive at some measure of student achievement in our programs. In most cases we'd rise above that minimum: instructors would simply use the system itself to do the grading and feedback on papers and exams and so this information would be "automatically" recorded and tallied up for use in assessment. This would make faculty life easier because we would not have to submit separate assessment information. Ideally, most, if not all, of the work that we assign for evaluation in courses would be associated with a rubric that would be in the system and then students could submit work electronically and we could have open in side-by-side windows the student's work and the rubric and we rate the work on each measure, add comments, etc. and then the student receives the feedback in electronic form and the aggregate results for the class are automatically recorded and forwarded "up the chain" to department heads, the office of assessment, etc. as appropriate.

MY TAKE-AWAY

After several hours listening to the Blackboard reps (sales and technical folks) here are a few observations:

These folks do not understand how education happens in a liberal arts college.
1. what is valuable for students
2. how departments work
3. how decisions get made
4. what value added professors actually bring to the mix
Instead the software is designed to resonate with an auditor's fantasy of higher education as might be manifest in the cfo of a large, for-profit, online university.
Feature after feature of the software is perfect for online correspondence courses as offered by, say, University of Phoenix.

While the company representatives repeatedly touted the system's "flexibility," in fact, it imposes dozens upon dozens of assumptions about teaching and learning on the process without any self-consciousness. The whole thing derives from a particular view of academic assessment (itself a refugee from peer review) and purveyors appeared to have zero sense that its epistemological status was different from, say, the law of gravity.

Totally absent from their pitch was any sense at all that there was an educational problem that this product could help you solve.

What it does address is the fact that institutions like Mills have been told "you must do something" and this is clearly a something and spending a lot of money on it would be a great demonstration of institutional commitment.
The company appears to have done zero assessment of the temporal impact of the processes the software would require. "Eventually, instructors would get really good at entering this stuff and so the time involved would drop over time..." "The information could be viewed and sliced in many different ways..." (by whom?) Is there a net gain in productivity? No idea. Is there a net positive for student learning? No idea. Will more parents want to pay our tuition because we use this system? No idea. What should instructors stop doing to make time to use this system? No idea.

What they are selling is "a license and consulting." In order to figure out how to use the software and adapt it (remember, it's very flexible) you have to hire them as consultants. Remember too, that these consultants, as far as I can tell, have very little fundamental appreciation for how a liberal arts college works. Either they will mislead us because they don't understand us or we will pay for them to learn something about how a college like Mills works.

The fact that, as potential customers, we were hardpressed to come up with things that we want to do that this product would make it easier for us to do (usually at these things users' imaginations get going and they start saying "hey, could I use it to do X?") and instead we sat their realizing that the software would make us do things is telling.

SOFTWARE DESIGNED TO CONNECT THINGS UP

Two aspects of the software are key (from a software design point of view) -- "the hierarchy" and "links."

A core concept in the software design is "the hierarchy" by which they seemed to mean the managerial hierarchy that oversees the delivery of education. At the bottom of this structure are instructors and their students. Instructors implement courses which are at the next level above them -- overseen "by a department chair or dean" who might then be under another dean. Above this we have "the assessment operation" -- as the discussion went on it seemed that this means some combination of Institutional Research and Assessment Committee. Then above this you might have other levels of college administration and then above this outside mandaters such as WASC. The genius of the software is that each institution can build-in the hierarchy that is appropriate to itself. The data in the system, the descriptions of goals and standards and such, are carefully protected so that only the appropriate people at the appropriate level of the hierarchy can see, change, etc.

The other part of the system is the links. It allows you to build a rubric for, say, reading lab assignments and for each item in the rubric to be linked back to course learning objectives which are in turn linked back to program goals and these back to institutional goals or to requirements set forth by external agencies. This means that when you evaluate 25 lab reports, the system automatically gets information on how well the institution is doing in its effort to inculcate a culture of experimentation AND it also automatically gets information about the fact that the institution is monitoring whether or not such learning is occuring. And all this simply by clicking on a radio button in an web-based report evaluation rubric!

All of these things are, of course, changeable. In theory. In practice, the system allows for the creation of extremely high levels of opaque complexity. To insure system integirity new procedures will need to be invented so that faculty who want to make a change can confer with departmental colleagues and get department head to make a request to office of assessment and then maybe the gen ed committee or the epc has to get involved etc. Or, even more likely, once stuff is in the system it just stays there until it causes a major problem.
The designers of the system seem totally oriented toward (1) the capacity to output what an entity like WASC wants and (2) changing the way instructors teach via a logic of "it's easier to join than fight" and "why duplicate your efforts?"

DISHONESTY AND ANTI-INTELLECTUALISM

A system like this is championed for its flexibility but that flexibility exists only relative to how rigid it could be. Neither its designers nor its purveyors struck me as having even a hint of a nuanced view of what education is and how it happens and how real educational organizations work. That's too bad because these are not mysterious topics -- a lot of people DO know a lot about them. What the talk of "flexibiliity" represents is marketing-speak. A common complaint about course management software and student systems software is that it is inflexible and "doesn't fit how we have kept records before" and so the folks who write it add more options to mix and match the pieces (the presenters seemed to want to impress us by the fact that on a particular screen we could have two tabs or four : "you can set it up so it's exactly right for your process!"). But that's not really flexibility, that's customization. System software is, pretty much by definition, not flexible. It's especially true that system software almost never adapts to an organization; organizations adapt to system software. We've seen this plenty over the years when we're told "Banner can't do that" or "we need this change because of Banner."

A second moment of dishonesty happens because the designers and sales force have clearly bought into the ideology of the professor/instructor as problem. They have talked for so long to assessment afficiandos and heads of assessment who get blowback from faculty that they "know" that individual professors don't like this stuff and that part of the challenge of their job is just to soft pedal around that. They are not selling this stuff to instructors. They are selling it to the instructors' managers or ueber-managers, folks who themselves have uncritically bought into the idea of there being a crisis of accountabilty in higher education. The intellectual dishonesty lies in the fact that these folks are neither willing nor able to actually have a critical conversation about any of this. They simply think of people who do not swallow it hook, line, and sinker as "unsaved."

AN IMPORTANT ASIDE

Sociologically, what's interesting is that this is an example of the "for-profit" side of education rubbing up against the not-for-profit side. Blackboard and its competitors, as well as the folks who are on the hustings about assessment, are entrepreneurs. They don't live for assessement, they live off assessment. And we know that that's an arrangement that makes intellectually honest discussions hard to come by

Thursday, August 6, 2009

It IS an Industry...

Recently got copy of an email sent out by McGraw-Hill's "Assessment Research Project." In part it said:

As a quick reminder, we are conducting this nation-wide study to learn about assessment practices that are actually being used by professors of Introductory Sociology. We are seeking a copy of your syllabus, mid-term and final exams. If you do not assess cumulatively, please submit your mid-year and end-of-year exams along with your syllabus.

The recipient was assured that the material would be kept confidential and not published in any of their teaching materials (interesting that they even need to say this) and was asked to be sure to "please be sure to let me know if you would like to receive a Certificate of Participation and/or an honorary mention in our research"! Maybe you could include that in your tenure file.

There's money in them thar hills, folks, and even though the accreditation agencies, think tanks, washed-up academics who've become assessment entrepreneurs, and standardized testing organizations have a head start (since they are the ones who get to set the agenda), textbook publishers are starting to realize that if we are up against the wall we'll love to order textbooks that come with "integrated assessment plans" of some kind.

Onward and upward!

Wednesday, August 5, 2009

Validity and Such

An AACU blogpost referred me to the National Institute for Learning Outcomes Assessment website which referred me to an ETS website about the Measure of Academic Proficiency and Progress (MAPP) where I would be able to read an article titled "Validity of the Measure of academic Proficiency and Progress ."

And here's the upshot of that article: The MAPP is basically the same as the test it replaced and research on that test showed

...that the higher scores of juniors and seniors could be explained almost entirely by their completion of more of the core curriculum, and that completion of advanced courses beyond the core curriculum had relatively little impact on Academic Profile scores. An earlier study (ETS, 1990) showed that Academic Profile scores increased as grade point average, class level and amount of core curriculum completed increased.

In other words, the test is a good measure of whether students took more GenEd courses. And we suppose that in GenEd courses students are acquiring GenEd skills. And so these tests are measures of the GenEd skills we want students to learn.

A tad circular? What exactly is the information value added by this test?

Introduction

This is intended to be a critical discussion of assessment in higher education.