Assessing Assessing

Meanwhile, On Our Other Flank...

2012-07-24T09:35:00.003-07:00

Lots of online commentary and reposting of OpEd ("The Trouble with Online Education") by M Edmundson of UVA in last week's New York Times. For an interesting response see N Jurgenson on the Cyborgology blog.

Personally, I’m pretty sure that we (non-elite liberal arts colleges) ARE sunk. Not so much BY online education as by our response to it. As far as I can tell, almost all the reaction from liberal arts quarters is just that: reaction. And mostly reactionary (that’s how I’d characterize Edmundson’s piece, even if I agree with him that a great face-to-face class can be a great thing – all one can say is “of course it is”). Our usual approach, trying to preach the demons away, does not have a good track record.

But the big mistake made by little institutions that DO take a look is to think that the way to go is to “get into” offering online courses. Terrible idea, in part due to the simple problem of scale. I pray that we at my own institution don’t even waste time on it. If we venture down the fool’s path of looking for (yet) another cash cow, we’ll be a footnote in the history books in no time at all.

Which is not to say online instruction and instructional resources are a bad thing. Au contraire! What we should be thinking about is how we can use these tools to increase productivity and effectiveness and repertoire (and thereby lower the cost of we do well so that we can attract more enrollment). Coursera and EdX (and P2PU et al.) are amazing ventures generating all manner of raw material for the bricolage of creative minds; liberal arts colleges should be thinking like innovators – learning all about them with an eye to how to use them to our benefit – not like a ostriches, fundamentalists, Alamoistas or Masadaists.

Alas, every indication I’ve seen over the last few years in our approaches to budgets, revenue, innovation, technology, and enrollment is very discouraging in this regard. Circumstances force something like 90% of our attention to be on today and we tend to squander the last 10% on yesterday; tomorrow is our neglected child.

The Path from "Running a University Like a Business" Leads Where?

2012-06-21T15:36:00.002-07:00

A blog post with some insightful comments about UVA situation (http://bit.ly/PD6WNv) notes that the folks who pressured the president to resign favor something they call "strategic dynamism." The blogger notes that it's pretty much all about dynamism and not much about strategy. That is, it's about action, not planning.

This had me thinking about Karl Mannheim's ideal types of conservative thinking (ranging from bureaucratic conservatism to fascism). Mannheim characterizes the latter as “active and irrational,” noting that at “the very heart of its theory and its practice lies the apotheosis of direct action, the belief in the decisive deed, in the significance attributed to the initiative of a leading elite. The essence of politics is to recognize and grapple with the demands of the hour” (Ideology and Utopia 1936 : 134 http://bit.ly/PD6NJT).

Indeed, it's a real danger in higher education when administrators and government boards think that "running a college like a business" means adopting the deference to swashbuckling command and control oriented decisive simplifiers. The very fact that they cannot see how many different ways there are to be "like a business" is what makes them most dangerous.

Know (and be smarter than) Your Enemy

2012-06-19T10:16:00.002-07:00

This post is not specifically about assessment, but it relates to the larger conversation of which assessment is but one component : the future of American higher education. Thanks to tweet from Cedar Reiner for turning me onto it.

You've possibly already seen this D Levy opinion piece in the Washington Post from March (or certainly other examples of the genre) example of what “they” are saying and reading (spoiler: it's the standard “we pay them 100 grand and they only work 15 hours a week” tirade): " Do college professors work hard enough?

It's a tired bit of rhetoric, to be sure, but sung over and over like church hymns, it comes to define reality for a certain set. That needs to be countered by smart talk widely repeated; smirking won't do. Here’s one reasoned rebuttal by Swarthmore's Tim Burke that casts the problem in terms of larger arc of private capture of value through de-professionalization: "The Last Enclosures."

The real challenge here is that most representatives of “the other side” (e.g., administrators, trustees, legislators) have not actually thought things through carefully but have bought into a well-crafted rhetoric and catchy simplifications, while “our side” takes a fundamentally conservative approach (same as it ever was) and puts its finger in its ears and goes “la la la la I cannot hear you….” Higher education has a broken economic model, but too many of us are content to just demonize those with really bad ideas about how to fix it. I agree with most of Burke's critique, but I think we need to move beyond critique. There is a romantic valor in identifying the corruption in the current wave of education reform, but it won't be stopped by mere resistance. Bad new ideas need to be defeated by good new ideas (as can be found in some of Burke's other posts).

What If Administrator Pay Were Tied to Student Learning Outcomes

2012-04-07T14:03:00.000-07:00

The recent negotiation in Chicago ("Performance Pay for College Faculty") of a tie between student performance and college instructor pay brought this accolade from an administrator: it gets faculty "to take a financial stake in student success."

It got me wondering why we don't hear more about directly tying administrator pay to student success. If we did, I'll bet the students would have a lot more success. At least, that's what the data released to the public (and Board of Trustees) would show. There'd be far less of a crisis in higher education.

Thought experiment. What would happen if we were to tie administrator pay to student success -- much the way corporate CEOs have their pay packages designed -- especially administrators of large multi-campus systems.

Prediction 1. The immediate response to the very proposal would be "oh, no, you can't do that because we do not have the same kind of authority to hire and fire and reward and punish that a corporate CEO has." But think about this...

Private sector management has a lot less flexibility than those looking in from the outside think. Almost all of the organizational impediments to simple, rational management are endemic to all organizations.
Leadership is not primarily about picking the members of your team. It's about what you manage to get the team you have to accomplish.
Educational administrators do not start the job ignorant of how these educational institutions work. It is tremendously disingenuous to say "if only I had a different set of tools." People who do not think they can manage with the tools available and within the culture as it exists should not take these jobs in the first place.
This, it turns out, is what some people mean when they say that schools should be run like a business. The first impulse of unsuccessful leaders is to blame the led. The second one is to engage in organizational sector envy: "if I had the tools they have over in X industry...." What this ignores is the obvious evidence that others DO succeed in your industry with your tools. And plenty of leaders "over there" fail too. It is not the tools' fault.

Prediction 2. Learning would be redefined in terms of things produced by inputs administrators had more control over. And resources would flow in that direction too.

Prediction 3. Administrators would get panicky when they looked at the rubrics in the assessment plans they exhort faculty to participate in and that are included in reports they have signed off on for accreditation agencies. They'd suddenly start hearing the critics who raise questions about methodologies. They would start to demand that smart ideas should drive the process and that computer systems should accommodate good ideas rather than being a reason for implementing bad ones.

Prediction 4. In some cases it would motivate individuals to start really thinking "will this promote real learning for students" each time they make a decision. And they'll look carefully at all that assessment data they've had the faculty produce and mutter, "damned if I know."

Prediction 5. Someone will argue that the question is moot because administrators are already held responsible for institutional learning outcomes. Someone else will say "Plus ça change, plus c'est la même chose."

Better Teaching Through a Financial Stake in the Outcome

2012-04-07T13:41:00.000-07:00

In an Inside Higher Ed article this week ("Performance Pay for College Faculty") K Basu and P Fain describe how the new contract signed between City Colleges of Chicago and a union representing 459 adult education instructors links pay raises to student outcomes.

Administrators lauded the move in part because it gets faculty "to take a financial stake in student success." The details of the plan are not clear from the article, but the basic framework is to use student testing to determine annual bonus pay for groups of instructors working in various areas. That is, in this particular plan it does not sound like the incentive pay is at the level of individual instructors.

Still, should the rest of higher education be paying attention? Adult education at CCC is, after all, a markedly different beast than full time liberal arts institutions or 4 year state schools or research universities. One reason we should because it's precisely the tendency to elide institutional differences that is one of the hallmarks of the style of thought endemic among some higher education "reformers." Those who think it's a good idea for adult education institutions are likely to champion it elsewhere.

But most germane for the subject of this blog is the question of what data would inform such pay for performance decisions when they are proposed for other parts of American higher education. Likely it will be something that grows out of what we now know as learning assessment. I ask the reader: given what you have seen of assessment of learning outcomes in your college, how do you feel about having decisions about your pay check based upon it?

But, your opinion aside, there are several fundamental questions here. One is whether you become a more effective teacher by having a financial stake in the outcome. The industry where this incentive logic has been most extensively deployed is probably the financial services industry, especially investment banking. How has that worked for society? It would be easy to cook up scary stories of how this could distort the education process, but that's not even necessary to debunk the idea. The amounts at play in the teacher pay realm are so small that one can barely imagine even a nudge effect on how people approach their work.

But what about the data? Consider the prospect of assessment as we know it as input to ANY decision process, let alone personnel decisions. Anyone who has spent any time at all looking at how assessment is implemented knows that the error bars on any datum emerging from it dwarf the underlying measurement. The conceptual framework is thrown together on the basis of dubious theoretical model of teaching and learning and forced collaboration between instructors and assessment professionals. The process sacrifices methodological rigor in the name of pragmatism, a culture of presentation (vis a vis accreditation agencies), and the tail of design limitations of software systems that wags the dog of pedagogy and common sense. At every step of the process information is lost and distorted. But it seems that the more Byzantine that process is, the more its champions think they have scientific fact as product.

It could well be that the arrangement agreed to in Chicago will lead to instructors talking to one another about teaching, coordinating their classroom practices, and all sorts of other things that might improve the achievements of their students. But it will likely be a rather indirect effect via the social organization of teachers (if I understood the article, the good thing about the Chicago plan is that it rewards entire categories of instructors for the aggregate improvement). To sell it at the level of individual incentive is silly and misleading. And, if we think more broadly about higher education, the notion that you can take the kinds of discrimination you get from extremely fuzzy data and multiply it by tiny amounts of money to produce positive change at the level of the individual instructor is probably best called bad management 101.

College Presidents and Bonuses

2012-04-07T13:13:00.003-07:00

A never posted draft dusted off and posted now.

While doing some research on a completely unrelated topics I came across a few articles on the question of whether college presidents should get performance bonuses. One is from Insider Higher Ed and asks whether there should be bonuses for improved US News ratings (basically no, but with some dissenting voices) and another in Trusteeship, published by the Association for Governing Boards (also, mostly no).

This is actually relevant to assessment since in the most general terms, assessment is about institutional accomplishment of stated mission goals.

So, let's look at how obviously corrupt it is to give college presidents performance bonuses.

Why does one give performance bonuses? To motivate behavior that rewards the institution. But that is just the president's job. To counteract self-interest? The most basic values governing a position like college president already rule this out. Anyone who needs their self-interest balanced is corrupt to start with (in fact, the opposite is the problem for governing boards: they must be vigilant in ensuring that administrators do not use their position to feather their own career nest at the expense of furthering the institution's mission).

Or maybe a governing board would want to show its appreciation for a job well done. But who is actually doing the job? Any president worth her/his stuff, knows the success of an institution of higher education depends on hundreds of individuals toiling away in vocational dedication to a task. A president who accepted a financial bonus based on increased enrollments, higher selectivity, student learning outcomes or student success would be cynical in the extreme.

This model, ported uncritically and uncreatively from the private sector, assumes far less ambiguous goals and measurements of outcomes than exist in higher education. And it assume far more command, control, and executive power than exists in the private sector. And most of all, it probably inflates the influence of presidents on important outcomes.

Supporters likely only want to go half-way. It is unlikely that they would be willing to propose cuts when performance lags or when bad decisions reduce the performance of subordinates.

There could be a satisfying, if tragic, irony when such bonuses are offered: if it becomes widely known, the resulting demoralization of faculty and staff might undo the results that won the bonus. But alas, the lag time in such things (and the fact that boards that give performance bonuses rarely administer failure penalties) would probably mean the chief executive could take the money and run (which would, perhaps, fulfill the goal of making higher education more like business).

The only ethical and rational thing for a board motivated to give performance bonuses to actually do is to reward the entire institution (and probably in a creatively progressive manner, not with an across the board percentage of salary).

See also

Dane, Roger and Allen E. Koenig. 1981. "A Good President Rates a Cash Bonus." AGB Reports, v23 n5 p30-32 Sep-Oct 1981
Hall, Holly. 2010. College President Receives Controversial Fund-Raising Bonus." Chronicle of Philanthropy, January 21, 2010.
LÓPEZ-RIVERA, Marisa. 2007. "Presidential Bonuses, Often Secret, Are Wide Open at Some Public Universities," Chronicle of Higher Education, November 16, 2007
NACUA/ACE/NACUBO. 2009. "A Federal Income Tax Guide for College & University Presidents"
Stripling, Jack and Andrea Fuller. 2011. "Presidents Defend Their Pay as Public Colleges Slash Budgets." Chronicle of Higher Education, April 3, 2011.

White House College Scorecard Suggestions

2012-02-05T16:29:00.000-08:00

The White House asked for some feedback on their proposed "scorecard" for higher education cost and value which is intended "to make it easier for students and their families to identify and choose high-quality, affordable colleges that provide good value." Below are their questions and my (quick, off the top of my head, answering-an-online-survey level of analysis) responses.

What information is absolutely critical in helping students and their families choose a college:

You shouldn't be asking this question here. It's a researchable, empirical question. First, on what basis DO people decide? Then, to what degree do they have the appropriate information to do so?

As someone who studies things like this, I don't think the info presented here as it is here presented will provide much added value or better decisions. In terms of presenting information, probably better to summarize in simpler terms: "On metric one, college X is above/at/below average for it's sector." But then don't just stop there -- we also need to global comparison because people don't get how the sectors vary.

Note that costs are in fact a distribution and presenting average after grants still leaves family very much in the dark if they've no way to know where they're likely to fall on the distribution.

Graduation rates does not suffer from this problem.

Percent of loan repayment is too crude to be useful. It's useful for a banker who may want to finance loans for a student at a given school, but very unclear how this number helps student/family shopping for a college.

Average loan amount is useful.

As important as earning potential is, it's a really stupid number here. Just do a tiny bit of due diligence and you'll see screamingly wide variations across majors, careers, and even within majors. Lawyers, for example, have a certain average starting salary, to be sure, but really big range of variation. Frankly I think putting a single number or even a distribution of incomes next to the name of a school would be nothing but phony quantification. Either that or have a really big footnote explaining statistical significance of differences in means.

What other information would be helpful:

Rather than average loan amount and discount rate what would be useful would be ratios. Tell me (1) list price cost of attendance is X; (2) distribution of discounts is ... and (3) range of debt at graduation is ...

Interesting that you don't really have any room for general comments on doing this at all. You are going to end up diverting an incredible amount of resources toward a project that will in all likelihood produce at best some only moderately useful numbers with huge error bars on them. You will feed into the illusion that choice produces improvement (can you cite any actual evidence?). And you will do absolutely nothing that actually lowers or controls costs, increases graduation rates or lowers indebtedness. In short, not a drop of innovation here. Lots of window dressing, but very little that deserves the name policy.

I'm left wondering why this administration is so confident that "better than the alternative" will continue to be a reason people like me support you.

Does the scorecard cause you to think about things you might not have otherwise considered when choosing a college:

Not in the slightest. It makes me think that whoever made it up has never actually been through the process. It reads more like it is informed by a need to respond to conservative activists who are trying to make hay about higher education. As an Obama supporter and contributor I have to admit it's really a little bit embarrassing to read this as part of the administration's policy proposal. If you can't do better than this, I wonder how bad it would really be to have a republican in the WH as well as in control of congress.

How should this version be modified for 2-year colleges:

Look, it's pretty obvious that there are two issues with two year colleges: (1) to what degree does it lead to successful and timely completion of a four year degree, OR (2) to what degree does it yield serious, usable job training.

So, a start would be to provide rate of students who seek admission to four year who actually graduate from a four year. But really easy to get garbage data on this if you don't set up the categories and the tracking really smart.

On the job side, again, gonna be really serious data quality problems that will likely as not make the information worthless (mostly because you are going to see massive variation from program to program WITHIN schools). That said, let's start with simple "how many people are working in a full-time non-temporary job in or related to the field of their AA degree within X years?"

How should comparison groups for colleges be made? What are important things to consider in grouping institutions together that serve similar students:

Catch-22 here. You are asking people to choose -- if you separate it out too well, the really important thing gets lost: we want people to better understand what the different "rungs" represent. One of the big crimes in higher education is that crappy institutions with minimal value added get to promise people a college degree. And if you only compare within groups each one gets to, in a sense, set the standards. What you need is a tool that more clearly lets people see the payoff differences between the tiers (to the degree there are some).

A most important thing that you'll probably leave out is the effect of what you bring to college on the college outcomes. Huge naivete in college assessment world that the college output has only to do with what the college did. Gigantic effects of origins still at work in higher education. Just be sure your new tool doesn't simply do more to perpetuate the myth.

What search and comparison features would you like the online tool to have:

Something that shows schools in context and behind that groups in context (where does this school sit within its group and where does its group sit in the larger picture).

What should we call this tool? Would a different name better explain the service being provided:

One name would be "Republican Higher Education Policy as Adopted by Obama Administration."

Rubrics, Disenchantment, and Analysis I

2011-09-25T12:20:00.000-07:00

There is a tendency, in certain precincts in, and around, higher education, to fethishize rubrics. One gets the impression at conferences and from consultants that arranging something in rows and columns with a few numbers around the edges will call forth the spirit of rational measurement, science even, to descend upon the task at hand. That said, one can acknowledge the heuristic value of rubrics without succumbing to a belief in their magic. Indeed, the critical examination of almost any of the higher education rubrics in current circulation will quickly disenchant, but one need not abandon all hope: if assessment is "here to stay," as some say, it need not be the intellectual train wreck its regional and national champions sometimes seem inclined to produce.

Consider this single item from a rubric used to assess a general education goal in gender:

As is typical of rubric cell content, each of these is "multi-barrelled" -- that is, the description in each cell is asking more than one question at a time. It's not unlike a survey in which respondents are asked, "Are you conservative and in favor of ending welfare?" It's a methodological no-no, and, in general, it defeats the very idea of dis-aggregation (i.e., "what makes up an A?") that a rubric is meant to provide.

In addition, rubrics when they are presented like this are notoriously hard to read. That's not just an aesthetic issue -- failure to communicate effectively leads to misuse of the rubrik (measurement error) and reduces the likelihood of effective constructive critique.

Here is the same information presented in a manner that's more methodologically sound and more intellectually legible:

At the risk of getting ahead of ourselves, there IS a serious problem when these rank ordered categories are used as scores that can be added up and averaged, but we'll save that for another discussion. Too, there is the issue of operationalization -- what does "deep" mean, after all, and how do you distinguish it from not so deep? But this too is for another day.

Let's, for the sake of argument, assume that each of these judgments can be made reliably by competent judges. All told, 4 separate judgments are to be made and each has 3 values. If these knowledges and skills are, in fact, independent (if not, a whole different can of worms), then there are 3 x 3 x 3 x 3 = 81 combinations of ratings possible. Each of these 81 possible assessments is eventually mapped on to1 of 4 ratings. Four combinations are specified, but the other 77 possibilities are not:

Now let us make an (probably invalid) assumption: that each of THESE scores is worth 1, 2 or 3 "points" and then let's calculate the distance between each of the four scores. We use standard Euclidean distance – r=sqrt(x2 + y2) with the categories being: Mastery = 3 3 3 3, Practiced = 2 2 2 3, Introduced = 2 2 2 2, Benchmark = 1 1 1 1

So, how do these categories spread out along the dimension we are measuring here? Mastery, Introduced, and Benchmark are nicely spaced, 2 units apart (and M to B at 4 units). But then we try to fit P in. It's 1.7 units from Mastery and 2.2 from Benchmark, but it's also 1 unit from Introduced. To represent these distances we have to locate it off to the side.

This little exercise suggests that this line of the rubrik is measuring two dimensions.

This should provoke us into thinking about what dimensions of learning are being mixed together in this measurement operation.

It is conventional in this sort of exercise to try to characterize the dimensions in which the items are spread out. Looking back at how we defined the categories we speculate that one dimension might have to do with skill (analysis) and the other knowledge. But Mastery and Practiced were on the same level on analysis. What do we do?

It turns out that the orientation of a diagram like this is arbitrary -- all it is showing us is relative distance. And so we can rotate it like this to show how our assessment categories for this goal relate to one another.

Now you may ask what was the point of this exercise? First, if the point of assessment is to get teachers to think about teaching and learning, and to do so in a manner that applies the same sort of critical thinking skills that we think are important for students to acquire then a careful critique of our assessment methods is absolutely necessary.

Second, this little bit of quick and dirty analysis of a single rubric might actually help people design better rubrics AND to assess the quality of existing rubrics (there's lots more to worry about on these issues, but that's for another time). Maybe, for example, we might conceptualize "introduce" to include knowledge but not skill or vice versa? Maybe we'd think about whether the skill (analysis) is something that should cross GE categories and be expressed in common language. And so on.

Third, this is a first step toward showing why it makes very little sense to take the scores produced by using rubrics like this and then adding them up and averaging them out in order to assess learning. That will be the focus of a subsequent post.

What Will "Assessment 2.0" Look Like? A Proposal

2011-08-14T13:21:00.000-07:00

The most serious flaw in assessment as now practiced is the premise that it is something that teachers are not interested in, do not want to do, have not been doing, etc. A word that comes up a lot in connection with assessment is "accountability," but most folks who use the word don't take the time to be explicit about just who is supposed to be accountable to whom for what. When someone does get beyond just parroting the word, the most common interpretation seems to be "we need to hold teachers accountable."

We have some news for those who have discovered assessment. Teachers -- lecturers, instructors, professors -- have long been interested in what works and what doesn't in the classroom. Those who would appoint themselves guardians of learning have a nasty habit of trotting out stereotypes of the worst professor ever and, in a classic example of question begging, concluding that such figures dominate the academy and represent a threat to the future of higher education.

But rather than argue about that, here's a proposal for what the next stage in assessment might look like.

Given that most professors and most departments are actually interested in student learning and in how to maximize it -- this is, after all, the vocation these folks have chosen -- the resources that have been pumped into assessment projects should be put at the service of the faculty. Throughout Assessment 1.0 the dominant pattern is for an office of assessment to be in the driver's seat, more or less dictating to faculty (generally relaying what had been dictated to them by accreditation agencies) how and when assessment would happen. Many faculty found the methods wanting and the tasks tedious and pointless, but most went along -- at some institutions more willingly and at some less. The interaction between faculty and assessment offices generally came down to the latter making work for the former without the former seeing much in the way of benefits.

That's unfortunate because there are lots of potential benefits for us as instructors. But to realize them, we need to turn the tables. The basic premise of Assessment 2.0 should be (1) that it be faculty driven and (2) that assessment offices work for the faculty, rather than the other way round. Assessment offices should think of themselves as a support service for the academic program rather than a support service for a regulatory body that oversees the academic program from the outside. The main job of assessment offices should be to make a part of the job that faculty do, as professionals practicing their craft, easier. A part of what professionals do is self monitor and mutually monitor outcomes. As faculty, we need to think about what information will help us to make micro-, meso-, and macro-adjustments in our practice that will improve the outcomes we are collectively trying to achieve.

And the services of our assessment offices should be available to us to obtain it. We need to put the focus back on this side of the operation and shift away from the idea that the primary motivation behind assessment is to prove something to outsiders. Even the rhetoric from the accreditation agencies, if you slow the tape down and listen, resonates with this: they demand evidence that assessment is happening, that program adjustments happen in response to it, and so on. Where they are wrong is in their ignorant insistence that such things were not already happening.

The assessment industry did not invent assessment -- they simply codified it and figured out how to make a living off of doing it instead of being involved directly in educating.

Too Bad Higher Education "Experts" and Vendors aren't Graded

2011-08-11T11:09:00.000-07:00

I was inspired by a TeachSoc post from Kathe Lowney today to have a look at two articles in the Chronicle of Higher Education on computer essay grading.

The articles are "Professors Cede Grading Power to Outsiders—Even Computers" and "Can Software Make the Grade?"

My Review: A typical Chronicle hack job to my mind. Articles like this remind me of National Enquirer. Author makes little attempt to critically assess comments from his sources and gives little weight to contrary information (failing to infer, for example, anything from reported fact that in six years of marketing, almost no one has bought into the computer grading product mentioned). He jumps on grade inflation bandwagon instead of offering an analytic take on it. In typical COHE fashion he sets up false dichotomies and debates between advocates and defenders as if there is a big divide down the middle of higher education. In effect, articles like this are just product placement -- hopefully without kickbacks -- and "if someone says it then it's a usable quote" journalism. As with many COHE articles, it reflects journalism that's more in touch with the higher education industry than with higher education. It's mediocre work such as this that makes me let my subscription lapse every year or so. It's interesting how COHE seems to have no qualms at all about trashing educators and educational institutions but only ever so rarely do they seem to take an even gentle critical look at education vendors.

On the accompanying "compare yourself to the computer" article : I think I'd fire a TA who graded like that -- the words "capitalism" and "rationality" showing up constitute "concepts related to him" and an answer on Marx where "expelled for advocating revolution" = "significance for social science"? I scored them 4 and 2 and that was generous. I'd be mighty disappointed if I were the makers of that software and this is how my product placement in COHE turned out -- would anyone buy it based on this portrayal?!

The Rubrikization of Higher Education

2011-03-22T16:58:00.000-07:00

The rubricization of education has always rubbed me the wrong way but I’ve never been able to put my finger on concrete flaws beyond the obvious. This past January I attended the AAC&U conference in San Francisco. A few more problems became clear.

There are three obvious methodological/measurement problems that have long stood out:

1. Almost every rubric I have ever seen has exhibited gads of multi-dimensionality in the different skills/items/categories/rows. Another way to say this is that the rows typically posed double or multi-barreled questions to the evaluator. Or, even if the construct named in the row was simple, the description of the different scale levels would be multi-dimensional. Example:

Category	Advanced (4)	Competent (3)	Developing (2)	Underdeveloped (1)
Structure	Sections fit together in logical sequence; claims, evidence, analysis, conclusions distinguished; logic of argument telescoped and reviewed

One argument that this is not a problem is that all the things listed here typically go together and that they are all indicators of the same underlying skill. Maybe. But it seems to be a stretch that all these skills nicely fall into a simple four level linear scale.

2. The second problem here is just that four point scale. What evidence is there to support the idea that “Advanced” level structure is two times as much structure (or as much skill) as “Developing”? This does not matter much when we are simply looking at these four levels, but the first thing that that folks with just a little quantitative skill do is come up with average ratings for a group of students on a skill rating like this.

Let us be clear: computing the average of a scale that has not been shown to have the arithmetic properties of what we call an interval scale PRODUCES MEANINGLESS RESULTS.

3. The third problem with rubriks like this is that the items (rows) are not necessarily exhaustive or mutually exclusive. In other words, they do not always include all the components of learning that might be (or should be) happening and the individual items often tap into the same underlying skill. The former is a substantive problem to be solved by better conversations about the goals of education. The latter, though, lead to bad data. Suppose three items X, Y, and Z are listed in a rubric and that the elaborate operationalizations of the different levels of these involve underlying skills a, b, c, d, and e.

Category	Advanced (4)	Competent (3)	Developing (2)	Underdeveloped (1)
X	Blah blah blah {a} blah blah blah {c}
Y	Blah blah blah {b} blah blah blah {c}
Z	Blah blah blah {d} blah blah blah {a} blah blah blah {e} blah blah blah {c}

Where we’ve put in curly brackets the underlying skill that the description “blah blah blah” refers to. In this rubrik, skills a and b get counted twice, skill c three times. When data is aggregated, success on a, b, or c will easily mask lack of progress on d or e.

4. But here is the most serious problem of rubricization. It completely drives out of the teaching and learning process any response to individual variations in understanding. The role of the teacher as offering constructive criticism about the wide range of variability in learning is driven out in favor of a set of categories.

One great irony in this is that so many of the champions of this approach to educational reform are the very folks who preach about variability of learning styles.

Another is the high level of concern about students who “fall between the cracks.” Here we are developing a system with explicitly designed cracks between which they can fall.

Yet another is that a mantra of the rubrik crowd is “evidence based” and “data driven” decisions. And yet the very devices that lie at the heart of the enterprise are custom-built to degrade information and result in misleading data.

The fundamental absence of critical thinking in the rubrik/assessment literature – and total lack of interest in critical discourse about these techniques – is the final irony.

One can conclude that what we have here is a bunch of middle-brow thinkers designing a system that will maximize the production of people like themselves and guarantee their own employment in higher education industry. If only there were some evidence that this is what the world will need in the 21st century.

Coming Soon to a Classroom Near You?

2010-12-04T15:16:00.000-08:00

Some rambling thoughts on a fascinating set of articles about measuring teaching.

Today's NYT carried two stories -- on on page 1 -- about new techniques being used to evaluate K-12 teachers. The news in the stories concerns two things: existence of a very large program for measuring educational effectiveness in schools and the central role of video-taping teachers teaching in that program.

Local readers' radar might ponder the resonance between programs like this and higher education assessment and higher education "learning and teaching centers" and the individuals and organizations who live off, rather than for, education.

The first story ("Teacher Ratings Get New Look, Pushed by a Rich Watcher") highlights Bill Gates' (via the Gates Foundation) interest in a gigantic project measuring the "value added" by teachers through multi-mode assessment. Among other tools : videos of instruction that are scored by experts.

"Interesting" is the fact that one of the movers and shakers in the project is none other than Educational Testing Services. And so this represents yet another opportunity for that organization to live off, rather than for, education in the U.S. Other contractors are mentioned in the story too -- as has been true of the assessment movement more generally, a big part of the driving force seems to be entrepreneurs who, after persuading you that you need to do something are more than happy to sell you the equipment needed to collect the data and then expertise to evaluate it.

The second article, "Video Eye Aimed at Teachers in 7 School Systems," describes some 3,000 teachers who are a part of the first phase of this search for new methods to evaluate teachers. Each will have several hours of teaching video-taped and the tapes will be assessed by experts using a number of carefully validated protocols.

The first article, describing the scope of the project, notes that the rating of 24,000 video-taped lessons will come to something like 64,000 hours of video watching. On a full-time basis that represents 32 person years of work. At 180 days/year, that's about 44 years of teaching. The article suggests the costs to a school district will be about $1.5 million up front and then $800,000 per year.

I wonder if anyone has assessed the value of the information produced.

In the middle of the report there is a line about how this is a step forward because rather than having the principal observe once or twice during the year, outside experts (using scientific protocols) can observe up to a half dozen times. This suggests an interesting phenomenon: in the name of standardization and objectivity, we deskill and depersonalize (among other things).

In one paper on value added modeling (VAM), by an ETS staff person (Braun 2004, 17), one finds this argument: (1) quantitative evaluation of teaching is here to stay; (2) evaluation of gains is preferable to just measuring year-end performance; (3) we have to think what would get used if not this; (4) therefore, use VAM even if it has real limitations. Another, by a Michigan State University economist concludes (about VAM):

We are looking at the educational system through a poor quality lens. The real world is probably more orderly than it appears from the analyses of noisy data (Reckase 2004, 7).

Resources

Amrein-Beardsley, Audrey. 2008. "Methodological Concerns About the Education Value-Added Assessment System." Educational Researcher, Vol. 37, No. 2, pp. 65–75

Braun, Henry. 2004. "VALUE-ADDED MODELING: WHAT DOES DUE DILIGENCE REQUIRE?"

Rand Corporation. 2007. "The Promise and Peril of Using Value-Added Modeling to Measure Teacher Effectiveness"

Reckase, Mark D. 2004. "Measurement Issues Associated with Value-added Methods"

Wikipedia. "Value Added Modeling"

Closing the Loop in Practice: Does Assessment Get Assessment?

2010-09-06T10:38:00.000-07:00

At a liberal arts college with which I am familiar, the administration recently distributed "syllabus guidelines" with 34 items for inclusion on course syllabi. Faculty leaders balked and asked for clarification: which of the 34 items were mandates (and from whom on what authority) and which were someone's "good idea"? The response was that guidelines are merely guidelines and most of the content were indeed good ideas. Most were.

A subsequent examination of a sample of syllabi revealed that most syllabi did not contain all 34. More specifically, there was not universal inclusion of several that, apparently, are important for accreditation purposes.

The semester has begun. The syllabi are printed. The administration disseminated the guidelines -- their obligation is fulfilled. If faculty choose not to comply, that's their decision. Overall, the situation is alarming because the school could appear to be non-compliant to its accreditors. And it's the faculty's fault. And folks are wondering how to fix it.

THIS COULD BE TURNED INTO SOMETHING POSITIVE, a shining example of assessment, closing the loop, and evidence-based change.

But first, WAIT A MINUTE! Do faculty get to say "We told them what to do; if they can't comply and don't learn, it's not my fault."? Of course not. If students aren't learning, faculty are doing something wrong. Lack of learning = feedback, and feedback must lead to change.

Here we have a case of an institution ignoring unambiguous feedback. The feedback is simple: the promulgation of a list of 34 things one should do on a syllabus does not produce the uniform inclusion of the small handful of actually really important things to include on a syllabus. That's it; that's what the evidence tells you. It doesn't tell you faculty are bad; it tells you that this method of changing what syllabi look like was ineffective.

Never mind that any good teacher knows that you cannot motivate change with a list of 34 fixes.

The correct response? Close the loop: listen, learn, change the way syllabus guidelines are handled.

The unfortunate thing here is that folks who know (faculty) brought this immediately to the attention of the folks in charge. Faculty noted that the list was too long, its provenance ambiguous, its authority unclear, its applicability variable, its tone insulting. A solution was suggested. All this was met with, basically, a brush off -- they're just guidelines not requirements, what's the big deal?

And, it turns out, that is precisely how faculty understood them. No need for alarm. Some adjusted their syllabi to some of the suggestions in the guidelines. But apparently, the faculty didn't all implement a few of the guidelines that really do matter (to someone). Arrrrrrrgh.

And now for a little forward looking fantasy of what the outcome of this situation COULD be.

Since administrations and the assessment industry are apparently NOT really ready to adopt the underlying premise of assessment -- pay attention to feedback and change accordingly -- the faculty will.  

From now on, only the faculty will disseminate syllabi guidelines.  They will very clearly distinguish between legally mandated content, accreditation relevant functionality, college-specific custom and standards, and good pedagogical practice in general.  They will invite all parties who become aware of syllabi-related mandates (or new good ideas) to communicate them to the faculty's educational policy committee for consideration for inclusion in their next semester's guidelines.

Those guidelines will explicitly articulate general goals (exactly which ones to be determined) such as syllabi are to be interesting documents that are useful to students and that permit colleagues to get a sense of what a course is about and at what level it is being taught as well as suggestions of particular features, boilerplate and examples that might be useful, and fully explained required items.  They will include examples of an array of syllabi that explicitly demonstrate a variety of forms that meet their standards.  And, all suggestions will be referenced where possible and requirements will be documented in terms of on what authority they are an obligation. 

For assessment purposes the faculty will adapt* any externally supplied "rubrics" to their own intellectually and pedagogically defensible standards and practices and encourage our colleagues to make use of these college-specific tools in developing their syllabi.

Educators really committed to the stated goals of assessment would see in this affair an opportunity for an achievement they could boast about.  Those committed to one directional, top-down, assessor-centered, non-interactive, deaf-to-feedback approaches will see in it only faculty reluctance to get with the program.  

One lesson learned here is that institutional processes need adjustment.  The amount of faculty and administrative time, emotional energy, and the augmentation of frustration and mistrust that this little thing has engendered was a phenomenal waste of precious institutional resources.  Alas, accountability for THIS is unlikely ever to be reckoned.

 * For the assessment sticklers who think twiddling with a rubric undermines its comparability with external standards: worry not!  The validity of these things is so much in doubt and the scaling so arbitrary that the improved fit to institutionally unique values and practices will far outweigh any disadvantages caused by departure from mindless standardization. 

How Academic Assessment Gets it Backwards

2010-08-09T21:36:00.000-07:00

In a letter to the NYT about an article on radiation overdoses, George Lantos writes:

My stroke neurologists and I have decided that if treatment does not yet depend on the results, these tests should not be done outside the context of a clinical trial, no matter how beautiful and informative the images are. At our center, we have therefore not jumped on the bandwagon of routine CT perfusion tests in the setting of acute stroke, possibly sparing our patients the complications mentioned.

This raises an important, if nearly banal, point: if you don't have an action decision that depends on a piece of information, don't spend resources (or run risks) to obtain the information.

Consider, for a moment, the trend toward "assessment" in contemporary higher education. A phenomenal amount of energy (and grief) is invested to produce information that is (1) of dubious validity and (2) does not, in general, have a well articulated relationship to decisions.

Now the folks who work in the assessment industry are all about "evidence based change," but they naively expect that they can, a priori, figure out what information will be useful for this purpose.

They fetishize the idea of "closing the loop" -- bringing assessment information to bear on curriculum decisions and practices -- but they confuse the means and the ends. To show that we are really doing assessment we have to find a decision that can be based on the information that has been collected.

A much better approach (and one that would demonstrate an appreciation of basic critical thinking skills) to improving higher education would be to START by identifying opportunities for making decisions about how things are done and THEN figuring out what information would allow us to make the right decision. Such an approach would involve actually understanding both the educational process and the way educational organizations work. My impression is that it is precisely a lack of understanding and interest in these things on the part of the assessment crowd that leads them to get the whole thing backwards.

Assessment and Evaluating Student Work

2009-12-14T10:33:00.000-08:00

It's ironic, given it's centrality, how little that's sensible and defensible has been said about the relation between grading and assessment. To my mind, it's a lost opportunity to offer constructive criticism of grading in general as well as a failure on the part of the assessment industry to demonstrate and convey clear thinking and to develop useful tools for teachers.

And so here is part one of working through a relationship between grading and assessing.

My students always want to know "how much does this count for" and so my syllabus always says something like

Exam 1	30%
Problem Sets	20%
Final Essay	40%
Participation	10%

When I grade any particular item -- assignment or paper -- I make use (at least implicitly) of a similar decomposition of the grade. If it is an essay I may be evaluating the quality of the writing, the use of evidence, the structure of the argument, the use of sources, and so on. If it is an exam, the questions can usually be separated into a finite number of groups, each "testing" a particular skill or understanding of a particular concept (but see fn 1 below). Let's imagine a class in which five skills or concepts, J,K,L,M, and N, make up the content. And let's imagine my graded activities from above can be described this way

Exam 1	25%J	25%K	25%L	25% critical thinking
Problem Sets	20%J	20%K	20%L	20%M	20%N
Final Essay	20% Writing	30% J-N	30% Argument	20% Scholarly conventions
Participation	33% J-N	33% Staying up with material in course	33% Poise, verbal skills, etc.

Now let's look at how all of the things I've graded fit together. In the table below, the rows represent skills or learning outcomes that I want students to demonstrate. The columns show me the evaluative tools I've used and which of these each one included.

Substance	Exam 1	Problem Sets	Final Essay	Participation
Concept J	+	+	+	+
Concept K	+	+	+	+
Concept L	+	+	+	+
Concept M	-	+	+	+
Concept N	-	+	+	+
Writing	-	-	+	-
Argument	-	-	+	-
Scholarly Conventions	-	-	+	-
Critical Thinking	+	-	+	-
Keeping Up	-	-	-	+
Verbal Skills	-	-	-	+

Next let's suppose that I graded each of these items on an A-F scale and that I've made some attempt to put on paper how I "operationalize" the grades "excellent," "good," "satisfactory," etc. I might, for example, have let students know that I consider an excellent use of concepts in the final essay to be when

Essay employs 3 or more of main concepts from the course in a manner that's appropriate to the subject at hand and that demonstrates a strong understanding of what they mean and how they can be useful.

And finally, let's assume that my program goals include concepts K and M and the school as a whole includes writing and critical thinking as goals.

I can simply take scores on concepts K from all four evaluations and M from the last three and then take the writing score from the final essay and the critical thinking scores from the first exam and the final essay and, oila, I've got my assessment.

fn 1 Two things require mention: 1) not every skill/concept that we expect to be learned is measured on every exam/exercise -- exams are samples; 2) many "items" will depend on more than one skill or concept. More on these issues later.

Responding to Student Writing

2009-11-25T15:38:00.000-08:00

In a piece called "ABOUT RESPONDING TO STUDENT WRITING," Peter Elbow writes:

The fact is there is no best way to respond to student writing. The right comment is the one that will help this student on this topic on this draft at this point in the semester -- given her character and experience.My best chance for figuring out what is going on for any particular student at any given point depends on figuring out what was going on for her as she was writing.

I found this document on our schools website under "Teaching Resources" on the Provost Office page. It sounds like pretty good advice.

It also sounds different from the advice I find on another page of the institution's website. On that page I find a "rubric" for assessing student learning in essays. It gives me six categories (overall impression, argument, evidence, counter evidence, sources, citations) and wordy descriptions of different levels of achievement in each. It's pretty unclear from the document how it is intended to be used, but basically, it's a grading scale.

Here's my question: which kind of teaching students to write does my boss want me to use?

Spellings' Flawed Metaphor

2009-11-06T13:11:00.000-08:00

An interesting article and even more interesting responses in Inside Higher Ed from a few years back: "The Flawed Metaphor of the Spellings Summit"

Six Things to Beware of, Grasshopper...

2009-11-02T09:39:00.001-08:00

A lot of such talk as there is about innovation and change in higher education these days shows up in the general orbit of assessment. Having recently listened to or read a lot of material from assessment experts, I jotted down a few cautions.

Beware the tyranny of software. I like software. I write computer programs. But, at the risk of sounding Asimovian, software has to serve education, not the other way round. For the last ten years we have repeatedly adjusted the way we educate to the needs of the software ("Banner won't let you do that..."). Banner and its ilk are just hammers. No right-thinking carpenter changes the way she builds a house because of hammer limitations.

Beware the fetishization of uniformity. Healthy ecosystems, organizations, relationships, families, and individuals entertain a healthy dialectical tension between sameness and difference, uniformity and irregularity, standardization and improvisation. Whether you are trying to be a virus that can outwit immune systems (or an immune system that can shut down a virus), or building a firm that can weather economic ups and downs, or running a college that produces excellent graduates from a stunning variation of inputs, the key is to cultivate order and chaos simultaneously.

Beware projection and other forms of X-o-centrism. We are all subject to our own version of Saul Steinberg's classic "New Yorker's view of the World." What works for me, or in one course, or in my department, or in our division, or in one school I know about, is surely good for you. Whether it's a analogy or metaphor, an algorithm, a social form, or a paradigm, or a homeomorphism, context and local history matter.

Beware foolish numberers and their arrogant misquotations of Lord Kelvin -- if you can't measure it, it does not exist -- and mindless adherence to quantification as an end in itself.

Beware saviors, those who in the face of skepticism and critique fancy themselves the new Galileo or who too readily imagine they are members of a new Vienna Secession or Salon des Refusés.

Beware assurances that complex things can be done with little effort or in far less time than you think. Most things that are easy and simple and beneficial have already been done. Things like the valid measurement of educational outcomes are not simple. Getting it right takes time and effort.

Most of all, beware a movement that cannot apply its own techniques to itself.

When the Blind Meet the Lost

2009-11-02T09:38:00.001-08:00

Assessment has landed where it has because the "movement" is driven, far beyond our individual institutions' halls, by a political agenda and small minds who have seized on an entrepreneurial opportunity and attached themselves to it. In a democratic society, that politcal agenda deserves a free and open debate. Unfortunately, many of the individuals who have attached themselves to it, are either unaware of its terms or incapable (or afraid) of engaging in such a debate. Unfortunately for those who are behind the movement, many of their foot soldiers are an embarrassment and either they themselves are not competent to realize this or they are too ideologically blinded to care.

Perhaps the most telling characteristic of the "assessment movement" is its failure to live up to its own standards: there is no culture of accountability and measurement and assessment in the assessment community. It would not be the first movement (in education or elsewhere) to suffer from this shortcoming.

The Fetishization of Rubrics I

2009-10-30T23:57:00.000-07:00

The one thing you see over and over and over in the assessment literature is the "rubric." Never mind, for now, the history of the concept -- that's an interesting story but it's for another time.

For now, just a quick note. A rubric is basically a two dimensional structure, a table, a matrix. The rows represent categories or observable or measurable phenomena (such as, for grading an essay, "statement of topic," "grammar," "argument," and "conclusion") and the columns represent levels of achievement (e.g., "elementary," "intermediate," "advanced"). The cells of the table then contain a description of the level of "grammar" that would constitute different levels of performance.

A rubric is, we could say, just a series of scales that use the same values with something like the operationalization of each value specified.

Rubrics are, in other words, nothing new. Why then, our first question must be, do assessment fanatics act as if rubrics are new, something they have discovered and delivered to higher education?

I would submit that the answer is ignorance and naivete. They just don't know.

A second question is why their rubrics are so often so unsophisticated. Most rubrics you find on assessment websites, for example, suggest no appreciation for something as elementary as the difference between ordinal, interval, and ratio measurements. Take this one, which is a meta-rubric (an assessment rubric for rating efforts at assessment using rubrics). (Source: WASC)

Criterion	Initial	Emerging	Developed	Highly Developed
Comprehensive List
Assessable Outcomes
Alignment
Assessment Planning
The Student Experience

Looks orderly enough, eh? Let's examine what's in one of the boxes. Here's the text for "Assessment Planning" at the "Developed" level:

The program has a reasonable, multi-year assessment plan that identifies when each outcome will be assessed. The plan may explicitly include analysis and implementation of improvements.

It looks like we need another rubric because we've got lots going on here:

What makes a "reasonable, multi-year plan"?
Mainly what we need here are dates : when will each outcome be assessed.
How should the assessor rate the "may-ness" of analysis and implementation? Apparently these do not make the plan better or worse since they may or may not be present.

Our next analytical step might be to look at what varies between the different levels of "Assessment Planning" but first let's ask what conceptual model lies behind this approach? It's very much that of developmental studies, especially psychology. The columns are, thus, stages of development. In psychology or child development the columns have some integrity in the natural stages a person goes through. Dimensions may be independent in terms of measurement but highly correlated (typically with chronology) and so "stages" emerge naturally from the data.

In the case of assessment, though, these are a priori categories made up by small minds who like to put things in boxes. And the analogy they are making when they make them up is very much to child development. An assessment rubric is a grown up version of kindergarten report cards.

Why Do We Need a Faculty Assessment Committee?

2009-08-28T18:09:00.000-07:00

Any time we create a committee we should stop and ask why. The baseline for answering that question should be the world (or institution) without the committee. How was it? How would it be?

When I think about that in this case, here's what I come up with. With no practicing assessment committee (it was appointed but never met last year):

Faculty have felt little opportunity for real input into assessment
The process has in fact, over the years, been dominated by non-faculty and non-academics.
Many faculty members are unimpressed with the process. Substantive missteps have been frequent. Faculty members' assessments of assessment span the range from feeling insulted by the unprofessional and intellectually demeaning tone with which assessment has frequently been conveyed to serious criticism of the validity of the methods used in assessment and real concern about how it is consistently ignored or dismissed. And much in between.

[I suspect that from the "other side" it looks like this

Faculty have been slow to adopt a culture and practice of assessment
Our job is to get the institution to comply with WASC enough to get us re-certified]

So how to make the world different WITH an assessment committee? If I were an administrator, I'd think that the committee could help me to bring the faculty along. I could co-opt them as fellow champions of assessment as currently practiced and they'd be vanguards of the movement.

Uh, I don't think so. The problem with assessment is not lack of faculty buy-in. Let's repeat that: THE PROBLEM WITH ASSESSMENT IS NOT FACULTY BUY-IN. The problem with assessment is (are):

its methods are methodologically dubious
its logic model (observation>analysis>change) is vague, rarely made explicit, and more wishful thinking than realistic
it dishonestly or naively hides its political values behind a veil of "objective measurement"
it is dominated by self-serving educational entrepreneurs who live off, not for, assessment
it is evangelized in the absence of hard thinking about institutional inputs and outputs, the very things it purports to be sensitive to
it enters the academy as a fait accompli, more based on conviction and belief than theory, analysis, and argument, and exempts itself from the critical examination and culture of evidence that it champions

So, what can an assessment committee do if even some of the above is in fact that case? Mainly, I think, hold assessment accountable to normal standards of intellectual integrity and professionalism. If we do that, I predict, there would be changes in how assessment is implemented, changes that would allow the process to capitalize on its virtues and avoid some of its vices. And in reaction to THAT you would get more buy in. The giant flaw in how it's been handled so far is that assessment is blind to closing its own loop. When faculty don't fall in line, it's not necessarily because they are resistant to change, unwilling to give up their comfortable sinecures, or too arrogant to think about students. Sometimes its because they have looked at something, and, smart people that they are, found it wanting.

It may even be that the resistance to change and feedback, the comfortable sinecures, and the arrogance that deflects all criticism may lie in the assessment industry itself. The rest is, as they say, projection.

Let's Take It Seriously

2009-08-23T07:48:00.000-07:00

Let's take assessment and accountability seriously AS AN INSTITUTION. There is a tendency to equate assessment with measuring what professors do to/with students. The buzz word is "accountability" and there's this unspoken assumption that the locus of lack of accountability in higher education is the faculty. I think that assumption is wrong.

We should broaden the concept of assessment to the whole institution. Course instructors get feedback on an almost daily basis -- students do or don't show up for class; instructors face 20 to 100 faces projecting boredom or engagement several times per week; students write papers and exams that speak volumes about whether they are learning anything; advisees tell faculty about how good their colleagues are. By contrast, the rest of the institution has little, if any, opportunity for feedback. It's important: one substandard administrative act can affect the entire faculty, so even small things can have a big negative effect on learning outcomes.

In the name of accountability throughout the institution I propose something simple, but concrete: every form or memo should have a "feedback button" on it. Clicking on this button will allow "users" anonymously to offer suggestions or criticism. These should be recorded in a blog format -- that is, they accumulate and are open to view. At the end of each year, the accountable officer would be required in her or his annual report to tally these comments and respond to them, indicating what was learned, what changes have been made or why changes were not made.

The important component of this is that the comments are PUBLIC so that constituents can see what others are saying. Each "user" can see whether her ideas are commonly held or idiosyncratic and the community can know what kind of feedback an office is receiving and judge its responsiveness accordingly.

Why anonymous? This is feedback, not evaluation. This information cannot be used to penalize or injure anyone. The office has opportunity to respond either immediately or in an annual report. Crank comments will be weeded out by sheer numbers and users who will contradict them. In the other direction, it is clear that honest feedback can be compromised by concerns about retribution, formal or informal. Further analysis along these lines would further support the idea that comments should be (at least optionally) anonymous.

We should note that we already do all of this in principle -- many offices around campus have some version of a "suggestion box." What is missing is (1) systematic and consistent implementation so that users get accustomed to the process of providing feedback, and (2) a protocol for using the feedback to enrich the community knowledge pool and to build it into an actual accountability structure.

The last paragraph makes the connection to a sociology of information. Information asymmetries (as when the recipient knows what the aggregate opinion is, but the "public" does not) and the atomization of polities (this is what happens when opinion collection is done in a way that minimizes interactions among the opinion holders -- cf. Walmart not wanting employees to discuss working conditions -- preventing the formation of open, collective knowledge*) are a genuine obstacle to organizational improvement. Many, many private organizations have learned this; it's not entirely surprising that colleges and universities are the last to get on board.

* as opposed, say, to things that might be called "open secrets"

The Bannerization of Assessment

2009-08-21T22:55:00.000-07:00

I met recently, along with Andy, Alice, and Kiem, with two reps from the Blackboard company. They were here to tell us about a Blackboard add-on product called, I think, "the assessment module." Herewith, some observations.

The product incorporates some of the functionality that most of us have seen recently in the CARP software. It allows folks at different levels of the instructional process -- from instructors up to deans and assessment staff -- to input, collate, tally, analyze, query, and report on all manner of information related to assessment. It has the advantage of using the same overall interface and design logic that we are familiar with from our use of Blackboard for classes and it's "flexible" and can be integrated with BANNER.

The mere fact of investing in the software would probably send a positive signal to WASC that we are an institution that is taking assessment seriously. It would also greatly simplify the work of the office of institutional research by organizing assessment data in one place and one format. Nobody at the meeting was prepared to give actual numbers but it seems logical that it could save lots and lots of hours of work in that office (and probably in other offices that have to prepare materials for WASC).

Much of the labor saving derives from the fact that the system assumes that instructors will use it to collect and assess at least some of the work students do in their courses. At a minimum, the system allows students to submit papers, essays, etc. in electronic form and then the assessment group can process these using rubrics we've developed so as to arrive at some measure of student achievement in our programs. In most cases we'd rise above that minimum: instructors would simply use the system itself to do the grading and feedback on papers and exams and so this information would be "automatically" recorded and tallied up for use in assessment. This would make faculty life easier because we would not have to submit separate assessment information. Ideally, most, if not all, of the work that we assign for evaluation in courses would be associated with a rubric that would be in the system and then students could submit work electronically and we could have open in side-by-side windows the student's work and the rubric and we rate the work on each measure, add comments, etc. and then the student receives the feedback in electronic form and the aggregate results for the class are automatically recorded and forwarded "up the chain" to department heads, the office of assessment, etc. as appropriate.

MY TAKE-AWAY

After several hours listening to the Blackboard reps (sales and technical folks) here are a few observations:

These folks do not understand how education happens in a liberal arts college.
1. what is valuable for students
2. how departments work
3. how decisions get made
4. what value added professors actually bring to the mix
Instead the software is designed to resonate with an auditor's fantasy of higher education as might be manifest in the cfo of a large, for-profit, online university.
Feature after feature of the software is perfect for online correspondence courses as offered by, say, University of Phoenix.

While the company representatives repeatedly touted the system's "flexibility," in fact, it imposes dozens upon dozens of assumptions about teaching and learning on the process without any self-consciousness. The whole thing derives from a particular view of academic assessment (itself a refugee from peer review) and purveyors appeared to have zero sense that its epistemological status was different from, say, the law of gravity.

Totally absent from their pitch was any sense at all that there was an educational problem that this product could help you solve.

What it does address is the fact that institutions like Mills have been told "you must do something" and this is clearly a something and spending a lot of money on it would be a great demonstration of institutional commitment.
The company appears to have done zero assessment of the temporal impact of the processes the software would require. "Eventually, instructors would get really good at entering this stuff and so the time involved would drop over time..." "The information could be viewed and sliced in many different ways..." (by whom?) Is there a net gain in productivity? No idea. Is there a net positive for student learning? No idea. Will more parents want to pay our tuition because we use this system? No idea. What should instructors stop doing to make time to use this system? No idea.

What they are selling is "a license and consulting." In order to figure out how to use the software and adapt it (remember, it's very flexible) you have to hire them as consultants. Remember too, that these consultants, as far as I can tell, have very little fundamental appreciation for how a liberal arts college works. Either they will mislead us because they don't understand us or we will pay for them to learn something about how a college like Mills works.

The fact that, as potential customers, we were hardpressed to come up with things that we want to do that this product would make it easier for us to do (usually at these things users' imaginations get going and they start saying "hey, could I use it to do X?") and instead we sat their realizing that the software would make us do things is telling.

SOFTWARE DESIGNED TO CONNECT THINGS UP

Two aspects of the software are key (from a software design point of view) -- "the hierarchy" and "links."

A core concept in the software design is "the hierarchy" by which they seemed to mean the managerial hierarchy that oversees the delivery of education. At the bottom of this structure are instructors and their students. Instructors implement courses which are at the next level above them -- overseen "by a department chair or dean" who might then be under another dean. Above this we have "the assessment operation" -- as the discussion went on it seemed that this means some combination of Institutional Research and Assessment Committee. Then above this you might have other levels of college administration and then above this outside mandaters such as WASC. The genius of the software is that each institution can build-in the hierarchy that is appropriate to itself. The data in the system, the descriptions of goals and standards and such, are carefully protected so that only the appropriate people at the appropriate level of the hierarchy can see, change, etc.

The other part of the system is the links. It allows you to build a rubric for, say, reading lab assignments and for each item in the rubric to be linked back to course learning objectives which are in turn linked back to program goals and these back to institutional goals or to requirements set forth by external agencies. This means that when you evaluate 25 lab reports, the system automatically gets information on how well the institution is doing in its effort to inculcate a culture of experimentation AND it also automatically gets information about the fact that the institution is monitoring whether or not such learning is occuring. And all this simply by clicking on a radio button in an web-based report evaluation rubric!

All of these things are, of course, changeable. In theory. In practice, the system allows for the creation of extremely high levels of opaque complexity. To insure system integirity new procedures will need to be invented so that faculty who want to make a change can confer with departmental colleagues and get department head to make a request to office of assessment and then maybe the gen ed committee or the epc has to get involved etc. Or, even more likely, once stuff is in the system it just stays there until it causes a major problem.
The designers of the system seem totally oriented toward (1) the capacity to output what an entity like WASC wants and (2) changing the way instructors teach via a logic of "it's easier to join than fight" and "why duplicate your efforts?"

DISHONESTY AND ANTI-INTELLECTUALISM

A system like this is championed for its flexibility but that flexibility exists only relative to how rigid it could be. Neither its designers nor its purveyors struck me as having even a hint of a nuanced view of what education is and how it happens and how real educational organizations work. That's too bad because these are not mysterious topics -- a lot of people DO know a lot about them. What the talk of "flexibiliity" represents is marketing-speak. A common complaint about course management software and student systems software is that it is inflexible and "doesn't fit how we have kept records before" and so the folks who write it add more options to mix and match the pieces (the presenters seemed to want to impress us by the fact that on a particular screen we could have two tabs or four : "you can set it up so it's exactly right for your process!"). But that's not really flexibility, that's customization. System software is, pretty much by definition, not flexible. It's especially true that system software almost never adapts to an organization; organizations adapt to system software. We've seen this plenty over the years when we're told "Banner can't do that" or "we need this change because of Banner."

A second moment of dishonesty happens because the designers and sales force have clearly bought into the ideology of the professor/instructor as problem. They have talked for so long to assessment afficiandos and heads of assessment who get blowback from faculty that they "know" that individual professors don't like this stuff and that part of the challenge of their job is just to soft pedal around that. They are not selling this stuff to instructors. They are selling it to the instructors' managers or ueber-managers, folks who themselves have uncritically bought into the idea of there being a crisis of accountabilty in higher education. The intellectual dishonesty lies in the fact that these folks are neither willing nor able to actually have a critical conversation about any of this. They simply think of people who do not swallow it hook, line, and sinker as "unsaved."

AN IMPORTANT ASIDE

Sociologically, what's interesting is that this is an example of the "for-profit" side of education rubbing up against the not-for-profit side. Blackboard and its competitors, as well as the folks who are on the hustings about assessment, are entrepreneurs. They don't live for assessement, they live off assessment. And we know that that's an arrangement that makes intellectually honest discussions hard to come by

It IS an Industry...

2009-08-06T12:13:00.000-07:00

Recently got copy of an email sent out by McGraw-Hill's "Assessment Research Project." In part it said:

As a quick reminder, we are conducting this nation-wide study to learn about assessment practices that are actually being used by professors of Introductory Sociology. We are seeking a copy of your syllabus, mid-term and final exams. If you do not assess cumulatively, please submit your mid-year and end-of-year exams along with your syllabus.

The recipient was assured that the material would be kept confidential and not published in any of their teaching materials (interesting that they even need to say this) and was asked to be sure to "please be sure to let me know if you would like to receive a Certificate of Participation and/or an honorary mention in our research"! Maybe you could include that in your tenure file.

There's money in them thar hills, folks, and even though the accreditation agencies, think tanks, washed-up academics who've become assessment entrepreneurs, and standardized testing organizations have a head start (since they are the ones who get to set the agenda), textbook publishers are starting to realize that if we are up against the wall we'll love to order textbooks that come with "integrated assessment plans" of some kind.

Onward and upward!

Validity and Such

2009-08-05T09:05:00.000-07:00

An AACU blogpost referred me to the National Institute for Learning Outcomes Assessment website which referred me to an ETS website about the Measure of Academic Proficiency and Progress (MAPP) where I would be able to read an article titled "Validity of the Measure of academic Proficiency and Progress ."

And here's the upshot of that article: The MAPP is basically the same as the test it replaced and research on that test showed

...that the higher scores of juniors and seniors could be explained almost entirely by their completion of more of the core curriculum, and that completion of advanced courses beyond the core curriculum had relatively little impact on Academic Profile scores. An earlier study (ETS, 1990) showed that Academic Profile scores increased as grade point average, class level and amount of core curriculum completed increased.

In other words, the test is a good measure of whether students took more GenEd courses. And we suppose that in GenEd courses students are acquiring GenEd skills. And so these tests are measures of the GenEd skills we want students to learn.

A tad circular? What exactly is the information value added by this test?