
Direct Manipulation vs. Interface Agents, Ben Shneiderman and Pattie Maes, ACM Interactions, December 1997, pp. 42-61
The article presents excerpts from two separate debates between Shneiderman and Maes at the IUI and CHI conferences in 1997. Prepared statements and responses to specific attendee questions alternate. The chosen format is both misleading and suboptimal for printed reproduction. First, the exchange as presented in the article never actually happened in this form. The reader cannot assume a continuous narrative since responses came from two different events and may be presented interspersed and out of order. Second, transcriptions of verbal exchanges are structurally very different from written arguments. Spoken language is characterized by frequent redundancy to help the listeners remember significant points. In written communication it is easy for the reader to simply skip back a paragraph or two if she does not remember something. As a result, some basic positions of the two debaters are repeated frequently.
Shneiderman argues that we should build comprehensible, predictable user interfaces that afford direct manipulation of all relevant parameters. This combination, he says, will make users accept responsibility for the actions they take using the system. Maes argues that computing systems have become so complex that the user does not want to be concerned with all the details and that intelligent software can act according as a representative of the user and proactively perform tasks without direct UI interaction. The two positions are not completely at odds with each other as both Shneiderman and Maes agree that direct, predictable interfaces are important and that hiding complexity under the hood can also be beneficial.
Shneiderman does not think that "human-to-human interaction is a good model for the design of user interfaces." (56) My own research background is in simulating embodied virtual agents. In that field, the richness of human-to-human communication is often given as motivation for building more realistic anthropomorphic agents. However, graphic simile comes along with a whole list of expectations regarding reasoning and behavioral processes. I am not sure if we are making substantial progress in these areas. Interacting with an agent is still significantly worse than having a teleconference with a real human - and we don't even like to engage in that latter form of interaction (cf. Hollan, Stornetta's Beyond Being There).
Maes states on pg. 50 that "More and more the Word Wide Web and our browser is becoming the one and only interface." I disagree - otherwise we could reduce the work of HCI researchers to tweaking Javascript.
Horvitz et al. outline their work in the sensing and processing of user attention at Microsoft Research. They identify the important role attention plays both in human cognition and in social patterns of communication. In their research system, data from multiple input channels such as accelerometers, touch sensors, and gaze trackers are fed into Bayesian belief networks and HMMs to reason about the locus and object of user attention over time. Given such an internal representation of user attention, their "Notification Platform" makes choices about when and how to display different system messages to minimize disruption.
Providing feedback -- showing the outcome of the system's attention calculation to the user -- seems important to render the decisions of the system transparent and understandable. Examples of two visualizations - a color/intensity changing "lens" and an animated anthropomorphic agent were given, but a more systematic approach seems necessary.
On the last page, the authors claim that "robust solutions to the speech-target problem promise to significantly influence the overall sociology of human-computer interaction [...]." I disagree - speech interaction's problems prevent large-scale adoption are not purely technical in nature. As Shneiderman put it in the previous reading, "natural language interaction [...] has not been a success." Speech in general is not an efficient input/output channel - it requires a lot of cognitive processing and also faces cultural resistance (people feel uneasy being seen while talking to their computer).
Posted by Bjoern Hartmann at November 1, 2004 12:54 AM