November 22, 2004

HCI Comments XV

User Learning and Performance with Marking Menus, Gordon Kurtenbach and William Buxton, CHI 1994: ACM Conference on Human Factors in Computing Systems, pp. 258-64

The advantages of radial and marking menus over linear selection mechanisms in terms of efficiency and speed are described sufficiently in the paper. I want to focus on some issues left undiscussed.

The "press-and-wait" mechanism used to invoke radial menus serves as a context switch operation - it is clear to user and application that the following pen stroke will be interpreted as a menu selection command. But what about marking menus where no clear context switch exists? It appears possible, if not likely, that menu strokes can be confused as strokes meant to be directly registered (e.g. as drawing) in the application window and vice versa. What kinds of disambiguation strategies exist? How likely are such errors of stroke interpretation? The authors specifically use the dual purpose of marks to their advantage to combine object selection and command specification in a few select cases in the ConEd software. But would this overloading work as well in a drawing or design application? Are overloadings application specific so that the user will have to relearn them for every application or do general principles exist?

Besides giving a heuristic of "natural" matching of selection direction to intended action, guidelines are missing for how to arrange menu items in a circle. The principle of command distinctness (Norman) suggests placing similar actions far away from each other.

A quote: "If possible, once a function in a menu is invoked, it is replaced by the corresponding inverse function." This seems half-baked and counter-intuitive. An implicit assumption is made here that a normal workflow alternates between issuing do and undo commands. But what happens if the user wants to sequentially mark all "laugh" occurrences? The appropriate menu item will have disappeared after the first mark and won't return until something is "unlaughed". This practice goes against the author's own recommendation that "marking menus are not appropriate when the list of items changes dynamically." Users will have to remember a set of binary modes to decide which action will be invoked by drawing a stroke in a particular direction.

In the discussion of the case study, the highly unequal amounts of time users A and B worked with the system -- user A crammed all testing into one week while user B took about a month -- complicate interpretation of the results. The different time periods may have had as much of an influence on learning marking as the different user experience levels.

The explanation for not including selection error in the study is not convincing. While not trivial, it is certainly possible to track error frequency at least as an approximation - for example by counting undo operations.

Passive Real-World Interface Props for Neurosurgical Visualization, Ken Hinckley, Randy Pausch, John C. Goble, Neal F. Kassell, CHI 1994: ACM Conference on Human Factors in Computing Systems, pp. 452-8

Dealing with 3D manipulation on a 2D display is indeed a frustrating task. Using 2D input devices like mice exacerbates the problem. 6DOF rotation+translation pucks exist (see Logitech's Magellan line), but manipulation still passes through a layer of indirection: movement of the control device has to be mapped into movement of the objects in virtual 3D space. The frame of reference here is the virtual world, so manipulation of the physical-world controller can be counterintuitive. Hinckley et al.'s approach is appealing in that the frame of reference is moved into the real world which the user already knows how to interact with. The computer merely listens/watches to replicate that interaction in its virtual representation.

The authors stress the need for a system that has a low re-learning time. This longitudinal aspect of system knowledge is seldomly addressed (exception: the previous paper).

I felt that the problem of "clutch" mechanisms was not discussed in sufficient depth in the paper. A neurosurgeon can easily manipulate 3D views, but she cannot efficiently work with the data beyond this 3D exploration. She would have to switch back and forth between some other input devices, e.g., mouse/keyboard, and the bimodal props. The described binary clutches afford the possibility of performing a switch, but the effort to do so is still substantial (find free space on desk, put objects down, make sure they stay in place, locate other input devices, etc.) Moreover, it is impossible to seamlessly resume the 3D manipulation once props were set down. Efficient task/device switching may well be a general issue with bimodal interfaces - a given task may be better supported in a bimodal system, but changing from that particular task to other interaction with the computer system may now be harder since both of the user's most versatile output devices - her hands - are already occupied.

The Design of a GUI Paradigm based on Tablets, Two-hands, and Transparency, Gordon Kurtenbach, George Fitzmaurice, Thomas Baudel, and Bill Buxton, CHI 1997: ACM Conference on Human Factors in Computing Systems, pp. 35-42

The authors make a strong case for the usefulness of their bimodal manipulation technique by retrofitting a production-level application with their system. Successful WIMP applications have often already undergone many years of incremental development and offer very comprehensive toolsets (e.g., Photoshop) unlikely to be matched by an experimental application. Research findings from an artificial "toy" system don't automatically scale to these more complex work environments. That the authors tested their GUI paradigms in both situations much increases my confidence in the reported results.

Notably, the permanent on-screen presence of the toolglass was eliminated in the Studiopaint implementation. In the physical world, artists do not keep all tools piled right on top of the central work area - it would create too much physical clutter. While watching the video, the toolglass did appear to me as visual clutter during non-tool-picking interactions. Instead of selectively hiding the toolglass, hiding it by default and only selectively displaying it, for example through a non-dominant hand move or button click, would seem to better support the goal of maximizing screen real estate for the artwork itself.

Using color picking as the example to demonstrate click through functionality is a bad choice in my opinion. When a color swatch is made semi-transparent, the perceived color is a mixture of the swatch itself and the underlying background color - while general hue assignments are possible (green vs. red), saturation and value cannot be picked precisely in a transparent overlay.

Posted by Bjoern Hartmann at 06:47 AM | Comments (0)

November 18, 2004

HCI Comments XIV

Haptic Techniques for Media Control, Scott S. Snibbe, Karon E. MacLean, Rob Shaw, Jayne Roderick, William L. Verplank, Mark Scheef, UIST 2001: ACM Symposium on User Interface Software and Technology, pp. 199-208

This paper may has an incredible density of ideas. No less then six unique haptic devices and eleven different application scenarios are introduced. What has happened to these prototypes? An ACM search revealed just four subsequent citations. Further research on the prototypes would most likely be fruitful - the surface of possible applications has only been scratched here. Contacting either Scott Snibbe or Bill Verplank may be worthwhile.

I stumbled over the references to the demise of Interval Research and embarked on some research about the think tank. According to the Online Archive of California, most of Interval's documents are now held in Stanford's libraries: [http://www.oac.cdlib.org/findaid/ark:/13030/tf1s2001tx] It appears though that the records are not accessible to the Stanford community - permission of Interval's legal team is required. In 1999, Wired ran a long article about Interval [http://www.wired.com/wired/archive/7.12/interval.html] and shortly thereafter Salon.com published a post-mortem: [http://www.salon.com/tech/log/2000/04/22/interval/]


Embodied User Interfaces for Really Direct Manipulation, Kenneth P. Fishkin, Anuj Gujar, Beverly L. Harrison, Thomas P. Moran, Roy Want, Communications of the ACM, March 2003, pp. 75-80

In their introduction, the authors argue that as GUIs have become dominant, the physical form factor of the computer - the box - has become anonymous and invisible. But why has the box become invisible? Why have we focused much more on developing GUIs instead of paying attention to form factors? The principal reason is precisely that GUIs are intangible and as such their creation, modification, augmentation is not controlled by the laborious and expensive manufacturing processes that control creation in the physical world. Standardized commodity hardware can also exploit economies of scale better than specialized devices.

"The devices are metaphorically related to similar noncomputational artifacts." Do we always need a noncomputational metaphor? It is surely helpful in explaining uses and modes of operation to non-experts, but they are constraining the user's imagination of what can be done with the device.

I fundamentally disagree with the author's discussion of the Palm hand held on the top of page 76 - I have been using various Palm PDAs for more than 5 years and the Palm never *was* my calendar. Instead, for me it is a bidirectional window to look at and modify my calendar, which itself is an intangible collection of data shared between and accessibly by a multitude of devices. When at home or in my office I always create and check entries on my desktop or notebook computer - both data input and output bandwidths are much higher than on the PDA. I only use the PDA on the road, when other methods of access are unavailable. My cellphone has another copy of the schedule to remind me of upcoming meetings. So let's not confuse the data and the device. Also, while it is true that the Palm was originally conceived as an electronic form of a paper personal organizer, much of its value to users today comes from its functioning as a general purpose platform with an open SDK. Many applications extending its functionality into completely different realms are available from 3rd parties. This suggests that it may be more promising to develop devices with form factors not tied to specific applications but to specific real world constraints (size,weight,modes of input/output) and let a large community of developers figure out what tasks such a device can support. Specialization can come later with add-ons to a general-purpose architecture.

Finally some notes on the presented techniques of page turning, scrolling, and tilting:

Page turning: a particularly bad example. At least half of the work here was done inside the GUI and not on the physical interface. I feel like adding any two buttons on the right/left side inside or outside the touchscreen would have led to similar result.

Scrolling: Sony developed a better analogy to turning a Rolodex wheel years ago with their scroll wheel built into cell phones and PDAs.

Tilting: the authors sidestep the issue of individual differences - settings for "neutral tilt" were derived as an average of users. A better approach would have been to let each user set their individual comfortable value (calibration).

Posted by Bjoern Hartmann at 06:46 AM | Comments (0)

November 17, 2004

HCI Comments XIII

Iterative Design of Seamless Collaboration Media, Hiroshi Ishii, Minoru Kobayashi, Kazuho Arita, Communications of the ACM, August 1994, pp. 83-9

This paper stands out in its presentation of a long-range view on the development of one particular application concept -- a distributed collaborative drawing space. The reader is taken through a history or archaeology of various prototypes, with discussions how analysis of earlier stages informed subsequent implementations. Tradeoffs emerge as unavoidable choices central to any design: a feature that enables one function (e.g., tilting the display down to facilitate writing on the surface) will hinder another one that was well supported before (e.g., looking "at" your collaboration partner vs. looking down upon him on the display).
Towards the end of the article, the authors suggest a paradigm shift from HCI to HHI - computer-mediated human-to-human communication. In 2002, eight years after the publication of this paper, Andy van Dam raised the same point in his CRA "Grand Challenge" Statement (http://www.cra.org/Activities/grand.challenges/vandam.pdf). Apparently we have not made much substantial progress in this area in the meantime.


Interacting with Paper on the DigitalDesk, Pierre Wellner, Communications of the ACM, July 1993, pp. 87-96

While the idea of turning the desktop metaphor on its head sounds appealing at first, I am not sure whether this particular implementation presents a real enhancement of a paper-based workflow. Much attention is being paid to input of paper-based information into the computer system. However, if a significant volume of output is generated by the computational task, it is not sure how this result should be integrated back into the real world. The user can write down a number or a word, but what about a page, or ten? Should the system be connected to a printer? What if such a printed document needs further revision? Should it be scanned and printed again? The proposed camera based input solution can also be improved upon: most of the discussed implementation issues resulted from the choice of overhead projection, which is incidental to the operation of the system. The previous paper introduced a better back projection method in the ClearBoard that circumvented most of the described problems. A positive point is the mention of handedness as a factor that should be taken into account in any interface that directly engages the user's hands.
At the end of the article, Wellner poses the following fundamental question: "Do we think of ourselves as working primarily in the computer btu with access to physical world functionality, or do we think of ourselves as working primarily in the physical world but with access to computer functionality?" I believe the question is misguided in the presented context. While paper is a physical object, its primary function is not as object per se, but as storage medium for intangible information (Origami is the exception to the rule). Manipulating paper-based information is thus in my mind a bad example for "working in the physical world".


Multiple-Computer User Interfaces: "Beyond the Desktop" Direct Manipulation Environments, Jun Rekimoto, In Extended Abstracts of CHI 2000: ACM Conference on Human Factors in Computing Systems, pp. 6-7

This short article introduces (or at least describes) the concept of MCUIs - multiple computer user interfaces. Interaction and synchronization techniques abound for sharing and exchanging data between different applications/processes/windows on one particular device - but most if not all of the techniques break down when users try to bridge the divide between different devices (of similar of different kind). The problem is certainly important. Ken Hinckley is addressing the same issue in his TabletPC research. The term "direct manipulation" appears to be overloaded with different meanings by different subgroups of the HCI community (cf. Shneiderman, for whom the term is attached to GUI data mining).

Posted by Bjoern Hartmann at 09:14 AM | Comments (0)

November 12, 2004

Researching Interval Research

While reading Snibbe et al.'s article "Haptic Techniques for Media Control," (UIST2001, to be reviewed here shortly) I stumbled over a few references to the demise of the writers' former employer, Interval Research. The company was a secretive think tank focused on exploratory research, located on Page Mill Road, right behind Stanford's campus. Paul Allen started the lab with $100 million dollar in 1992, then pronounced it a failure and closed it down in 2000.

Wired wrote about a "shift in focus" from basic research to cable product development at Interval in 1999: http://www.wired.com/wired/archive/7.12/interval.html

According to the Online Archive of California, most of Interval's documents are now held in Stanford's libraries: http://www.oac.cdlib.org/findaid/ark:/13030/tf1s2001tx

Salon.com published a short post-mortem: http://www.salon.com/tech/log/2000/04/22/interval/

Gavin Miller was a researcher at Interval, as was Scott Snibbe.

Off on a tangent, Bill Gaver's work on auditory interfaces is worth looking into. Reference: 'Auditory interfaces' (1997) in Handbook of Interaction (2ed), Publisher: Elsevier, Editors: Helander, M G (external); Landauer,T K (external); and Prabhu, P (external). Link: http://www.rca.ac.uk/pages/research/dr_william_gaver_609.html

Posted by Bjoern Hartmann at 07:37 PM | Comments (0)

HCI Galore in today's NYT

Two long articles - one on usability issues in a computer system installed in police partol cars, one on pen-based computing - were published in the New York Times. Links:

Trying to Make the Pen as Mighty as the Keyboard
By AARON RICADELA
http://nytimes.com/2004/11/11/technology/circuits/11next.html

Wanted by the Police: A Good Interface
By KATIE HAFNER
http://nytimes.com/2004/11/11/technology/circuits/11cops.html

Posted by Bjoern Hartmann at 04:44 AM | Comments (0)

November 08, 2004

HCI Comments XI & XII

Multimodal Interfaces, Sharon Oviatt, In The Human-Computer Interaction Handbook, Lawrence Erlbaum, 2002, 22 pp.

This article was a let-down for me. Given its publication in the Handbook of HCI, I expected a more balanced, more comprehensive review of multimodal interaction tools. For Oviatt, the term "interface" seems to stand exclusively for "input device". The option of multimodal output is not even mentioned until the closing statement. Furthermore, the article exhibits a strong slant towards architectures that incorporate speech recognition as one of their input channels. I am not sure whether this is because of a lack of research in other modalities or rather a result of the author's own bias. The discussion diverges at times into minute details of speech systems that I felt misplaced for a survey article. One of the overarching metaphors in the article was that multimodal interfaces afford human-like sensory perception to computers. Human perception though is inextricably intertwined with our memory and attention systems. These two important building blocks were completely left out.
While the inserted summary tables help to give the reader an overview of frequently used terms in the article, I think at least one pair of definitions is questionable: the distinction between active and passive input modes. Human gesturing behavior is often quite deliberate and intentional and serves the explicit purpose to communicate certain aspects of the speaker's utterance - especially in deixis. As such it should be classified as "active" (cf. McNeill's "Hand and Mind", which Oviatt cites multiple times).
"High fidelity simulation testing" seems to be a fancy label for Wizard of Oz testing. An important limitation to the applicability of this methodology is the relationship between the response time the wizard needs to select feedback and the expected/acceptable latency of the application for the user/subject. Human reaction time is a good match for speech interfaces, but may not be for other modalities. On page 12, evidence is presented that multimodal input is complementary rather than redundant. This weakens the previously stated claim that disambiguation is easier in multimodal interfaces. Multiple channels do provide more information, but if this information is about different aspects of user intention, inference across channels is far from trivial.
The silver lining here is the frequent reference to work in cognitive science that can (and must) inform future development in multimodal interfaces.

Interaction Techniques for ambiguity resolution in recognition-based interfaces, Jennifer Mankoff, Scott E. Hudson, Gregory D. Abowd, UIST 2000: ACM Symposium on User Interface Software and Technology, pp. 11-20

The authors describe OOPS, a system that encapsulates mediation strategies for recognition-based input devices. More than the particular practical value of the tool, the contribution of the article is its framework of terminology within which one can think about dealing with ambiguous and error-prone input. Discussion of particular function calls in their OOPS framework are level-of-detail mismatches compared to the rest of the paper (too detailed). The proposed mediation strategies appear to be context insensitive - they work only on a given atomic level (e.g., a word) without taking the larger structure in which the atom appears (e.g., a sentence) into account. The presented solution to deal with occlusion may introduce more problems than it solves - in a dense interface or document, moving elements is likely to cause other occlusion. If on the other hand the procedure is recursive, it could lead to a lengthy cascade of GUI reorganization that potentially changes the entire visual appearance of the interface for the duration of the dialog display.


Computer Vision for Interactive Computer Graphics, William T. Freeman, Yasunari Miyake, Ken-ichi Tanaka, David B. Anderson, Paul A. Beardsley, Chris N. Dodge, Michal Roth, Craig D. Weissman, William S. Yerazunis, Hiroshi Kage, Kazuo Kyuma, IEEE Computer Graphics and Applications, May 1998, pp. 42-53

The article presents simple, FAST computer vision algorithms for interactive UIs. Computer vision for HCI has different requirements from traditional application areas: results need to be available quickly, but the kind of information sought is often limited (e.g., no complete 3D reconstruction of a scene). Additionally, since a human is in the loop, feedback can be used to allow iteration/adaptation of software and user behavior.
The balance in presentation between mathematical methods and concrete application examples is quite effective as an "appetizer" - some links to textbooks for further exploration would have been useful. Here are two: an accessible introductory text in computer vision methods is "Machine Vision" by Jain, Kasturi, and Schunck (McGraw-Hill 1995). More in-depth treatment of current research problems, especially in 3D reconstruction, can be found in Computer Vision: A Modern Approach by Forsythe and Ponce (Prentice Hall 2002) The latter text requires a well equipped mental math tool box.
CV-based surgery is used as an early motivating example, which was quite scary for me. I'd rather entrust my health to physical manipulation based interfaces such as those developed by Ken Salisbury.
Just an idea: one could use image pyramids to compute multi-resolution classifiers from coarse to fine. Whenever the real-time system requires a response, one can return the last completed resolution calculation as the current "best guess". Have anyone done this yet?


A Design Tool for Camera-based Interaction, Jerry Alan Fails and Dan R. Olsen, CHI 2003: ACM Conference on Human Factors in Computing Systems, pp. 449-56

Crayon is a computer vision tool for building color-based classifiers for object tracking applications. It demonstrates the kind of productivity gains that are possible when HCI principles of iterative design and rapid prototyping are injected to a previously only technically oriented domain.
I do not buy the author's argument about most ML algorithms being completely impractical for real-time interaction. Their conclusion is solely based on their particular choice of performing per-pixel classifications with a large feature vector. Building more knowledge about potentially useful features into the system a priori -- instead of learning appropriate filters/kernels on the fly for every image -- could lead to dramatic shifts in performance of other methods. Crayon uses R,G,B,H,S,V values per pixel as the fundamental features - note that these are six features for only three independent dimensions. When expanded over the image regions a LOT of redundant information is stored in each classifier.
Decision trees frequently suffer from overfitting - which becomes an issue if training data sets don't accurately reflect testing situations. It seems that the Crayon approach would be most accurate if the end-user builds her own classifier in her actual application setting, instead of the UI designer building a classifier in a potentially very different lab environment. I would have liked to see a running example of how the classification step fits into a complete vision-based application.
The general painting metaphor seems to be quite similar to Adobe PhotoShop's "Extract" function. Maybe a rough boundary painting approach followed by region filling would be more successful than only looking at the pixels underneath the user's crayon trace.

Posted by Bjoern Hartmann at 01:19 AM | Comments (0)

November 01, 2004

HCI Comments X

SpeechActs: A Spoken-Language Framework, Paul Martin, Frederick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich, IEEE Computer, July 1996, pp. 33-40

Note: I am not very excited about speech technology in general, which somewhat tinges my answers to this reading. I am not convinced that the translation problem from natural language (speech) to some form of machine interpretable format is in fact one of the major challenges in HCI. Speech is an effective means for communication between a small number of co-located humans in the absence of other communication channels. That does not make it an automatic good candidate for HCI (cf. my stance on imitating human-to-human interaction methods in HCI in general from the last reading response).

SpeechAct's chosen scenario is information delivery for business travelers. The presented example fails to convince me of the usefulness of such a system. Granted, the article was written in 1996 - but where are business travelers faced with a situation that allows for phone calls but not internet access today? In hotel rooms, phones have data ports. In residences, you can unplug a phone and plug in your modem cord. Business-oriented cell phones can be used as modems through their infrared or bluetooth ports. That leaves pay phones - which are getting harder to find by the day. In addition, some of the tasks performed by SpeechActs do not lend themselves to auditory presentation. The user has to have present her mental model of the SpeechActs application so she knows what she _can_ ask and at the same time keep track of all the information presented so she knows what she _wants_ to ask. Imagine how long synchronizing schedules will take if there are more than two meeting participants and each participant already has a packed schedule.

The authors' primary goal was to build a speech application toolkit for software developers that do not have expertise in speech or natural language. Constructing the unified grammar seems to be quite a daunting task for such linguistically "naive" developers. On the positive side, the authors were careful to construct a future-proof software system by stressing independence from particular recognizer/TTS implementations and by supporting multiple applications to service voice requests. They also acknowledge that some of the challenges for speech systems are not related to technical implementation, but rather with human expectations. Prior work was not surveyed in enough detail to judge the specific contributions that SpeechActs made.



The Audio Notebook, Lisa Stifelman, Barry Arons, Chris Schmandt, CHI2001: ACM Conference on Human Factors in Computing Systems, pp. 182-9

Here a more promising application area for voice technology is demonstrated: many situations exist where capturing an original audio stream is quite important because it comes from an authoritative source and will only be produced by that source once. Reviewing that original recording in random access fashion is complicated an frustrating with existing technologies used by the target audience (students&reporters; tape recorders). The abstract problem the paper addresses is automatic semantic segmentation of time-based media.
The authors provide an intriguing solution by augmenting a familiar interface that most members of the target audience already use - the paper notepad. A range of uses is supported to allow different interaction styles with the audio notebook: users can continue previous note taking activity without having to adjust at all - or change what and how information is written down to further simplify review later on.

The designers chose to employ the audio notebook both as the input device during note taking and as the output device during review. The requirements of these two processes can be quite different so we should not assume that a single interface will present an optimal solution. My personal preference would be for a central storage server that unites information from multiple input devices. This way one could recall the recording, hand-written notes, but also additional documents like lecture slides and pdf articles from one device connected to the central information server. The audio scrollbar is a low-bandwidth interface - low in information content and resolution. Also, phrase-snapping and segmentation make the audio scrollbar display non-linear, which complicates user predictions how far in time the audio will jump when selecting a different LED. A graphical representation of the audio stream with additional segmentation mechanisms would provide for richer interaction. Allowing other applications to access the recorded voice data on a central storage server would also enable post-processing to improve the fidelity of the audio signal.

The author's solution to the problem of incorrect segmentation by the topic suggestion algorithm is not very satisfying. If the audio will be reviewed multiple times, some direct user intervention to correct segmentation can be valuable. This would once again be a relatively easy task if other software could access the audio recordings. For music, tools like Steinberg's Recycle or the FruityLoops BeatSlicer perform semi-automatic user-correctable segmentation.

The long duration of the field test resulted in very rich usage data. Shortcomings and direction for future work were lacking.

(It would be interesting to port this work to a Tablet PC - here we already have all the required hardware, save for a decent microphone, and additional processing capabilities. Has anyone done this?)



A Confederation of Tools for Capturing and Accessing Collaborative Activity, Scott Minneman, Steve Harrison, Bill Janssen, Gordon Kurtenbach, Thomas Moran, Ian Smith, Bill van Melle, MM 1995: ACM Conference on Human Factors in Computing Systems, pp. 523-34

Coral, a suite of tools to deal with time-based media, has three foci: capturing interaction unobtrusively, indexing the recordings, and accessing the the recordings. A particular, narrow application domain - supporting casual group interaction - is picked to ground the research in real-world requirements. The authors present a useful taxonomy of indices (segmentation marks): intentional annotations, side-effect indices, derived indices and post hoc indices. Furthermore, capture and access situations are distinguished clearly, with completely different hardware supporting each stage (in contrast to the audio notebook). The loose collection of tools comprising Coral is based around shared communication protocols and interfaces - it is easily extensible. Building access tools is mentioned as the area where most work remains to be done. Bits and pieces for UIs that deal with time-based media exist in hardware and software used in audio and video editing (time lines, jog shuttles, interval-based selection, multi tracking). Uniting them in a common time-based framework would be worthwhile.

Posted by Bjoern Hartmann at 04:03 AM | Comments (0)

HCI Comments IX

Direct Manipulation vs. Interface Agents, Ben Shneiderman and Pattie Maes, ACM Interactions, December 1997, pp. 42-61

The article presents excerpts from two separate debates between Shneiderman and Maes at the IUI and CHI conferences in 1997. Prepared statements and responses to specific attendee questions alternate. The chosen format is both misleading and suboptimal for printed reproduction. First, the exchange as presented in the article never actually happened in this form. The reader cannot assume a continuous narrative since responses came from two different events and may be presented interspersed and out of order. Second, transcriptions of verbal exchanges are structurally very different from written arguments. Spoken language is characterized by frequent redundancy to help the listeners remember significant points. In written communication it is easy for the reader to simply skip back a paragraph or two if she does not remember something. As a result, some basic positions of the two debaters are repeated frequently.

Shneiderman argues that we should build comprehensible, predictable user interfaces that afford direct manipulation of all relevant parameters. This combination, he says, will make users accept responsibility for the actions they take using the system. Maes argues that computing systems have become so complex that the user does not want to be concerned with all the details and that intelligent software can act according as a representative of the user and proactively perform tasks without direct UI interaction. The two positions are not completely at odds with each other as both Shneiderman and Maes agree that direct, predictable interfaces are important and that hiding complexity under the hood can also be beneficial.

Shneiderman does not think that "human-to-human interaction is a good model for the design of user interfaces." (56) My own research background is in simulating embodied virtual agents. In that field, the richness of human-to-human communication is often given as motivation for building more realistic anthropomorphic agents. However, graphic simile comes along with a whole list of expectations regarding reasoning and behavioral processes. I am not sure if we are making substantial progress in these areas. Interacting with an agent is still significantly worse than having a teleconference with a real human - and we don't even like to engage in that latter form of interaction (cf. Hollan, Stornetta's Beyond Being There).

Maes states on pg. 50 that "More and more the Word Wide Web and our browser is becoming the one and only interface." I disagree - otherwise we could reduce the work of HCI researchers to tweaking Javascript.



Models of Attention in Computing and Communications: From Principles to Applications, Eric Horvitz, Carl Kadie, Tim Pack, David Hovel, Communications of the ACM, March 2003, pp. 52-8

Horvitz et al. outline their work in the sensing and processing of user attention at Microsoft Research. They identify the important role attention plays both in human cognition and in social patterns of communication. In their research system, data from multiple input channels such as accelerometers, touch sensors, and gaze trackers are fed into Bayesian belief networks and HMMs to reason about the locus and object of user attention over time. Given such an internal representation of user attention, their "Notification Platform" makes choices about when and how to display different system messages to minimize disruption.
Providing feedback -- showing the outcome of the system's attention calculation to the user -- seems important to render the decisions of the system transparent and understandable. Examples of two visualizations - a color/intensity changing "lens" and an animated anthropomorphic agent were given, but a more systematic approach seems necessary.

On the last page, the authors claim that "robust solutions to the speech-target problem promise to significantly influence the overall sociology of human-computer interaction [...]." I disagree - speech interaction's problems prevent large-scale adoption are not purely technical in nature. As Shneiderman put it in the previous reading, "natural language interaction [...] has not been a success." Speech in general is not an efficient input/output channel - it requires a lot of cognitive processing and also faces cultural resistance (people feel uneasy being seen while talking to their computer).

Posted by Bjoern Hartmann at 12:54 AM | Comments (0)