Fred Brooks is well known among computer scientists for (at least) two reasons: his work at IBM managing the development of the System/360 hardware and software, and his book describing that experience, The Mythical Man-Month (That book introduced Brooks's law: "Adding manpower to a late software project makes it later.") Brooks is a professor at UNC Chapel Hill where his research is focused on human-computer interaction and scientific visualization, especially in augmented and virtual environments.
Brooks has credibility with computer scientists, and this will likely be of importance for the impact of "The Design of Design," a collection of essays about design process in computer science (his work) and architecture (of buildings - his hobby). For readers familiar with the design research literature, the book may hold relatively few new insights and will re-present many common themes. But its target reader is not the design specialist; it's the software engineer and software team manager. With this reader in mind, the book succeeds as a cogent argument for the importance of paying attention to design process and getting that process right. Having this argument presented by a respected engineer, manager, and researcher in his no-nonsense style, illustrated with numerous examples of important, complex projects may carry the necessary weight for those skeptical of the fuzzy nature of design. This is a great book to hand to incoming CS graduate students and engineering colleagues.
To sum up the main arguments of the book: First, Brooks makes the case for iterative design. The "rational model" of design as a sequence of well-delineated stages (as found in Royce's Waterfall model and Simon's science of design) comes naturally to engineers; it mischaracterizes what actually happens in practice; it is thus harmful to real-world projects. Our ideas are incomplete and inconsistent before we attempt to realize them - thus requirements cannot be captured in the abstract. The role of design is to uncover requirements and to generate alternative strategies to meet those requirements. The project definition necessarily changes as we work on it; the process model must take this into account. Brooks points to Boehm's spiral and other models of iterative development with regular end-user involvement.
Second, the overall goal of design is to achieve conceptual unity across all aspects of a project. This is hard; design by committee never achieves it. Open Source design can achieve conceptual integrity - but it tends to do so only in domains where the builders are also the users (e.g., Linux). Brooks is unsure if Open Source works well as a methodology when designing for others, because of the incentive structures of the Open Source community (he strongly recommends reading Raymond though).
How can one achieve conceptual integrity in design teams, while leveraging the benefits of collaboration? Conceptual integrity requires having only one, at most two, main system architects and one, at most two, primary UI designers. However, more bodies are advantageous for uncovering needs and for design reviews. (For brainstoriming, "more minds mean more ideas. [...] the ideas are not necessarily better.") CSCW research is misguided if it envisions equal-participation collaboration as it ignores synchronization costs. While many CSCW design tools have been proposed, few have succeeded in practice (with revision control and "track changes" the exceptions); thus, Brooks cautions against assigning PhD dissertation topics in collaborative design tools. I agree that a naive approaches to collaboration often fail; but I disagree that careful, grounded research by graduate students cannot rise to the challenge.
Chapters in the section on "design perspectives" examine rationalism versus empiricism as fundamental stances toward design; the role of limited resources and constraints; "style" in engineering design, and the role of exemplars. Brooks points out that the prevalence of the rational view (careful planning and reasoning alone can yield the right design) is unique to computer science; all design disciplines that deal with the vagaries of the physical world necessarily rely on testing and iteration because of the limitations of their formal methods. However, taking an empirical stance to design does not relieve the designer of the responsibility of careful thought and planning. Constraints are welcome, because it is easier for a designer to exercise restraint when designing for a narrow purpose than a general one.
What defines style in design? Parsimony (economy of expression) is often an ingredient, but it alone is insufficient. For example, a minimal instruction set might be Turing-complete, but it does not support any one concrete task well. Beyond concision, then, "structural clarity" is required: it "demands that the basic structural concept of the design be plainly evident and, if not logically straightforward, easily explained." Style in engineering design is "a set of different repeated microdecisions, each made the same way whenever it arises, even though the context may be different." Style is hard (and voluminous) to explicitly specify, this is one of the reasons that coherent style in group design is rarely achieved. While documentation of style is not trivial, Brooks argues strongly for doing so. This raises the question: how might documentation of a design style be facilitated?
The chapter on the role of exemplars in design was of special interest to me as example-centric development has been a research focus of mine (see d.mix, HelpMeOut) and of colleagues. Referring to exemplars in design has important benefits (the "provide safe models for new designs, implicit checklists of design tasks, warnings of potential mistakes, and launching pads for radical new designs.") The use of exemplars in software design lacks in comparison to other disciplines, which Brooks laments. Novices and experts may draw upon different sets of exemplars: novices tend to use ones encountered in their immediate experience, while experts refer a much larger set of historical precedents. Our scholarly literature often does a poor job at explaining the rationale for design decisions in software systems - thus papers are of limited use as design exemplars. However, conveying rationale is what really matters. (For research, this suggests that merely showing related exemplars without proper explanation of why they are shown may not be sufficient.) The systematic collection and cataloging of exemplars should be encouraged (but doesn't count as research in technical disciplines). Beyond collection, exemplars should be critiqued and compared to others. But, does reliance on examples lead to laziness and restrict originality? Brooks answers that one should not design by merely copying and adapting exemplars, but by deeply understanding their rationale and transferring the approach, rather than the surface structure. He concedes that "the world is full of lazy Bauhaus architecture and mediocre ranch-type homes" but argues that the fallacy of their architects was a too near-sighted in use of exemplars.
The later sections of the book were less central to Brooks' argument, and to my interests. A historical note describes how the profession of design is characterized by its divorce from making, as well as from using, the designed artifact. This means that designers have to work hard to understand the perspective of both the implementer and the final users, as neither come naturally. A chapter on the attempted capture of design rationale during the process of design (of a house) concludes that such a project is rather complicated, and that current tools are not much help. The book concludes with a set of case studies which answer his own call to capture and describe design rationale through collections of exemplars. However, they are also very specific to the design of houses, organizations, books. To me then their value is in demonstrating the possible format of rationale descriptions.
One surprising aspect of the book is the repeated use of Biblical references in the prose. These tend to detract from Brooks' otherwise sound arguments for those of us of alternative persuasions.
My dissertation focused on prototyping tools for interaction designers so I have been keeping an eye out for relevant design process books. Two such books have recently been published: Todd Zaki Warfel's "Prototyping - A Practitioner's Guide" makes the case for prototyping in UI design and contains tutorials on constructing prototypes in various software tools. Fed Brooks' "The Design of Design" is a collection of musings on the design process, drawing examples from computer systems (Brooks' work) and architecture (of his own house). These are some first impressions after (partially) reading Warfel's book on my daily BART commute. Notes on Brooks will follow later.
Warfel addresses his book at fellow interaction designers that want to know how, when, and why to prototype during the design and development of user interfaces. The first part of the book surveys the conceptual landscape; the second part describes six small prototyping projects, each conducted in a different tool: on paper, and in PowerPoint, Visio, Fireworks, Axure RP Pro, and HTML+Javascript. This structure would have made the book a good candidate text for the UI Prototyping Design Clinics I co-taught this Spring at Berkeley; I will suggest the book to students in future semesters.
There are many things to like beyond the nuts and bolts description of how to use various tools: Warfel systematically describes which tool fits which purpose; he shows survey data which tools designers use today; and he adds multiple case studies that give concrete examples how prototypes were constructed and what functions they served.
At times though, the tone was too conversational, obscuring rather than highlighting insights. More importantly, the conceptual argument about the value of prototypes seems to mostly derive from the author's intuition and experience. Often, his arguments ring true. However, much has already been said and written in the HCI, computer science and design communities about different kinds of prototypes and the roles they serve in the design process. A discussion that takes this prior work into account or at least points to it for further reading would have been much appreciated (I have a partial list). Finally, many of the surveyed tools focus on the same small area of the design space of prototyping tools: creating static screens for desktop or browser-based UIs and hyperlinking them in some fashion. Continuous interactions, gesture/multitouch input, and other non-desktop UIs are only mentioned in passing. But that bias might be an artifact of today's tools - Brad Myers' survey of interaction designers from VL/HCC08 points out that designer need better tools to prototype rich interactive behaviors. Some tools to do so are already available: Flash Catalyst was recently released; and research projects such as K-Sketch demonstrate that we researchers can contribute better tools to rapidly create more dynamic prototypes as well.
I will present two new papers at CHI2010 in April - final preprints and some videos are available now, below.
Hartmann, Björn, MacDougall, D., Brandt, J., and Klemmer, S.R. What Would Other Programmers Do? Suggesting Solutions to Error Messages. Proceedings of CHI 2010: ACM Conference on Human Factors in Computing Systems. Atlanta, GA, 2010.
Hartmann, Björn, Follmer, S., Ricciardi, A., Cardenas, T., and Klemmer, S.R. d.note: Revising User Interfaces Through Change Tracking, Annotations, and Alternatives. Proceedings of CHI 2010: ACM Conference on Human Factors in Computing Systems. Atlanta, GA, 2010.
Reading Ch12 of Hofstadter's Metamagical Themas which Scott dropped off in my office this morning.
Hofstadter claims that "Making variations on a theme is really the crux of creativity." (235)
Think of a concept as a machine with knobs on it. Use the knobs to interpolate and extrapolate variations from the original concept. Example 1: Rubik's cube might have a knob of dimensionality that happens to be set at 3 in original. Example2: John Gould's dreamed of turning the listener of music into a conductor through a parametric multitrack playback interface.
But where does the set of knobs come from? What can get varied? Is there a fixed, even finite set? Hofstadter argues that extrapolation needs creative analogy and that changing context and perspective produces new sets of knobs. This shifting of perspective, "nondeliberate yet nonaccidental slippage," is at the core of creative thought. Hofstadter also labels this activity as producing "subjunctive variations on a theme" - I believe this is where Aran Lunzer took his notion of "subjunctive interfaces."
The key quote on reframing: "Context contributes an unexpected quality to the knobs that are perceived on a given concept. The knobs are not displayed in a nice, neat little control panel, forevermore unchangeable. Instead, changing the context is like taking a tour around the concept, and as you get to see it from various angles, more and more of its knobs are revealed." (239) Or: "[Good knobs come from] seeing one thing as something else." (251)
Example 3: Don Knuth's Metafont, which allows typeface designers to create parametric letter definitions. This parameterization is hierarchical: there are typeface-level controls as well as letter-level controls. But H~ argues (convincingly) that all these systems only ever explore sub-spaces. One reason: different styles of letters we all agree on as instances of the letter "A" have very different underlying structure - varying continuous dimensions will never result in such fundamentally different approaches.
"One of Knuth's main thesis is that with computers, we now are in the position of being able to describe nut just a thing in itself, but how that thing would vary." (240) An open question though is how accessible this parameterization process is for creators/designers: describing a parametric space of possible designs is a very different activity from producing a point solution within that space. An avenue for future research is pointed out: given a set of examples, automatically derive the structure of the design space within which they are embedded. That sounds hard, but interesting. "If we wish to enlist computers as our partners in this venture of inventing variations on a theme [...] we have to give them the ability to spot knobs themselves, not just to accept knobs that we humans have spotted." In his words, computers should help us explore the "implicosphere" (implicit counterfactual sphere or sphere of implications) of a concept.
H~ mentions "One Book Five Ways" - a book containing, side-by-side, five different edited versions of a single manuscript. This reminded me of Raymond Queneaus's "Exercises de style" which is made up of 99 variations of a common story.
Are variations fundamentally different from new themes? No, but they tweak less obvious, hidden knobs.
P.S.: Hofstadter needs an editor.
During some down time in New York last week I finally got to read Clay Shirky's "Here Comes Everybody." Shirky's strong suit is balancing concrete stories with the principles behind those stories, distilled in concise nuggets of insight. A bit of social science and economic theory gets added to the mix, resulting in an engaging survey of the promises and limits of collaborative sharing and production with online social tools.
I found the early chapters (2-4) and the conclusion most valuable, with the highest aha-moments-per-page ratio. Here is my summary of the major points.
The second chapter, "Sharing Anchors Community," describes how sharing by individuals, aggregated through social tools, opens up new areas of value that are not served by traditional institutions. Coase, in "The Nature of the Firm," (1937) shows how hierarchical organizations can be more desirable than open labor markets. In an open market of individuals, transaction costs (negotiated agreements) rise sharply with the number of parties involved (squared? Flipside of Metcalfe's law?). Institutions use central control to lower the number of transactions. In return, they introduce managerial overhead. This overhead cost, required to simply maintain the institution itself, limits what kind of activities institutions can and will engage in. Activities below this "Coasean floor" (45) are valuable to someone, but their value is less than the cost of doing business for an institution. These activities are now viable because social tools reduce the cost of coordinating group action.
Chapter three tackles the complicated relationship between professionals and amateurs. A profession is a community that has its own world view and values: "a professional pays as much or more attention to the judgment of her peers as to the judgment of her customers". That focus on group-maintenance makes professions susceptible to miss changes in the core assumptions underlying the formation of their field. The prime example of course is publishing - where previously the hard part was to deliver content to the consumer, production and distribution are essentially free now. As amateurs, with the help of online platforms, can deliver the same value as professionals (see iStockPhoto), boundaries are blurred. But "mass professionalization is an oxymoron" (66) - professions are built on concept of exclusive group membership and shared beliefs/practices within group. So blogging is not a new form of publishing - it's an alternative to publishing.
Chapter four: In ye olden days, the structure of communication -- broadcasting (one-way, public) vs conversational (bi-directional, private) -- was bound up in the communicative medium (TV, phone). We can now mix and match attributes at will. As a result, many public pieces of information are in fact not "content" meant for public consumption - they are part of conversations. "Bloggers with a dozen readers don't have a small audience: they don't have an audience at all, they just have friends". But not all blog posts are about conversations: if you become famous, human cognitive capabilities limit you to to one-way broadcasting again - there's only so much information one can digest or respond to.
Relevance is always relative to the concerns of a particular community (Wenger's communities of practice are mentioned). Community members thus filtering collaboratively and AFTER information has been put online, an inversion of institutional practice.
The main argument of the conclusion is that any successful social tool needs three ingredients to work - promise, tool, and bargain. Failure in any one spells failure for the entire project. For Shirky, it starts with a promise of benefit to the user: what's the value of contributing to a new service? The promise has to be inspirational, concrete and achieveable. The key is to convince users that others will see value in the tool as well. Tools have to be tailored to their job. The two most important questions are: Will the group be large or small? And Is it short-lived or long-lived? Small groups tend to lead to convergent thinking, large groups tend to have divergent beliefs. The "bargain" is not about a good price you're getting, but rather the negotiated set of community rules that a user of a social tool agrees to. Here Shirky mentions Alan Fiske's basic modes of social collaboration, specifically equal participation, as an example of a bargain struck by a Flickr photo community. Finally, Shirky argues that all social tools have social dilemmas that come with them and some form of governance is required. Users take their bargains very seriously. When tool providers change the terms of the bargain unilaterally, backlash often ensues.
Some Choice Quotes:
"There is no such thing as a generically good tool; there are only tools good for particular purposes." (265)
"The most profound effects of social tools lag their invention by years, because it isn't until they have a critical mass of adopters, adopters who take these tools for granted, that their real effects begin to appear." (270)
"The spread of chap and widely available creative tools is sad for people in the advertising business in the same way that moveable type was sad for scribes - the loss from this change is real but limited and is accompanied by a generally beneficial social change." (209)
Because of my qualifying exams, I didn't get to read much during the first half of the year. Here are the books that left an impression on me from the second half:
English: for some reason or other, these are mostly older books that I discovered in bookstores around the country or through recommendations from friends.
* D. Price - "How to make a journal of your life" (10 Speed Press)
* Gay and Laney Salisbury - "The Cruelest Miles" (Norton)
* Geoff Dyer - "Yoga for People Who Can't Be Bothered To Do It" (Vintage)
* David Brooks - "Bobos in Paradise" (Simon and Schuster) (recommended by Jaime - funny)
* Bill Buxton - "Sketching User Experiences" (Morgan Kaufmann)
* Bill Moggridge - "Designing Interactions" (MIT Press)
German:
* Heinrich Steinfest - "Die Feine Nase der Lilli Steinbeck" (Piper)
* Martin Suter - "Der Teufel von Mailand"
* Jakob Hein - "Herr Jensen steigt aus" (Piper)
* Christoph Hein - "In seiner frühen Kindheit ein Garten" (Suhrkamp)
You can also take a look at the full list.

I recently interviewed Don Norman (author of Emotional Design, Design of Everyday Things, ...) for Ambidextrous, Wendy Ju's design magazine. Here is a pdf of the print version. The full transcript will appear on the Ambidextrous site.
I found a great used book store in the Sunset in SF this weekend: Black Oak Books [http://www.blackoakbooks.com/]. Picked up a copy of Ishmael Reed's Terrible Twos there and devoured it the next day. After a quarter of immersing myself in cs textbooks and conference papers I'm looking forward to a more balanced reading diet with more novels this summer.
I've been toying around with the idea of relating Deleuze&Guattari's Mille Plateaux to Ubicomp in particular and HCI design philosophy in general. Anne Galloway at Carelton University apparently had this idea of relating critical theory and cultural studies to ubicomp a few years ago and devoted her PhD thesis to the topic. Add that to the to-read-list...
User Learning and Performance with Marking Menus, Gordon Kurtenbach and William Buxton, CHI 1994: ACM Conference on Human Factors in Computing Systems, pp. 258-64
The advantages of radial and marking menus over linear selection mechanisms in terms of efficiency and speed are described sufficiently in the paper. I want to focus on some issues left undiscussed.
The "press-and-wait" mechanism used to invoke radial menus serves as a context switch operation - it is clear to user and application that the following pen stroke will be interpreted as a menu selection command. But what about marking menus where no clear context switch exists? It appears possible, if not likely, that menu strokes can be confused as strokes meant to be directly registered (e.g. as drawing) in the application window and vice versa. What kinds of disambiguation strategies exist? How likely are such errors of stroke interpretation? The authors specifically use the dual purpose of marks to their advantage to combine object selection and command specification in a few select cases in the ConEd software. But would this overloading work as well in a drawing or design application? Are overloadings application specific so that the user will have to relearn them for every application or do general principles exist?
Besides giving a heuristic of "natural" matching of selection direction to intended action, guidelines are missing for how to arrange menu items in a circle. The principle of command distinctness (Norman) suggests placing similar actions far away from each other.
A quote: "If possible, once a function in a menu is invoked, it is replaced by the corresponding inverse function." This seems half-baked and counter-intuitive. An implicit assumption is made here that a normal workflow alternates between issuing do and undo commands. But what happens if the user wants to sequentially mark all "laugh" occurrences? The appropriate menu item will have disappeared after the first mark and won't return until something is "unlaughed". This practice goes against the author's own recommendation that "marking menus are not appropriate when the list of items changes dynamically." Users will have to remember a set of binary modes to decide which action will be invoked by drawing a stroke in a particular direction.
In the discussion of the case study, the highly unequal amounts of time users A and B worked with the system -- user A crammed all testing into one week while user B took about a month -- complicate interpretation of the results. The different time periods may have had as much of an influence on learning marking as the different user experience levels.
The explanation for not including selection error in the study is not convincing. While not trivial, it is certainly possible to track error frequency at least as an approximation - for example by counting undo operations.
Passive Real-World Interface Props for Neurosurgical Visualization, Ken Hinckley, Randy Pausch, John C. Goble, Neal F. Kassell, CHI 1994: ACM Conference on Human Factors in Computing Systems, pp. 452-8
Dealing with 3D manipulation on a 2D display is indeed a frustrating task. Using 2D input devices like mice exacerbates the problem. 6DOF rotation+translation pucks exist (see Logitech's Magellan line), but manipulation still passes through a layer of indirection: movement of the control device has to be mapped into movement of the objects in virtual 3D space. The frame of reference here is the virtual world, so manipulation of the physical-world controller can be counterintuitive. Hinckley et al.'s approach is appealing in that the frame of reference is moved into the real world which the user already knows how to interact with. The computer merely listens/watches to replicate that interaction in its virtual representation.
The authors stress the need for a system that has a low re-learning time. This longitudinal aspect of system knowledge is seldomly addressed (exception: the previous paper).
I felt that the problem of "clutch" mechanisms was not discussed in sufficient depth in the paper. A neurosurgeon can easily manipulate 3D views, but she cannot efficiently work with the data beyond this 3D exploration. She would have to switch back and forth between some other input devices, e.g., mouse/keyboard, and the bimodal props. The described binary clutches afford the possibility of performing a switch, but the effort to do so is still substantial (find free space on desk, put objects down, make sure they stay in place, locate other input devices, etc.) Moreover, it is impossible to seamlessly resume the 3D manipulation once props were set down. Efficient task/device switching may well be a general issue with bimodal interfaces - a given task may be better supported in a bimodal system, but changing from that particular task to other interaction with the computer system may now be harder since both of the user's most versatile output devices - her hands - are already occupied.
The Design of a GUI Paradigm based on Tablets, Two-hands, and Transparency, Gordon Kurtenbach, George Fitzmaurice, Thomas Baudel, and Bill Buxton, CHI 1997: ACM Conference on Human Factors in Computing Systems, pp. 35-42
The authors make a strong case for the usefulness of their bimodal manipulation technique by retrofitting a production-level application with their system. Successful WIMP applications have often already undergone many years of incremental development and offer very comprehensive toolsets (e.g., Photoshop) unlikely to be matched by an experimental application. Research findings from an artificial "toy" system don't automatically scale to these more complex work environments. That the authors tested their GUI paradigms in both situations much increases my confidence in the reported results.
Notably, the permanent on-screen presence of the toolglass was eliminated in the Studiopaint implementation. In the physical world, artists do not keep all tools piled right on top of the central work area - it would create too much physical clutter. While watching the video, the toolglass did appear to me as visual clutter during non-tool-picking interactions. Instead of selectively hiding the toolglass, hiding it by default and only selectively displaying it, for example through a non-dominant hand move or button click, would seem to better support the goal of maximizing screen real estate for the artwork itself.
Using color picking as the example to demonstrate click through functionality is a bad choice in my opinion. When a color swatch is made semi-transparent, the perceived color is a mixture of the swatch itself and the underlying background color - while general hue assignments are possible (green vs. red), saturation and value cannot be picked precisely in a transparent overlay.
Haptic Techniques for Media Control, Scott S. Snibbe, Karon E. MacLean, Rob Shaw, Jayne Roderick, William L. Verplank, Mark Scheef, UIST 2001: ACM Symposium on User Interface Software and Technology, pp. 199-208
This paper may has an incredible density of ideas. No less then six unique haptic devices and eleven different application scenarios are introduced. What has happened to these prototypes? An ACM search revealed just four subsequent citations. Further research on the prototypes would most likely be fruitful - the surface of possible applications has only been scratched here. Contacting either Scott Snibbe or Bill Verplank may be worthwhile.
I stumbled over the references to the demise of Interval Research and embarked on some research about the think tank. According to the Online Archive of California, most of Interval's documents are now held in Stanford's libraries: [http://www.oac.cdlib.org/findaid/ark:/13030/tf1s2001tx] It appears though that the records are not accessible to the Stanford community - permission of Interval's legal team is required. In 1999, Wired ran a long article about Interval [http://www.wired.com/wired/archive/7.12/interval.html] and shortly thereafter Salon.com published a post-mortem: [http://www.salon.com/tech/log/2000/04/22/interval/]
Embodied User Interfaces for Really Direct Manipulation, Kenneth P. Fishkin, Anuj Gujar, Beverly L. Harrison, Thomas P. Moran, Roy Want, Communications of the ACM, March 2003, pp. 75-80
In their introduction, the authors argue that as GUIs have become dominant, the physical form factor of the computer - the box - has become anonymous and invisible. But why has the box become invisible? Why have we focused much more on developing GUIs instead of paying attention to form factors? The principal reason is precisely that GUIs are intangible and as such their creation, modification, augmentation is not controlled by the laborious and expensive manufacturing processes that control creation in the physical world. Standardized commodity hardware can also exploit economies of scale better than specialized devices.
"The devices are metaphorically related to similar noncomputational artifacts." Do we always need a noncomputational metaphor? It is surely helpful in explaining uses and modes of operation to non-experts, but they are constraining the user's imagination of what can be done with the device.
I fundamentally disagree with the author's discussion of the Palm hand held on the top of page 76 - I have been using various Palm PDAs for more than 5 years and the Palm never *was* my calendar. Instead, for me it is a bidirectional window to look at and modify my calendar, which itself is an intangible collection of data shared between and accessibly by a multitude of devices. When at home or in my office I always create and check entries on my desktop or notebook computer - both data input and output bandwidths are much higher than on the PDA. I only use the PDA on the road, when other methods of access are unavailable. My cellphone has another copy of the schedule to remind me of upcoming meetings. So let's not confuse the data and the device. Also, while it is true that the Palm was originally conceived as an electronic form of a paper personal organizer, much of its value to users today comes from its functioning as a general purpose platform with an open SDK. Many applications extending its functionality into completely different realms are available from 3rd parties. This suggests that it may be more promising to develop devices with form factors not tied to specific applications but to specific real world constraints (size,weight,modes of input/output) and let a large community of developers figure out what tasks such a device can support. Specialization can come later with add-ons to a general-purpose architecture.
Finally some notes on the presented techniques of page turning, scrolling, and tilting:
Page turning: a particularly bad example. At least half of the work here was done inside the GUI and not on the physical interface. I feel like adding any two buttons on the right/left side inside or outside the touchscreen would have led to similar result.
Scrolling: Sony developed a better analogy to turning a Rolodex wheel years ago with their scroll wheel built into cell phones and PDAs.
Tilting: the authors sidestep the issue of individual differences - settings for "neutral tilt" were derived as an average of users. A better approach would have been to let each user set their individual comfortable value (calibration).
Iterative Design of Seamless Collaboration Media, Hiroshi Ishii, Minoru Kobayashi, Kazuho Arita, Communications of the ACM, August 1994, pp. 83-9
This paper stands out in its presentation of a long-range view on the development of one particular application concept -- a distributed collaborative drawing space. The reader is taken through a history or archaeology of various prototypes, with discussions how analysis of earlier stages informed subsequent implementations. Tradeoffs emerge as unavoidable choices central to any design: a feature that enables one function (e.g., tilting the display down to facilitate writing on the surface) will hinder another one that was well supported before (e.g., looking "at" your collaboration partner vs. looking down upon him on the display).
Towards the end of the article, the authors suggest a paradigm shift from HCI to HHI - computer-mediated human-to-human communication. In 2002, eight years after the publication of this paper, Andy van Dam raised the same point in his CRA "Grand Challenge" Statement (http://www.cra.org/Activities/grand.challenges/vandam.pdf). Apparently we have not made much substantial progress in this area in the meantime.
Interacting with Paper on the DigitalDesk, Pierre Wellner, Communications of the ACM, July 1993, pp. 87-96
While the idea of turning the desktop metaphor on its head sounds appealing at first, I am not sure whether this particular implementation presents a real enhancement of a paper-based workflow. Much attention is being paid to input of paper-based information into the computer system. However, if a significant volume of output is generated by the computational task, it is not sure how this result should be integrated back into the real world. The user can write down a number or a word, but what about a page, or ten? Should the system be connected to a printer? What if such a printed document needs further revision? Should it be scanned and printed again? The proposed camera based input solution can also be improved upon: most of the discussed implementation issues resulted from the choice of overhead projection, which is incidental to the operation of the system. The previous paper introduced a better back projection method in the ClearBoard that circumvented most of the described problems. A positive point is the mention of handedness as a factor that should be taken into account in any interface that directly engages the user's hands.
At the end of the article, Wellner poses the following fundamental question: "Do we think of ourselves as working primarily in the computer btu with access to physical world functionality, or do we think of ourselves as working primarily in the physical world but with access to computer functionality?" I believe the question is misguided in the presented context. While paper is a physical object, its primary function is not as object per se, but as storage medium for intangible information (Origami is the exception to the rule). Manipulating paper-based information is thus in my mind a bad example for "working in the physical world".
Multiple-Computer User Interfaces: "Beyond the Desktop" Direct Manipulation Environments, Jun Rekimoto, In Extended Abstracts of CHI 2000: ACM Conference on Human Factors in Computing Systems, pp. 6-7
This short article introduces (or at least describes) the concept of MCUIs - multiple computer user interfaces. Interaction and synchronization techniques abound for sharing and exchanging data between different applications/processes/windows on one particular device - but most if not all of the techniques break down when users try to bridge the divide between different devices (of similar of different kind). The problem is certainly important. Ken Hinckley is addressing the same issue in his TabletPC research. The term "direct manipulation" appears to be overloaded with different meanings by different subgroups of the HCI community (cf. Shneiderman, for whom the term is attached to GUI data mining).
While reading Snibbe et al.'s article "Haptic Techniques for Media Control," (UIST2001, to be reviewed here shortly) I stumbled over a few references to the demise of the writers' former employer, Interval Research. The company was a secretive think tank focused on exploratory research, located on Page Mill Road, right behind Stanford's campus. Paul Allen started the lab with $100 million dollar in 1992, then pronounced it a failure and closed it down in 2000.
Wired wrote about a "shift in focus" from basic research to cable product development at Interval in 1999: http://www.wired.com/wired/archive/7.12/interval.html
According to the Online Archive of California, most of Interval's documents are now held in Stanford's libraries: http://www.oac.cdlib.org/findaid/ark:/13030/tf1s2001tx
Salon.com published a short post-mortem: http://www.salon.com/tech/log/2000/04/22/interval/
Gavin Miller was a researcher at Interval, as was Scott Snibbe.
Off on a tangent, Bill Gaver's work on auditory interfaces is worth looking into. Reference: 'Auditory interfaces' (1997) in Handbook of Interaction (2ed), Publisher: Elsevier, Editors: Helander, M G (external); Landauer,T K (external); and Prabhu, P (external). Link: http://www.rca.ac.uk/pages/research/dr_william_gaver_609.html
Two long articles - one on usability issues in a computer system installed in police partol cars, one on pen-based computing - were published in the New York Times. Links:
Trying to Make the Pen as Mighty as the Keyboard
By AARON RICADELA
http://nytimes.com/2004/11/11/technology/circuits/11next.html
Wanted by the Police: A Good Interface
By KATIE HAFNER
http://nytimes.com/2004/11/11/technology/circuits/11cops.html
SpeechActs: A Spoken-Language Framework, Paul Martin, Frederick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich, IEEE Computer, July 1996, pp. 33-40
Note: I am not very excited about speech technology in general, which somewhat tinges my answers to this reading. I am not convinced that the translation problem from natural language (speech) to some form of machine interpretable format is in fact one of the major challenges in HCI. Speech is an effective means for communication between a small number of co-located humans in the absence of other communication channels. That does not make it an automatic good candidate for HCI (cf. my stance on imitating human-to-human interaction methods in HCI in general from the last reading response).
SpeechAct's chosen scenario is information delivery for business travelers. The presented example fails to convince me of the usefulness of such a system. Granted, the article was written in 1996 - but where are business travelers faced with a situation that allows for phone calls but not internet access today? In hotel rooms, phones have data ports. In residences, you can unplug a phone and plug in your modem cord. Business-oriented cell phones can be used as modems through their infrared or bluetooth ports. That leaves pay phones - which are getting harder to find by the day. In addition, some of the tasks performed by SpeechActs do not lend themselves to auditory presentation. The user has to have present her mental model of the SpeechActs application so she knows what she _can_ ask and at the same time keep track of all the information presented so she knows what she _wants_ to ask. Imagine how long synchronizing schedules will take if there are more than two meeting participants and each participant already has a packed schedule.
The authors' primary goal was to build a speech application toolkit for software developers that do not have expertise in speech or natural language. Constructing the unified grammar seems to be quite a daunting task for such linguistically "naive" developers. On the positive side, the authors were careful to construct a future-proof software system by stressing independence from particular recognizer/TTS implementations and by supporting multiple applications to service voice requests. They also acknowledge that some of the challenges for speech systems are not related to technical implementation, but rather with human expectations. Prior work was not surveyed in enough detail to judge the specific contributions that SpeechActs made.
Here a more promising application area for voice technology is demonstrated: many situations exist where capturing an original audio stream is quite important because it comes from an authoritative source and will only be produced by that source once. Reviewing that original recording in random access fashion is complicated an frustrating with existing technologies used by the target audience (students&reporters; tape recorders). The abstract problem the paper addresses is automatic semantic segmentation of time-based media.
The authors provide an intriguing solution by augmenting a familiar interface that most members of the target audience already use - the paper notepad. A range of uses is supported to allow different interaction styles with the audio notebook: users can continue previous note taking activity without having to adjust at all - or change what and how information is written down to further simplify review later on.
The designers chose to employ the audio notebook both as the input device during note taking and as the output device during review. The requirements of these two processes can be quite different so we should not assume that a single interface will present an optimal solution. My personal preference would be for a central storage server that unites information from multiple input devices. This way one could recall the recording, hand-written notes, but also additional documents like lecture slides and pdf articles from one device connected to the central information server. The audio scrollbar is a low-bandwidth interface - low in information content and resolution. Also, phrase-snapping and segmentation make the audio scrollbar display non-linear, which complicates user predictions how far in time the audio will jump when selecting a different LED. A graphical representation of the audio stream with additional segmentation mechanisms would provide for richer interaction. Allowing other applications to access the recorded voice data on a central storage server would also enable post-processing to improve the fidelity of the audio signal.
The author's solution to the problem of incorrect segmentation by the topic suggestion algorithm is not very satisfying. If the audio will be reviewed multiple times, some direct user intervention to correct segmentation can be valuable. This would once again be a relatively easy task if other software could access the audio recordings. For music, tools like Steinberg's Recycle or the FruityLoops BeatSlicer perform semi-automatic user-correctable segmentation.
The long duration of the field test resulted in very rich usage data. Shortcomings and direction for future work were lacking.
(It would be interesting to port this work to a Tablet PC - here we already have all the required hardware, save for a decent microphone, and additional processing capabilities. Has anyone done this?)
Coral, a suite of tools to deal with time-based media, has three foci: capturing interaction unobtrusively, indexing the recordings, and accessing the the recordings. A particular, narrow application domain - supporting casual group interaction - is picked to ground the research in real-world requirements. The authors present a useful taxonomy of indices (segmentation marks): intentional annotations, side-effect indices, derived indices and post hoc indices. Furthermore, capture and access situations are distinguished clearly, with completely different hardware supporting each stage (in contrast to the audio notebook). The loose collection of tools comprising Coral is based around shared communication protocols and interfaces - it is easily extensible. Building access tools is mentioned as the area where most work remains to be done. Bits and pieces for UIs that deal with time-based media exist in hardware and software used in audio and video editing (time lines, jog shuttles, interval-based selection, multi tracking). Uniting them in a common time-based framework would be worthwhile.
Direct Manipulation vs. Interface Agents, Ben Shneiderman and Pattie Maes, ACM Interactions, December 1997, pp. 42-61
The article presents excerpts from two separate debates between Shneiderman and Maes at the IUI and CHI conferences in 1997. Prepared statements and responses to specific attendee questions alternate. The chosen format is both misleading and suboptimal for printed reproduction. First, the exchange as presented in the article never actually happened in this form. The reader cannot assume a continuous narrative since responses came from two different events and may be presented interspersed and out of order. Second, transcriptions of verbal exchanges are structurally very different from written arguments. Spoken language is characterized by frequent redundancy to help the listeners remember significant points. In written communication it is easy for the reader to simply skip back a paragraph or two if she does not remember something. As a result, some basic positions of the two debaters are repeated frequently.
Shneiderman argues that we should build comprehensible, predictable user interfaces that afford direct manipulation of all relevant parameters. This combination, he says, will make users accept responsibility for the actions they take using the system. Maes argues that computing systems have become so complex that the user does not want to be concerned with all the details and that intelligent software can act according as a representative of the user and proactively perform tasks without direct UI interaction. The two positions are not completely at odds with each other as both Shneiderman and Maes agree that direct, predictable interfaces are important and that hiding complexity under the hood can also be beneficial.
Shneiderman does not think that "human-to-human interaction is a good model for the design of user interfaces." (56) My own research background is in simulating embodied virtual agents. In that field, the richness of human-to-human communication is often given as motivation for building more realistic anthropomorphic agents. However, graphic simile comes along with a whole list of expectations regarding reasoning and behavioral processes. I am not sure if we are making substantial progress in these areas. Interacting with an agent is still significantly worse than having a teleconference with a real human - and we don't even like to engage in that latter form of interaction (cf. Hollan, Stornetta's Beyond Being There).
Maes states on pg. 50 that "More and more the Word Wide Web and our browser is becoming the one and only interface." I disagree - otherwise we could reduce the work of HCI researchers to tweaking Javascript.
Horvitz et al. outline their work in the sensing and processing of user attention at Microsoft Research. They identify the important role attention plays both in human cognition and in social patterns of communication. In their research system, data from multiple input channels such as accelerometers, touch sensors, and gaze trackers are fed into Bayesian belief networks and HMMs to reason about the locus and object of user attention over time. Given such an internal representation of user attention, their "Notification Platform" makes choices about when and how to display different system messages to minimize disruption.
Providing feedback -- showing the outcome of the system's attention calculation to the user -- seems important to render the decisions of the system transparent and understandable. Examples of two visualizations - a color/intensity changing "lens" and an animated anthropomorphic agent were given, but a more systematic approach seems necessary.
On the last page, the authors claim that "robust solutions to the speech-target problem promise to significantly influence the overall sociology of human-computer interaction [...]." I disagree - speech interaction's problems prevent large-scale adoption are not purely technical in nature. As Shneiderman put it in the previous reading, "natural language interaction [...] has not been a success." Speech in general is not an efficient input/output channel - it requires a lot of cognitive processing and also faces cultural resistance (people feel uneasy being seen while talking to their computer).
Readings in Information Visualization: Using Vision to Think, Chapter 1, Stuart K. Card, Jock D. Mackinlay, Ben Shneiderman, Morgan Kaufmann Publishers, pp. 1-34
The grounding of visualization techniques in the biology of the human perceptual system is essential. The authors make this point but do not have enough space to elaborate sufficiently on the specific properties of the human eye and visual pathway. An accessible psychology text that goes into further detail describing vision and other senses is: E. Bruce Goldstein, Sensation and Perception. Wadsworth, Sixth Edition, 2002. ISBN 0534539645.
While concepts relating to static presentation of data are discussed at length ("spatial substrate", "marks", and their graphical properties) the dynamic, temporal properties are only glossed over briefly. I believe more interesting work can be done here. Animation itself can be used to show physical or abstract phenomena, just as static properties of data marks. In sonification for example, all conveyed information is temporal in nature. Interestingly, different frequency ranges are perceived as different phenomena - can similar characteristic ranges be found for visual frequencies?
What about aesthetics and production value? The article explicitly states that many of the demo systems mentioned were not concerned with creating beautiful graphics. But maybe they should be. In "Emotional Design" I believe Don Norman argues that "attractive things work better" (haven't read the book yet- it's sitting on my desk). Can established graphic design practices and guidelines inform information visualization? Can they be captured in a set of heuristics?
Other odds and ends:
- The "Cost-of-Knowledge Characteristic Function" maybe more significantly altered by non-visual tools such as Google. Structure isn't everything.
- Cell animators have been using selective distortion and exaggeration - such as squash and stretch - to create believable characters. How can exaggeration be employed in info vis to guide user attention towards salient/unusual data points or patterns?
- The "overview+detail" technique could be generalized to show n different viewpoints of the same frame of complex data sets to overcome human difficulty in understanding higher-dimensional spaces.
Comments on the demo videos linked from the course page:
FILMFINDER: User interface designers should be sensitive to the importance of production values (cf. in-class discussion of Nielsen's bad graph in the Heuristic Evaluation paper). This demo has an incredibly overdriven audio channel that makes it nearly impossible to listen to with headphones.
TREEMAP: Shneiderman's group places emphasis on the concept of "dynamic queries" - yet their examples in the Treemap demo are all based on Visualization of Excel files that I assume are then manipulated internally in Treemap. Integrating their work with a relational database system that can answer the changing queries directly would make their approach more powerful.
In the example, users can specify their own color gradients. Which gradients maximize perceptible differences between elements is not intuitive. Maybe the color picker could be based on perceptual distances to aid the user?
The Table Lens: Merging Graphical and Symbolic Representations in an Interactive Focus+Context Visualization for Tabular Information, Ramana Rao and Stuart K. Card, CHI 1994: ACM Conference on Human Factors in Computing Systems, pp. 318-22
The paper's abstract tone made it difficult for me to imagine the operation of Table Lens in practice - the demo was much more instructive as to how their technique works. The principal contribution seems to be that much larger data sets can be shown at once than in standard spreadsheet applications. Also, the direct visual comparison affords an intuitive understanding of the characteristics of the data sets. Simple comparisons can be read off and don't have to be calculated. Grouping and complex relationships between a larger set of different variables are still hard to comprehend though since display is restricted to the 2D constraints of a flat area display. The paper was short on evaluation.
Thoughts on:
DENIM: An Informal Web Site Design Tool Inspired by Observations of Practice, Mark W. Newman, James Lin, Jason I. Hong, James A. Landay, Human-Computer Interaction, 2003. 18(3): pp. 259-324
Newman et al. show how fieldwork can play an important part in the UI design cycle. Their initial user study suggested the lack of tools for the early stages of website design. Consequently, the authors abandoned their original plan to design software for finished websites and refocused on developing an information organization tool that supports the early sketching phase of website design. Unwittingly, they further underline the importance of user-centered development in their implementation of "semantic zoom" levels within DENIM: three representation levels were suggested by the initial user study; two more were added ad-hoc by the programmers. The latter two levels then turned out to be of little use to practitioners during the reported evaluation study. Most insightful quote on this topic (pg. 317): "Much of [web design practice] literature is prescriptive rather than descriptive in nature and may not accurately reflect what designers are actually doing in the field. To learn what designers do, there is no substitute for direct connect."
Curiously, a discussion of data-driven, dynamic websites was absent from the otherwise very thorough and detailed article. In my own experience as a web designer and programmer, most sites have long moved from static HTML to database/CMS-backed, script-driven dynamic output. In this programming-heavy model, sub-page-level code modules are used as basic building blocks - logic widgets that often get reused in multiple different pages. DENIM focuses solely on surface realization of pages and does not address this underlying information infrastructure.
The authors write that "several designers were interested in having a tool that helped them keep track of project histories." This statement suggests an unfortunate disconnect between the realms of web design and general computer programming. The computer science community has long relied on freely available version control tools such as CVS. These systems readily work for HTML pages; given enough interest, it should also be possible to develop an extension for performing version comparisons on images - "visual diffs".
A major shortcoming of the DENIM system is the lack of downstream integration into the further production pipeline. The notion of "semantic zoom" should be carried through to its conclusion. Finished site sketches apparently have to be reprogrammed from scratch for final production - GIF image-map exports are not very useful to work off.
Finally, it seems to me that the custom set of pen gesture commands developed for DENIM would hinder rather than help its adoption. Web designers have to learn a new list of UI commands that are only useful within one application. This is possibly a general problem of pen-based interfaces where standards are still lacking or inherently difficult to define across applications (cf. Ken Hinckley's CS547 talk from last week).
How to Conduct a Heuristic Evaluation, Jakob Nielson, http://www.useit.com/papers/heuristic/heuristic_evaluation.html
Multiple people are likely to find more usability problems than a single evaluator. Nielsen argues that from a cost-benefit point of view, three to five evaluators provide the best payoff. Heuristic evaluation is contrasted repeatedly with user testing; however, no clear succinct definition of what constitutes a user test is given. Benefits of heuristic evaluation are likely to be incremental/evolutionary rather than revolutionary: emphasis is placed on small individual problems rather than large, conceptual issues. Because of the reliance on a set of heuristics, evaluators are less likely to find structural problems not well covered by these heuristics. Real work done with an application may expose a different set of problems. Nielsen's cost-benefit analysis is based on commercial software products. Open-source software is developed and distributed through a distinctly different model that necessitates a different kind of analysis. Cheap automatic evaluation techniques combined with the short cycle period between OSS release versions may help achieve better coverage of more usability problems than a single small group heuristic evaluation. Example.: the Firefox browser interface went through many different incarnations; more so than iExplore or another browser. Even though the changes from version to version were rapid and took getting used to - something commercial vendors may not be able to subject their customers to - the UI has now become stable and is more functional than that of other browsers.
Methodology Matters: Doing Research in the behavioral and social sciences, Joseph E. McGrath, in Readings in Human-Computer Interaction: Toward the Year 2000 R. M. Baecker and J. Grudin and W. A. S. Buxton, ed. pp. 152-169.
McGrath presents a clear summary of social science research methodologies. In terms of knowledge communicated this article is the most dense so far and the most efficient in terms of "data ink" (concepts-per-page). The article touches upon all important issues to consider when planning social science research, but the limited length prohibits in-depth treatment - the buzzwords are there, but many are not explained exhaustively enough for a practitioner. A more detailed discussion can be found in: Investigating Communication - An Introduction to Research Methods. L.R. Frey, C.H. Botan, P. G. Friedman, G. L. Kreps. Prentice Hall, 1991 (my undergraduate textbook). The importance of triangulation between different methods is stressed to offset the inherent bias/weakness of any one particular methodology. For HCI researchers, adopting rigorous soc. science research methods means an two-step removal from usual computer science practices - first one needs to replace developers' intuition with external evaluation. One then needs to qualify any particular evaluation by considering the effect of the chosen measurement instrument. The importance of collaboration and cooperation between researches and institutions emerges since the scope of producing research validated by multiple methodologies may be too large of a project for any single individual.
Measuring API Usability, Steven Clarke, Dr. Dobb's Journal Special Windows/.NET Supplement, May 2004, pp. S6-S9
Clark recognizes that APIs are as much a user interface as GUIs - so similar techniques can be used to evaluate their usability (breakdowns). Scenario approaches can help determine desired API functionality. Comparisons between user expectations and API affordances according to the "cognitive dimensions framework" help pinpoint potential areas of mismatch where an API needs to be redesigned.
While I am usually a proponent of concision, I would have liked this article to be more substantial. The absence of any references exacerbates the problem. Hiding the information of how to read the radial graph in Figure 1 is in itself a usability breakdown of the article.
Kristen Blair, Bill Gaver, Tony Dunne, Elena Pacenti. Cultural Probes,ACM Interactions, February 1999, pp. 21-9
Researchers can improve the quality of their data gathering by consulting and applying principles from the artistic/design tradition. High production values generate better responses. Using hand-crafted artifacts reinforces the impression that the people being observed are important as individuals, not just as an amorphous "target group". Likelihood of cooperation is thus increased. Moreover, the unconstrained, evocative techniques encourage rich responses and enable "user-centered inspiration" that informs the developer about previously undiscovered issues. In some parts, the study appears to veer too much towards gratuitous design tinkering, and the step from viewing probes to proposing designs is not sufficiently described. Principal take-away message for me: sustaining regular communication on a personal level with the target audience is essential. Being creative about the means employed to achieve this conversation can help.
Prototyping for Tiny Fingers, Marc Rettig, Communications of the ACM, April 1994, pp. 21-7
Rettig presents powerful arguments for using quick-and-dirty low fidelity prototypes very early on in the development process. Changing designs and going through more iterations becomes easier since the investment (in terms of time, resources and ego) in each prototype is kept low. Results can be obtained quickly and the danger of getting mired in details too early on is minimized. I have experienced the point that testers tend to comment on "fit and finish" issues in a different area - paper writing. Draft reviewers often focus on surface details such as tone or even punctuation, without addressing more fundamental questions such as structure or soundness of arguments. Switching to other media for design prototypes is in itself valuable - physical, tangible objects have very different "paths of least resistance" and constructing a prototype under such altered circumstances forces the developer to question notions of what is natural/easy/desirable in the interface.
Looking Across the Atlantic: Using Ethnographic Methods to Make Sense of Europe, Genevieve Bell, Intel Technology Journal, 3rd Quarter 2001
The difficult task the ethnographer-as-outsider faces to truly understand her subject of inquiry became apparent to me through some of the reported details. Having grown up in Europe, I noticed a few choices made that likely prevent the study from producing a balanced view. The chosen time frame over the summer months is problematic. Especially in southern Europe, daily habits fluctuate greatly between seasons and the heat common in July/August strongly influences behavior patterns since air conditioning is not common in private residences. So people spend the middle of the day indoors to avoid the sun and go out at night much more frequently than during the rest of the year. Furthermore, the places chosen exhibit some peculiar idiosyncrasies not representative of their national environments. In Italy, Venice was picked as the "major urban metropolis." With roughly 300,000 inhabitants and an economy strongly skewed towards tourism, Venice is very much unlike other Italian urbanizations. In Spain, both the Basque and Catalan regions are known for fierce separatist movements that seek to uphold local culture against "Spanification." Both regions prize themselves on their independence and difference from mainstream Spanish culture.
Despite these mischaracterizations, the study still resulted in a better understanding of Intel's potential European customers. The "domains of significance" that emerged from the ethnography should help position products in these markets. However, the author's own concern, voiced in one of the concluding paragraphs, that "finding ways to make our ethnographic work relevant" remains the biggest challenge, points to the hurdles open-ended, qualtitative research faces in an industrial setting.
Past, Present, and Future of User Interface Software Tools, Brad Myers, Scott E. Hudson, Randy Pausch, ACM Transactions on Computer-Human Interaction, March 2000, pp. 3-28
The authors offer a clear and well-structured exposition on which user interface tools have worked in the past and which ones have not, along with reasons for the successes and failures. (Relative-to-human) scale emerges again as a factor strongly influencing design. Because of the success of ubicomp devices, the authors argue that UIST will once again become an important research topic as the mature, relatively monolithic set of user interface principles dominating the desktop computing environment cannot be transferred to devices of different scale. In the section "Future Prospects and Visions", development trends for the next couple of years are sketched. It is noted that "user interfaces are becoming more cinematic". I strongly dislike the use of gratuitous animation in user interfaces - the cinematic components often force the user to abandon her own internal timing for interacting with a software and adhere to the developer's notion of speed. Many custom graphical interfaces seem sluggish to experienced users.
Another quote: "Furthermore, as people increasingly distribute their own computation across devices [...], there will be an increasing need for people to communicate with _themselves_ on different devices." I couldn't agree more. I work on at least three different machines every day and data synchronization has turned into a major headache. The trade off seems to be between working simple applications whose data can be synchronized easily (e.g., plain txt-files) and using complex applications with complex file formats that are troublesome to synchronize, especially if you modified versions of the file on more than one machine between syncs. A larger question is whether we want our data stored in a distributed or centralized fashion. Centralized servers assure consistency but may be more vulnerable to security risks. Also, what if you need access to your files but are temporarily off line? Maybe we can think of individual devices acting as local data caches for server-side storage. Then we could exploit existing caching algorithms from computer architecture to ensure data consistency.
Finally Myers et al mention designing for older adults. I agree that this topic needs to be addressed. You can already find pocket calculators, land line phones and wrist watches optimized for use by people with diminished perceptual skills. But trying to find a cell phone that is easily operatable by the elderly is near impossible.
Natural Programming Languages and Environments, Brad A. Myers, John F. Pane, Andy Ko, To appear in Communications of the ACM
As in previous papers, the idea of building "natural" interfaces is stressed. Here though, an attempt is at least made to define what naturalness means. However, two conflicting definitions are given. In the opening paragraph, natural is identified with "closer to the way people think about their tasks." Later on, natural becomes "faithfully representing nature of life". To me, these statement are not at all equivalent. The paper mentions the Alice graphical programming environment. In 2002, I was a teaching assistant for a class called "Virtual World Design" in which groups of students developed a series of small interactive 3D worlds in Alice. The main complaint from the students was that while basic tasks accessible through the user interface where quickly completed, any deeper programming was frustrated by the developer's efforts to hide the internals of the engine from the user. In the author's own terminology, the system had a low threshold, but also a low ceiling. Accommodating the different requirements posed by novices and expert programmers at the same time is likely to remain a challenge.
Thoughts on:
Anne Marie Piper & Nirav Mehta Tangible Bits: Towards Seamless Interfaces between People, Bits, and Atoms, Hiroshii Ishii and Brygg Ullmer, CHI 1997: ACM Conference on Human Factors in Computing Systems, pp. 234-41
Ishii and Ullmer outline their vision of tangible user interfaces - TUIs - that aim to leverage the haptic intelligence and processing capability which we have gained through interaction with physical objects for human computer interaction. Key concepts are 'interactive surfaces' (think Weiser's Tabs, Pads and Boards) that provide for manifold interfaces between physical and virtual worlds; the 'coupling of bits and atoms' which refers to the linking of computational processes to physical objects; and 'ambient media' which are concerned with peripheral perception as viable communication channels.
Beyond particular implementations, two fundamental design dichotomies (not exacly the right word) are at the core of this paper - figure/ground separation and generalized vs. specialized artifact design. The figure/ground distinction arises from selective human attention which qualitatively separates that which is attended to from its environment; it reoccurs as a basic constituting principle in the visual arts, music and probably any other field related to human perception. A recent example of amplifying figure/ground difference to increase UI legibility is Apple's use of dynamically resizing application windows that shrink and fade as they are "backgrounded". There seems to be a direct link between figure/ground distinction issues and Bayesian processes (conditional probabilities) at work in our perceptual system(s). Something to ponder in the future.
The generalized vs. specialized design question may help explain the uneasy position HCI has assumed within the computer science community. Predominantly, design and invention of physical artifacts has been concerned with building specialized tools optimized for helping humans perform one particular task. HCI folks follow in these footsteps by taking a user centric and thus application centric view. The mathematical tradition in contrast is based on axiomatic methods that are concerned with establishing generalizable results - while applications of these results to real world problems are accepted as welcome side effects, they do not represent the core problem. Arguably, the computer arose out of that tradition as a blank-slate general purpose device, capable of any number of feats, but optimized for none. The principal difference in approach inherently makes one group skeptical as to the merits of the other group's work.
A few more scatted notes:
Bimodal input seems to be indeed an important research direction. However, it is really independent of the question of whether phicons are useful or not. The two issues are muddled in the text. I also think there are two related yet distinct concepts to discuss in the ambientROOM: peripheral perception versus multimodal perception. Peripheral input can be restricted to one modality (presenting material in the edges of a large screen) while multimodal signals can all compete for the user's attention. The association is by no means mandatory. I can furthermore imagine that extended periods of multimodal interaction are tiring for users because of the greater cognitive load. The question of necessary rest periods or pauses comes up - how can one stop the constant information stream in an ambientROOM? On a desktop computer, you just have to lift your hands off the keyboard and look away from the screen. Finally, some nitpicking: the use of audio in the ambientROOM appears naive. The sampled sound of a raindrop - the chosen network activity indicator - is not a neutral signal. It has a complex and rich spectral structure in itself that carries connotations. An analogy in graphics would maybe be to use an intricate ornamental stencil mask or stamp to indicate every single data point in a graph plot.
Knowledge-Based Augmented Reality, Steven Feiner, Blair Macintyre, and Doree Seligman, Communications of the ACM, July 1993, pp. 53-62
Since the response to the previous article was overly elaborate, I'll keep it short here: The authors put their work in an interesting relation to Weiser's ubicomp vision - instead of embedding the computers in the environment itself, they simply project or overlay computer information onto the spaces we look at in the environment. The approach is much more economical and sidesteps the networking infrastructure required to integrate ubicomp devices. Reintroducing networking, I see promise for using augmented reality systems for cooperative tasks where groups of people need to work together. Each group member can have their own instructions shown, but all instructions can be synchronized. In this way, the complex sequential nature of team work (e.g., aircraft maintenance) could be accounted for. Late in the paper there is a reference to the utility of sound - the fact that we can hear things that are outside of our field of vision could be used to direct user attention to items in the real world that are relevant for his task but not currently visible. Generating 3D sound sources is an easy problem in psychoacoustics.
Reinventing the Familiar: Exploring an Augmented Reality Design Space for Air Traffic Control, Wendy E. Mackay, Anne-Laure Fayard, Laurent Frobert and Lionel Médini, CHI 1998: ACM Conference on Human Factors in Computing Systems, pp. 558-65
A very thorough exposition how close cooperation between users and designers in iterative cycles results in a superior understanding of the underlying task. To say it with the authors: "exploration of the design space is essential." For such critical systems as air traffic control, reliability becomes a major issue. What about fallback possibilities if the software fails? Keeping the paper strips in the work process enables a gracious degradation in case of software bugs. Related link: "Glitch Grounds U.K. Air Traffic"
http://www.cbsnews.com/stories/2004/06/20/world/main624974.shtml
Again, peripheral perception emerges as an important interaction variable. The Stanford iRoom with its big smartBoards could be used to explore aspecs of visual peripheral input.
Thoughts on:
The Computer for the 21st Century, Mark Weiser, Scientific American, September 1991, pp. 94-104
Positive/interesting points:
Weiser's reference to Polanyi's "tacit dimension" touched upon a topic I have been mulling about for a while. The concept of truly mastering a task/skill/activity by internalizing it through repetition beyond a barrier of conscious knowledge is found time and time again across disciplines and cultures (cf. Herrigel, "Zen in the Art of Archery") Conversely, note how hard it is to break out of established patterns, the "habitual mind" (is there a good citation for this term?). I know I am way off on a tangent here, but at least the paper was thought provoking. Larry Gross from the Annenberg School of Communication teaches (or used to teach) a great course entitled "Art as Communication" that spends a considerable amount of time on this topic.
My second item of interest is more directly related to the article's core message: interacting with many "boards" and "pads" on a daily basis may actually force the computer user to engage in regular physical activity. This is healthy. Ending the user's transfixion in front of the desktop PC could seriously lessen work-related health risks.
On the downside, ubiquitous computing seems to needlessly import some of the problems the physical world exhibits into the digital domain. Would you rather shuffle through a deck of "tabs" or use a search engine to find some files? Also, producing lots of limited-use electronics seems to be a waste of natural resources. Do we have a plan for how to disassemble and recycle these mini-computers when they start to fail or are deemed outdated? We cannot even take care of this task for our comparatively few full-grown PCs today (shipping the trash to China is not a sustainable solution). A meta-comment: the paper ends with a number of blanket statements such as "Computer access will penetrate all groups in society." that are not backed by any evidence. Irritating.
Charting Past, Present, and Future Research in Ubiquitous Computing, Gregory D. Abowd and Elizabeth D. Mynatt, ACM Transactions on Computer-Human Interaction, March 2000, pp. 29-58
What does it mean to build a natural interface? The authors mention writing as a natural action - but we all need to learn this skill through a long, possibly arduous process. Writing is common, yes, but natural? Similarly, playing a guitar does not resemble any other activity we routinely perform in our lives. Yet millions of people have learned how to use this "unnatural" instrument. Mastering these interfaces is difficult, but the inherent complexity also makes great virtuosity possible - it largely defines their value. (Upshot: we should not aim too low. Intuitiveness may be inversely related to usefulness)
I agree with the authors that scale is an extremely important attribute. Building devices that accommodate human scales is essential - issues of size, but also rhythm, nonlinear perception of time come to mind. Another good point: humans are effective - but far from perfect - recognizers. Computer recognition and context fusion techniques should therefore take an approach that incorporates the notion of uncertainty. Much of AI has already gone down this statistical route.
A little line on the importance of error handling caught my interest: how much research has been devoted not to the avoidance of errors, but to their constructive processing in interaction with the system user?
And yes, we need to introduce more associative models of information management. The image of a message morass perfectly described my own email account/folder mess. How can we visualize associative models appropriately? Graphs are good for showing connections, but don't mesh well with text.
I will post short responses to my reading assignments in HCI here from time to time. Most will not summarize the articles but rather take them as a starting point for various tangential thoughts.
As We May Think, Vannevar Bush, The Atlantic Monthly, July 1945
Bush, with striking insight, predicts that issues of knowledge management, meta-knowledges so to speak, will become the most important tasks to solve in the future. Key functions that science should enable are efficient extension, storage and consultation of the record of human knowledge. While enabling technologies are envisioned, the archival properties of these technologies are not addressed. Will his microfilm still be legible after spending 100 years on shelf? This problem is also underrated today - see the recent discovery that many CD-Rs will de-laminate and disintegrate after a few years.
Most of Bush's contraptions are directly bound to mechanical machines or chemical processes. The abstraction of the function of a particular device (think software) from the underlying architecture (think general purpose computer) has not taken place yet.
The Xerox Star: A Retrospective, Jeff Johnson, Teresa L. Roberts, William Verplank, David C. Smith, Charles Irby, Marian Beard, Kevin Mackey, IEEE Computer, September 1989, pp. 11-27
Through careful planning, a strong task-based focus, and meticulous attention to detail, the STAR team anticipated and introduced many lasting design features of office software systems. Part of the success was due to the developer's choice to not just rely on their own judgments but to leverage external expertise - graphic designers were hired and user studies were conducted. The article itself situates the STAR system very well in the general context of user interface research at the time, showing its lineage, but also concurrent competing technologies. As for criticism, it was uncanny how many of my immediate concerns about the system while reading the paper were acknowledged and addressed by the authors just a few pages later. There is the danger of pushing the desktop metaphor too far - the life of data is not like the life physical objects. Only allowing the user to act upon data in ways that have correspondences in the real world is limiting. There is also a problem of custom-tailoring a product too rigidly to an a priori model - what happens if the user's requirement profile changes - maybe as a function of becoming more computer proficient and reliant?
User Technology: From Pointing to Pondering, Stuart K. Card and Thomas P. Moran, ACM Conference on The history of personal workstations, 1986, pp. 183-98
Card and Moran outline a detailed "applied science of the user" - user behavior and processing capacity is rigorously studied. An interesting question at the level of their "conceptual interface": how can we as system designers/developers ensure that the user will have a reasonably accurate mental model of the system? The authors also point out the difficulty of aggregating scattered individual research studies from, e.g., psychology, into a unified computational model of human user behavior. An assumption is made that the user acts rationally to fulfill the given tasks. Is this always the case? When does irrationality come into play? Can we model it? Problematic on a technical level: repetitive use of terms that are overloaded with multiple, imprecise meanings in popular usage ("task") makes it hard to follow the flow of the argument at times.
Words, words, words:
London's Sunday Times Magazine has a great column titled "The best of all possible words". My two favorites from today's issue: "neoteny" (1|2) and "nyctophoniac" (1). Words are taken from this book (apparently only available in the UK).
It is acceptable in American English to use apostrophe+s to indicate plurals of abbreviations, not just possessives of nouns and indefinite pronouns or contractions. Two examples from today's NYT:
"No Wonder C.E.O.'s Love Those Mergers"
"As Smith sang, however, the sadness that flooded his five CD's swamped the room"
It looks very awkward to me.
What is the difference between impudence and insolence?
[written for and published at textone.org]
in mid-june, i visited berlin for the launch of the german creative commons licenses. as part of the launch panel, i gave a brief presentation about how CC licenses are already pervasively used in the netlabel scene today. in preparation, thinking about how to best explain the motivations behind the quick adoption of the creative commons model by artists and labels online, i came up with three good reasons for three distinct constituencies - promotion for the artists; community building for the music scene today; and future-proofing for the scene of tomorrow - which i will describe below. while i hope to hit on some unifying themes, i do not want to overgeneralize and cannot claim to represent other artists or labels. these are my personal convictions and i do encourage everyone to contribute their own opinion in the discussion thread linked to this article. most of what follows here is based on CC's music sharing license, with exceptions explicitly noted.
reason #1: promotion
this is the most direct and self-serving motive for the producer: releasing music online and allowing listeners to share that music with others has to make sense for the artist, otherwise the model will not find widespread use. well, for independent niche music it does make sense. most of the artists creating the kind of minimal electronic music we promote at textone cannot live off the profits from releasing records alone. many, if not most, of the physical records put out by small indy labels barely break even. instead, most income from music is generated by performances in clubs or other concert venues. however, to get booked, artists needs to build a reputation through a discography first. releasing records then is more about reputation than about direct remuneration. but one can actually reach a larger audience by publishing works online for free. listeners are more likely to seek out new material if this comes at no cost to them and they will share the music with others if they are actively encouraged to pass the music on via file sharing networks, on cd, or however they want. this adds up to both wider and faster diffusion of the music. textone.org is evidence to the effectiveness of the strategy - our download numbers have by far eclipsed our previous vinyl record sales.
reason #2: community
communities live and die by the interaction between their members. innovation is facilitated by having a sense of what already exists. creativity in general never arises out of a void - it always incorporates prior experience and exposure. to build a vibrant, innovative, creative music scene we should foster interaction with each other and encourage artistic exchange. CC licenses create a positive, conducive environment for doing so. to clarify why this is the case, let me contrast the netlabel scene with the mainstream music market: we are not interested in creating the kind of artificial distinction between producers and consumers that is promoted there. we are not interested in building one-way pipelines that push out products conceived by the marketing masterminds to the lowly masses. in electronic music, where the means of production are available to nearly anyone with a computer, each listener is also likely to be a producer, or to turn into one in the future. the distribution system for such a kind of music should therefore reflect this equiposition of artists and audience. by building a system based on respect and trust rather than intimidation and litigation, a fair and open licensing scheme such as CC creates the positive base for future interaction. by contributing to the catalog of publicly available material, CC licensed material also facilitates creative exploration. one of the responses to my last article made the point that most physical records are simply not available and/or not affordable in south america. i have heard similar complaints from friends in eastern europe. by using free non-commercial distribution, we can build a truly international community where membership is not contingent upon living in the western, industrialized part of the northern hemisphere, or upon having a big bank account.
i should mention though that one of the most important steps towards true open collaboration has not been widely adopted thus far: the permission to create derivative works. a blank license for remixing and otherwise altering existing works would surely spawn a wide range of interesting projects. however, it also raises some thorny issues about attribution which especially already established artists worry about. since reputation is the the main currency in the scene, having your name attached to a re-made track that you do not approve of is not an enticing thought. maybe derivative licenses are not for every work, but we could sure use a few more offerings.
reason #3: future-proofing
time for a reality check: how many of today's netlabels will still be around in five years? hopefully a good number, but almost certainly not all of them. how about in 15 years though? or 50? the indy market has always been characterized by a high fluctuation rate brought about by economic pressures (go pro or go broke), among other factors. therefore we should already think today about what will happen to our music tomorrow, if/when particular artists or labels are no longer around. picking up the thread of art always arising from the history of prior creations, we should be interested in making sure that future generations have full access to the music we create now. creative commons licenses ensure that this happens. many works published under the restrictive traditional copyright regime are in danger of being "orphaned" for an obscenely long time if the exclusive copyright holder dies or disappears. without a legal way of distributing and sharing these works, most basically disappear from the public's collective memory for so long that they are unlikely to be remembered/retrieved after the copyright has expired. in contrast, any work released under a creative commons license that allows for non-commercial distribution is more likely to survive since any single copy can legally spawn a future "re-release". as long as some user somewhere still has one copy of a CC work, the art is not lost - no matter if the artist is still around or not. longterm initiatives like the internet archive, which only offers material in the public domain or licensed under a non-commercial distribution scheme, increase the chances of a transmission of our work through time. thus a sense of history and continuity is created and we avoid depriving the future of the achievements of today.
the idea here is to take the practice of peer-reviewed publishing known from academic journals and apply it to a netaudio label. a pool of competent authors (i.e., artists, musicians themselves) judge a pool of submitted songs for inclusion in a compilation. an editor would assign multiple reviewers to each song, and each reviewer would receive a subset of the submitted tracks. note that this is different from collaborative filtering where the audience expresses preferences for already published material. issues to resolve include: anonymous submissions, (?); other aspects that may be translated: invited papers, book reviews. the netaudio competitions hosted by (ref?) and soulseek already incorporate some of the aspects of a peer reviewed netaudio journal.
The New York Times just published its annual summer reading suggestions online here. My personal non-fiction picks, from reading the abstracts, are below:
A CONTINENT FOR THE TAKING: The Tragedy and Hope of Africa. By Howard W. French.
THE CREATION OF THE MEDIA: Political Origins of Modern Communications. By Paul Starr. (Basic Books, $27.50.)
EATS, SHOOTS & LEAVES: The Zero Tolerance Approach to Punctuation. By Lynne Truss. (Gotham, $17.50.)
ON PARADISE DRIVE: How We Live Now (and Always Have) in the Future Tense. By David Brooks. (Simon & Schuster, $25.)
SOMETHING FROM THE OVEN: Reinventing Dinner in 1950s America. By Laura Shapiro. (Viking, $24.95.)
In the next two weeks I will be speaking/performing at two internet culture conferences - Free Bitflows in Vienna and Wizards of OS3 in Berlin. For a quick introduction I wrote about these events (in German) see http://www.phlow.net/arc/001032.php. The Free Bitflows presentation and all associated links will be at http://bjoern.org/freebitflows/.
(to be revised)
Paul Ekman, "Emotions Revealed", Owl Books ISBN: 080507516X
Ekman is an authority on the display of emotion on the human face. This mass market paperback summarizes his research findings as well as his personal theories on emotion for a general public. While the topic is engaging, the writing is somehwat repetitive and often wordy in a colloquial way. The book could have been condensed significantly.
The first part, chapters 1-4, outlines Ekman's trajectory as a researcher and introduces his understanding of emotions. A visit to New Guinea in the 1960s was a formative period for Ekman's research focus. According to him, there are a few basic categories of emotion that have universal corresponding expressions in the human face. Emotions are short term events that are triggered and take control over our thought process until the end of their refractory period. Individual differences may be large, but emotions in general cannot be completely controlled. Ekman's suggests to develop an emotional consciousness he calls _mindfulness_, in which we cannot stop our emotions but we do make the choice of acting on the emotions or merely observing them.
In the second part, starting with chapter 5, Ekman describes his basic categories of emotion: sadness and agony, anger, surprise and fear, disgust and contempt, enjoyable emotions. For each emotion (or pair thereof), Ekman introduces situations in which the emotion may arise and exhaustively lists variants of the emotion. He then asks the reader to induce the emotion in herself and to take note of physiological changes cuased by the arousal. Next, facial expressions linked to the emotion are presented in a lego-like system of independent components. Ekman describes how presence of various components can be caused by different shades/variations in the underlying emotional state. The last subsection of each chapter, entitled "using information from expressions" repeatedly stresses the point that while we may be able to recognize an emotion from its signal, we cannot infer the cause of the emotion. for this step, context has to be taken into account. The author also discusses how to react to perceived emotional signals, often suggesting not a direct acknowledgment that the emotion was perceived, but an invitation to elaborate on the topic potentially having caused the emotion. For anger, this elaboration should not be immediate to avoid escalation.
The conclusion (not yet read) offers guidelines on "living with emotion" and tests the reader's sensitivity to facial signals of emotion.
From Business Day, South Africa, South Africa - 30 Dec 2003
"South African police have warned that they will crack down hard on anybody lobbing items such as old refrigerators from high-rise buildings over the New Year ..."
...a couple of days old, yet still hilarious.
[DRAFT VERSION]
While recently on vacation in Tanzania, I found myself listening in on a conversation of a group of anglophile retirees reminiscing about their time working in the Middle East 30 years ago. Somehow the topic of discussion turned on cars and I was astounded to hear that my hosts still remembered exactly in which year which particular car model was released; how many units were sold when and what modifications were made in subsequent editions. They probably even knew the whole list of available factory paint finishes. For me on the other hand the automotive market is a vast ocean of sameness that I do not care to delve into. (That was not always the case - I remember poring over catalogs from local dealerships when I was ten years old, but this little detail completely detracts from my main point, so please erase this entire sentence from your short-term memory.) What is important is the huge generational discrepancy of interest in cars that I perceived. Now, this might have simply been due to the fact that these people were a bunch of car nuts while I am a flag-bearer for public transport, but such a cul-de-sac of common sense argumentation would stop my the entire train of thought here and I would have nothing more to write about. Instead, allow me to take you on a mental detour to an alternative explanation that maybe far-fetched but makes up for it in overly broad ambition.
Leaving the particulars of the occasion aside for a minute, it seems to me that each generation has a) a formative period for establishing a certain world view which subsequently is clung-to by its constituents even as technological progress transforms everyday life; and b) a specific dominating social paradigm that controls notions of what is important/relevant during the formative period.
Cars have been around since the end of the 19th century; but only after WWII did they come within reach of the average consumer. The paragidm associated with automobile proliferation was that of individual transportation - one had the means to go wherever one wanted, whenever one wanted. And not just around the block - across the country if desired. It makes sense then that the cars as objects AND agents of this positive tranformation should have become subject of adoration/intense study/etc of the generation for which their wide-scale availability was a novelty.
For my age set, the convenience of ubiquitous means transportation has always been taken for granted. The car is a commodity product, but only one of a range of possible mobility options. In fact, in urban areas, it has now often become a quite inefficient (slow and expensive) method of navigating over-crowded centres. Amsterdam has twice as many bicycles as it has registered cars. In New York, most of the city's inhabitants don't even have a driver's license. For me as a mid-20s city dweller, the car does not have the charismatic power of a dominant technology anymore. Instead, predictably, it is communication/information technology that most dramatically transforms my life. Accordingly, I can tell you much more about the history of the web and the rise and demise of various file sharing applications than about V8 versus V6 engines. To take the point even further, I know more about ways to encode and decode audio files than I know about handling any real world object. Vegetables?Cooking? Ah, let's just order take-out. [CONTINUE]
From here I can already see myself jumping off to another point about our new existence in a world of excessive information - the shift in retrieval and filtering methods brought about by instant access to vast amounts of data, but also the daily overload of information impinging on our feeble minds. Ah, even the headline emerges: "I don't recall: How Google replaced my short-term memory."
Yet another topic: Mass markets, economies of scale, and the resulting decline of specialized solutions - are we losing most of our practical knowledge that took so many centuries to acquire? - What are our survival chances if electricity did not exist anymore from tomorrow on? Are we putting to many of our eggs into a virtual model of a basket that disappears as soon as the power drops out?