Evaluation Techniques for Interactive Systems.

9 min readApr 16, 2022

What is Evaluation👓?

Have you ever think about how many time a day that you interact with software products? Starting with turning on your smart phone screen to see the time, to the situation that staring at your monitor for hours with your word processor or IDE to finish your collage or company project, we interact with software products uncountable times per day. So since this interacting with software product takes a considerable time in our day it is a must to have a pleasant and productive interaction with these product.

Let me explain this with a very simple example. Who wants to spend half of their day with an annoying, stressing, cranky person? That’s why people learn about soft skills to interact with each other more pleasantly and productively. Same thing applies for software products too.. Nobody wants to spend their time on frustrating, inefficient and annoying software product. That’s why we need to ensure the interaction between human and computer device is much more effective, efficient and also enjoyable 😉. That’s where this ‘Evaluation’ placed in. Evaluation is the process of assess designs and systems to ensure that they actually behave as designers expect and meet their users’ requirements.

Goals of evaluation

Now we have the basic idea of the need of this evaluation process which is, to ensure that the interaction between human and the product is effective and efficient. But to be more specific and precise we can divide the goal of the evaluation in to three points.

1. Assess extent of system functionality (📚 Vs 💻)

You know that software products are introduced to a situation to increase the productivity of human tasks. Therefore the process of getting things done with a software product must be easier than its usual way. This includes not only making the appropriate functionality available within the system, but making it clearly reachable by the user in terms of the actions that the user needs to take to perform the task.

And also there was a way that people have done the same task before software product came in to the picture. So there should be a similarity between these two tasks because in that way people can easily get adapted to interacting with software product for getting their task done.

2. Assess effect of interface on user 🎭

It is very important to assess the user’s experience of the interaction and its impact upon user. Since this interaction happens to help human it should satisfy user expectations in all possible ways such as how easy the system is to learn, its usability, enjoyment and emotional response, specially in the case of systems that are aimed at leisure or entertainment.

3. Identify specific problems 🔍

It’s human who use these systems. It’s very natural to make mistakes for humans. So the system should not make them frustrated over small mistakes they made. Also the system should not confuse the user by its behavior. This is related to both the functionality and usability of the design.

Now let’s get an idea about how this evaluation process happens. There are basically two ways of doing these evaluations.

Without the participation of the real users. (Evaluation through expert analysis) 🕵️‍♀️
With the participation of the real users. 🙎‍♀️

These two categories have its own pros and cons. Lets discuss them with more details to get an clear idea.

1. Evaluation through expert analysis

This evaluation process happens without involving the real users. You may see this as a drawback but there are very efficient methods that used in this evaluation process that helps to build a quality interaction between human and computer.

A) Cognitive Walkthrough

This method was proposed by Polson et al.

This is one of most efficient and extremely cost-effective way of increasing the usability of the system. Most users prefer to do things to learn a product rather than to read a manual or follow a set of instructions. So with this evaluation it is assured that the design is easy to pick up by a novice and takes less time to become and expert in using the design.

How to conduct a Cognitive Walkthrough?

An expert ‘walk through’ each and every possible paths of the design to understand what are the possible problems that a user can face. This expert must think in the perspective of a potential user to increase the evaluation result correctly. Therefore the person who conduct is an expert in cognitive psychology.

For each task walkthrough the expert should considers about ,

🔸 What impact will interaction have on user?

🔸 What cognitive processes are required?

🔸 What learning problems may occur?

The expert must have a specification or prototype of the system, description of tasks and also written list of the actions needed to complete the task with the proposed system to do this walkthrough.

A sample cognitive walkthrough

B) Heuristic Evaluation

This evaluation method was proposed by Nielsen and Molich. There are well defined 10 usability heuristics. The design examined by experts to see if these are violated (3 to 5 enough).

C) Review-based Evaluation

This is a method that relies on experimental results and empirical evidence from the literature (for instance from psychology, HCI, etc.) in order to support or refute parts of the user interface design.

This is a model based evaluation method. Which means using a model of how a human would use a proposed system to obtain predicted usability measures by calculation or simulation.

This method can be used to filter design options. Design rationale can also provide useful evaluation information in that filtering process.

2. Evaluation through user participation

These are the methods that involve real users to evaluate the design. There are two styles of involving users in the evaluation process. As we discussed before these two styles also have it’s own pros and cons. Let’s discuss about them in details now.

Laboratory studies 👩‍🔬 (Those performed under laboratory conditions)

In this case users are invited to special environment to take a part of evaluation process. The main advantage in here is availability of specialist equipment for the evaluation. (Contain sophisticated audio/visual recording and analysis facilities, two-way mirrors) But of course as you can imagine there is a main drawback because sometimes it’s hard to imitate the real user environment inside a lab environment. But if system location is dangerous or impractical for constrained single user systems to allow controlled manipulation of use this style is more valuable.

2. Field studies 👩‍💼 (Those conducted in the work environment or ‘in the field’)

Ability to conduct the evaluation in the natural environment is the biggest advantage in here. But there can be distractions such as high levels of ambient noise, greater levels of movement and constant interruptions, such as phone calls. This is highly appropriate where context is crucial for longitudinal studies.

Experimental Evaluation

controlled evaluation of specific aspects of interactive behavior.
evaluator chooses hypothesis to be tested.
a number of experimental conditions are considered which differ only in the value of some controlled variable.

• changes in behavioral measure are attributed to different conditions

Subjects — who — representative, sufficient sample
Variables — things to modify and measure
Hypothesis — what you’d like to show

• Experimental design — how you are going to do it

Observational Methods

Think Aloud

In this method user is asked to performed a task and asked to describe what he is doing and why, what he thinks is happening etc. Relatively simple but can provide useful insight with an interface and show how system is actually use. But there is a major disadvantage here which is this answers are based on the user(subjective).

2. Cooperative evaluation

“Cooperative evaluation” is a think aloud variation in which the user is urged to perceive himself as a collaborator rather than just a subject in the evaluation. The evaluator can ask questions like “Why?” and “What if…..?” in addition to encouraging the user to think aloud; similarly, the user can ask the evaluator for explanation if problems arise. This more laid-back approach has a variety of benefits. It is less limited and hence easier for the evaluator, who is not compelled to sit in solemn quiet, the user is encouraged to actively criticize the system rather than passively accept it and the evaluator can clarify points of uncertainty, maximizing the approach’s effectiveness.

In here usually not the designer who is the evaluator, but an independent person.

3. Protocol analysis

Protocol analysis is one of the most successful approaches for evaluating an information system’s usability and determining which components of the system should be modified to increase usability. “ The “protocol” in protocol analysis refers to the comprehensive recording (in textual, audio, and/or video form) of a user’s interaction with a system while that user “thinks out loud” so that his or her perceptions, reasoning, and reactions to the system can be recorded. The “analysis” is provided by the researcher, who evaluates a number of these protocols (3–5 for each type of user) and draws judgments regarding system characteristics that cause problems for users.

4. Automated analysis

Workplace project

Post task walkthrough — user reacts on action after the event — used to fill in intention

Advantages — analyst has time to focus on relevant incidents — avoid excessive interruption of task

Disadvantages — lack of freshness — may be post-hoc interpretation of events

5. Post-task walkthroughs

In here transcript played back to participant for comment. If this happens immediately they are fresh in mind but delayed evaluator has time to identify questions. This method is useful to identify reasons for actions and alternatives considered. This method is necessary in cases where think aloud is not possible.

Query Techniques

Interviews

Let the real users talk is the main point of an interview. Usually this conversation is based on prepared questions. Sometimes these questions are open ended which allow users to express their ideas and comments freely but sometimes respondents are asked to choose an answer from a fixed series of options given by the interviewer. (This form of interview is very similar in form to a closed questionnaire.) This type of structure yields information which is easily quantified, ensures comparability of questions across respondents and makes certain that the necessary topics are included. But this may prevent respondent expressing their true intentions. Therefore the interviewer must set some balance between making those questions.

This is a very suitable way to explore issues relatively in a cost-effective way. But the answers that are getting by the users can be very subjective sometimes. Also this method is somewhat time consuming.

2. Questionnaires

This is quick method that can applied for a large set of users. The method is giving set of fixed questions to users. Those questions can be,

📝 general

📝 open-ended

📝 scalar

📝 multiple choice

📝 ranked

You might feel that this is not very flexible way like interviews. But in this method data that are collected can be analyzed more rigorously.

Evaluation through monitoring physiological responses

Eye Tracking

Eye position is one of most important point of user when it comes to evaluation process. Head or desk mounted equipment tracks the position of the eye. Eye movement reflects the amount of cognitive processing a display requires. In here the measurements that are collected include,

👁 fixations: eye maintains stable position.

👁 Number of fixations :- The more fixations the less efficient the search strategy

👁 Fixations duration :- Indicate level of difficulty with display

👁 Saccades :- rapid eye movement from one point of interest to another

👁 scan paths :- moving straight to a target with a short fixation at the target is optimal

2. Physiological Measurements

In here we measure emotional response linked to physical changes. These measures help determine a user’s reaction to an interface.

Measurement that are taking in here as follows:

🧡 activity, including blood pressure, volume and pulse.

🙌 activity of sweat glands: Galvanic Skin Response (GSR)

💪 electrical activity in muscle: electromyogram (EMG)

🧠 electrical activity in brain: electroencephalogram (EEG)

But there are some areas that needed to done more researches to get an useful measurements because of difficulties in interpreting the responses.

So, hope you get an clear idea about evaluation techniques for interactive systems with this article. Until we meet with another interesting article, be motivated and keep learning! 😊

Waruni Lalendra,

Software Engineering undergraduate,

University of Kelaniya Sri Lanka.