APP Behavior


Three major characteristics are inherited from Zenbo's interactive behavior: Variations based on the distance of interaction, using voice as the primary means of interaction, and conveying emotions. The functional design of apps should also comply with the following principles. These principles can be summarized into six essential aspects:

  • Zenbo is a family member with humanistic qualities.
  • Zenbo is capable of expressing a wide variety of emotions.
  • Zenbo is very proactive. He will inform the user of any important information or ask the user if actions should be taken rather than just wait for a command passively.
  • Zenbo is mobile and can take the initiative to locate a user to provide services.
  • Zenbo's distance of interaction is much farther than that of the average smartphone or tablet device, and voice commands are used as the primary mode of communication. The conventional user interface is only for secondary use.
  • Similar to a normal dialog between people, each instance of interaction with Zenbo lasts a relatively short period of time to avoid lengthy monologues. More importantly, do not treat Zenbo as a tablet computer with extended periods of touchscreen operations.


If Zenbo is unable to understand a command issued by the user, one of two possible courses of action can be taken. If there are at most 3 possible options and all are simpler commands (meaning that they are single-word commands or simple phrases rather than complete sentences), then Zenbo will attempt to ascertain the user's intention based on spoken commands. If there are more than three options, or if the commands are more complex, Zenbo will provide hints on the screen by displaying all possible interactive commands for the current step, so that the user will know how to communicate with Zenbo through voice commands.

If a voice response is expected from the user but none is provided after 12 seconds, Zenbo will conclude that the user may not know how to respond and will proceed to provide either voice or UI-based hints. Meanwhile, the user may access hints while under any state within the app by issuing the "Tutorial" command. For this reason, the developer must ensure that hints relevant to each state are available to the user via voice or the UI hint.


In voice interface, command may not be easy to know for first-time users. Therefore, we recommend that all apps should have tutorial sessions when used for the first time.

The tutorial must follow the following patterns as a dialog between Zenbo and user:

  • Most common voice command to enter app.
    For example, “Timer”
  • (If applicable) Advanced voice command to enter app. Usually it includes all commands with additional information. Variables should be in different color.
    For example, “ 5 mins ” or “ 20 mins ”
  • App functions. These should also start with Zenbo’s instructions, followed by lists of available voice commands. If the voice commands are GUI interfaces’ responses, a full-screen GUI interface is shown first, it and then transforms into a thumbnail in voice commands dialog.
  • Ending page. At this page, user should be able to select “Replay tutorial ”or “Start” to start using this app.
    ※ Please remember the users’ initial command and guide them to target page after tutorial.  Unless the user did not specify his intention (eg. launch app by tapping icons), do not jump directly to a default landing page after tutorial. It may be against the users’ will and may cause confusion. If possible, remind users with their initial command because they might forget after a long tutorial.

Here is an example of standard tutorial.


Voice Interface

All of Zenbo's interfaces are primarily voice-based. When designing a voice interface, please adhere to the principles listed below:


It is necessary for Zenbo to reflect his personality. Although each app may be unique in its functionality, consistency in terms of Zenbo's personality is still required as he is a member of the family.

  • Zenbo is neither a machine nor an interactive voice response system.


    "The system cannot continue due to error #3012. Please say '1' to close or '2' to contact customer service"


    "Zenbo doesn't know how to do this. Would you like me to contact customer service?"
  • Zenbo is very polite and does not use profanity


    "F***, how many times do you want me to repeat it?"


    "Would you like Zenbo to repeat it?"
  • Zenbo is very enthusiastic and always tries to help others


    "I can't help you with this.”


    "I can't seem to locate the information you're looking for. Should I go online and try to find it on the Internet for you?"

Voice Interaction

Whenever Zenbo hears the call "Hey, Zenbo", he will send out an audio prompt to signal that the user can begin issuing commands to him. The length of a voice input is set to 12 seconds; when the time is up the user will hear an audio prompt that signals the end of the input period. When composing voice interactions with Zenbo, please follow these guidelines:

  • Keep it brief and clear; try to limit the length of Zenbo's utterance to 12 seconds or less, or within three sentences.


    "You've entered the voice recording function. Please record your speech after the beep. When finished just tell me to 'stop recording' and that's it"


    "Ready! Begin recording after the beep. Say 'stop recording' when finished."
  • Provide hints in the question as to how the user should respond. It is best to include the possible responses within the question itself. Avoid open-ended questions


    "What additional adjustments do you require?"


    "Please tell me if you'd like it to be a little higher, lower, or as is?"
  • Ask only for a small amount of information in a single dialogue


    "What are the destination, arrival time, departure city, vehicle type, and desired fare range for your ticket?"


    "What is your destination?"
    "What is the arrival date and time?"
    "I've located the following. You can narrow down your choices by ticket price or vehicle type"
  • Avoid reading text as displayed on screen (unless it is configured to read articles or web content aloud). Voice and the UI should be complementary rather than duplicating the same information. Speech can generally be expressed with a bit of emotion and liveliness.


    "Please select the name of the album you wish to listen to"


    "Oh, which album are you in the mood for?"
Go To Top