By Preston So
When was the first conversation? What WAS that first conversation? Spoken language has left no trace on our planet. Unlike some of the evidence we have for early humans that indicate the kind of cultural mannerisms that really characterize the human species today, we don’t have any evidence of when language became part of the human condition.
About 1.5-2million years ago there was the control of fire by humans.
Earliest writing came around 3-4 millennia ago.
But there is a big abyss here between the beginning of controlled fire and the beginning of writing.
The origins of spoken language have left no trace.
We don’t know what the topic of the first conversation was, what language was used, where it happened…
Unlike spoken human language we do know quite a bit of the origins of how computer and machines started to speak. Conversational interfaces have long been part and parcel of computer systems for a long time. But one of those things that hasn’t happened is the development of voice interfaces and voice interactions.
When it comes to the true proliferation of voice interfaces, those only appear starting in the early1990s as part of interactive voice response (IVR) systems built on top of dictation programs and in-car systems. They were difficult to build, not affordable, and very complex.
But now the world is almost unrecognizable in that we have a litany of devices and frameworks that have democratized voice interfaces unlike ever before. We now have the ability to program voice assistances, leverage frameworks like VoiceSML and Dialogflow.
It’s never been easier to build a voice interface, and now the audience we serve in our industry is beginning to catch up.
As of June 2019, about 25% of all American adults have a smart speaker at home. The adoption of voice interfaces may accelerate thanks to the pandemic.
He believes we’re witnessing the beginnings of the growth of voice interfaces that will lead to a ubiquity of the delivery of content.
Voice interfaces are essentially human interfaces. They leverage the prime currency of how we communicate today.
What makes our human conversations so compelling?
What exactly is spoken content, and how does that jumble up all of the preexisting paradigms we have around user experience and design.
What is in a conversation?
When It comes to conversation we know it’s a back and forth. Conversation is primordial. It is deeply engrained within the human experience. It is one of the most fundamental aspects we can recondition.
“Conversation is not a new interface. It’s the oldest interface. Conversation is how humans interact with one another, and have for millennia.” – Erika Hall, Conversational Design
If you take a good look at controllers like keyboards and mice and game controllers, you’ll see how artificial they are. Not a single one of us was able to leave infancy and know at the root how to use a mouse or a video game controller. These behaviors are learned, they are taught.
In the case of earphones, it’s fairly clear what that interface serves in terms of a purpose but how do we know that there’s a volume control or to change the size of the ear piece if it doesn’t fit our inner ear? We also have to learn how to use touch screens and to do different actions to achieve different things.
Augmented reality and virtual reality also require us to have some knowledge of how to use technology to engage with those experiences.
VOICE IN CONTEXT
We must distinguish between VOICE and other conversational interfaces.
Written conversational interfaces like chat bots and Slackbots still require the use of a learned interface: keyboards.
Multimodal conversational interfaces contain both visual and aural components to aid the user experience but they are still artificial interfaces. Ex: Amazon Echo
Some of this artificiality have led to situations we may not be big fans of at the moment.
Voice interfaces are the only authentically human interfaces. All of us grows up learning how to speak or sign a language. Voice interfaces are the ones that require the least amount of learning. They don’t require us to teach each other about how these interfaces work.
But how do we teach machines how to have a human spoken conversation?
* Quantity – provides just enough information but you don’t want to overwhelm anyone with too much information
* Relation – stay on topic, stay relevant
* Quality – honestly, integrity, how do you keep up a conversation that maintains a modicum of truth? Be truthful.
* Manner – you want to be brief, orderly, take turns, make sure the other party has the ability to enter into the conversation, and be unambiguous.
* Politeness (Lakoff’s principle) – be polite. Is it pleasant? Is it a conversation we enjoy having? Is it a pro-social affair?
Preston is a trained linguist. There’s a figure in linguistics, Robin Lakoff, that said, “All language is political, and we all are, or had better become, politicians.”
She is famous for having invested the notion that gender and sexuality dictate the ways in which we use language and the ways in which language influences us.
Lavender linguistics investigates the way the LGBTQ community uses language.
Lakoff has framed a 5th conversational maxim (see above).
Computers don’t really care about any of these maxims that relate to the human experience. “People exploit their human-human conversational competence, but they don’t treat the machine as another human.” – Randy Harris, Voice Interaction Design
We KNOW there are certain cues that betray the mechanical nature of voice interfaces that are machine based.
A TAXONOMY OF CONVERSATIONS
The vast majority of all conversations fit into 3 rough types:
1. Transactional – Task Led
2. Informational – Topic Led
3. Social / Pro-Social – these are merely about expressing empathy for your co-human.
A conversation is rarely JUST one type, they often shift from one type to another, and back.
As a person very interested in content, and who plays in the sandbox of web content, what interests him are these investigative journeys to uncover the truth. The vast majority of conversations we have with voice interfaces and each other are either transactional OR informational.
How do we enable our machines to embrace the complexity of human conversation?
Voice Interaction Styles:
1. System-centric (only interprets queries and commands)
4. Conversational-centric (capable of natural, colloquial human conversation)
Conversation-centric design is simply unrealistic for anyone who isn’t Amazon. Most organizations simplify cannot afford to build a conversation-centric voice interface.
We need to shake off our web-only shackles in favor of channel agnosticism. An omnichannel content strategy means leveraging content-first approaches to lay a foundation for, dictate, and inform each individual channel’s approach.
Structured content – you want to be able to write content in a way that is predestined for various different formats and devices.
CMS’s can now handle teasers vs full cards vs full articles of content. The idea of remixing this content into a voice compatible format requires us to shed some of the formats we used to use.
Microcontent – Minimal, concise, repeatable, low verbosity, also works well in social media, voice, extended reality, digital signage, kiosks, etc.
Note: Some of the voice technologies we rely on are much lower level than HTML.
Voice content leverages all the sonic qualities of human speech.
Look for voice-friendly content. We can take a look at the voice-friendly content we already have and address the issues that need to be fixed to become voice-ready (structured content ready for parsing by voice interfaces).
With Voice Content, you have to keep things microscopic so that there’s no chance of a listener losing their place. Voice content displays a low verbosity tolerance and less retention than web content.
Content audits should be used in terms of an omnichannel approach. The core objective would be to place your content in other contexts with scrutiny on head scratchers and obstacles.
Content Auditing Techniques:
* Voice (read aloud content in contextless isolation)
* Extended reality (preview overlay content in real spaces)
* Digital signage (test display on a billboard)
* Smartwatches (read content in isolation in small viewports)
Look for link and calls to action, missing content and verbosity, if it is easy to transition between devices.
Calls to Action can really be problematic in a voice interface, because how can you ‘do’ that call to action on an Alexa, for example?
Voice interface users don’t have the ability to scroll up a bit to find the answer to their question. We have to recontextualize everything so it makes sense for those users.
Look at how users might interact with other devices to get information when interacting with a voice interface.
Acting on recommendations from an audit requires careful cross-channel consideration. Do esa revision for voices stymie an experience of content on the website?
YOUR VOICE CONTENT HAS AN IDENTITY, TOO
Voice interfaces are basically disembodied voices. They’re synthesized. Voice content and voice interfaces are among the only digital experiences where a fully fledged human identity is expected of the machine.
We don’t expect our keyboard or mouse to have an identity, but we certainly do when we talk to our machine.
Voice ‘assistants’ – when you hear Alexa, Cortana, or Siri, who is it you are picturing having the conversation with?
How do our voice interfaces uphold or destroy systems of oppression that we face today?
The fact that we associate voice assistance with white women is a reflection of intrinsic misogyny and racism in how we build voice experiences.
The way we build our interfaces reflects deeply how we view the real world.
Voice content adds to your responsibilities as designer and content strategist. Representation matters in voice interfaces, too! Representation matters in contexts where we need to serve more than our primary audience, but also the marginalized and oppressed audiences.
In voice, we are crafting a person with an identity. This identity is critical to how we engage and experience the interfaces we build.
Giving your content a voice can give the unheard a voice too.