Now we’re Talking! Voice is the New Buzz

Season 1 Episode 11 | 23 minutes


When Apple released its voice interface in late 2011, watching somebody shouting at their phone repeatedly was good comic relief. Many of us would wait for a quiet moment alone before daring to ask our phone a question.


Today, via the likes of Alexa and Siri, more and more of us are confidently using voice as our UI. Searching, calling, composing and sending messages, an interface that is arguably the most natural we have.

More than half of India’s Internet users are doing it. Voice search is seeing massive growth in Asia – India, China, and Indonesia being the top three. And importantly, voice is used more by the younger generation: under 35s are leading the change.


Why should we be listening to the talk about voice? What are the threats? What are the opportunities? What does it mean for the UX writer or chatbot developer?


In this episode Rew Shearer and Jam Mayer touch on the topic. And why not? Podcast is the new blog.

Episode Conversation

Topics that were discussed:

  • Hootsuite Q3 2019 Global Digital Statshot
  • Digital trends and the move to mobile
  • Alexa and Siri for voice command and search
  • Google Search, SEO and Answer box
  • Voice bots and conversation design
  • Human voice and nuance
  • Voice technology

Introduction

Rew (00:00:02.985)

When you think about it, voice is the only pre-programmed biological user interface we humans have. Think about it. The first thing we do when we come out of the womb is scream. It's our first search inquiry. And the result, well, it's usually right on the mark.


Rew (00:00:24.795)

Welcome to the Conversologist podcast, where we talk about the art and science of conversation in the digital space. We know that technology can be a powerful enabler and the customer journey from marketing to customer service, but communication and emotional connection still need to be at the core. I'm your host Rew Shearer this week with Chief Conversologist Jam Mayer, and I invite you to converse with us.


Rew (00:00:55.785)

Welcome to Episode 11...11...11. Voice is the new buzz


Rew (00:01:02.655)

Now Jam, hello.


Jam (00:01:04.745)

Hello.


Rew (00:01:05.985)

Just a heads up when I listen back to my last podcast about how I turned my friend into a chat bot. You remember that?


Jam (00:01:11.835)

Yeah


Rew (00:01:12.465)

I nearly put myself to sleep. Okay. I sounded that bored. So I'm going to try and sound a lot more animated and energetic this time around.


Jam (00:01:21.475)

Okay.


Rew (00:01:22.395)

How appropriate is that? Because we are indeed today talking about voice and all of its wonderful nuances. Now, this has woken up my inner nerd and it's got him reaching for his notebook and clicking his pen. This Jam is my jam.


Jam (00:01:38.115)

Yeah, definitely with all of your experience. But I'm not surprised we're fellow nerds. So let's let's start off with some statistics, shall we?

Technology Conversations are Going to Evolve to Voice

Jam (00:01:48.165)

So being Hootsuite ambassador and all that, according to the latest Hootsuite Q3 Global Digital Reports done by Simon Kemp, of course. Thank you, Simon. As always, in partnership with We Are Social, more and more people are using voice interfaces like Siri and Alexa.


Jam (00:02:08.505)

It's intuitive, it's easy. Everyone can do it. We're actually doing it right now. So it's one of those change trends that is inevitable. The conversations we have through technology are going to evolve more and more to voice


Rew (00:02:21.015)

Right.


Jam (00:02:21.765)

As marketers, what do we need to be aware of? What should we be doing and what skills are we going to need for this next chapter in the digital story?


Rew (00:02:31.165)

OK, so guys, here's how it works. Since Jam is the brains and I'm just a beauty, don't tell me she's both because I need something to hold on to. I'm going to ask the questions for the most part. Jam's going to give the answers best she can, and I'll probably jump in from time to time with my own nerd-isms and go Jam. Q3 twenty nineteen Global Digital Report with We Are Social. What does it say about voice? What are the trends and where are we headed?


Jam (00:03:01.065)

We know it's growing globally on average, 43% of Internet users are actually using voice search and voice commands regularly. I think there are about 3.4 or is it 3.5 billion people already are active, So just give you some context.


Rew (00:03:19.945)

That's a big, big number.


Jam (00:03:21.315)

The biggest group in terms of age groups, which I'm not really surprised, are the 16 to 24 year old bracket. 52% of them has used it in the last thirty days, and it just goes down as you go through the age groups, which again, it's not surprising.


Rew (00:03:36.615)

Right.


Jam (00:03:37.005)

And if I'm not mistaken, the growth of voice is actually picked up by We Are Social in January this year being a trend to watch. So we are definitely seeing that. Basically, you could say that is voice interpretation gets better, devices become more available. More and more people are picking up on how easy it is to use voice as an everyday interface, as you say. You know, you can multitask, you can search, walk and look at the same time and hopefully drive safely and search.


Rew (00:04:10.335)

Oh, yeah, I've done that asking Google Maps for a location while I was driving.


Jam (00:04:14.895)

How did it go, by the way?


Rew (00:04:16.575)

Let's just say Google Maps and I didn't have a great start. They say AI's are not self-aware yet, I say it has got too much of a mean streak for me to believe that. Google Maps is like here, it's you go anywhere I tell them and I was driving in circles.


Jam (00:04:32.565)

Yeah, well, I kind of still like Google Maps, but that's just me. Yeah.

Digital Trends and the Move to Mobile

Rew (00:04:36.705)

Voice is also popular in developing countries for another reason and going ask why is that?


Jam (00:04:42.285)

Why?


Rew (00:04:42.795)

Why? Literacy. Yes. If you check, you'll find that 80% of adults now around the world are actually literate. Now, let's stop one moment. Appreciate that. OK, this is a stepping away from the podcast because while everyone talks about the world going to hell in a handbasket, things like a higher literacy rate than at any other time in human history, by far, even over the past few decades and more than half the world with a smart device in their hands using the Internet. Isn't that amazing? Isn't it cool?


Jam (00:05:13.245)

Yes.


Rew (00:05:14.205)

But anyway, there's literacy and is literacy. Just because you can read and write doesn't mean you're quick at entering a query with your thumbs and people who aren't super flesh thumb such as yourself, Jam Mayer, appreciate the ease and fluency of voice.


Jam (00:05:28.895)

I think you are also a fast thumb typist.


Rew (00:05:32.245)

I do have speedy little thumbs, I have to say so.


Jam (00:05:36.175)

So there was a life interview recently with Simon Camp of languages that aren't easy to type in as quickly like Chinese, I believe was his example.


Rew (00:05:47.395)

Yep, that's right.


Jam (00:05:48.385)

What's interesting is that they're doing this on their smartphones, and desktop, not so much, which kind of fits the profile of the people leading the trend, the young and the restless, or people in developing countries where a smartphone is their only interface with the Internet.


Rew (00:06:07.495)

So they've got the smartphone in their hand using their voice. What are they doing exactly with voice?


Jam (00:06:13.855)

I don't know. How should I know? You're searching for, like nearby businesses where to buy or to eat, seeking for answers, how to get somewhere like, oh, are you going in circles? They're searching for videos, content to watch everyday stuff that we're all relying more and more on our smartphones to do.


Rew (00:06:33.745)

E-commerce to?


Jam (00:06:34.765)

Yes, voice is used in that capacity. But the big 2 are information exchange to voice the message voice commands and search. I think search is by far the biggest


Rew (00:06:46.375)

Big implication there. Just like smack me right in the forehead. I don't know if you heard it. You ask your phone a question and you want it to tell you an answer. The answer, THE answer, one answer. Isn't that kinda scary


Jam (00:06:58.855)

No! Me scared no, it's not in my vocabulary. I get really excited.


Rew (00:07:05.555)

OK


Jam (00:07:06.535)

But obviously the threats to the status quo and if you're not onto it, it could hit you hard like smack on your head. So it's bad news and it's good news at the same time. We know that there are probably about 10,000 people working in Lexcen, Google's one answer right now as we speak


Rew (00:07:26.065)

There's a big office. I wonder if it's open plan.


Jam (00:07:29.605)

Oh, I can just imagine they've got their bot. But anyway, so significance.


Rew (00:07:34.315)

Yeah


Jam (00:07:34.795)

As you mentioned right at the beginning, with the baby crying, generally a voice search needs to be properly understood. And unlike a traditional Google search with us typing as fast as we can, I don't want you to give me 20 million answers. I want you to give me the right answer.


Rew (00:07:52.765)

Right.


Jam (00:07:53.695)

So that's what all this AI behind it is heading towards. You know, you see it at work already. If you start to ask a question on Google now, you'll see the answer in the dropdown search queries.


Rew (00:08:06.865)

Yeah.


Jam (00:08:07.645)

Now if we're talking about questions with absolute answers, what's the capital of Sweden?


Rew (00:08:13.055)

Stockholm.


Jam (00:08:13.555)

There's not much AI needed it. Right. OK, so you're going to get an answer that hits the mark and you'll be satisfied of being even more complicated questions: why is the sky blue?


Rew (00:08:24.715)

Because the shorter wavelength of the blue.


Jam (00:08:27.355)

Right. Right. OK, it's really going to come down to a comprehensive answer in simple language. The kind people can digest and are satisfied with. The challenge is going to be, say, marketers such as ourselves and businesses. I mean, is SEO it is changing. This isn't anything new. It's a progression what we've been seeing anyway. So when everyone was using desktop, SEO was all about getting into the top 10 results. Right. First page not counting, obviously, the ads, which is now taking so much space


Rew (00:08:58.985)

About half of the first page. Yeah.


Jam (00:09:02.065)

And there's, of course, Google's answer box. If you can actually get Google to place your page onto it. You know those answer boxes? I mean, just for people who don't know what it is. It's just all of a sudden those FAQs and you've got a link to a Web page and oh cool, Google answer box. Click. Traffic. People won't even bother to look at the other results.


Rew (00:09:23.095)

If I can jump in. What I've actually also read about that is that that's actually killing traffic because often the answer is provided so well in the answer box, they don't bother clicking through to the source.


Jam (00:09:36.985)

With a move to mobile as well. It's even more relevant. It's not about the top ten anymore. It's probably top five, maybe top three, to be honest, and you've got the keywords, snippets, rich content that people want to engage with. And there is the position zero.


Rew (00:09:54.355)

Position zero. Gulp, that sounds scary. I'm scared of everything today. I'm just scared. Tell me.


Jam (00:10:00.655)

It's like the default answer. It's the answer that gets read aloud, so to speak. And unless the person searching turns it down and wants another result, it's the only answer. It's the snippets because it gives you basically, and this results to fewer clicks is because it's already there and it has a little bit information and they go and they decide, oh no, that's enough. Or just move on.


Rew (00:10:23.245)

So position zero is THE answer, the one that I was talking about when I asked this question. Position zero is you're so damn good at a SEO or something happened with your website, and people love your website so much that your answer or your solution is the one that Google decides it's going to tell you about in voice correct?


Jam (00:10:42.605)

Yep. And of course, I'm not an SEO expert here, so I'll calling out all SEO experts, maybe you'd like to talk more about that? This isn't an SEO podcast.

What does the Voice Trend mean to Conversologists?

Rew (00:10:52.245)

No, it's not. Yeah. Let's bring it back to the Conversologist. What does it mean to us?


Jam (00:10:57.855)

OK, so personally for the Conversologist, I say it's good news talking about ranking algorithms and it gets more sophisticated and now there's voice. Obviously, there will be more about engagement and user feedback.


Rew (00:11:11.055)

Right.


Jam (00:11:11.655)

That's where meaningful interactions, the kind we Conversologists nurture, become more important.


Rew (00:11:17.745)

OK


Jam (00:11:18.435)

By interactions, I mean the kind of conversation where you're seeking feedback and then you can then show in ways that Google can see and track.


Rew (00:11:26.285)

OK


Jam (00:11:26.775)

Let's use an example. Let's see, because I'm hungry right now while we're recording this.


Rew (00:11:33.195)

Right


Jam (00:11:33.585)

You own a burger bar.


Rew (00:11:34.815)

All right? Yep, I own a burger bar, I'm going to call it Rew Burgers.


Jam (00:11:38.605)

It kind of sounds like Australian Roadkill.


Jam (00:11:42.915)

Australian Roadkill. That's a great name for a burger. I love it!


Jam (00:11:45.975)

I don't even want to know what it is. But anyway, maybe part of your strategy is a voice bot.


Rew (00:11:52.035)

Right.


Jam (00:11:52.545)

It can take your order and recommend something. Remember past orders. Maybe they don't even need to come in store. The button will even arrange delivery. But it also says, hey, it was great chatting with you. Mind if I come back in a half an hour or so and ask you, how is your burger.


Rew (00:12:09.435)

Now, I'm actually hearing all of that in an Australian accent. Ey, it was great chatting with you how was the burger?


Jam (00:12:15.825)

You can have fun conversation with the bot already. So you say, sure, why not? And the bot comes back and asks you to review Australian road kill or something.


Rew (00:12:25.485)

Australian, right. Yeah. Yeah. With the Australian road kill.


Rew (00:12:28.095)

Yeah, exactly. And this is all voice. Remember that. You say the burger was awesome and you've got your review by voice alone.


Rew (00:12:37.845)

Wow


Jam (00:12:39.015)

The thing is OK, without a natural engaging automated conversation you're not going to get the review before you know it. Rew Bergers is gone Burgers.


Rew (00:12:48.875)

I worked so hard on that place too.


Jam (00:12:50.205)

Oh I really want a burger now with cheese.

Creating Engaging Chats with Voicebots

Rew (00:12:54.045)

Wait. OK, because this is something all right. It's one thing to create engaging conversations and chat, which is what we try and do all the time, but voice Whoa, whoa. Yeah. I mean,


Jam (00:13:08.415)

What are you doing?


Rew (00:13:09.195)

I'm making noises. But you knew what I meant, right, when I made those noises.


Jam (00:13:12.975)

Sure..


Rew (00:13:13.935)

You did.


Jam (00:13:14.805)

OK, so you meant this is going to be a big challenge and it's scary, but exciting at the same time.


Rew (00:13:21.015)

Right. And you got that bit of the message without any words. This is where the big challenge is if you ask me, OK, for us, Conversologists. If we're going to step up to the challenge of voice response and really engaging voice bots, in a way, we're all going to have to become something between copywriters and musical composers, because here's the thing.


Rew (00:13:44.775)

Jam, say "I really love being a Conversologist" but make it passionate. Go.


Jam (00:13:50.775)

I really love being a Conversologist.


Rew (00:13:54.435)

OK, now do it sarcastic.


Jam (00:13:55.815)

I really love being a Conversologist.


Rew (00:13:58.905)

And now determined like you're trying to convince yourself.


Jam (00:14:03.285)

What are you doing? You're making me do stuff.


Rew (00:14:07.095)

Another example, you and I have had a few misunderstandings through messaging with our thumbs and WhatsApp, right?


Jam (00:14:14.025)

Well, I'd say probably arguments, not misunderstandings.


Rew (00:14:18.045)

Just not exactly arguments, just heated realizations that you're always right.


Jam (00:14:22.885)

Thank you very much.


Rew (00:14:24.105)

The fact is, there's a lot more understanding and a lot less friction when we're talking in voice. And that's because it's not just the words as it is with messaging. To put it simply, when we talk, we sing.


Jam (00:14:39.285)

Did I tell you I belonged to a musical theatre company? I love that time.


Rew (00:14:42.375)

Pay attention, space, time and pitch loudness, even vocalizations like cadence and vocal fry. They all help communicate a message. All the aspects of voice.


Jam (00:14:54.855)

Sorry, you got me in vocal fry because I'm hungry. I want fries. When you read a message and text, you can interpret it with your own emotions.


Rew (00:15:05.385)

Right. If I'm feeling something negative, I'm much more likely to read what you've written as being negative or sarcastic, even if that wasn't your intention. If I hear you say it, I hear your register, I hear the energy in your voice, even the subtleties like brighter consonants, because you are smiling at the time, there's only one interpretation, right?


Jam (00:15:27.855)

Right.


Rew (00:15:28.645)

So the key to truly meaningful conversations is realistic inflection and tone. Now, if you're trying to put that into a voice bot, I trolled online for good speech generators and none could come up with anything even remotely approaching the nuance of true human speech.


Rew (00:15:48.685)

They all sound pretty much a bit like this.


Voice Bot (00:15:51.745)

I really love being a Conversologist.


Rew (00:15:54.615)

Yep.


Jam (00:15:55.495)

Pretty expressionless.


Rew (00:15:57.625)

It's one thing, machine learning, interpreting a customer's mood from listening to their voice, which they can do now. It's a mammoth task generating mood, nuance and inference in artificial speech because there are so many possible ways to deliver it, so many combinations. And let's not lose sight of the fact that those nuances change from country to country, dialect to dialect.


Jam (00:16:22.105)

And a lot of my inflections as a speaker of Americanized English are going to be a lot different from yours or the Kiwi.


Rew (00:16:29.665)

Programming that stuff convincingly and knowing when to apply which tone to really convey meaning and create engagement is no easy thing.


Jam (00:16:38.395)

I mean, it's hard and even harder to decide exact tone appropriate for the user's mood. I mean, do you match sarcasm with kindness or with cuteness? Do you match sadness with sympathy or positivity?


Rew (00:16:50.545)

Exactly. A lot will depend on context. The point is with voice, conversations and user experience are going to a whole new level. And that's where we all Conversologists are really going to shine because the conversations we design will be the ones that stand out and get the results.

Human Voice and Nuance

Jam (00:17:11.725)

One more thing I guess around that is conversation design right now. I know there's a lot going on in the text arena and of course, voice bots are coming into the game and digital units et cetera. But we have to rely on technology. As you said, there's nothing yet, well I haven't heard, if there's anyone out there, please prove us wrong.


Jam (00:17:35.095)

If there's anything that's close to human speech as possible, it's easy to design it, etc.. But in terms of voice, will the technology catch up to us humans?


Rew (00:17:45.715)

And if it does, when it does, what will it look like? The way I see it, this can go three ways. If, as you say, Jam, we the conversation designers are going to create truly engaging automated voice conversations with all of the light shade and nuance is going to need to sell the Australian roadkill burger right?


Jam (00:18:04.645)

Right.


Rew (00:18:05.455)

Option 1 prerecorded voice. This is basically the way a simple rule based chatbot operates already with its quick reply buttons. It's also the way an IVR system works. If, say, you're ordering pizza through a voice system over the phone by hitting the number pad. That's pretty straightforward as a conversation designer. It all exists now. You just need a good script and a talented voice.


Rew (00:18:28.585)

Option 2, AI really comes into its own and actually begins to master the natural sounds of speech. Given the size of the challenge, I personally think that one's a long way off. Option 3 and please, I hope somebody is working on this. Here's where I talked about us Conversologists as being like Composers. You can write your script, but there's an additional user interface that lets you control elements like pitch, register, speed and the emphasis on words through a sentence.


Rew (00:18:59.575)

This is where a conversation designer becomes, like I said, a composer.


Jam (00:19:04.915)

Yay


Rew (00:19:05.605)

Because it's almost like writing music. Where does it go up? Where does it go down? Where does it kind of waver in the middle? Now am I just dreaming will all those technologies be here in a year's time, five years. Will it never exist? Because it turns out there's actually no demand for it.


Jam (00:19:23.725)

That's a question for you to think about.


Rew (00:19:25.735)

I will think about it. I promise.


Jam (00:19:28.675)

All right.

Closing

Rew (00:19:29.095)

Now, just before we go, we'd like to acknowledge the voice messages left to us by Nathan, who gave some great feedback on some previous episodes, including Why Do We Hate Chat Bots and Jam your interview with Hillary Black of Black Ops. Now, among other things, Nathan pointed out that maybe don't offer the option to talk to a human at the start of the flow. Nathan, thank you. That was a valid suggestion and client willing one that makes an awful lot of sense.


Rew (00:19:59.785)

Now, Nathan also said the following:


Nathan Recorded voice (00:20:01.915)

I really enjoyed the part where we were talking about how people actually would prefer to talk to a chatbot. I do think that there's this element of wanting to be a little bit anonymous, also not wanting to waste somebody's time. If you have dumb questions that a chatbot can answer, you're much more likely to want to talk to that chatbot. So you're not embarrassing yourself in that conversation.


Rew (00:20:26.995)

Jam, do you want to talk to that one?


Jam (00:20:29.295)

Sure. And Nathan by the way thank you. And just for those who are listening, Nathan is one of our podcast listeners. Thank you so much. I remember, I think, in saying that he actually listened to all of our podcast episodes in one go. WOO HOO! Thank you.


Rew (00:20:44.148)

Binge Listen.


Jam (00:20:44.715)

Love ya! With regards to Nathan's comment, sort of it depends. It's a case by case basis because it depends on the personality of the person, the human being, comfort level of each person. For example, I for one, agree that I don't mind talking to a chat bot for like simple questions. And because I am a very impatient person, you know that and I like the instant gratification, I'll try the chatbot if available first, cause it's there 24/7. So it doesn't matter if it's three o'clock in the morning here, someone's going to answer me and help me with a correct answer. But what I find too is if you're someone who doesn't have all the time in the world, a chatbot is OK versus a human, or else you might just end up spending so much time speaking to that person. And with my experience, the call centers and BPO and everything contact centers, it also depends on how good the rep is right?


Jam (00:21:42.955)

So with a chat bot, easy peasy, it knows and interprets your questions. It has the answer. You get it. If not, then that's the time you actually get a human being. So yeah, it really depends. Do you have anything to add to that?


Rew (00:21:56.025)

Yes, I do. I also think in terms of about the dumb questions, I thought that was a really good point, that people might be more comfortable talking to a chat bot if they have what they believe is a dumb question, because humans being humans, sometimes you would rather not ask the question and screw everything up colossally.


Jam (00:22:15.315)

Right.


Rew (00:22:16.035)

Then ask the question and risk being mocked, laughed at or getting a audible roll of the eyes as you ask it. So, Nathan, I think that was pretty valid point. Thanks for making it.


Jam (00:22:27.945)

Thanks, Nathan. OK, Rew.


Rew (00:22:30.225)

Right. That's it for this podcast. Thank you for listening. If you're on Anchor.FM, you can do like Nathan and leave a voice message, we will listen and we will respond. If you found us on Social or your usual podcast app, drop a comment, join the conversation or you can visit our page, beautifully created by Jam TheConversologist.Show and tell us what you think. Their music bid was composed by Carlos O'Gara. And this podcast and images were produced by me.


Jam (00:23:00.915)

Yahoo!


Rew (00:23:01.215)

Till the next episode. Thank you for listening and keep talking.

Keen to listen to more episodes?

Metaverse
The Conversologist Podcast with Rew Shearer
by Jam Mayer 10 Dec, 2022
What if you could talk to the future, and the future talks to us? Our thoughts around the technologies behind "The Peripheral" and a few real-world applications.
Our Stories
The Girl, The Lab and Nerdgasms
by Jam Mayer 07 Dec, 2022
Nerdgasms? Yup. An integral part of the Conversologist Lab. This episode is not about what it is or how it's done but the WHY. This is my story that led me to start it and how it can potentially make a difference in people's lives.
Social Media
Gunnar Habitz Guest in the Conversologist Show
by Jam Mayer 05 Jun, 2023
Discover what 'Social Selling' truly means in this episode of the Conversologist podcast. It's not about spamming on social media - so what does it take?
AI and Chatbots
Human-AI Partnership: Unveiling the Essential Skills with Peachy Pacquing
by Jam Mayer 22 Feb, 2024
Take a deep dive into the impact of AI on human element and the essential skills needed in the age of AI.
Education
by Jam Mayer 29 Nov, 2022
Why traditional workshops don't work. Here's how the Conversologist Lab's learning framework is changing how workshops are done.
Copywriting
by Jam Mayer 07 Jun, 2019
From the effects of words on the dopamine reward centres, to the psychology of tone and nuance, the Cortex Copywriter says that copywriting is actually a science.
Share by: