We live in a post-COVID-19 world where physical distancing, work from home, and contactless interactions have become the new norm. Thus, there have been many innovations and focus on voice automation. One company working in voice AI solutions is Vernacular.ai, founded by IIT Roorkee alumnus. The co-founders are Akshay Deshraj, Prateek Gupta, Manoj Sarda, and Sourabh Gupta.

Vernacular.ai is an AI-First SaaS business aimed to enhance customer experience through intelligent voice conversations. Their vision is to build a unique voice AI platform, which would enable a multi-lingual audience to engage with interfaces online.

Designed By Keerti Charantimath

They have picked call centers as the first vertical to go after. This is due to the fact call centers are traditionally places where there are high costs, high attrition rates, and for the end-users IVRs are frustrating, and wait times are irritating.

"Large enterprises in India will be spending $30 million – $50 million per year on call centers. But I would say getting a cab in five minutes is much higher than getting connected to a call center in five minutes, and this is where our product VIVA comes in and solves not just costs but also customer satisfaction." Co-founder Sourabh Gupta said this in an interview with Telegraph.

They have developed two major products: Virtual Intelligent Voice Assistant (VIVA) and Vernacular Automated Speech Recognition (VASR).

Vernacular.ai team Source: yourstory.com


VIVA helps accelerate engagement strategy and utilizes cutting edge speech recognition and Natural Language Understanding (NLU) technology. It is an intelligent and multilingual platform that can help automate 80% of call centre operations.

VIVA has been trained by 1,00,00+ hours of speech data, and it keeps learning as its being used. It has been deployed to help enterprises boost customer stickiness and loyalty through a deep understanding of the customer’s context and intent.

Source: Vernacular.ai Website

VIVA support 10 Indian languages and has features like streaming and synchronous cognition and content and intent identification. Streaming and synchronous cognition analyses conversations for both online and offline cases. Content and intent identification allow VIVA to identify the user’s speech input's real meaning as per the domain. Hyper-personalization of calls is possible by its speech characteristic feature, which understands users.


VASR enables enterprises to convert audio to text by applying powerful neural network models in an easy-to-use Application Programming Interface (API). The API recognizes 10 Indian languages to support the enterprise user base. VASR builds the foundations of our conversational AI platform.

“Vernacular Automated Speech Recognition that truly listens and analyses.” It raises flags in our conversation in addition to spot keywords and generating insights. VASR also supports context and sentiment analysis. Many of the features are the same as VIVA, like streaming and synchronous cognition, content and intent identification, and speech characteristics.

Technology behind Vernacular.ai:

We have seen the features and customizations possible by VIVA and VARS, but it is possible because of the technology. The three significant challenges in making VIVA and VARS incorporate the style factor in conversation, answering tough but essential questions, and smooth interaction even with external problems. To solve these problems, Vernacular.ai has developed technology, namely - Idiolect layer, Contextual conversation clustering (C3), and Conversation Monitoring.

Designed By Keerti Charantimath

Idiolect layer:

Successful human conversations go beyond context. Both understanding and articulation are essential and must be incorporated. The idiolect layer plays a role in this.

Idiolect is formally explained as ‘speech habits peculiar to a particular person.’ While communicating, they use the understanding of these habits and patterns to derive an appropriate style along with content, which then goes out as a response.

VIVA’s engine understands various aspects of a caller’s idiolect in real-time, including many named factors like age, gender, etc. and latent factors and the standard lexical components needed for regular Spoken Language Understanding (SLU). This understanding, along with the semantics collected from SLU, helps in generating a response.

Contextual conversation clustering (C3):

A variety of challenging but essential questions are asked in call centres. The response is usually compared manually by auditing/quality teams. This problem is solved using C3.

It scans through unlimited calls and makes groups of requests based on over 40 parameters. These groups help identify hidden conversational patterns in a user-bot session.

The identified patterns are then used to generate situational reports of why a particular pattern is happening, provide an actionable suggestion to a human-agent in case of a transfer, and power the conversation monitoring technology. Even after transferring, C3 learns from the human agent’s interactions.

Conversation Monitoring:

Customer interactions aren’t always smooth because of multiple factors such as background noise, insufficient knowledge of the conversational agent, a hard to understand the accent, etc. Thus, it is important to monitor to keep an eye on these conversational agents. Monitoring is usually done manually in call centres by auditing/quality teams. To automate this process, conversation monitoring comes in handy.

The technology works by continuously monitoring an ongoing call and classifying it as a good or bad conversation based on over 70 conversation-level features. It generates a score for a call where a score of 0 means the call went pretty bad, and a score of 100 means the call went pretty well.

Designed By Keerti Charantimath

Identifying bad conversations is crucial since disappointing customer service may hamper customer loyalty and impact the brand image. This technology helps to catch these instances early on before it can make a significant impact and helps to ensure that customers have the best experience and the voice-bot is getting the maximum possible satisfaction score.

Which Industry is Vernacular.ai working?

Vernacular.ai has shown its presence in many industries, namely banking and insurance, food and beverage, travel and hospitality, DTH, internet service providers, and even online gaming. They create industry-specific customized solutions with their integrated voice & conversational intelligence into products through an independent platform that is always learning.

Their technology provides numerous benefits in these industries by increasing workers’ productivity and preventing the loss of customers due to the long wait for customer service. It reduces the number of calls received in call centres, thus reduces operating costs. It stops interested customers from going to other providers by reducing delays in their calls.

Vernacular.ai’s self-efficient, 24x7 service increases return on investment by freeing up agents focusing on converting high-quality leads. Customer experience is improved by proactively guiding them throughout the call and helping improve operations through customer feedback. The demo of their service can be heard on their website: https://vernacular.ai.

With their recent Series A funding, Vernacular.ai is planning to expand to newer markets in the US and Southeast Asia and develop their R&D. They are also planning to double their team to help drive their goal of becoming the leading voice automation/AI platform in the world.

"Voice automation has taken center stage for customer engagement for enterprises. This is also a time when voice-processing technologies need to evolve and communicate better than ever before. COVID-19 has accelerated this shift from touch to talk." Words said by the co-founder in an interview with Telegraph.