Introducing Speech Summarization Powered by Domain-Specific Language Models
Josh Fox
tl;dr:
We’re announcing the public release of our first domain-specific language model (DSLM) for speech summarization of call center interactions
Fine-tuned using more than 200K domain-specific conversations
No token length or audio duration limits
See our new summarization model in our API Playground or contact us to learn more
Since our inception, Deepgram has been a foundational AI company on a mission to create the essential building blocks for Language AI that will power the future of human-computer interactions. Sure, we’re mostly known for our industry-leading speech-to-text models and API, but that was always just the first phase of our journey:
Phase 0: Develop an end-to-end infrastructure and operational pipeline to curate data, train deep learning models, adapt general models into custom-trained ones, and deploy/operate these models for customers.
Phase 1: Apply this process to voice data to produce transcripts with near-human accuracy across multiple languages, domains, and use cases.
Phase 2: Make AI-generated transcripts more legible to both humans and machines with enhanced formatting options and speaker diarization.
Phase 3 (current phase): Give users the most comprehensive understanding of what was said, how it was said, and who said it using domain-specific language models (DSLMs) for high-level natural language understanding tasks like summarization, sentiment analysis, topic detection, etc.
Unlike OpenAI, Anthropic, and Google who are building massively scaled-up, general-purpose large language models (LLMs) like ChatGPT, Claude, and Bard with hundreds of billions of parameters, we are taking a different approach. While these models are undoubtedly powerful, they are large, slow, and too expensive to serve specific use cases efficiently and accurately at scale, to say nothing of the safety, privacy, and security concerns that will inhibit widespread enterprise adoption.
In contrast, we are building domain-specific language models (DSLMs) that are trained on use case-level data–with support for training on unique user-level data–which will provide several important benefits over their general-purpose counterparts:
Personalization
Superior accuracy on specialized topics
Low inference costs
Speed
Today we are proud to announce the general public release of our first such models for speech summarization of contact center and sales enablement interactions.
Deepgram Speech Summarization
The contact center industry is embracing AI-powered solutions to drive operational efficiency, cost reduction, and enhanced customer satisfaction. With the ever-increasing volume of customer interactions, contact centers are actively seeking innovative approaches to efficiently manage and analyze these interactions. Contact center agents spend an average of six minutes in wrap-up time per call, which involves manually updating notes from customer calls, documenting resolutions, and outlining next steps. Unfortunately, this manual process leads to longer average handling times, vital details being overlooked, and an overall decline in the customer experience.
To address these challenges head-on, we have developed a state-of-the-art DSLM-powered Summarization Model specifically tailored for contact centers and sales enablement use cases. This model is now publicly available for pre-recorded, English audio. By leveraging this model, agents and supervisors can effectively reduce average handling times, increase first-call resolution rates, and elevate the overall customer experience. Our Summarization Model automates the process of summarizing customer interactions and extracting pertinent information on a large scale. This empowers sales representatives with highly accurate summaries of customer conversations, allowing them to spend less time on administrative tasks and instead focus on building meaningful connections with customers and prospects.
Our model has undergone meticulous customization to cater specifically to the unique requirements of the contact center segment and offers a number of differentiated benefits:
Through extensive fine-tuning using more than 200K domain-specific conversations, our model surpasses the accuracy of alternative summarization methods for this segment.
No token length or audio duration limits–especially important in the call center space where call duration can often exceed the fixed context windows of current summarization solutions.
Blazing fast speed to support workflow automation.
Low cost per summary.
Unlike extractive approaches that tend to generate inaccurate summaries, our DSLM-powered model adopts an abstractive approach. This enables it to capture the essence of conversations with remarkable precision, delivering summaries that effectively convey the primary aspects of the conversation, including the reason for calling, agent responses, and identified follow-ups. An example of the difference between approaches can be seen below.
Transcript:
[Speaker:0] Thank you for calling Honda dealership. This is Bob. How may I assist you today?
[Speaker:1] Hi, Bob. I'm interested in the new Honda Civic two thousand twenty three. Can you tell me more about it?
[Speaker:0] Of course, Jake. We have the Honda Civic two thousand twenty three available in white color. It's a great car with some fantastic features.
[Speaker:0] Would you like to schedule a test drive?
[Speaker:1] Yes. I would love to test drive it. But before that, do you have the hybrid model available?
[Speaker:0] Let me check our inventory for you. Can you hold for just a moment? Jake?
[Speaker:1] Sure. I can hold.
[Speaker:0] Thank you for holding. Jake.
[Speaker:0] Yes. We do have the Honda Civic two thousand twenty three hybrid available in our inventory. When would you like to come in for a test drive?
[Speaker:1] That's great news, Bob. I'm available this Friday at three PM. Can we schedule the test drive for that time?
[Speaker:0] Absolutely Jake. Let me get your contact information and we'll confirm the appointment by email or phone.
[Speaker:0] Can you please provide me with your name, phone number, and email address.
[Speaker:1] Sure. My name is Jake Smith. My phone number is five hundred fifty five one thousand two hundred thirty four, and my email address is Jake. Smith at email dot com.
[Speaker:0] Perfect. Thank you for that information, Jake. We have you scheduled for a test drive in the Honda Civic two twenty three hybrid this Friday at three PM.
[Speaker:0] We'll send you a confirmation by email shortly.
[Speaker:1]Thank you so much, Bob. I'm really excited to test drive the car.
[Speaker:0] It was my pleasure, Jake. We look forward to seeing you on Friday.
[Speaker:0] If you have any other questions or concerns, feel free to give us a call.
[Speaker:1]Thanks, Bob. Have a great day.
[Speaker:0] You too, Jake.
Alternative (extractive) summarization results:
The Honda Civic two thousand twenty three is a great car with some fantastic features. The hybrid model is available in white color.
Deepgram’s DSLM-powered Summarization Model:
The customer calls Honda dealership to inquire about the availability of the Honda Civic 2023 hybrid model. The representative confirms that they have it in white and offers a test drive on Friday at 3 PM. The customer provides their contact information and schedules the test drive for that time. The representative confirms the appointment and assures the customer that they will receive a confirmation email.
As is clear from this example, our DSLM-powered Summarization Model delivers superior results that more accurately captures the essence of the interaction and includes important details the other model’s output lacks. In contrast to LLM-based solutions that have long latencies and much greater expense per query, the price and speed at which our model operates enables reliable, efficient summarization in record time in support of streamlined workflows.
By harnessing the power of automated summarization, contact centers and sales enablement platforms can uncover critical insights that enable leaders to efficiently navigate through thousands of conversations. This allows for quick identification of calls that require in-depth review and follow-up, ultimately saving time and effort while providing targeted coaching to agents.
To learn more, please visit our changelog or try out our new summarization model in our API Playground.
If you have any feedback about this post, or anything else regarding Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions or contact us to talk to one of our product experts for more information today.