Table of Contents
1. Executive Summary
This GigaOm Benchmark Report was commissioned by Zoom.
Artificial Intelligence has undergone a massive leap forward with the proliferation of generative AI and interactive language models. As recently as two years ago, AI solutions were available exclusively to large enterprises. Today, AI products are breaking ground as mass-market workforce multipliers that are available to end consumers and businesses of all sizes.
Zoom released its AI Companion in September 2023, and it is included at no additional cost with all paid Zoom plans. This report explores Zoom AI Companion as one of the first AI assistants to market in the industry and the value it can provide to organizations. The report will additionally provide insight into what to expect next, as Zoom is continuing to develop new features at an extraordinary pace.
In this report, we will convey a few key points with our testing of Zoom AI Companion: that it effectively and accurately captures the discussion topics of meetings, that it quickly answers questions in the chat interface, and, most importantly, that it is easy for workers to use.
About GigaOm Benchmarks
GigaOm Benchmarks consist of hands-on field tests and lab-based performance testing shaped to reflect real-world scenarios and assess claims made by vendors. Our Benchmark reports inform technology buyers with transparent, repeatable tests and results, backed by GigaOm’s expert analysis. Where quantitative metrics may not fully describe an experience, qualitative metrics and analyst commentary may be used to provide context and product positioning in the market.
While no testing can make up for real-world implementations, we design benchmark test suites to validate a select set of hypotheses which are designed to demonstrate the potential business value and product differentiation against competitors.
2. Introduction
What is Zoom AI Companion?
Zoom AI Companion is a generative-AI virtual assistant powered by Large Language Models (LLMs). Users will be able to interact with Zoom AI Companion across the Zoom Platform, including meetings, chat, email, and even phone.
In meetings, users can ask questions to the virtual assistant in a chat-based interface, and the assistant will generate answers based on the live meeting transcript in real time. In addition to the chat-based interface, AI Companion can work in the background and create a meeting summary that captures the high-level conversation topics and action items, which it sends to the host and attendees via email or chat at the end of the meeting.
What is Zoom’s Strategy with the AI Companion?
Adoption of AI tools is impeded in many organizations by three significant concerns: They are expensive to acquire, complicated to implement, and pose an exfiltration risk when internal data is used to train the model. Zoom AI Companion is positioned to alleviate each of these concerns:
- Cost: Zoom AI Companion is included at no extra cost with every paid subscription to Zoom.
- Ease of Use: It takes just two clicks to start the AI Companion for a given meeting, and it features a simple chat interface for users to interact with.
- Data Security: Zoom AI companion does not use customer data for model training.
Zoom AI Companion is only one product in an ecosystem-wide and extensible portfolio of AI solutions under development by Zoom. The company’s platform features offerings grouped into segments, including flexible workspaces, productivity, business services, customer care, sales, and marketing. These are all connected by Zoom’s core communications products, such as Meetings and Team Chat.
Figure 1. Zoom Platform Structure
Zoom understands that most organizations have an array of heterogeneous communications platforms and has designed Zoom AI Companion to eventually extend to these ecosystems. From mail, chat, meetings, and enterprise document storage, Zoom plans to support an extensible model for companies to integrate their enterprise resources into Zoom AI Companion. The two primary platforms targeted for integrations are Microsoft 365 and Google GSuite. It’s exciting to imagine an AI assistant that could suggest meeting times for follow-ups based on calendar availability or automatically create reminders for action items.
In addition to its enterprise platforms, Zoom has shared plans to support custom language models so companies can train their own LLMs with their business-specific use cases in mind.
Competitive Landscape
Zoom is not the only company with an AI assistant. Microsoft Teams Copilot has been identified as the primary competitor for Zoom AI Companion. Unlike Zoom’s pricing model, Microsoft Teams Copilot is an add-on charge to an existing Office 365 E3 or higher-level subscription.
In Table 1, we break down the costs of the Zoom and Microsoft collaboration platform AI offerings. To attain a feature set equivalent to Zoom One Enterprise Premier, organizations must add both Teams Premium and Copilot to their existing Office 365 E3 subscriptions.
The net impact on cost is significant: The add-on costs alone for Microsoft add up to more than the full licensed cost of the Zoom service. In fact, Microsoft Teams is shown to be at least 37% more expensive to acquire when adding Copilot functionality to an existing Microsoft 365/Teams environment than it is to acquire a Zoom plan with this functionality included. Table 1 shows the annual pricing as of February 2024.
Table 1. Competitive Annual Pricing Zoom vs. Microsoft Teams (USD)
Zoom with Microsoft Office 365 E3 | Teams with Microsoft Office 365 E3 | ||
---|---|---|---|
Zoom One Enterprise Premier | $350 | Microsoft Teams | $0 |
AI Companion | $0 | Microsoft Copilot | $360 |
Microsoft Teams Premium | $120 | ||
Annual Cost per 1,000 Users | $350,000 | $480,000 | |
Source: GigaOm 2024 |
Note also that Table 1 only examines the meetings functionality of Zoom and Teams and that additional cost factors should be considered for connected phones, calling plans, connected conference rooms, workspace reservation systems, and other features included as part of Zoom One Enterprise Premier. For more information on the value proposition of the Zoom One Enterprise Premier package, see Table 4 in the Appendix.
Figure 2. Zoom AI Companion Cost Advantage Over Microsoft Stack
Figure 2 shows that it costs 37% more to add Microsoft Teams Copilot and Teams Premium to an existing Microsoft 365 subscription than it does to acquire the Zoom One Enterprise Premier plan, which includes Zoom AI Companion at no additional cost.
3. Field Test Overview
To measure the efficacy and reliability of Zoom AI Companion, we developed four hypotheses. A number of quantitative and qualitative metrics were then gathered and measured to assess the hypotheses, which are as follows:
- Zoom AI Companion is easy to use and provides value to the way companies hold meetings. (Ease of use and value)
- Zoom AI Companion accurately transcribes meetings. (Transcription accuracy)
- Zoom AI Companion quickly and accurately answers questions. (Response accuracy)
- Zoom AI Companion is competitive with other language models in answer accuracy and speed. (Response comparison)
Hypothesis 1: Ease of Use and Value
To assess if Zoom AI Companion is easy to use and provides value to the way companies hold meetings, we developed a test scenario to measure user experience across GigaOm internal meetings.
We gathered ratings on feature usability from a panel of meeting hosts and attendees and extrapolated a 1-10 score from the options: Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree, and N/A. The specific statements presented were:
- As a meeting host, it’s easy for me to enable Zoom AI Companion for my meetings.
- As a meeting host, it’s easy for me to enable the AI Summary for my meetings.
- As a meeting attendee, I was notified that Zoom AI Companion was enabled and available in meetings.
- As a meeting attendee, the user interface to interact with Zoom AI Companion was intuitive and easy to use.
Hypothesis 2: Transcription Accuracy
Zoom AI Companion uses the live meeting transcript, computed separately from the post-meeting transcript associated with meetings that are recorded. To determine if Zoom AI Companion accurately transcribes meetings, we enabled live captioning for each meeting and viewed the resulting live transcript.
We applied two test scenarios in this test segment:
Test Scenario A
Measure Zoom AI Companion’s word error rate (WER) for the live-captioned transcript on real-world meetings.
We utilized NIST SCTK (speech recognition scoring toolkit) to measure the WER of the transcription. A selection of five meetings of various lengths, formats, and number of speakers were selected from the total pool of over 200 meetings conducted in the testing period. A manually corrected transcription was used as the reference, and the hypothesized transcription was compared against it to generate the percentage of correct and incorrect words.
Test Scenario B
Measure Zoom AI Companion’s WER for the live-captioned transcripts on synthetic meetings.
Using a programmatic test apparatus, we conducted synthetic meetings using pre-recorded audio streams for each speaker. This lets us play back the same meeting multiple times consistently.
In addition to WER, we can calculate the consistency of transcription by measuring the standard deviation between numerous transcriptions of the same meeting. Are the same words correct/incorrect in each version?
Hypothesis 3: Response Accuracy
To determine if Zoom AI Companion quickly and accurately answers questions, we developed a test scenario to measure the accuracy of Zoom AI Companion’s responses across real-world meetings. In all, we cataloged more than 200 interactions with Zoom AI Companion during the test period and ranked the responses for accuracy on a scale of 1 to 10, with 10 being most accurate and 1 being least.
Hypothesis 4: Response Comparison
Here, we compare the performance of Zoom AI Companion, both in terms of accuracy and speed of response, against ChatGPT. The intent is to determine if AI Companion is competitive with other language models. We applied two test scenarios.
Test Scenario A
A number of sample transcripts and questions were selected to compare the response produced by Zoom AI Companion against that provided by ChatGPT 4. As with earlier tests, a 1-10 score was assigned to the answers of each model, with 1 being worst and 10 being best.
Test Scenario B
In a series of synthetic meetings of various length and number of presenters, a predefined set of questions was asked of Zoom AI Companion, and the time-to-answer measured. We attempted to determine if a positive correlation existed between the current transcript length and the time to respond. We repeated this test against ChatGPT 4.
4. Field Test Report
Here, we explore the results of our hands-on and performance testing of Zoom AI Companion. Again, we address each of the four hypotheses presented in the Field Test Overview section.
Hypothesis 1: Ease of Use and Value
Zoom AI Companion was well-received by GigaOm panelists. All participants who hosted meetings found the user interface intuitive, with all applicable responses either “agreeing” or “strongly agreeing” with the first statement, which posits that the UI to enable Zoom AI Companion was intuitive and easy to use. The four statements were:
- Meeting Host: It’s easy for me to enable the Zoom AI Companion for my meetings.
- Meeting Host: It’s easy for me to enable the AI Summary for my meetings.
- Attendee: I was notified that the Zoom AI Companion was enabled and available in meetings.
- Attendee: The user interface to interact with the Zoom AI Companion was intuitive and easy to use.
The responses were mapped to a 1-10 scoring scale, and these scores were averaged to provide an overall score for each statement, as shown in Figure 3. For the question about Zoom AI Companion ease of use, the scored average came out to 8.6, indicating robust agreement.
Figure 3. Panelist Responses to the Four Ease of Use Statements
While the Zoom AI Summary has a similar UI to the AI Companion, the statement about AI Summary ease of use drew a few additional responses scored as “agree,” which pulled the overall average for the question down to 8.3 There was some user sentiment about the desire to combine the AI tools into a “single pane of glass,” combining the AI Assistant with the AI Summary into one button. Overall, responses still either agree or strongly agree with the statement premise.
The third statement, about Zoom notifying attendees when Zoom AI Companion is enabled, earned the highest marks, with an overall average score of 9.6. Fully 80% of respondents marked “strongly agree” to this statement.
Finally, attendees were mostly enthusiastic about leveraging the AI Companion, with 17 of 20 respondents saying they “agreed” or “strongly agreed” that AI Companion is intuitive and easy to use. The most common feedback received was that Zoom AI Companion is currently available only during the meeting. (Note that the Zoom roadmap addresses extending the use of the AI Companion to completed meetings).
Next, we asked panelists to score how compelling they found specific use cases with Zoom AI Companion. Four use cases were provided, and respondents were asked to flag the use cases they felt were compelling to them. The most popular use case, as shown in Figure 4, was “Joining a meeting late and getting a quick recap” at 85% of total available points scored. The only other broadly identified use case was “Reviewing action items,” which earned 70%.
Figure 4. Use Case Alignment
Hypothesis 2: Transcription Accuracy
As the live meeting transcript is the primary input to Zoom AI Companion’s language model, its accuracy is critical to the AI Companion’s ability to answer questions correctly.
We found Zoom’s transcription highly accurate, with most transcripts exceeding 95% accuracy. As a note, we manually transcribed our sample meetings with a best-effort approach. We specifically corrected words that we had strong confidence were transcribed incorrectly, and mumbled or ambiguous speech that could be interpreted in multiple ways was left unchanged. We did not leverage human inference on intended speech to correct mumbled or undetected words; we relied only on what we heard. Our WER measurements may be conservative based on this approach. Zoom has shared that in their internal testing, WER approaches 5%.
We found that many errors occurred with proper nouns and names of individuals or companies, which may not be comprised of English-based words. Additionally, we observed a higher error rate for non-native English speakers with strong accents. We did not recognize a strong correlation between overall words-per-minute in meetings and the WER on our sample size.
As shown in Table 2, the WER exceeded 4% just once out of the six samples. Taken together, the total word count and total error count of all six meetings yielded an overall average WER of 3.3%.
Table 2. Transcript Accuracy
Meeting Name | Number of Speakers | Meeting Length | Total Words | Word Error Rate (WER) |
---|---|---|---|---|
Real-World Meeting 1 | 4 | 42:02 | 5,009 | 3.23% |
Real-World Meeting 2 | 3 | 24:44 | 4,115 | 2.67% |
Real-World Meeting 3 | 2 | 12:50 | 2,214 | 4.34% |
Synthetic Meeting 1 | 3 | 16:05 | 2,331 | 2.40% |
Synthetic Meeting 2 | 6 | 48:10 | 6,845 | 3.77% |
Synthetic Meeting 3 | 5 | 38:29 | 5,543 | 3.20% |
Source: GigaOm 2024 |
To measure the consistency of the transcription, each synthetic meeting was conducted at least three times, and each instance was compared against the other to identify a mean transcription. In each case, we measured no meaningful inconsistency in transcription—the resultant transcripts were highly consistent. The same audio generally produces similar results.
Hypothesis 3: Response Accuracy
Test Scenario A: Chat Interaction Sentiment Analysis
We collected 195 impressions in real-world meetings and 289 interactions in synthetic meetings during the testing period. Overall, AI Companion scored well, with approximately 57% of interactions scoring a 9 or 10, as shown in Figure 5. Interactions with the AI Companion were rated with a mean score of 7.33 out of 10.
Note that ratings below 7 are disproportionately weighted towards being scored as 1, where the model simply doesn’t recognize a topic that was discussed. The model appears biased toward not recognizing a topic or an abstract rather than providing answers that it has a low confidence in. Zoom has done an excellent job of curbing AI “hallucinations,” and the model was not observed to provide answers that reference non-existent statements. In answers where it does not recognize conversation topics, however, it states with confidence that “there was no mention of that topic in the transcript” rather than suggesting “it could not find a reference to that topic.”
Figure 5. Chat Interaction Ratings
After concluding the testing, panelists were asked to rate their overall perception of the AI Companion on a 1-10 scale, with 10 being best and 1 worst. As shown in Figure 6, scores of 7 and 8 were most awarded by panelists, while no score of 9 or 10 was awarded. The overall average for perception of chat interaction with Zoom AI Companion was 6.8.
Figure 6. Chat Interaction Perception
While Meeting Summary was used by fewer of our panelists, those that used it scored it marginally higher for its accuracy, with stronger user sentiment showing through multiple 9 scores, as shown in Figure 7.
In this survey, a score of 1 means Zoom AI Summary never provides useful or trustworthy information about meeting content, while a 10 means Zoom AI Summary always successfully provides useful or trustworthy information. The overall average score was 7.13 out of 10.
Figure 7. Meeting Summary Perception
Hypothesis 4: Response Comparison
We loaded the transcripts from our sample meetings into ChatGPT running the GPT-4 language model, and asked the same questions from our question bank as were asked in the live Zoom meetings to compare the results.
The questions were asked at the end of the Zoom meeting, so the full transcripts were present.
Three synthetic meetings were selected for the comparative language model testing:
- Synthetic Meeting 2 – 48 Minutes, 6,845 Words
- Synthetic Meeting 4 – 2 Hours 39 Minutes, 15,439 Words
- Synthetic Meeting 5 – 2 Hours 22 Minutes, 19,946 Words
The same questions were asked in the Zoom meeting, and after providing the transcript to ChatGPT, two mechanisms were tested for delivering the transcript to ChatGPT:
- Mechanism A: Copy and paste the transcript into the chat with the preamble “Assume the following meeting transcript and answer questions about it. <Transcript>”
- Mechanism B: Upload the transcript file to ChatGPT, then ask questions with the format “Based on the transcript, <Question Text>?”
We identified that ChatGPT was far more likely to produce hallucinations using Mechanism A than B. Mechanism B, however, increased the time-to-answer, as it appears to re-analyze the entire uploaded transcript for each question rather than rely on the active tokens from the chat history.
Mechanism B was selected for comparison, as it gave the most consistent answers in line with how Zoom AI Companion behaves.
Test Scenario A: Answer Quality Comparison
These comparative tests diverge from our standard real-world and synthetic testing, as we more actively re-attempt questions where the model fails to respond in different ways. The goal is to identify if different phrasing could prompt the models to recognize known topics. As a result, the overall average score is lower than our user experience scores.
In this test scenario, Zoom AI Companion scored a mean rating of 5.53 out of 10, while ChatGPT 4 scored 5.94.
ChatGPT 4 was inclined to provide significantly longer answers than Zoom AI Companion, including formatting answers using bullets and multiple paragraphs.
When rated on a 1-10 scale, answer quality was largely correlated. Both systems occasionally failed to recognize conversation topics, but the content of the answers was generally correct.
Test Scenario B: Answer Speed Comparison
To measure response speed, we used a screen recording utility and measured the time differential from the start frame once the question was submitted to the final frame once the AI tool completed its response.
Zoom AI Companion was demonstrably faster in all meeting scenarios than ChatGPT 4. This advantage was extended when using test mechanism B, as described above.
Note that this test used the ChatGPT web interface, not the API. This is not a rigorous measure of the speed of the GPT-4 language model but a small-scale test of the user-facing product ChatGPT. Response speeds may vary on Microsoft Teams Copilot, which also uses the GPT-4 language model.
Table 3. Comparing Response Time (in Seconds)
Meeting ID | Number of Words | Zoom AI (Mean) | Zoom AI (Worst) | ChatGPT (Mean) | ChatGPT (Worst) |
---|---|---|---|---|---|
Synthetic Meeting 2 | 6,845 | 3.5 | 5.3 | 11.1 | 18.70 |
Synthetic Meeting 4 | 15,439 | 4.3 | 9.2 | 13.60 | 36.8 |
Synthetic Meeting 5 | 19,946 | 4.96 | 9.70 | 28.01 | 54.4 |
Combined | 4.16 | 9.7 | 16.15 | 54.4 | |
Source: GigaOm 2024 |
Table 3 shows that ChatGPT response time was, on average, nearly four times slower than Zoom AI (4.16 seconds for Zoom vs. 16.15 seconds for ChatGPT). ChatGPT also demonstrated a stronger correlation of increased response time as transcript length increases compared to Zoom AI Companion. The longest answer time from ChatGPT was 460% longer than Zoom AI’s.
5. Conclusion: Should You Use Zoom AI Companion?
GigaOm users responded favorably to Zoom AI Companion, with an average rating of 7.33 out of 10 for chat interactions or 8.83 out of 10 when you remove the AI’s occasional non-answers from the response pool. The post-testing rating panel resulted in a user perception score of 6.84 out of 10 for chat interactions, and 7.13 out of 10 for meeting summaries.
Zoom AI Companion has a highly compelling pricing model, where acquiring the closest competing product, Microsoft Copilot for Teams, would be 37% more expensive than the entire Zoom One Enterprise Premier plan, assuming you already have an existing Microsoft 365 subscription.
Our primary wants for Zoom AI Companion to become even more useful would be the ability to use the AI Companion after meetings have ended and to ask questions regarding past meetings. These features are in the works, and with Zoom’s aggressive bi-weekly update cadence, they should be in our hands soon.
In conclusion, Zoom AI Companion is a useful value-add to how we meet as a company. Allowing late joiners to catch up on what has been discussed and getting a summary of action items are popular activities that this solution effectively automates. With even more features and extensibility, Zoom AI Companion will become a force multiplier, streamline meeting outcomes, and organize the interconnected nature of meetings with enterprise ecosystems.
6. Appendix
Zoom One Enterprise Premier vs. Teams Full Pricing Comparison
While this report primarily focuses on the meetings functionality of Zoom, the Zoom One Enterprise Premier plan comes bundled with many services that on competing platforms carry additional cost. In Table 4, we show the expanded pricing gap for an enterprise implementing a connected phone system for 5,000 users, 50 conference rooms, and 100 common-area phone systems. The Microsoft platform is shown to be 91% more expensive while not meeting full feature equivalency with the Zoom platform’s core capabilities.
Table 4. Platform Pricing Comparison
Zoom Platform | Microsoft Platform | ||
---|---|---|---|
Microsoft Office 365 E3 | Microsoft Office 365 E3 | ||
Zoom One Enterprise Premier | $350 | Microsoft Teams | $0 |
AI Companion | $0 | Microsoft Copilot | $360 |
Meeting Summary | $0 | Microsoft Teams Premium | $120 |
Domestic Calling Plan | $0 | Teams Phone w/Calling Plan | $180 |
Common Area Phones | $0 | Teams Shared Device | $96 |
SMS | $0 | Cloud Video Interop | Third Party |
Conference Room Connector | $0 | Teams Room Pro | $480 |
Workspace Reservation | $0 | Microsoft Places | Not Available |
ANNUAL COST Per User ** | $350 | $667 | |
ANNUAL COST 5,000 Users (With 50 conference rooms and 100 common area phone systems) | $1,750,000 | $3,333,600 | |
Source: GigaOm 2024 |
** Costs for shared resources such as conference rooms and common area phones distributed across user count
7. About Eric Phenix
Eric Phenix is Engineering Manager at GigaOm and responsible for our cloud platforms and guiding the engineering behind our research. He has worked as a senior consultant for Amazon Web Services, where he consulted for and designed both systems and teams for over 20 Fortune 1000 enterprises; and as cloud architect for BP, where he helped BPX Energy migrate their process control network from on-premises to AWS, creating the first 100% public cloud control network, operating over $10 billion in energy assets in the Permian Basin.
8. About GigaOm
GigaOm provides technical, operational, and business advice for IT’s strategic digital enterprise and business initiatives. Enterprise business leaders, CIOs, and technology organizations partner with GigaOm for practical, actionable, strategic, and visionary advice for modernizing and transforming their business. GigaOm’s advice empowers enterprises to successfully compete in an increasingly complicated business atmosphere that requires a solid understanding of constantly changing customer demands.
GigaOm works directly with enterprises both inside and outside of the IT organization to apply proven research and methodologies designed to avoid pitfalls and roadblocks while balancing risk and innovation. Research methodologies include but are not limited to adoption and benchmarking surveys, use cases, interviews, ROI/TCO, market landscapes, strategic trends, and technical benchmarks. Our analysts possess 20+ years of experience advising a spectrum of clients from early adopters to mainstream enterprises.
GigaOm’s perspective is that of the unbiased enterprise practitioner. Through this perspective, GigaOm connects with engaged and loyal subscribers on a deep and meaningful level.
9. Copyright
© Knowingly, Inc. 2024 "GigaOm Benchmark: Testing Zoom AI Companion" is a trademark of Knowingly, Inc. For permission to reproduce this report, please contact sales@gigaom.com.