This GigaOm Research Reprint Expires Dec 31, 2025

The Business Case for Solving Visual Search with Computer Vision

Empower Users to Find Objects in Your Media Archives Quickly

What it Does Icon

What it Does

Visual search (VS) extracts information from digital media via computer vision algorithms and machine learning models. The metadata from the media is used to describe, search, and trigger actions.

Benefits Icon


  • Increase productivity by 25% to 50% when searching within media.
  • Improve video content sharing across the organization by 25% to 50%.
  • Drive online shopping conversion rates by 20% to 30% by including search results of similar items.
Urgency Icon


HIGH: Worth immediate action if your business processes entail capturing video for record keeping, commerce, quality checks, or detecting anomalies.

Risk Level Icon

Risk Level

MEDIUM: Risk factors include inadequate operational processes, information leakage impacting security, and unacceptable performance of computer vision algorithms and models to identify objects.

30/60/90 Plan Icon

30/60/90 Plan

Identify use cases, VS vendors, and key performance indicators, then conduct vendor trials and POCs while assessing workflow integration challenges. Once a VS solution is acquired, draw out plans for expanded use and configure metrics to show benefits over time.

Time to Value Icon

Time to Value

The value of integrating visual search into a solution should be realized within three to six months.

What Are the Scenarios of Use?

The growth of video data is astronomical—approximately 500 hours of videos are created each minute, making it difficult to find content. Manually created metadata is structured and limiting because it fails to describe the objects within the video. CV extracts objects like faces, text, surveillance information, and manufacturing defects and applies labels to the objects it finds. The labeled objects are associated with a timestamp within the media, facilitating VS.

VS can be used in the following core scenarios:

  • Enterprise Search: Augment search results by looking for text, faces, and product occurrences in video recordings.
  • Security: Surveillance of spaces and facilities for performing forensics.

  • E-Commerce: Recommend items similar to shoppers’ browsing experience and offer additional shopping choices.

  • Document Analysis and Tracking: Extract text from media for search, compliance, and automatic document verification for passports and driver licenses.

  • Manufacturing: Perform quality inspections by identifying visual defects like cracks or blemishes on recorded videos.

VS cannot exist on its own and needs to be a part of a larger solution. It can be included in a solution in the following ways:

  • Custom integration with AI platforms and services: Integrate media capture and CV services into the application. The CV service offers search and indexing.
  • Extend an enterprise video platform (EVP): Integrate a video platform as part of your media solution and integrate with a CV service if the platform supports custom metadata.
  • Use CV services provided by the video platform: Some video platform vendors provide additional services for CV. The platform’s search and indexing service indexes extracted objects. This is an up-and-coming trend from video platform vendors.

What Are the Alternatives?

The only alternative to implementing VS is to add the metadata for search purposes manually. This entails significant labor costs for personnel to perform the job, invites inconsistency due to varying knowledge and diligence levels of personnel, and requires higher turnaround times to label content due to human productivity limitations.

What Are the Costs and Risks?

From a cost standpoint, organizations can expect to spend between $5,000 and $6,000 monthly to process two million images a month for visual search when using a service offering CV. Optical character recognition (OCR) and facial recognition cost less than object recognition. The costs vary based on how visual search is incorporated into the solution. Note: Cost is based on GigaOm’s research and may vary.

Integration and maintenance costs are a factor, and organizations must plan for video storage costs when media is acquired as video. EVPs tend to use video hours or storage used for pricing, and some offer OCR services with their base pricing. Additional CV services like facial and object recognition may incur additional costs by EVP vendors. Developer training and consulting costs should be factored in when skills are unavailable internally.

The most significant VS risks are security, system performance, and operations. Security risks include cyber intrusion, unauthorized access to data and privacy, software problems, and configuration issues. System performance risks are focused on incorrectly labeling objects, causing rework and a high volume of transactions. CV commonly uses the measures of false positives, which determine if an object was mistakenly identified, and false negatives, which determine if an object’s label was not verified correctly. When the false positives and negatives rates are unacceptable, the CV systems need retuning and retraining, and the models may need to be updated. Operations risks are loss of regulatory compliance and inadequate business processes.

Any CV application should also consider the following risks:

  • Ethical: Does the solution violate business ethics when using data for incorrect purposes, is the data biased on demographics, is the application legal?
  • Economic: Does the solution have a potential for lawsuits, or could it impact the organization’s reputation?
  • Cultural: Is there close cooperation between teams to take advantage of the automation and prevent undesired outcomes that go undetected for long periods?

30/60/90 Plan

The path to adopting a SaaS-based visual search solution should be pretty quick. For general guidance, we highlight the following roadmap for 30/60/90 day deployment and adoption:

30 Days: Prepare
Identify VS workflow scenarios and the size of your video archive. Define the metadata of interest, seek appropriate CV platform vendors supporting visual search, and perform a cost-benefit analysis.

60 Days: Evaluate
Understand workflow integration challenges. Shortlist vendors, sign up for trials, and conduct POCs with a cross-functional team. Explain use scenarios to vendors, use vendor guidance for optimizing the solution, and understand how long it may take to index the video archive. Verify if the vendor supports the metadata model and provides support for custom metadata.

90 Days: Implement
Purchase the platform and set up sample workflows. Draw out plans for expanded use of VS. Set up metrics that drive key performance indicators to show benefits. Determine the need for a custom model, pre-processing, or post-processing to optimize the computer vision workflows for VS. Run tests to understand and document turnaround times for extracting and indexing metadata. Draw out plans to go live in production.