Data for Machine Learning Gets a New Lease on Language

Venga Logo

InVimage by Venga Global creates clean data sets for annotation of text within images

San Francisco, CA – Venga has released its third solution in its rapidly growing suite of products for Natural Language Processing (NLP) Data Collection. The new addition to the family is InVimage – a cloud-based solution for annotating text within images. With each annotation, we automatically capture the X and Y coordinates, OCR (Optical Character Recognition) the annotation and have the option to machine or manually translate the captured text. 

Through our Human-in-the-Loop step, both the OCR and translation text can be reviewed and edited. This ensures Venga’s clients receive the cleanest datasets possible for their training models. InVimage was built with scalability in mind and can handle hundreds of thousands of images daily.

At the beginning of 2019, Venga released a completely redesigned version of our first solution InVtext, a solution that eliminates many of the quality issues that have plagued data set text translation. This was followed shortly by  InVvoice that summer which simplified the management and translation of voice data.

 “It has been a busy year for us”, says Chris Phillips (COO), who designed and architected the Product Suite. “Our data collection work has grown exponentially and we have had to scale our supply chain and development efforts accordingly. We’ve built those tools into a platform offering more options for our clients and resources. We are planning further growth and continuous development in 2020.” 

Venga started working on data collection projects back in 2017. Some of the larger data collection buyers were not getting the improvements expected in their models from other providers and wanted to test Venga is this field. Venga is the first to admit it wasn’t smooth sailing and suffered from delivery issues early on but learned very quickly and overcame many of the challenges that were causing models to stagnate in their development.  

“We paid attention to our clients’ needs and invested heavily in our Supply Chain, quality, and technology. The effect was that our clients kept sending more and more data to us as a result of their machine learning improvements. As we scaled, we found that the traditional technology used in the language industry caused supply chain, quality, and delivery issues and needed to be adapted. 

When working with 130 language pairs and well over 1,000 linguists, your processes must be on point. If technology is causing problems, people get frustrated, pull out from projects, or find ways to cut corners. 

We gathered our internal expertise and went through every pain point with a fine comb. Then we designed solutions that were up to the tasks. Our systems have been put to the test with the volumes we have processed this year and we haven’t had a single complaint or drop out due to technology.” continued Chris.

The three data tools InVtext, InVvoice, and InVimage have been designed based on specific customer needs but are flexible enough to adapt to project-specific requirements.

When asked about Venga’s 2020 plans Chris concluded, “We have now succeeded in delivering the key points for large scale NLP data collection. These include data integrity, adherence to researchers’ rules, working with low resource languages, simple UX, and ability to work in the cloud with low internet speeds. 

Our current solutions are tried and tested so now we can focus on customizations and can move development resources onto other interesting projects for our clients. We are currently talking to two clients about new solutions for text annotation and creative ways to collect colloquial conversation audio so we are expecting an interesting 2020 for sure.” 


With expertise in translation, localization, and creative services in over 100 languages, Venga partners with clients to support their global ambitions.

They follow a strategy of building robust programs for continuous translation and localization for enterprise clients. These programs are supported by an agile production team, an innovative tools and technology approach, a specialized supply chain, and an ISO-certified quality assurance team.

Clients can expect a long-term and transparent partnership. Venga is committed to continuous improvement and supporting our client’s accelerated growth and localization maturity.

For more information about Venga, please visit their website at


Stephanie Harris

Venga Global

+1 415 738-7705