App Engine + Google Voice = Super-charged Communication Platform


App Engine, a powerful and easy-to-use cloud-based development environment, definitely has its issues, among them voice. Google Voice, on the other hand, is a nice solution for managing incoming calls, but a boilerplate one with a fairly rudimentary set of features. Combine the two, however, and you’d have a powerful — and extensible — communication platform that supports both a basic set of features for ordinary users, plus the ability to build custom softswitch applications using App Engine to control them.

What might this look like from an architectural standpoint? There would be two basic components. One would be a highly scalable, distributed softswitch that accepts calls via SIP and IAX2 (more on IAX in a moment). This switch would be fairly dumb in that it would simply answer and route calls, play prompts and capture utterances or keystrokes from callers, then report these events to App Engine via HTTP requests. The other would be a simple library for App Engine that idiot-proofs basic call handling tasks such as answering, call transfer, prompt playback and speech recognition. Since App Engine is a fully featured programming environment in its own right, with support for Python and Java, these applications can be as sophisticated as developers want them to be.

Google doesn’t need to get into the business of selling phone numbers if it doesn’t want to. There are plenty of carriers that offer trunking services in dozens of countries — Voxbone, for example — so users can choose a carrier based on the type of inbound and outbound access required. The calls are then routed to the softswitch which, in turn, queries your web service for directions on what to do at each stage of the call.

This is also an opportunity for Google to promote alternatives to SIP for voice/video over IP, such as IAX (Inter Asterisk eXchange protocol). Built by Digium, the company behind the Asterisk open-source PBX, it’s designed to do one thing and do it very well. As such, it’s more efficient, better at firewall traversal and generally easier to deal with than SIP. IAX supports low bandwidth voice, high fidelity voice, as well as video and media streams, so it’s a versatile protocol, and perfectly adequate for transporting calls. It can also interleave many calls into a single media stream, which both reduces network overhead and makes security easier, since all calls and signaling messages about them come and go via a single port.

The concept of App Engine + Voice is similar to what providers like Voxeo and Twilio offer. Whether Google acquired or built this capability internally is a straightforward build vs. buy decision. If it can find a company whose infrastructure is compatible with the way Google operates, buy is probably the least risky way to go. Building something on top of Freeswitch, another open source softswitch, is another path worth investigating. The value to Google is the ability to create a developer community around communication applications. Google Voice is great, but it’s a pretty limited solution. Skype is awesome, but it’s a closed system (although the just released SkypeKit SDK is a sign that may change). Moreover, there are a number of niche providers that offer cloud-based communication services, but none have Google’s resources or scale.

But most importantly, marrying such services would see Google do what it does best: provide an open and scalable platform, and let partners and developers figure out what to build and who to sell it to.



IAX has a couple of other benefits over SIP (besides the hot-knife-through-butter attitude to firewalls). It is simpler, both on the wire and to implement than SIP. It also supports encryption as standard.

As to the cloud aspects of this, you might want to read a nice booklet that Thomas Howe put together (I contributed an essay) on cloud telephony.

I think this sort of thing is inevitable, but exactly what form it will take and who will be the major players is unclear.

Jason Goecke

Great ideas, and definitely a direction Google should be investigating if they are not already. Tropo, the Voxeo platform providing new variations of real-time communications APIs referenced above, has released the Moho ( library for applications like this.

While HTTP is great for the web, it is not as great for real-time/telephony communications where latency issues are more acute when massively scaling. Over time we noticed that VXML was increasingly becoming more ECMA/Javascript script-centric with more processing happening in the VXML browsers. This was happening, in the same way AJAX did, because the developer could get a lot more scale by pushing more processing to the VXML browser. Any of our large scale users of our platform push complex documents to our cloud, eschewing a chatty request/response model which incurs overhead for every HTTP POST/GET/PUT.

Tropo Scripting was our first answer to this, just send us the ECMA script and drop the VXML altogether. And then we extended this to Groovy, PHP, Python and Ruby. Having said this, Tropo today is synchronous, which does not lend itself to driving a ‘softswitch in the cloud’ but works great for user driven dialogs that developers use it for today.

Moho is the answer to the ‘softswitch in the cloud’, as it is a full asynchronous framework that may be driven at a low-level. We expect to build more APIs on the framework to drive such a highly scalable cloud as a softswitch concept as your article touches upon.

Writing scalable real-time cloud platforms is a different beast to doing the same with web apps, hence why Google Voice may not have any meaningful APIs today since Google probably gets this.

Aswath Rao

I would think the choice of protocol – SIP or IAX2 – will be determined by the trunking service provider. After all they are the ones who deliver the calls. Am I missing something here?

Mike Dent

Interesting, I think there is a lot of scope for Google voice and the App engine. It’s just a shame Google have not managed to roll out ‘voice to anywhere further than the US so far.
I can imagine though that going ‘outside’ the US will be a logistical nightmare!?


I definitely like what you are proposing here, and have often thought that if Google Voice had a little extra oomph would be a fantastic product for SMBs, but have a couple of issues with it: 1. Google Voice is not for business use (as per their terms of service), or at least not as of now. Using it for this purpose will cause you to lose your Google Voice number. 2. I am not sure IAX2 is that good of a suggestion at this point. I have seen quite a few IAX2 line providers actually move to SIP because its more standard and has less “jitter” issues. I agree that better firewall traversing is a huge issue, but apparently IAX2 is not looking like the right answer. 3. on a infrastructure perspective how realistic is it to use App Engine?, is GAE good enough for voice? I don’t use it at all so I am just extrapolating here, but I know that quite a few providers have stayed away from AWS for voice, and use Voxeo instead, since their infrastructure is conceived for Voice, which provides better quality on this matter.


I’m pretty sure G Voice is going to lose those business restrictions. Only thing that makes sense?

Brian McConnell

App Engine is essentially a web scripting / database environment, not unlike PHP + MySQL except it runs on Python and uses their own data store. App Engine would mostly likely not be handling the calls themselves, but would just respond to requests from a switch handling the calls (i.e. “I just received a call from 4155551234 for 4155554321, what should I do?”).

This is similar to the way web servers control VoiceXML servers, and plays to the strengths of each component. So as a developer you can script phone applications much like web applications, and you do not need to know the messy details of how telecom servers work except for a few basic functions.

Comments are closed.