Auto-transcribing podcasts?

13 Comments

I think it would be really cool if someone could come up with a way to automatically derive written transcriptions from podcasts. This would work similar to speech recognition programs but it needs to be speaker independent to be effective, kind of like closed-captioning used by TV stations. The purpose for producing written transcripts would be two-fold- make podcasts available to those in the community who are deaf and unable to access the information shared in the many podcasts now available and to make the shows searchable once the transcripts are published.

I would love to hear from those in a position who have thoughts about possible solutions that could be used from existing technology. I am willing to work with anyone using any of the podcasts I produce as a test case, with the hope of eventually using this on a wide range of podcasts. Please contact me if you have any ideas along these lines, or post a comment here. Let’s see what we can come up with as a community effort.

13 Comments

Larry Hendrick

James, I asked this same question on a podcast a few weeks ago and a listener directed me to http://www.podzinger.com, which scans your mp3 and turns it into text. It still has a few shortcomings and will need to improve, but the recognition is about the same as using Word and a microphone. It is not a complete solution, but a good beginning.

Larry O'Brien

I’ve written prototypes and talked with some of the technical architects at MS about speech recognition technology. The current APIs are geared for realtime recognition — dictation — and it’s actually quite difficult to use them in a way that exploits offline CPU power (that is, “I’ll wait 8 hours for the job to finish, if the result is better”).

This is not to say that the task is impossible; it will certainly be achieved someday. With today’s APIs, lower-quality but still valuable work could be done (extraction of keywords and so forth). However, it is a decidedly non-trivial task.

Larry O'Brien

I’ve written prototypes and talked with some of the technical architects at MS about speech recognition technology. The current APIs are geared for realtime recognition — dictation — and it’s actually quite difficult to use them in a way that exploits offline CPU power (that is, “I’ll wait 8 hours for the job to finish, if the result is better”).

This is not to say that the task is impossible; it will certainly be achieved someday. With today’s APIs, lower-quality but still valuable work could be done (extraction of keywords and so forth). However, it is a decidedly non-trivial task.

Larry O'Brien

I’ve written prototypes and talked with some of the technical architects at MS about speech recognition technology. The current APIs are geared for realtime recognition — dictation — and it’s actually quite difficult to use them in a way that exploits offline CPU power (that is, “I’ll wait 8 hours for the job to finish, if the result is better”).

This is not to say that the task is impossible; it will certainly be achieved someday. With today’s APIs, lower-quality but still valuable work could be done (extraction of keywords and so forth). However, it is a decidedly non-trivial task.

Larry O'Brien

I’ve written prototypes and talked with some of the technical architects at MS about speech recognition technology. The current APIs are geared for realtime recognition — dictation — and it’s actually quite difficult to use them in a way that exploits offline CPU power (that is, “I’ll wait 8 hours for the job to finish, if the result is better”).

This is not to say that the task is impossible; it will certainly be achieved someday. With today’s APIs, lower-quality but still valuable work could be done (extraction of keywords and so forth). However, it is a decidedly non-trivial task.

Larry O'Brien

I’ve written prototypes and talked with some of the technical architects at MS about speech recognition technology. The current APIs are geared for realtime recognition — dictation — and it’s actually quite difficult to use them in a way that exploits offline CPU power (that is, “I’ll wait 8 hours for the job to finish, if the result is better”).

This is not to say that the task is impossible; it will certainly be achieved someday. With today’s APIs, lower-quality but still valuable work could be done (extraction of keywords and so forth). However, it is a decidedly non-trivial task.

Mike

I’ve been looking at stuff like this for years, from people like Virage and other digital AV transcription services. All of it is expensive, though very cool. I wanted to use it to provide transcripts of digitally recorded Distance Ed classes. Still do, just wish it was cheaper.

Nick

Would also be interested in this – more from a business point of view though.

I work for a market research company – if we could process a recording of an interview through software and come out with a transcript at the other end, that would help us tremendously.

We’ve tried basic speech recognition, but that isn’t accurate enough.

Will watch this with interest. . .

Comments are closed.