13 Comments

Summary:

I think it would be really cool if someone could come up with a way to automatically derive written transcriptions from podcasts. This would work similar to speech recognition programs but it needs to be speaker independent to be effective, kind of like closed-captioning used by […]

I think it would be really cool if someone could come up with a way to automatically derive written transcriptions from podcasts. This would work similar to speech recognition programs but it needs to be speaker independent to be effective, kind of like closed-captioning used by TV stations. The purpose for producing written transcripts would be two-fold- make podcasts available to those in the community who are deaf and unable to access the information shared in the many podcasts now available and to make the shows searchable once the transcripts are published.

I would love to hear from those in a position who have thoughts about possible solutions that could be used from existing technology. I am willing to work with anyone using any of the podcasts I produce as a test case, with the hope of eventually using this on a wide range of podcasts. Please contact me if you have any ideas along these lines, or post a comment here. Let’s see what we can come up with as a community effort.

You’re subscribed! If you like, you can update your settings

  1. Only thing I have ever seen on this is this:

    http://www.podtranscript.com/index.php?categoryid=13&s=&

  2. Great idea James.
    I have often thought of working on somthing like that with the Microsoft Speach SDK.

  3. Would also be interested in this – more from a business point of view though.

    I work for a market research company – if we could process a recording of an interview through software and come out with a transcript at the other end, that would help us tremendously.

    We’ve tried basic speech recognition, but that isn’t accurate enough.

    Will watch this with interest. . .

  4. Compaq Research had a project called SpeechBot some years ago – it was a search engine for Internet radio shows.

    Unfortunately, it seems to be offline now. Read something about it here:
    http://www.corante.com/getreal/archives/003865.html

  5. I’ve been looking at stuff like this for years, from people like Virage and other digital AV transcription services. All of it is expensive, though very cool. I wanted to use it to provide transcripts of digitally recorded Distance Ed classes. Still do, just wish it was cheaper.

  6. Larry O’Brien Friday, December 23, 2005

    I’ve written prototypes and talked with some of the technical architects at MS about speech recognition technology. The current APIs are geared for realtime recognition — dictation — and it’s actually quite difficult to use them in a way that exploits offline CPU power (that is, “I’ll wait 8 hours for the job to finish, if the result is better”).

    This is not to say that the task is impossible; it will certainly be achieved someday. With today’s APIs, lower-quality but still valuable work could be done (extraction of keywords and so forth). However, it is a decidedly non-trivial task.

  7. Larry O’Brien Friday, December 23, 2005

    I’ve written prototypes and talked with some of the technical architects at MS about speech recognition technology. The current APIs are geared for realtime recognition — dictation — and it’s actually quite difficult to use them in a way that exploits offline CPU power (that is, “I’ll wait 8 hours for the job to finish, if the result is better”).

    This is not to say that the task is impossible; it will certainly be achieved someday. With today’s APIs, lower-quality but still valuable work could be done (extraction of keywords and so forth). However, it is a decidedly non-trivial task.

  8. Larry O’Brien Friday, December 23, 2005

    I’ve written prototypes and talked with some of the technical architects at MS about speech recognition technology. The current APIs are geared for realtime recognition — dictation — and it’s actually quite difficult to use them in a way that exploits offline CPU power (that is, “I’ll wait 8 hours for the job to finish, if the result is better”).

    This is not to say that the task is impossible; it will certainly be achieved someday. With today’s APIs, lower-quality but still valuable work could be done (extraction of keywords and so forth). However, it is a decidedly non-trivial task.

  9. Larry O’Brien Friday, December 23, 2005

    I’ve written prototypes and talked with some of the technical architects at MS about speech recognition technology. The current APIs are geared for realtime recognition — dictation — and it’s actually quite difficult to use them in a way that exploits offline CPU power (that is, “I’ll wait 8 hours for the job to finish, if the result is better”).

    This is not to say that the task is impossible; it will certainly be achieved someday. With today’s APIs, lower-quality but still valuable work could be done (extraction of keywords and so forth). However, it is a decidedly non-trivial task.

Comments have been disabled for this post