"speech synthesis"

- A
- Allan Adler
  
  Contact options for registered users
posted
18 years ago

Mon, May 16, 2005 8:37 PM

I don't know anything about how people actually do speech synthesis. I have two questions:

(1) I looked at some of the talk filters at

formatting link

and it seems to me that something similar might be used in speech synthesis. In other words, there must be some kind of mapping of certain sequences of letters to certain sounds. So, I'd like to know where I can found such a map.

Let me make this more precise. What I'd like is a filter, like the flex scripts at the talkfilters site, which can be used to convert text into ascii representation of international phonetic symbols. If that already exists, that's great. If not, I'd like to know where I can find some source code that contains data for mapping printed text to some other notation for producing sounds.

(2) What kind of progress has been made on getting robots to talk by producing the sounds mechanically, instead of electronically?

- G
- Gordon McComb
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, May 16, 2005 9:40 PM

Are you interested in speech synthesis on a pathological level, or on a practical level given some target platform? If the former, there are some good resources on speech synthesis going back to the Voder, a device created in the 1930s that indeed used mechanical means to mimic the human vocal tract. It was the precursor to the Vocoder, which was designed by Bell Labs in order to better understand speech, and how to manipulate it. (Bell was interested in how to put more talk down a thinner and thinner wire. Same stuff people worry about today.)

Look up "voder" or "homer dudley" for more information. There are even some newsreel shots from the '39 World's Fair when it was demonstrated. It is operated much like a stenograph machine.

If you're looking at speech on the practical level, today most people just use the Microsoft speech engine that comes with Windows, or one of the various open source or licensed speech synthesis programs from folks like Lucent. Here's an online example:

formatting link

For representing speech sounds phonetically, a lot of research has been done at CMU. Look up their Sphinx program.

For small robots, in the "old days" we used Votrax SC-01 or Genteral Instruments SPO256 chips, which were based on parametric synthesis of the vocal tract, but neither is still made. Today, there are some replacements, like the SpeakJet -- which is based on a modern-day PIC. It took themn a LONG time to write the code for this.

As far as stringing discrete sounds to make speech: a little harder than that. Discrete scounds -- phonemes or allophones -- by themselves don't have enough information in them for recognition. People need to hear the transistion between these sounds to actually recognize the speech, and be able to comprehend what is said. You could map sounds and link them together, and because there are only about 63 phonemes for English, that's not hard. Much harder is building a transition model to bridge the sounds, and then adding things like inflection that aid in our comprehension of speech. There can be thousands of permutations.

Many of the technical papers on speech synthesis, even those going back decades, talk about these and other issues. You can find some of them on Google, or in speech and language pathology texts available at a good university library.

-- Gordon

- A
- aiiadict
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, May 16, 2005 10:01 PM

phoneme

n : (linguistics) one of a small set of speech sounds that are distinguished by the speakers of a particular language

The speech synthesizers that I have played with have a database of letter combinations, and what phoneme they correspond to:

read letter combination = phoneme r = r ea = E d = d

r-E-d

To make it sound more human, you need to add programming that I've never even thought of looking at.

The ones I've used allowed to change pitch and speed that the words were spoken in.

Rich

- A
- Allan Adler
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, May 17, 2005 10:58 AM

That sounds like the kind of thing I'm looking for, i.e. a filter. I can write a filter too, however, if I look at theirs, it will tell me all of the things that need to go on the left hand side of the equals sign.

Where can I find the source code for one of these programs?

- A
- Allan Adler
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, May 17, 2005 11:08 AM

My two questions were not related to each other, except that they were about different aspects of speech synthesis. I don't have plans to build any robots, I was just curious about what has been accomplished along the lines of getting robots to produce mechanical sounds. On the other hand, I might actually want to do something impractical with the speech synthesis programs, such as write a simple filter using a flex file, such as the stuff at the talkfilters site I mentioned. The main purpose of doing so would be to get practice in writing such filters and drill in using ascii representations of the international phonetic alphabet.

I went there but didn't find any source code I could study. The text to spoken pig latin is a nice touch, though.

I don't actually want to produce speech, at least not at this stage. I just want to turn text into ascii representation of the corresponding speech using the international phonetic alphabet. In other words, I'm just interested in the easy part of what you said, just the discrete sounds. I just want to take someone else's source code and tweak it a little to produce ascii IPA.

- K
- KenLem
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, May 19, 2005 8:10 PM

My SP0256 emulator software, ChipTalk, strings phonemes together without doing and tranistioning and it's almost as intelligable as a SP0256-AL2. Granted, that are pops and clicks that distract from comprehension but with a little practice, once can understand it. It is also a great example of why you would bother to tranistion from one sound to another. Ken

formatting link