"speech synthesis"

Translate This Thread From English to

Threaded View

I don't know anything about how people actually do speech synthesis.
I have two questions:

(1) I looked at some of the talk filters at
    http://www.dystance.net/software/talkfilters/
    and it seems to me that something similar might be used in speech
    synthesis. In other words, there must be some kind of mapping of
    certain sequences of letters to certain sounds. So, I'd like to know
    where I can found such a map.

    Let me make this more precise. What I'd like is a filter, like the flex
    scripts at the talkfilters site, which can be used to convert text into
    ascii representation of international phonetic symbols. If that already
    exists, that's great. If not, I'd like to know where I can find some
    source code that contains data for mapping printed text to some other
    notation for producing sounds.

(2) What kind of progress has been made on getting robots to talk
    by producing the sounds mechanically, instead of electronically?
--
Ignorantly,
* Disclaimer: I am a guest and *not* a member of the MIT CSAIL. My actions and
* comments do not reflect in any way on MIT. Also, I am nowhere near Boston.

Re: "speech synthesis"


Are you interested in speech synthesis on a pathological level, or on a
practical level given some target platform? If the former, there are
some good resources on speech synthesis going back to the Voder, a
device created in the 1930s that indeed used mechanical means to mimic
the human vocal tract. It was the precursor to the Vocoder, which was
designed by Bell Labs in order to better understand speech, and how to
manipulate it. (Bell was interested in how to put more talk down a
thinner and thinner wire. Same stuff people worry about today.)

Look up "voder" or "homer dudley" for more information. There are even
some newsreel shots from the '39 World's Fair when it was demonstrated.
It is operated much like a stenograph machine.

If you're looking at speech on the practical level, today most people
just use the Microsoft speech engine that comes with Windows, or one of
the various open source or licensed speech synthesis programs from folks
like Lucent. Here's an online example:

http://www.bell-labs.com/project/tts/voices.html

For representing speech sounds phonetically, a lot of research has been
done at CMU. Look up their Sphinx program.

For small robots, in the "old days" we used Votrax SC-01 or Genteral
Instruments SPO256 chips, which were based on parametric synthesis of
the vocal tract, but neither is still made. Today, there are some
replacements, like the SpeakJet -- which is based on a modern-day PIC.
It took themn a LONG time to write the code for this.

As far as stringing discrete sounds to make speech: a little harder than
that. Discrete scounds -- phonemes or allophones -- by themselves don't
have enough information in them for  recognition. People need to hear
the transistion between these sounds to actually recognize the speech,
and be able to comprehend what is said. You could map sounds and link
them together, and because there are only about 63 phonemes for English,
that's not hard. Much harder is building a transition model to bridge
the sounds, and then adding things like inflection that aid in our
comprehension of speech. There can be thousands of permutations.

Many of the technical papers on speech synthesis, even those going back
decades, talk about these and other issues. You can find some of them on
Google, or in speech and language pathology texts available at a good
university library.

-- Gordon

Re: "speech synthesis"



My two questions were not related to each other, except that they were
about different aspects of speech synthesis. I don't have plans to build
any robots, I was just curious about what has been accomplished along the
lines of getting robots to produce mechanical sounds. On the other hand,
I might actually want to do something impractical with the speech synthesis
programs, such as write a simple filter using a flex file, such as the
stuff at the talkfilters site I mentioned. The main purpose of doing so
would be to get practice in writing such filters and drill in using ascii
representations of the international phonetic alphabet.


I went there but didn't find any source code I could study. The text to
spoken pig latin is a nice touch, though.


I don't actually want to produce speech, at least not at this stage.
I just want to turn text into ascii representation of the corresponding
speech using the international phonetic alphabet. In other words, I'm
just interested in the easy part of what you said, just the discrete
sounds. I just want to take someone else's source code and tweak it
a little to produce ascii IPA.
--
Ignorantly,
* Disclaimer: I am a guest and *not* a member of the MIT CSAIL. My actions and
* comments do not reflect in any way on MIT. Also, I am nowhere near Boston.

Re: "speech synthesis"


My SP0256 emulator software, ChipTalk, strings phonemes together
without doing and tranistioning and it's almost as intelligable as a
SP0256-AL2.  Granted, that are pops and clicks that distract from
comprehension but with a little practice, once can understand it.  It
is also a great example of why you would bother to tranistion from one
sound to another.
Ken
www.speechchips.com


Re: "speech synthesis"

phoneme

n : (linguistics) one of a small set of speech sounds that are
distinguished by the speakers of a particular language


The speech synthesizers that I have played with
have a database of letter combinations, and what
phoneme they correspond to:

read
letter combination = phoneme
r   =  r
ea = E
d   = d

r-E-d

To make it sound more human, you need to add programming
that I've never even thought of looking at.

The ones I've used allowed to change pitch and
speed that the words were spoken in.

Rich


Re: "speech synthesis"

aiiadict@gmail.com writes:

That sounds like the kind of thing I'm looking for, i.e. a filter.
I can write a filter too, however, if I look at theirs, it will tell
me all of the things that need to go on the left hand side of the
equals sign.

Where can I find the source code for one of these programs?
--
Ignorantly,
* Disclaimer: I am a guest and *not* a member of the MIT CSAIL. My actions and
* comments do not reflect in any way on MIT. Also, I am nowhere near Boston.

Site Timeline