Voice morphing is the transition of one speech signal into another. Voice morphing technology enables a user to transform one person?s speech pattern into another person?s pattern with distinct characteristics. Image morphing, speech morphing preserve the shared characteristics of the starting and final signals, while generating a smooth transition between them. Speech morphing is analogous to image morphing. In image morphing the in-between images show one face smoothly changing its shape and texture until it turns into the target face. In Speech morphing one speech signal should smoothly change into another, keeping the shared characteristics of the starting and ending signals but smoothly changing the other properties. Pitch and formant information in each signal is extracted using the cepstral approach. Necessary processing to obtain the morphed speech signal include methods like Cross fading of envelope information, Dynamic Time Warping to match the major signal features (pitch) and Signal Re-estimation to convert the morphed speech signal back into the acoustic waveform.
A successful procedure for voice morphing requires a representation of the speech signal in a parametric space, using a suitable mathematical model that allows interpolation between the characteristics of the two speakers. In other words, for the speech characteristics of the source speaker?s voice to change gradually to those of the target speaker, the pitch, duration and spectral parameters must be extracted from both speakers. Natural-sounding synthetic intermediates, with a new voice timbre, can then be produced. A key element in the morphing is the manipulation of the pitch information. If two signals with different pitches were simply cross-faded it is highly likely that two separate sounds will be heard. A complete voice morphing system incorporates a voice conversion algorithm, the necessary tools for pre- and post-processing, as well as analysis and testing. The processing tools include waveform editing, duration scaling as well as other necessary enhancements so that the resulting speech is of the highest quality and is perceived as the target speaker.
Figure below shows the general block diagram of the voice morphing system. A recognition and alignment module was added for synchronizing the user?s voice with the target voice before the morphing is done. Before we can morph a particular file we have to supply information about the file to be morphed and the file recording itself (Target Information and File Information). The system requires the phonetic transcription of the lyrics, the melody as MIDI data, and the actual recording to be used as the target audio data. Thus, a good impersonator of the person that originally spoke the speech has to be recorded. This recording has to be analyzed with SMS, segmented into ?morphing units?, and each unit labeled with the appropriate note and phonetic information of the file. This preparation stage is done semi-automatically, using a non-real time application developed for this task. The first module of the running system includes the real-time analysis and the recognition/alignment steps. Each analysis frame, with the appropriate parameterization, is associated with the phoneme of a specific moment of the song and thus with a target frame. Once a user frame is matched with a target frame, we morph those interpolating data from both frames and we synthesize the output sound. Only voiced phonemes are morphed and the user has control over which and by how much each parameter is interpolated. The frames belonging to unvoiced phonemes are left untouched thus always having the user?s consonants.
This report has been subdivided into nine chapters. The second chapter gives the details about history of voice morphing. The third chapter gives an idea of the various processes involved in voice morphing in a concise manner. A thorough analysis of the procedure used to accomplish morphing and the necessary theory involved is presented in an uncomplicated manner in the fourth chapter. Processes like pre processing, cepstral analysis, dynamic time warping and signal re-estimation are vividly described with necessary diagrams. The fifth chapter gives a deep insight into the actual morphing process. The conversion of the morphed signal into an acoustic waveform is dealt in detail in the sixth chapter. Chapter six summarizes the whole morphing process with the help of a block diagram. Chapter seven lists the conclusions that have been drawn from this project.
Download your Reports for Voice Morphing