Patent No. 4305131 Dialog between TV movies and human viewers
Patent No. 4305131 Dialog between TV movies and human viewers (Best, Dec 8, 1981)
Abstract
A video amusement system by which one or more viewers influence the course of a motion picture as if each viewer were a participant in a real-life drama or dialog. A speech-recognition unit recognizes a few spoken words such as "yes" and "run" spoken by a viewer at branch points in the movie, thus simulating a dialog between the screen actors and the viewer. The apparatus may read an optical videodisc containing independently addressable video frames, blocks of compressed audio, and/or animated cartoon graphics for the multiple story lines which the movie may take. A record retrieval circuit reads blocks of binary-coded control information comprising a branching structure of digital points specifying the frame sequence for each story line. A dispatcher circuit assembles a schedule of cueing commands specifying precisely which video frames, cartoon frames, and portions of audio are to be presented at which instant of time. A cueing circuit executes these commands by generating precisely timed video and audio signals, so that a motion picture with lip-synchronized sound is presented to the viewer. Recordings of the viewers' names may be inserted into the dialog so that the actors speak to each viewer using the viewer's own name. The apparatus can thus provide each viewer with an illusion of individualized and active participation in the motion picture.
Notes:
Parent 
  Case Text
  
  This is a continuation of U.S. patent application Ser. No. 009,533, filed Feb. 
  5, 1979 now abandoned. 
 BACKGROUND 
  OF THE INVENTION 
  
  1. Field of the Invention 
  
  The apparatus and methods of this invention relate to the following classes: 
  voice controlled television, electric amusement devices, motion picture and 
  sound synchronizing, videodisc retrieval, digital generating of animated cartoons, 
  and branching motion pictures. 
  
  2. Description of the Prior Art 
  
  Since the beginning of the motion picture industry, movies have generally been 
  constrained to a predetermined sequence of predetermined scenes. Although a 
  vicarious sense of involvement is often felt by each viewer, the immutability 
  of the sequence of scenes limits the viewer's actual participation to a few 
  primative options such as cheering, commenting, and selecting what to watch. 
  This limitation in prior-art movies has not changed substantially with the advent 
  of television, video games, and audience-response systems. 
  
  Although the prior art includes devices capable of providing viewer participation, 
  such devices do not provide all of the following features in one entertainment 
  medium: 
  
  (1) vivid motion picture imagery; 
  
  (2) lip-synchronized sound; 
  
  (3) story lines (plots) which branch (have alternative sequences); 
  
  (4) elaborately developed story lines as in motion picture drama; 
  
  (5) scene changes responsive to inputs from each individual viewer; 
  
  (6) seamless transitions between shots; 
  
  (7) many hours of non-repetitive entertainment. 
  
  Furthermore no prior art device can conduct a voice dialog with each viewer 
  in which the screen actors respond to the viewer's voice in a natural conversational 
  manner. 
  
  Prior-art video game devices enable players to control video images via buttons, 
  knobs, and control sticks. But in these devices the images are limited to one 
  stereotyped scene such as a battlefield, an automobile race, a gun fight, or 
  a surface on which a ball is moved around. Such game devices generate simple 
  moving figures on a television screen, but the elaborate plot, dialog, characterization, 
  and most of the cinematic art is absent. 
  
  Another problem faced by the present invention is providing many hours of interactive 
  entertainment without obvious repetition. Prior-art video games can be played 
  for many hours only because they involve ritualistic cycles in their mechanism 
  of play. Such cycles lack the variety, suspense, and realism of conventional 
  movies. 
  
  The use of microcomputer-controlled videodiscs for interactive instruction has 
  been discussed in the literature (for instance see "Special Purpose Applications 
  of the Optical Videodisc System", by George C. Kenney, IEEE Transactions on 
  Consumer Electronics, November 1976, pages 327-338). Such computer-assisted 
  instructional devices present conventional movie portions and still frames with 
  narration in response to information entered by the student via push-buttons. 
  But this prior art does not teach how to synchronize multiple alternative motion 
  picture sequences with multiple alternative audio tracks so that spoken words 
  from any of the audio tracks are realistically synchronized with the moving 
  lips of the human actors in the video image. Nor does the prior art teach a 
  method for automatically inserting spoken names of the players into a prerecorded 
  spoken dialog so that lip-synchronization (lip-sync) is maintained. Nor does 
  the prior art teach a method for making a television movie responsive to spoken 
  words from the viewers/players so that an illusion of personal viewer participation 
  results. 
  
  Prior art systems for recognizing voice inputs and generating voice responses, 
  such as described in U.S. Pat. No. 4,016,540, do not present a motion picture 
  and therefore cannot simulate a face-to-face conversation. 
  
  Prior art voice controlled systems such as described in U.S. Pat. No. 3,601,530, 
  provide control of transmitted TV images of live people, but cannot provide 
  a dialog with pre-recorded images. 
  
  Prior-art systems have been used with educational television in which the apparatus 
  switches between two or more channels or picture quadrants depending on the 
  student's answers to questions. Such systems cannot provide the rapid response, 
  precise timing, and smooth transitions which the present invention achieves, 
  because the multi-channel broadcast proceeds in a rigid sequence regardless 
  of the student's choices. 
  
  The prior art also includes two-way "participatory television" which enables 
  each subscriber of a cable-TV system to communicate via push-buttons with the 
  broadcaster's central computer so that statistics may be gathered on the aggregate 
  responses of the viewers to broadcast questions and performances. Similar systems 
  use telephone lines to communicate viewer's preferences to the broadcaster's 
  computer. Although the central computer can record each viewer's response, it 
  is not possible for the computer to customize the subsequent picture and sound 
  for every individual viewer. The individual's response is averaged with the 
  responses from many other subscribers. Although such systems permit each person 
  to participate, the participation is not "individualized" in the sense used 
  herein, because the system cannot give each individual a response that is adapted 
  to him alone. 
  
  The prior art for synchronizing audio with motion pictures is largely concerned 
  with film and video tape editing. Such devices as described in U.S. Pat. No. 
  3,721,757, are based on the presumption that most of the editing decisions as 
  to which frames will be synchronized with which portions of the audio have been 
  made prior to the "final cut" or broadcast. If multiple audio tracks are to 
  be mixed and synchronized with a motion picture, such editing typically takes 
  many hours more than the show itself. It is not humanly possible to make the 
  editing decisions for frame-by-frame finecut editing and precise lip-sync dubbing, 
  during the show. For this reason, prior-art editing and synchronizing apparatus 
  (whether preprogrammed or not) cannot provide each individual player with an 
  individualized dialog and story line, and are therefore not suitable for interactive 
  participatory movies and simulated voice conversations which are automatically 
  edited and synchronized by the apparatus during the show. 
  
  Another problem not addressed in the prior art is the automatic selection of 
  a portion of audio (from several alternative portions) which may be automatically 
  inserted into predetermined points in the audio signal by the apparatus during 
  the show. For example, the insertion of the names of the players, selected from 
  a catalog of thousands of common names, into a dialog so that the actors not 
  only respond to the players but call them by name. Recording a separate audio 
  track for each of the thousands of names would require an impractically large 
  amount of disc space. But using a catalog of names requires that each name be 
  inserted in several points in the dialog, whenever an actor speaks the name 
  of the then current player. The task of synchronizing audio insertion so that 
  the dialog flows smoothly without gaps or broken rhythm at the splice is one 
  heretofore performed by skilled editors who know in advance of the editing procedure 
  which frames and audio tracks are to be assembled and mixed. In the present 
  apparatus this finecut editing cannot be done until after the show has started, 
  because no human editor can know in advance the name of each player and the 
  sequence of the dialog which will change from performance to performance. The 
  present invention solves these editing and synchronizing problems. 
  
  While watching a prior art branching movie as described in U.S. Pat. No. 3,960,380, 
  a viewer cannot talk with the screen actors and have them reply responsively. 
  Applying prior art speech-recognition techniques to control such branching movies 
  would not provide a realistic conversational dialog because of the following 
  problem: If the number of words which a viewer of any age and sex can speak 
  and be understood by the apparatus is sufficiently large to permit a realistic 
  conversation, then prior art speech-recognition techniques are unreliable. But, 
  if the vocabulary is restricted to only a few words to make speech recognition 
  reliable, then a realistic conversation would not result. This problem is resolved 
  in the present invention. 
  
  SUMMARY OF THE INVENTION 
  
  This invention provides a form of entertainment heretofore not provided by any 
  prior-art system. With this invention one or more people can participate in 
  a motion picture by steering it in a direction of their own choosing and with 
  the consequences of their participation explicitly performed by motion picture 
  images and voices of actors or cartoon characters. Users of the system can carry 
  on simulated conversations with the screen actors who may address each player 
  by the player's own name. The invention enables television viewers to participate 
  in simulated conversations with famous people, and choose the direction the 
  conversation takes as it progresses. The invention eliminates the need for the 
  ritualistic cycles characteristic of prior-art games, by permitting each show 
  to be significantly different from any recent show. This is accomplished by 
  a special-purpose microcomputer which may automatically schedule and control 
  presentation of video frames, and/or digitally-generated animated cartoons, 
  and digitized audio which is automatically lip-synced with the motion picture. 
  
  
  Some embodiments of the invention include voice-recognition circuitry so that 
  the course of the movie can be influenced by words or other sounds spoken by 
  each viewer to produce an illusion of individualized participation. 
  
  Some embodiments include processing of branching schedules of control commands 
  which specify precise sequences and timing of video, audio, and graphics to 
  provide a lip-synchronized movie having a seamless flow through alternative 
  story lines. 
  
  Some embodiments include synchronizing multiple video frames and/or animated 
  cartoon frames with alternative audio portions during the show, such as inserted 
  names of the players/viewers, while preserving lip-sync and seamless flow. 
  
  This invention comprises various apparatus and methods for performing the functions 
  or combination of functions which may provide individualized participation in 
  a motion picture and simulated conversations with people. Some of these functions 
  may in some embodiments be performed by microprocessors executing programs which 
  may be fixed as firmware incorporated into the same semiconductor chips as the 
  conventional processing circuits. These programmed microprocessors are in essence 
  special-purpose circuits. Microprocessors executing separately-stored programs 
  may also be used. 
  
  The claims appended hereto should be consulted for a complete definition of 
  the invention which is summarized in part in the present summary. 
 
 
 
 
 
Comments