CMUSphinx никогда не распознает ни слова из аудиофайлов

Sphinx, похоже, не распознает и не обрабатывает аудиофайлы, которые он принимает, аудиопоток выдает пустой массив (результат SpeechResult). Я чувствую, что с аудиофайлом, который я использую, нет никаких проблем, потому что я пробовал несколько, и он не работает ни с одним из них. У кого-нибудь есть аудиофайл, который, как они знают, работает? И есть ли что-то особенное, из-за чего поток не производит транскрипцию?

public static void main(String args[]) throws IOException {
    Configuration configuration = new Configuration();
    configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
    configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
    configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.dmp");

    StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
    //recognizer.startRecognition(new FileInputStream("E:/1video/hello-5.mp3"));

    File file = new File("E:/1video/bargain_not.wav");
    FileInputStream fis = new FileInputStream(file);
    InputStream is = new FileInputStream(file);

    //is = AutomaticSpeechRecognition.class.getResourceAsStream("/edu/cmu/sphinx/demo/aligner/10001-90210-01803.wav");
    recognizer.startRecognition(is);
    SpeechResult result = null;
    while((result = recognizer.getResult()) != null) {
        System.out.println(result.getResult()); 
        System.out.println(result.getHypothesis());

        System.out.println(result.getWords()); 
    }
    //result = recognizer.getResult();
    //System.out.println(result);
    //System.out.println(result.toString());
    //System.out.println(result.getWords());
    /*for (WordResult wordResult : result.getWords())
    {
        System.out.println(wordResult);
    }*/
    recognizer.stopRecognition();


}

Вот результат его запуска - похоже, у него нет сбоев

 09:31:13.430 INFO unitManager          CI Unit: *+NSN+
 09:31:13.433 INFO unitManager          CI Unit: *+SPN+
 09:31:13.433 INFO unitManager          CI Unit: AA
 09:31:13.434 INFO unitManager          CI Unit: AE
 09:31:13.434 INFO unitManager          CI Unit: AH
 09:31:13.434 INFO unitManager          CI Unit: AO
 09:31:13.434 INFO unitManager          CI Unit: AW
 09:31:13.434 INFO unitManager          CI Unit: AY
 09:31:13.434 INFO unitManager          CI Unit: B
 09:31:13.434 INFO unitManager          CI Unit: CH
 09:31:13.434 INFO unitManager          CI Unit: D
 09:31:13.434 INFO unitManager          CI Unit: DH
 09:31:13.434 INFO unitManager          CI Unit: EH
 09:31:13.435 INFO unitManager          CI Unit: ER
 09:31:13.435 INFO unitManager          CI Unit: EY
 09:31:13.435 INFO unitManager          CI Unit: F
 09:31:13.435 INFO unitManager          CI Unit: G
 09:31:13.435 INFO unitManager          CI Unit: HH
 09:31:13.435 INFO unitManager          CI Unit: IH
 09:31:13.435 INFO unitManager          CI Unit: IY
 09:31:13.435 INFO unitManager          CI Unit: JH
 09:31:13.435 INFO unitManager          CI Unit: K
 09:31:13.435 INFO unitManager          CI Unit: L
 09:31:13.435 INFO unitManager          CI Unit: M
 09:31:13.436 INFO unitManager          CI Unit: N
 09:31:13.436 INFO unitManager          CI Unit: NG
 09:31:13.436 INFO unitManager          CI Unit: OW
 09:31:13.436 INFO unitManager          CI Unit: OY
 09:31:13.436 INFO unitManager          CI Unit: P
 09:31:13.436 INFO unitManager          CI Unit: R
 09:31:13.436 INFO unitManager          CI Unit: S
 09:31:13.436 INFO unitManager          CI Unit: SH
 09:31:13.436 INFO unitManager          CI Unit: T
 09:31:13.436 INFO unitManager          CI Unit: TH
 09:31:13.436 INFO unitManager          CI Unit: UH
 09:31:13.437 INFO unitManager          CI Unit: UW
 09:31:13.437 INFO unitManager          CI Unit: V
 09:31:13.437 INFO unitManager          CI Unit: W
 09:31:13.437 INFO unitManager          CI Unit: Y
 09:31:13.437 INFO unitManager          CI Unit: Z
 09:31:13.437 INFO unitManager          CI Unit: ZH
 09:31:14.014 INFO autoCepstrum         Cepstrum component auto-configured      as follows: autoCepstrum {MelFrequencyFilterBank, Denoise,      DiscreteCosineTransform2, Lifter}
 09:31:14.030 INFO dictionary           Loading dictionary from: jar:file:/C:/Users/Kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-SNAPSHOT/sphinx4-data-1.0-SNAPSHOT.jar!/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict
 09:31:14.132 INFO dictionary           Loading filler dictionary from: jar:file:/C:/Users/Kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-SNAPSHOT/sphinx4-data-1.0-SNAPSHOT.jar!/edu/cmu/sphinx/models/en-us/en-us/noisedict
 09:31:14.132 INFO acousticModelLoader  Loading tied-state acoustic model from: jar:file:/C:/Users/Kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-SNAPSHOT/sphinx4-data-1.0-SNAPSHOT.jar!/edu/cmu/sphinx/models/en-us/en-us
 09:31:14.133 INFO acousticModelLoader  Pool means Entries: 16128
 09:31:14.133 INFO acousticModelLoader  Pool variances Entries: 16128
 09:31:14.133 INFO acousticModelLoader  Pool transition_matrices Entries: 42
 09:31:14.133 INFO acousticModelLoader  Pool senones Entries: 5126
 09:31:14.133 INFO acousticModelLoader  Gaussian weights: mixture_weights. Entries: 15378
 09:31:14.133 INFO acousticModelLoader  Pool senones Entries: 5126
 09:31:14.133 INFO acousticModelLoader  Context Independent Unit Entries: 42
 09:31:14.133 INFO acousticModelLoader  HMM Manager: 137095 hmms
 09:31:14.134 INFO acousticModel        CompositeSenoneSequences: 0
 09:31:14.134 INFO largeTrigramModel    Loading n-gram language model from: jar:file:/C:/Users/Kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-SNAPSHOT/sphinx4-data-1.0-SNAPSHOT.jar!/edu/cmu/sphinx/models/en-us/en-us.lm.dmp
 09:31:14.807 INFO largeTrigramModel    1-grams: 19794
 09:31:14.807 INFO largeTrigramModel    2-grams: 1377200
 09:31:14.807 INFO largeTrigramModel    3-grams: 3178194
 09:31:15.582 INFO lexTreeLinguist      Max CI Units 43
 09:31:15.583 INFO lexTreeLinguist      Unit table size 79507
 09:31:15.585 INFO speedTracker         # ----------------------------- Timers----------------------------------------
 09:31:15.585 INFO speedTracker         # Name               Count   CurTime   MinTime   MaxTime   AvgTime   TotTime   
 09:31:15.586 INFO speedTracker         Load Dictionary      1       0.1020s   0.1020s   0.1020s   0.1020s   0.1020s   
 09:31:15.586 INFO speedTracker         Load LM              1       0.6730s   0.6730s   0.6730s   0.6730s   0.6730s   
 09:31:15.586 INFO speedTracker         Compile              1       0.7760s   0.7760s   0.7760s   0.7760s   0.7760s   
 09:31:15.586 INFO speedTracker         Load AM              1       1.5450s   1.5450s   1.5450s   1.5450s   1.5450s   
 09:31:15.608 INFO speedTracker            This  Time Audio: 1.94s  Proc: 0.01s  Speed: 0.00 X real time
 09:31:15.608 INFO speedTracker            Total Time Audio: 1.94s  Proc: 0.01s 0.00 X real time
 09:31:15.609 INFO memoryTracker           Mem  Total: 454.75 Mb  Free: 262.35 Mb
 09:31:15.609 INFO memoryTracker           Used: This: 192.40 Mb  Avg: 192.40 Mb  Max: 192.40 Mb
 09:31:15.610 INFO largeTrigramModel    LM Cache Size: 0 Hits: 0 Misses: 0
 <s> </s>

person kevinn2065    schedule 25.04.2015    source источник
comment
Работающий звуковой файл включен в демонстрацию. Ваш файл не распознан, так как, скорее всего, он имеет неправильный формат. Это должен быть 16-битный моно файл MSWAV 16 кГц.   -  person Nikolay Shmyrev    schedule 25.04.2015


Ответы (1)


Как сказал Николай Шмырев, файл должен быть 16 кГц 16 бит моно MSWAV. Такой файл можно записать с помощью Audacity. 16 кГц и моно

Экспортируйте файл и убедитесь, что вы выбрали WAV (Microsoft), подписанный 16-битный PCM.

person Travis    schedule 14.05.2015