Loading...
「ツール」は右上に移動しました。
利用したサーバー: wtserver1
82いいね 2,228 views回再生

The MOST Accurate Speech-to-Text in 2025 💥 Nvidia Parakeet Python Tutorial 💥

parakeet-tdt-0.6b-v2 is a 600-million-parameter automatic speech recognition (ASR) model designed for high-quality English transcription, featuring support for punctuation, capitalization, and accurate timestamp prediction. Try Demo here: https://huggingface.co/spaces/nvidia/...

This XL variant of the FastConformer [1] architecture integrates the TDT [2] decoder and is trained with full attention, enabling efficient transcription of audio segments up to 24 minutes in a single pass. The model achieves an RTFx of 3380 on the HF-Open-ASR leaderboard with a batch size of 128. Note: RTFx Performance may vary depending on dataset audio duration and batch size.

Key Features

Accurate word-level timestamp predictions
Automatic punctuation and capitalization
Robust performance on spoken numbers, and song lyrics transcription

Colab used in the video:


https://colab.research.google.com/dri...

❤️ If you want to support the channel ❤️
Support here:
Patreon -   / 1littlecoder  
Ko-Fi - https://ko-fi.com/1littlecoder

🧭 Follow me on 🧭
Twitter -   / 1littlecoder  

コメント