The transcripts of spring 2023 seminar of IOB at UGA auto generated by the OpenAI Whisper large model.

Description

All the transcripts with different file format of IOB Spring seminars as of April 2023 are in the out folder of the github repo.

The scripts used to generate the transcripts are in the scripts folder. I first used the yt_dlp.sh to download youtube videos and convert them into mp3 format. (ty yt-dlp)

yt-dlp -x --audio-format mp3 -o '%(title)s.%(ext)s' {youtube link}

Then use the prep_whisper_job.py to generate commands for each sound file. It would generate cmd.sh for me to submit a lot of job to run whisper on UGA’s sapelo2 cluster.

whisper --model large -o out -- './{filename}';

Finally, I used prep_jekyll_page.py to generate the markdown file for each transcript so we can see this github page.

Status

I will update when I want to. Please feel free to use the transcripts for your own purpose or contact me for more interesting projects.

links: https://y.at/💻🌲🎓🚀🌕

Posts

subscribe via RSS