This is a Python Program to get all the captions from youtube link:
from pytube import YouTube yt = YouTube('https://youtu.be/5MgBikgcWnY') captions = yt.captions.all() for caption in captions: print(caption)
and the output of the above program is:
<Caption lang="Arabic" code="ar"> <Caption lang="Chinese (China)" code="zh-CN"> <Caption lang="English" code="en"> <Caption lang="English (auto-generated)" code="a.en"> <Caption lang="French" code="fr"> <Caption lang="German" code="de"> <Caption lang="Hungarian" code="hu"> <Caption lang="Italian" code="it">
But I want to get only the lang and code from the above output in a dictionary pair.
{"Arabic" : "ar", "Chinese" : "zh-CN", "English" : "en", "French : "fr", "German" : "de", "Hungarian" : "hu", "Italian" : "it"}
Thanks in Advance.
Advertisement
Answer
It’s pretty simple
from pytube import YouTube yt = YouTube('https://youtu.be/5MgBikgcWnY') captions = yt.captions.all() captions_dict = {} for caption in captions: # Mapping the caption name to the caption code captions_dict[caption.name] = caption.code
If you want a one-liner
captions_dict = {caption.name: caption.code for caption in captions}
Output
{'Arabic': 'ar', 'Bangla': 'bn', 'Burmese': 'my', 'Chinese (China)': 'zh-CN', 'Chinese (Taiwan)': 'zh-TW', 'Croatian': 'hr', 'English': 'en', 'English (auto-generated)': 'a.en', 'French': 'fr', 'German': 'de', 'Hebrew': 'iw', 'Hungarian': 'hu', 'Italian': 'it', 'Japanese': 'ja', 'Persian': 'fa', 'Polish': 'pl', 'Portuguese (Brazil)': 'pt-BR', 'Russian': 'ru', 'Serbian': 'sr', 'Slovak': 'sk', 'Spanish': 'es', 'Spanish (Spain)': 'es-ES', 'Thai': 'th', 'Turkish': 'tr', 'Ukrainian': 'uk', 'Vietnamese': 'vi'}