Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only folder structure without videos #649

Open
carlosvega opened this issue Sep 24, 2020 · 58 comments
Open

Only folder structure without videos #649

carlosvega opened this issue Sep 24, 2020 · 58 comments

Comments

@carlosvega
Copy link

Subject of the issue

edx-dl creates the folder structure but does not download any video

Your environment

  • Operating System (name/version): macOS
  • Python version: 3.8.5
  • youtube-dl version: 2020.9.20
  • edx-dl version: 0.1.13

Steps to reproduce

edx-dl -u https://courses.edx.org/courses/course-v1:KTHx+PHSC01.1x+1T2020/course/

Expected behaviour

I would expect it to download all videos

Actual behaviour

Is creating folder structure only

@diptomondal007
Copy link

same here

@0n0n0m0uz
Copy link

edx recently changed the structure of the website and this package isn't being maintained as it was before. It's going to be up to one of us to fix it I think. Not sure if they are still devoting time to this package

@RJFeddeler
Copy link

I started playing with the code yesterday to see if I could get it to work. I haven't used python much so my code isn't very pretty but it's just about working. I just need to get pdf and other files downloading. the videos and subtitles work. I'm not sure if I can advertise a fork here but when its done i'll upload it. Let me know if it's ok to post here.

@carlosvega
Copy link
Author

I noticed that some videos are only loaded afterwards via JS introducing an iframe.
How do you workaround that?

@carlosvega
Copy link
Author

Meanwhile, since I couldn't use this tool, I created my own chrome extension for that. You can find it here.
https://github.com/carlosvega/edx-video-extension

@RJFeddeler
Copy link

I noticed that some videos are only loaded afterwards via JS introducing an iframe.
How do you workaround that?

There is an api call now that gets unit IDs and other info. The section code stays the same but to get the units you make an api call for each section, it returns unit IDs for each unit, then you use the prefix https://courses.edx.org/xblock/ to get what is loaded in the iframe. I'm testing my final code now. When I upload it you can take a look.

@carlosvega
Copy link
Author

Great, I think I used a similar approach for the chrome extension.

@jturner421
Copy link

jturner421 commented Sep 29, 2020

I'd love to see what you've done. I'm working with the code myself and got to the point of correctly identifying the urls for all subsections. Then I ran into this regex on line #92 of parsing.py:

re_units = re.compile('(<div?[^>]id="seq_contents_\d+".?>.?</div>)', re.DOTALL)

As far as I can tell, this method and its associated regex are what is causing the script to fail to identify units.

Using the following url as an example: [(https://courses.edx.org/courses/course-v1:BerkeleyX+Data8.1x+2T2020/jump_to/block-v1:BerkeleyX+Data8.1x+2T2020+type@sequential+block@851eafb36585493aa5ce5c54f8d56d4a)]

which part do you append to [(https://courses,edx.org/xblock)]?

@carlosvega
Copy link
Author

carlosvega commented Sep 29, 2020

The advantage of the extension is that it can wait for JavaScript to load. The iframe src to https://courses.edx.org/xblock/block... is created dynamically via JS through very convoluted function.
The server could even generate a dynamic JS file. There is one file, called something like https://learning.edx.org/vendors~app … .js that has the function that initialises the video, or some JS that loads the iframe.

I think it won't be possible to build a successful scrapper without JS rendering.

In my case I wait for the page to load, then redirect to the iframe src, then I take the $('.video.is-initialized').attr('data-metadata') and get the video URL. I can't even parse the iframe content since they use different domain for the iframe and the website. They really went far to avoid any scrapping.

From https://learning.edx.org/course/course... they change to https://courses.edx.org/xblock/block... but an id is added, that ID is what's dynamically re-calculated through a very obfuscated process.

@RJFeddeler
Copy link

RJFeddeler commented Sep 30, 2020

I published my code, I modified edx_dl.py and parsing.py

https://github.com/RJFeddeler/edx-dl/

I decided to switch the way youtube-dl is used to the embedded method so I'm still playing around with the settings for that. It doesn't show progress or anything and downloads the best quality and muxes the audio/video together which is slower (ffmpeg is required for that, I forget if it defaults to the normal video file if not installed). I did that because I've been getting a lot of 500 errors when trying to download the default video+audio.

@RJFeddeler
Copy link

@jturner421

The xblock has both the sequential block id of the sub-section and the vertical block id of the unit. The sequential block IDs (section/subsections) are still working as usual but you use api calls to get the vertical block IDs of the units of each section.

@shad90
Copy link

shad90 commented Sep 30, 2020

@RJFeddeler You code is working . I had one problem with default settings it tries to download youtube videos. It will download one video and then it will raise an exception and give error during downloading second video. I ran the script again and it downloaded the second video and raised the exception again for third video.

Something about connection time out. Also it takes long time to download video. EDIT: I tried again and it was going fine without any problem. I will update on it.

Sorry i don't have details available now. But i used --prefer-cdn-videos and it is working flawlessly. But youtube download method has the advantage of naming the files properly.

Downloading courses before edx decides to update their system again

@RJFeddeler
Copy link

RJFeddeler commented Sep 30, 2020

@shad90 I'm not sure why youtube downloading isn't working for you. One thing you could try is to add the command line argument --ignore-errors (download latest version for that). I mentioned why the downloads take a long time above, its downloading the best quality audio and video separately and muxing them together. You can add the argument --format "best" and it should download from youtube quicker.

EDIT: It is --format "mp4" (or: -f "mp4")

@vobisie
Copy link

vobisie commented Oct 4, 2020

I published my code, I modified edx_dl.py and parsing.py

https://github.com/RJFeddeler/edx-dl/

I decided to switch the way youtube-dl is used to the embedded method so I'm still playing around with the settings for that. It doesn't show progress or anything and downloads the best quality and muxes the audio/video together which is slower (ffmpeg is required for that, I forget if it defaults to the normal video file if not installed). I did that because I've been getting a lot of 500 errors when trying to download the default video+audio.

Hey @RJFeddeler I used your repository and got the following error message

Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
File "edx-dl.py", line 8, in
edx_dl.main()
File "C:\Users\iobis\Desktop\edx-dl-master\edx-dl-master\edx_dl\edx_dl.py", line 1112, in main
all_selections = {selected_course:
File "C:\Users\iobis\Desktop\edx-dl-master\edx-dl-master\edx_dl\edx_dl.py", line 1113, in
get_available_sections(selected_course.url.replace('info', 'course'),
File "C:\Users\iobis\Desktop\edx-dl-master\edx-dl-master\edx_dl\edx_dl.py", line 232, in get_available_sections
sections = page_extractor.extract_sections_from_html(page, BASE_URL)
File "C:\Users\iobis\Desktop\edx-dl-master\edx-dl-master\edx_dl\parsing.py", line 457, in extract_sections_from_html
sections = [Section(position=i,
File "C:\Users\iobis\Desktop\edx-dl-master\edx-dl-master\edx_dl\parsing.py", line 459, in
url=_make_url(section_soup),
File "C:\Users\iobis\Desktop\edx-dl-master\edx-dl-master\edx_dl\parsing.py", line 430, in _make_url
return section_soup.a['href']
TypeError: 'NoneType' object is not subscriptable

@RJFeddeler
Copy link

@0n0n0m0uz No problem, I'm happy it worked! I am currently working on an update to make the output nicer and more useful (progress bars for everything, not just the current video) and to handle youtube errors better (downloading alternative videos)

@vobisie I didn't modify that at all, extracting the sections still works from the old code, it's getting the units from the sections that was the problem, not sure why you are getting that error.

@vobisie
Copy link

vobisie commented Oct 7, 2020

@RJFeddeler

@vobisie I didn't modify that at all, extracting the sections still works from the old code, it's getting the units from the sections that was the problem, not sure why you are getting that error.

(base) C:\Users\iobis\Desktop\edx-dl-master>python edx-dl.py -u *******@gmail.com https://courses.edx.org/courses/course-v1:MITx+JPAL102x+3T2020/course/
edx_dl version 0.1.13
Password:
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
File "edx-dl.py", line 8, in
edx_dl.main()
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 1112, in main
all_selections = {selected_course:
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 1113, in
get_available_sections(selected_course.url.replace('info', 'course'),
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 232, in get_available_sections
sections = page_extractor.extract_sections_from_html(page, BASE_URL)
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\parsing.py", line 457, in extract_sections_from_html
sections = [Section(position=i,
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\parsing.py", line 459, in
url=_make_url(section_soup),
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\parsing.py", line 430, in _make_url
return section_soup.a['href']
TypeError: 'NoneType' object is not subscriptable

The above is the complete output I get.

@RJFeddeler
Copy link

RJFeddeler commented Oct 8, 2020

@vobisie I haven't looked at the section extraction code much. Your guess is as good as mine. You sure you have the right URL for the course? Do other courses work or same problem?

EDIT: Working on some code now and I see it verifies the URL is in your list before starting, so guess it's not a problem of a wrong URL.

@Coperbytes
Copy link

Coperbytes commented Oct 8, 2020

I am getting different errors after using your code.

`During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\users\appdata\local\programs\python\python37\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "c:\users\appdata\local\programs\python\python37\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\AppData\Local\Programs\Python\Python37\Scripts\edx-dl.exe_main
.py", line 7, in
File "c:\users\appdata\local\programs\python\python37\lib\site-packages\edx_dl\edx_dl.py", line 1165, in main
downloadCount = download(args, selections, filtered_units, headers)
File "c:\users\appdata\local\programs\python\python37\lib\site-packages\edx_dl\edx_dl.py", line 940, in download
headers)
File "c:\users\appdata\local\programs\python\python37\lib\site-packages\edx_dl\edx_dl.py", line 898, in download_unit
headers)
File "c:\users\appdata\local\programs\python\python37\lib\site-packages\edx_dl\edx_dl.py", line 875, in download_video
downloadCount += skip_or_download(youtube_downloads, headers, args)
File "c:\users\appdata\local\programs\python\python37\lib\site-packages\edx_dl\edx_dl.py", line 858, in skip_or_download
f(url, filename, headers, args)
File "c:\users\appdata\local\programs\python\python37\lib\site-packages\edx_dl\edx_dl.py", line 763, in download_url
download_youtube_url(url, filename, headers, args)
File "c:\users\appdata\local\programs\python\python37\lib\site-packages\edx_dl\edx_dl.py", line 827, in download_youtube_url
ydl.download([url])
File "c:\users\appdata\local\programs\python\python37\lib\site-packages\youtube_dl\YoutubeDL.py", line 2019, in download
url, force_generic_extractor=self.params.get('force_generic_extractor', False))
File "c:\users\appdata\local\programs\python\python37\lib\site-packages\youtube_dl\YoutubeDL.py", line 820, in extract_info
self.report_error(compat_str(e), e.format_traceback())
File "c:\users\appdata\local\programs\python\python37\lib\site-packages\youtube_dl\YoutubeDL.py", line 625, in report_error
self.trouble(error_message, tb)
File "c:\users\appdata\local\programs\python\python37\lib\site-packages\youtube_dl\YoutubeDL.py", line 595, in trouble
raise DownloadError(message, exc_info)
youtube_dl.utils.DownloadError: ERROR: VgYkGzp_3jk: YouTube said: Unable to extract video data`

@RJFeddeler
Copy link

@techfre

Unable to extract video data is a problem with either youtube or youtube-dl, I get those somewhat frequently so I'm in the process of detecting those errors and downloading a different version of the video (youtube hosts multiple versions of each video with different encodings/resolutions/etc)

You can see the error in regard to the exception comes from youtube-dl so I can't do anything about it. I believe if you just use the flag -i AKA --ignore-errors then it should skip that video and continue downloading.

@jturner421
Copy link

jturner421 commented Oct 10, 2020

@RJFeddeler I can confirm that your code works well. Been able to download several courses. The only change I've made so far is to add downloading progress to the console. The output is not pretty, but it works.

Add bolded text to the my_hook method of the MyLogger class starting at line 115 of edx_dl.py

def my_hook(d):
if d['status'] == 'error':
print('Error downloading video from YouTube!')
if d['status'] == 'downloading':
print(d['filename'],` d['_percent_str'], d['_eta_str'])
if d['status'] == 'finished':
file_tuple = os.path.split(os.path.abspath(d['filename']))
print("Done downloading {}".format(file_tuple[1]))

@RJFeddeler
Copy link

I updated my repository with my latest version. It isn't perfect but it displays progress for the course/section/unit/video. I thought it was worth posting even though it isn't finished. It uses tqdm for progress bars. I also added an additional argument which I haven't tested:

  • -a (or --all): downloads all available courses sequentially. Do NOT specify any course urls with this arg, if you do, this arg is ignored.

@vobisie
Copy link

vobisie commented Oct 11, 2020

@RJFeddeler still having the same issues.
Tried with some of my other edx courses, below is the output.

(base) C:\Users\iobis\Desktop\edx-dl-master>python edx-dl.py -u ***@gmail.com https://courses.edx.org/courses/course-v1:MITx+JPAL102x+3T2020/course/
edx_dl version 0.1.13
Password:
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
File "edx-dl.py", line 8, in
edx_dl.main()
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 1183, in main
all_selections = {selected_course:
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 1184, in
get_available_sections(selected_course.url.replace('info', 'course'),
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 285, in get_available_sections
sections = page_extractor.extract_sections_from_html(page, BASE_URL)
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\parsing.py", line 457, in extract_sections_from_html
sections = [Section(position=i,
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\parsing.py", line 459, in
url=_make_url(section_soup),
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\parsing.py", line 430, in _make_url
return section_soup.a['href']
TypeError: 'NoneType' object is not subscriptable

(base) C:\Users\iobis\Desktop\edx-dl-master>python edx-dl.py -u ***@gmail.com https://courses.edx.org/courses/course-v1:MITx+14.740x+3T2020/course/
edx_dl version 0.1.13
Password:
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
File "edx-dl.py", line 8, in
edx_dl.main()
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 1183, in main
all_selections = {selected_course:
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 1184, in
get_available_sections(selected_course.url.replace('info', 'course'),
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 285, in get_available_sections
sections = page_extractor.extract_sections_from_html(page, BASE_URL)
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\parsing.py", line 457, in extract_sections_from_html
sections = [Section(position=i,
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\parsing.py", line 459, in
url=_make_url(section_soup),
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\parsing.py", line 430, in _make_url
return section_soup.a['href']
TypeError: 'NoneType' object is not subscriptable

(base) C:\Users\iobis\Desktop\edx-dl-master>python edx-dl.py -u ****@gmail.com https://courses.edx.org/courses/course-v1:MITx+15.415.1x+1T2020/course/
edx_dl version 0.1.13
Password:
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
File "edx-dl.py", line 8, in
edx_dl.main()
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 1183, in main
all_selections = {selected_course:
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 1184, in
get_available_sections(selected_course.url.replace('info', 'course'),
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 285, in get_available_sections
sections = page_extractor.extract_sections_from_html(page, BASE_URL)
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\parsing.py", line 457, in extract_sections_from_html
sections = [Section(position=i,
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\parsing.py", line 459, in
url=_make_url(section_soup),
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\parsing.py", line 430, in _make_url
return section_soup.a['href']
TypeError: 'NoneType' object is not subscriptable

However, it could be an issue with MIT courses because I was able to download this course without much hassle, while I struggled previously.

(base) C:\Users\iobis\Desktop\edx-dl-master>python edx-dl.py -u ***@gmail.com https://courses.edx.org/courses/course-v1:edX+edx201+1T2020/course/
edx_dl version 0.1.13
Password:
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Downloading How to Learn Online [course-v1:edX+edx201+1T2020/co]
Section 1: Welcome
Getting Started
The edX Team
Section 2: Self-Care for Learning
Managing Stress
Memory and Learning
Take Five for Yourself
(1 Question)
Section 3: Space, Time and Technology
Creating Space for Learning
Time Management
Managing Your Technology
(1 Question)
Section 4: Learning Strategies
Self-Regulation and Learning
Durable Learning
(1 Question)
Section 5: Communication and Community
Learning Together
Working Together
(1 Question)
Section 6: What's Next?
Keep Learning

Processing units...

Removed 0 duplicated urls from 24 in total
Output directory: Downloaded

Please advise and assist if possible.

@weirdsourcer
Copy link

I updated my repository with my latest version. It isn't perfect but it displays progress for the course/section/unit/video. I thought it was worth posting even though it isn't finished. It uses tqdm for progress bars. I also added an additional argument which I haven't tested:

  • -a (or --all): downloads all available courses sequentially. Do NOT specify any course urls with this arg, if you do, this arg is ignored.

I used edx-dl 2 months ago and it worked smoothly, I came back for it today but discovered this issues, thanks for resolving it. However, I'm a novice with github, how do I incorporate your codes into my edx-dl folder on my PC. I tried pip install --upgrade edx-dl but the output says all requirement are already satisfied but still, I can only see empty folders.

Kindly help.

@RJFeddeler
Copy link

@weirdsourcer I'm actually not sure where pip pulls stuff from. I can try to figure it out later but for now you just have to replace two files from my source in the edx-dl folder. The two modified files are edx-dl.py and parsing.py. I'm not sure exactly where pip installs your packages but you can type:

pip show edx-dl

to find out.

@weirdsourcer
Copy link

weirdsourcer commented Oct 15, 2020

I tried it exactly according to your guide but unfortunately it still maintains it behaviour of downloading only empty folders.

I used the coding below, I'm almost certain my code is correct as that was what I used to download the Microsoft courses I took 2 months ago.

(base) C:\Users\****>edx-dl -u ********@gmail.com -p ******! -o "C:\Users\********\OneDrive\Desktop\Online Course\EDX\MITx" --cache --youtube-dl-options="-f bestvideo[height<=1080]+bestaudio/best[height<=1080]" "https://courses.edx.org/courses/course-v1:MITx+15.071x+2T2020/course/"

UPDATE: the code is working now after I remove --cache from the code which makes me wonder if you Iit will work if I work to continue my course download later as MITx release course contents every week.

Update: it stops downloading after a while with the error

Removed 3 duplicated urls from 330 in total
Output directory: C:\Users******\OneDrive\Desktop\Online Course\EDX\MITx

Traceback (most recent call last):
  File "c:\users\******\anaconda3\lib\site-packages\youtube_dl\YoutubeDL.py", line 797, in extract_info
    ie_result = ie.extract(url)
  File "c:\users\******anaconda3\lib\site-packages\youtube_dl\extractor\common.py", line 530, in extract
    ie_result = self._real_extract(url)
  File "c:\users\******\anaconda3\lib\site-packages\youtube_dl\extractor\youtube.py", line 1893, in _real_extract
    'YouTube said: %s' % unavailable_message, expected=True, video_id=video_id)
youtube_dl.utils.ExtractorError: idRDTAUV8uY: YouTube said: Unable to extract video data

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\******\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\******\anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\******\anaconda3\Scripts\edx-dl.exe\__main__.py", line 7, in <module>
  File "c:\users\******\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 1233, in main
    download(args, selections, filtered_units, headers)
  File "c:\users\******\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 1002, in download
    download_unit(unit, args, target_dir, filename_prefix, headers)
  File "c:\users\******\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 937, in download_unit
    download_video(unit.videos[0], args, target_dir, filename_prefix, headers)
  File "c:\users\******\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 920, in download_video
    skip_or_download(youtube_downloads, FileType.Video, headers, args)
  File "c:\users\******\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 909, in skip_or_download
    f(url, filename, headers, args)
  File "c:\users\******\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 827, in download_url
    download_youtube_url(url, filename, headers, args)
  File "c:\users\******\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 880, in download_youtube_url
    ydl.download([url])
  File "c:\users\w******\anaconda3\lib\site-packages\youtube_dl\YoutubeDL.py", line 2019, in download
    url, force_generic_extractor=self.params.get('force_generic_extractor', False))
  File "c:\users\******\anaconda3\lib\site-packages\youtube_dl\YoutubeDL.py", line 820, in extract_info
    self.report_error(compat_str(e), e.format_traceback())
  File "c:\users\******\anaconda3\lib\site-packages\youtube_dl\YoutubeDL.py", line 625, in report_error
    self.trouble(error_message, tb)
  File "c:\users\******\anaconda3\lib\site-packages\youtube_dl\YoutubeDL.py", line 595, in trouble
    raise DownloadError(message, exc_info)
youtube_dl.utils.DownloadError: ERROR: idRDTAUV8uY: YouTube said: Unable to extract video data

@txeni
Copy link

txeni commented Oct 16, 2020

@RJFeddeler You are the man, thanks so much!

Just in case someone else runs into the same problem I did, it seems for a course I was downloading the separated files and then merging was giving me problems. If anyone has a similar problem, with errors from youtubedl or ffmpeg, try the -f "mp4" argument. It solved the problem for me. Before that I was getting the following error:

Traceback (most recent call last):                                                                                                 | 0/6 [?]
  File "/home/carlos/anaconda2/lib/python3.7/site-packages/youtube_dl/YoutubeDL.py", line 2065, in post_process                | 3/4 [00:03]
    files_to_delete, info = pp.run(info)                                                                                                    
  File "/home/carlos/anaconda2/lib/python3.7/site-packages/youtube_dl/postprocessor/ffmpeg.py", line 523, in run
    self.run_ffmpeg_multiple_files(info['__files_to_merge'], temp_filename, args)
  File "/home/carlos/anaconda2/lib/python3.7/site-packages/youtube_dl/postprocessor/ffmpeg.py", line 235, in run_ffmpeg_multiple_files
    raise FFmpegPostProcessorError(msg)
youtube_dl.postprocessor.ffmpeg.FFmpegPostProcessorError: Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument

@RJFeddeler
Copy link

@weirdsourcer --cache isn't very well implemented (by the original authors, I didn't touch it) and it really only saves you like a minute in time. You can resume downloading a course each week just by downloading the course without the --cache argument. It is supposed to skip files already downloaded, which it does for everything but youtube downloads which I'm currently working on fixing. The original code relies on youtube-dl to skip the youtube download which works okay but wastes some time.

As far as the error you get now where it stops downloading, that is something thats always happened for me. Thats why I always use the --ignore-errors (or just -i) arguments. At least then when it encounters that error it will keep going. As for a better solution, I am having it download alternate versions of the videos if one fails like that. I haven't tested it but it should be working, i'll publish the new code soon. You need to use the --ignore-errors argument though or any error youtube-dl encounters is just gonna end the program.

@txeni Yea I've had it default to downloading the best quality audio and video separately and using ffmpeg to combine them but I think it makes more sense to use the standard mp4 format argument as the default. I was just having trouble with the original code with the same error @weirdsourcer was having so I was looking for a format source that was more reliable but none are. I'll decide on a format order to go through when errors are encountered so that it doesn't just skip the file, but --ignore-errors must be specified.

@nalam002
Copy link

nalam002 commented Oct 16, 2020

@RJFeddeler Thanks for your new code, however I always keep getting a traceback that seems different from the ones reported so far, even if I'm typing nothing but edx-dl without any arguments at all.

Traceback (most recent call last):
  File "c:\users\teacher.desktop-6r84b69\miniconda3\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\teacher.desktop-6r84b69\miniconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\teacher.DESKTOP-6R84B69\miniconda3\Scripts\edx-dl.exe\__main__.py", line 4, in <module>
  File "c:\users\teacher.desktop-6r84b69\miniconda3\lib\site-packages\edx_dl\edx_dl.py", line 24, in <module>
    import youtube_dl
  File "c:\users\teacher.desktop-6r84b69\miniconda3\lib\site-packages\youtube_dl\__init__.py", line 15, in <module>
    from .options import (
  File "c:\users\teacher.desktop-6r84b69\miniconda3\lib\site-packages\youtube_dl\options.py", line 8, in <module>
    from .downloader.external import list_external_downloaders
  File "c:\users\teacher.desktop-6r84b69\miniconda3\lib\site-packages\youtube_dl\downloader\__init__.py", line 5, in <module>
    from .hls import HlsFD
  File "c:\users\teacher.desktop-6r84b69\miniconda3\lib\site-packages\youtube_dl\downloader\hls.py", line 6, in <module>
    from Crypto.Cipher import AES
  File "c:\users\teacher.desktop-6r84b69\miniconda3\lib\site-packages\Crypto\Cipher\__init__.py", line 27, in <module>
    from Crypto.Cipher._mode_ecb import _create_ecb_cipher
  File "c:\users\teacher.desktop-6r84b69\miniconda3\lib\site-packages\Crypto\Cipher\_mode_ecb.py", line 35, in <module>
    raw_ecb_lib = load_pycryptodome_raw_lib("Crypto.Cipher._raw_ecb", """
  File "c:\users\teacher.desktop-6r84b69\miniconda3\lib\site-packages\Crypto\Util\_raw_api.py", line 308, in load_pycryptodome_raw_lib
    raise OSError("Cannot load native module '%s': %s" % (name, ", ".join(attempts)))
OSError: Cannot load native module 'Crypto.Cipher._raw_ecb': Trying '_raw_ecb.cp38-win_amd64.pyd': cannot load library 'c:\users\teacher.desktop-6r84b69\miniconda3\lib\site-packages\Crypto\Util\..\Cipher\_raw_ecb.cp38-win_amd64.pyd': error 0x7e.  Additionally, ctypes.util.find_library() did not manage to locate a library called 'c:\\users\\teacher.desktop-6r84b69\\miniconda3\\lib\\site-packages\\Crypto\\Util\\..\\Cipher\\_raw_ecb.cp38-win_amd64.pyd', Trying '_raw_ecb.pyd': cannot load library 'c:\users\teacher.desktop-6r84b69\miniconda3\lib\site-packages\Crypto\Util\..\Cipher\_raw_ecb.pyd': error 0x7e.  Additionally, ctypes.util.find_library() did not manage to locate a library called 'c:\\users\\teacher.desktop-6r84b69\\miniconda3\\lib\\site-packages\\Crypto\\Util\\..\\Cipher\\_raw_ecb.pyd'

I upgraded youtube-dl and Crypto packages just to be safe, but nothing changed. :( BTW I'm running python 3.8 if that means anything.

EDIT: Nvm, I reinstalled python (latest miniconda) and redid everything, and now the error is gone.

@weirdsourcer
Copy link

@RJFeddeler thanks for your effort, is it possible with the current with your latest commit to download a list of courses with just one request. I noticed the progress bar counts course as 0/1, what is the command for downloading say a whole programme with 3 courses in it at the same time, or should I say 3 courses at the same time without sending the request per courses. This should result in course count like 0/3, 1/3, 2/3, you get the Idea.

@PencilWarrior1
Copy link

@RJFeddeler I installed your version, but when I run edx-dl I'm getting this error... any ideas? :-)

"C:\Users*****\AppData\Local\Programs\Python\Python38-32\lib\site-packages\edx_dl-0.1.13-py3.8.egg\edx_dl\edx_dl.py", line 27, in
ModuleNotFoundError: No module named 'tqdm'

@okyere
Copy link

okyere commented Oct 22, 2020

I updated my repository with my latest version. It isn't perfect but it displays progress for the course/section/unit/video. I thought it was worth posting even though it isn't finished. It uses tqdm for progress bars. I also added an additional argument which I haven't tested:

  • -a (or --all): downloads all available courses sequentially. Do NOT specify any course urls with this arg, if you do, this arg is ignored.

Great work. Thanks for sharing.

@diamneth
Copy link

@RJFeddeler Hello, appreciate all your work and effort in this. Have you maybe found a solution for the Youtube unable to extract video data error? I am getting the same like some guys above.

Traceback (most recent call last):
File "c:\users\r\anaconda3\lib\site-packages\youtube_dl\YoutubeDL.py", line 797, in extract_info
ie_result = ie.extract(url)
File "c:\users\r\anaconda3\lib\site-packages\youtube_dl\extractor\common.py", line 532, in extract
ie_result = self._real_extract(url)
File "c:\users\r\anaconda3\lib\site-packages\youtube_dl\extractor\youtube.py", line 1909, in _real_extract
raise ExtractorError(
youtube_dl.utils.ExtractorError: MjTmGAJCviA: YouTube said: Unable to extract video data

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\users\r\anaconda3\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\r\anaconda3\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\r\anaconda3\Scripts\edx-dl.exe_main
.py", line 7, in
File "c:\users\r\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 1236, in main
download(args, selections, filtered_units, headers)
File "c:\users\r\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 1004, in download
download_unit(unit, args, target_dir, filename_prefix, headers)
File "c:\users\r\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 939, in download_unit
download_video(unit.videos[0], args, target_dir, filename_prefix, headers)
File "c:\users\r\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 922, in download_video
skip_or_download(youtube_downloads, FileType.Video, headers, args)
File "c:\users\r\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 911, in skip_or_download
f(url, filename, headers, args)
File "c:\users\r\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 829, in download_url
download_youtube_url(url, filename, headers, args)
File "c:\users\r\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 882, in download_youtube_url
ydl.download([url])
File "c:\users\r\anaconda3\lib\site-packages\youtube_dl\YoutubeDL.py", line 2018, in download
res = self.extract_info(
File "c:\users\r\anaconda3\lib\site-packages\youtube_dl\YoutubeDL.py", line 820, in extract_info
self.report_error(compat_str(e), e.format_traceback())
File "c:\users\r\anaconda3\lib\site-packages\youtube_dl\YoutubeDL.py", line 625, in report_error
self.trouble(error_message, tb)
File "c:\users\r\anaconda3\lib\site-packages\youtube_dl\YoutubeDL.py", line 595, in trouble
raise DownloadError(message, exc_info)
youtube_dl.utils.DownloadError: ERROR: MjTmGAJCviA: YouTube said: Unable to extract video data

@vobisie
Copy link

vobisie commented Oct 23, 2020

Does anyone have any potential solutions to resolve this issue?

(base) C:\Users*\Desktop\edx-dl-master>python edx-dl.py -u @gmail.com https://courses.edx.org/courses/course-v1:MITx+14.740x+3T2020/course/
edx_dl version 0.1.13
Password:
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
File "edx-dl.py", line 8, in
edx_dl.main()
File "C:\Users*
\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 1183, in main
all_selections = {selected_course:
File "C:\Users*
\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 1184, in
get_available_sections(selected_course.url.replace('info', 'course'),
File "C:\Users*
\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 285, in get_available_sections
sections = page_extractor.extract_sections_from_html(page, BASE_URL)
File "C:\Users*
\Desktop\edx-dl-master\edx_dl\parsing.py", line 457, in extract_sections_from_html
sections = [Section(position=i,
File "C:\Users*
\Desktop\edx-dl-master\edx_dl\parsing.py", line 459, in
url=_make_url(section_soup),
File "C:\Users*
*\Desktop\edx-dl-master\edx_dl\parsing.py", line 430, in _make_url
return section_soup.a['href']
TypeError: 'NoneType' object is not subscriptable

@RJFeddeler
Copy link

@weirdsourcer you can list multiple course urls or you can list no course urls and use the -a or --all flag to download all available courses.

@diamneth use the --ignore-errors flag, my latest code will attempt to download it again in a different format and if that fails it will at least continue download the rest of the videos. I'm guessing that error is a problem with youtube-dl, I've always gotten that error randomly.

@sorin71
Copy link

sorin71 commented Oct 28, 2020

@vobisie I had the same problem. I fixed it in parsing.py by changing the line 431 from:
except AttributeError:
to
except:

@vobisie
Copy link

vobisie commented Oct 30, 2020

Thank you @sorin71 . Do you have any idea how to fix the issue of there being no sound with the videos downloaded? Thank you

@vobisie
Copy link

vobisie commented Oct 31, 2020

Also, does the take down of youtube-dl impact edx-dl?
Can this work with youtube-dlc? If so how?

Thank you

@sorin71
Copy link

sorin71 commented Oct 31, 2020

The take down of youtube-dl will have an impact on edx-dl, but probably on longer term when youtube will make format changes. youtube-dlc might end up in being taken down as well as it seems to be a fork of youtube-dl.

The problem with no sound for the downloaded videos is a false one. The video (mp4) and the audio (m4a) are in separate files, and you have to combine them in a single file using a tool like ffmpeg.

@jmfontana
Copy link

Same Error 403 problem:

File "/Users/blahuser/.pyenv/versions/3.8.3/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

@MissGorgeousTech
Copy link

it is working great
what I did was pip uninstall edx-dl (the original)
upgraded youtube-dl
I also have the python version 3.8
then download your zip code...unzipped
from cmd changed the directory to the unzipped folder then run python edx-dl.py -u [email protected] --list-courses
[if the error ModuleNotFoundError: No module named 'tqdm' ---you do : pip install tqdm and then try again]
then choose the URL from the course

note the something I noted is sometimes some videos are separated from track audio but nevertheless works great

@vobisie
Copy link

vobisie commented Nov 5, 2020

@sorin71 & @RJFeddeler do you have any idea why I get this output for python edx-dl -u *** -a -i

Traceback (most recent call last):
File "edx-dl.py", line 8, in
edx_dl.main()
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 1213, in main
all_units = extractor(all_urls, headers, file_formats)
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 590, in extract_all_units_in_parallel
units = pool.map(mapfunc, urls)
File "C:\Users\iobis\anaconda3\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\iobis\anaconda3\lib\multiprocessing\pool.py", line 771, in get
raise self._value
File "C:\Users\iobis\anaconda3\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\iobis\anaconda3\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\edx_dl.py", line 559, in extract_units
unit_page = get_page_contents(unit_url, headers)
File "C:\Users\iobis\Desktop\edx-dl-master\edx_dl\utils.py", line 58, in get_page_contents
result = urlopen(Request(url, None, headers))
File "C:\Users\iobis\anaconda3\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\iobis\anaconda3\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\iobis\anaconda3\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "C:\Users\iobis\anaconda3\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Users\iobis\anaconda3\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "C:\Users\iobis\anaconda3\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error

@jmfontana
Copy link

it is working great
what I did was pip uninstall edx-dl (the original)
upgraded youtube-dl
I also have the python version 3.8
then download your zip code...unzipped
from cmd changed the directory to the unzipped folder then run python edx-dl.py -u [email protected] --list-courses
[if the error ModuleNotFoundError: No module named 'tqdm' ---you do : pip install tqdm and then try again]
then choose the URL from the course

note the something I noted is sometimes some videos are separated from track audio but nevertheless works great

This is it! Yes, this worked for me. I had tried everything else but until I did 'pip uninstall edx-dl' nothing worked. Thanks!

JM

@MagTun
Copy link

MagTun commented Nov 8, 2020

@jmfontana and @MissGorgeousTech, what do you mean by "zip code" in:

then download your zip code...unzipped

Is it possible to have the link? Thanks !

@MissGorgeousTech
Copy link

@jmfontana and @MissGorgeousTech, what do you mean by "zip code" in:

then download your zip code...unzipped

Is it possible to have the link? Thanks!

Hi. I refer to download the zipped code...search for a green button that says Code, click on it and you will see Download Zip...and click it...and you go from there. If you still have any difficulties feel free to tell me. I will try with screenshots.

@MagTun
Copy link

MagTun commented Nov 8, 2020

Thanks for your help @MissGorgeousTech ! I found the green button on the home page but I am still getting empty folders.

4 days ago, I was able to get some videos by following the @RJFeddeler code but at some point in the downloading I got an error RuntimeError: cannot join current thread. When I tried again, the script stays on "Processing" for hours. The first time I got 13 videos, the second time I tried again from scratch, and I got 21 videos (I guess there are over a 100 video in my course: I just got the review videos, the download didn't even reach week 1 of a 5 weeks course).

I am on python 3.6 (can't update yet to 3.8)

@weirdsourcer
Copy link

weirdsourcer commented Nov 9, 2020

Could youtube-dl takedown be the culprit for the following error? is there a way to resolve it in case it is? I'm on python 3.8

Traceback (most recent call last):
  File "c:\users\user\anaconda3\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\user\anaconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\USER\anaconda3\Scripts\edx-dl.exe\__main__.py", line 7, in <module>
  File "c:\users\user\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 1213, in main
    all_units = extractor(all_urls, headers, file_formats)
  File "c:\users\user\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 590, in extract_all_units_in_parallel
    units = pool.map(mapfunc, urls)
  File "c:\users\user\anaconda3\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "c:\users\user\anaconda3\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
  File "c:\users\user\anaconda3\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "c:\users\user\anaconda3\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "c:\users\user\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 553, in extract_units
    page = get_page_contents(url, headers)
  File "c:\users\user\anaconda3\lib\site-packages\edx_dl\utils.py", line 58, in get_page_contents
    result = urlopen(Request(url, None, headers))
  File "c:\users\user\anaconda3\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "c:\users\user\anaconda3\lib\urllib\request.py", line 525, in open
    response = self._open(req, data)
  File "c:\users\user\anaconda3\lib\urllib\request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "c:\users\user\anaconda3\lib\urllib\request.py", line 502, in _call_chain
    result = func(*args)
  File "c:\users\user\anaconda3\lib\urllib\request.py", line 1393, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "c:\users\user\anaconda3\lib\urllib\request.py", line 1354, in do_open
    r = h.getresponse()
  File "c:\users\user\anaconda3\lib\http\client.py", line 1332, in getresponse
    response.begin()
  File "c:\users\user\anaconda3\lib\http\client.py", line 303, in begin
    version, status, reason = self._read_status()
  File "c:\users\user\anaconda3\lib\http\client.py", line 272, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

@MagTun
Copy link

MagTun commented Nov 22, 2020

After having seen that youtube-dl was updated, I tried again yesterday, and I was able to get all the videos from my courses.
First I updated youtube-dl with: python -m pip install --upgrade youtube-dl
The I made sure that edx-dl was also up to date: python -m pip install --upgrade edx-dl
Then I replaced the files edx-dl.py and parsing.py according to @RJFeddeler comment

Thanks for your help!

@Oscarhg42
Copy link

@RJFeddeler you rock!! Thanks for sharing your repository!

@MATRIX30
Copy link

🚨Please review the Troubleshooting section
before reporting any issue. Don't forget also to check the current issues to
avoid duplicates.

Subject of the issue

after modifying my edx.py and parsing.py as prescribed by @RJFeddeler, I still get this error can someone figure out whats wrong?

Your environment

  • Operating System (name/version): windows 10
  • Python version: Python 3.9.0
  • youtube-dl version: 2020.11.26
  • edx-dl version: 0.1.13

Steps to reproduce

edx-dl -u email -p password --ignore-errors --cache https://courses.edx.org/courses/course-v1:USMx+ENCE607.1x+3T2019/course/

Expected behaviour

download should have started normally

Actual behaviour

I get this Error message

Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Downloading Applied Scrum for Agile Project Management [course-v1:USMx+ENCE607.1x+3T2019/co]
Section 1: Welcome!
Welcome to Applied Scrum
Getting Started with Goals!
Section 2: Week 1: Why Agile?
1.0 Introduction to Week 1
1.1 Agile Basics
1.2 Proof Agile Works
1.3 Evolution of Agile
1.4 Netflix Case Study
1.5 18F Case Study
1.6 Week 1 Quiz
1.7 Week 1 Takeaways & Feedback
Verify Your Knowledge and Skills!
Section 3: Week 2: Who Uses Agile?
2.0 Introduction to Week 2
2.1 Simple PM Methods
2.2 Approaching the Triple Cost Constraint
2.3 Comparing Methods Across Industries
2.4 Comparing Methods of Customer Management
2.5 Comparing Methods of Engineering Management
2.6 Week 2 Quiz
2.7 Week 2 Takeaways & Feedback
Verify Your Knowledge and Skills!
Section 4: Week 3: How to Scrum And Be Agile?
3.0 Introduction to How to Scrum and Be Agile?
3.1 Scrum Team Formation
3.2 Three-Part User Story
3.3 Sprint Planning
3.4 Sprint Development
3.5 Sprint Retro & Review
3.6 Week 3 Quiz
3.7 Week 3 Takeaways & Feedback
Verify Your Knowledge and Skills!
Section 5: Week 4: What Scrum Framework Fits Best?
4.0 Introduction to What Scrum Framework Fits Best?
4.1 Scrum in the World of Agile
4.2 Exploring the Scaled Agile Framework (SAFe)
4.3 Exploring Disciplined Agile Delivery (DAD)
4.4 Exploring Large Scale Scrum (LeSS)
4.5 Pitfalls and Benefits of Agile at Scale
4.6 Week 4 Quiz
4.7 Week 4 Takeaways & Feedback
Verify Your Knowledge and Skills!
Section 6: Course Final for Verified Students
Course Final for Verified Students
Section 7: Congratulations! Now Keep Going!
Thank You! Now Will You Continue?
Feedback Quiz
Processing units...

Removed 0 duplicated urls from 76 in total

edx_dl version 0.1.13
loading 3212 urls from cache [edx-dl.cache]
Traceback (most recent call last):
File "c:\users\cyanide systems\appdata\local\programs\python\python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\cyanide systems\appdata\local\programs\python\python39\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\Cyanide Systems\AppData\Local\Programs\Python\Python39\Scripts\edx-dl.exe_main
.py", line 7, in
File "c:\users\cyanide systems\appdata\local\programs\python\python39\lib\site-packages\edx_dl\edx_dl.py", line 1233, in main
download(args, selections, filtered_units, headers)
File "c:\users\cyanide systems\appdata\local\programs\python\python39\lib\site-packages\edx_dl\edx_dl.py", line 989, in download
coursename = directory_name(selected_course.name)
File "c:\users\cyanide systems\appdata\local\programs\python\python39\lib\site-packages\edx_dl\utils.py", line 49, in directory_name
result = clean_filename(initial_name)
File "c:\users\cyanide systems\appdata\local\programs\python\python39\lib\site-packages\edx_dl\utils.py", line 123, in clean_filename
s = h.unescape(s)
AttributeError: 'HTMLParser' object has no attribute 'unescape'

@diamneth
Copy link

diamneth commented Dec 5, 2020

Anyone ever got this error and knows anything about it?

Course : 0%| | 0/1Got SSL/Connection error: HTTPConnectionPool(host='www.math.umt.edu', port=80): Max retries exceeded with url: /bardsley/courses/495/Projects/HIV/PerelsonEtAl1996.pdf (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001DD37B2C910>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))
SSL/Connection error ignored: HTTPConnectionPool(host='www.math.umt.edu', port=80): Max retries exceeded with url: /bardsley/courses/495/Projects/HIV/PerelsonEtAl1996.pdf (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001DD37B2C910>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))

@jackforfaltu
Copy link

jackforfaltu commented Dec 11, 2020

@MATRIX30

Look here... #778
Basically downgrade Python 3.9.x to 3.8 or lower.

@abeckman
Copy link

First a thank you to @RJFeddeler for the great work!

When I started downloading courses using that version I was getting the split audio and video. I'm on Centos 8 and didn't have ffmpeg installed. Once I got that installed (something of a battle in itself), rerunning edx-dl not only merged the files, but got rid of the separated ones already there.

@stacyH8
Copy link

stacyH8 commented Dec 28, 2020

Meanwhile, since I couldn't use this tool, I created my own chrome extension for that. You can find it here.
https://github.com/carlosvega/edx-video-extension

Thank you for you trying. But it can't work.

@JunaidShafi
Copy link

Thank You for your work @RJFeddeler but while download a course mine gets stuck on 24% before which this always works flawlessly and after reaching it it just stucks there

@owaiss007
Copy link

🚨Please review the Troubleshooting section
before reporting any issue. Don't forget also to check the current issues to
avoid duplicates.

Subject of the issue
It is giving me an error that no downloadable video found. I think edx may have changed their structure. Could anyone confirm whether it is working with them?

Your environment
Operating System (name/version): Windows 10
Python version: 3.8
youtube-dl version:
edx-dl version: 0.1.13
Steps to reproduce
Tell us how to reproduce this issue. Please provide us the course URL, and the
specific subsection or unit if possible.
https://courses.edx.org/courses/course-v1:PurdueX+CE597.1+1T2021/course/

Expected behaviour
It would say No downloadable video found.

Actual behaviour
image

@RJFeddeler Can you please help me?

@fedyd
Copy link

fedyd commented Mar 6, 2021

I get the "No downloadable video found" message too.
Previously I got a "HTTP Error 403: Forbidden" (issue #662). There @mobiiin suggested to edit the edx_dl.py script line n. 425 in this way:
'User-Agent': 'Chrome/88.0.4324.190'
input your chrome version
After that change I began to get the "No downloadable video found".

@fedyd
Copy link

fedyd commented Mar 6, 2021

I have also noted that if I change in edx_dl.py the line n.63 from:

'url': 'https://courses.edx.org'
to
'url': 'https://learning.edx.org'

I get again the "HTTP Error 403: Forbidden", while re-changing to the original value I get the "No downloadable video found" message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests