- cmd: python cambridge.urls.py
- goal: used to capture all urls in https://dictionary.cambridge.org/browse/english-chinese-traditional/
- implementation:
- got guideurls by appending https://dictionary.cambridge.org/browse/english-chinese-traditional/ from a to z
- got extendurls by capturing urls from each guideurls
- cmd: python cambridge.dictionary.py
- goal: crawled all pages in cambridge dictionary
- implementation:
- accese extendurls and preprocessed html