aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--CHANGELOG.md39
-rw-r--r--PKG-INFO698
-rw-r--r--README.rst7
-rw-r--r--data/completion/_gallery-dl11
-rw-r--r--data/completion/gallery-dl4
-rw-r--r--data/man/gallery-dl.121
-rw-r--r--data/man/gallery-dl.conf.575
-rw-r--r--docs/gallery-dl.conf6
-rw-r--r--gallery_dl.egg-info/PKG-INFO698
-rw-r--r--gallery_dl.egg-info/SOURCES.txt7
-rw-r--r--gallery_dl/__init__.py36
-rw-r--r--gallery_dl/downloader/ytdl.py2
-rw-r--r--gallery_dl/extractor/2chan.py2
-rw-r--r--gallery_dl/extractor/500px.py169
-rw-r--r--gallery_dl/extractor/__init__.py3
-rw-r--r--gallery_dl/extractor/artstation.py2
-rw-r--r--gallery_dl/extractor/blogger.py2
-rw-r--r--gallery_dl/extractor/common.py11
-rw-r--r--gallery_dl/extractor/cyberdrop.py23
-rw-r--r--gallery_dl/extractor/deviantart.py60
-rw-r--r--gallery_dl/extractor/exhentai.py16
-rw-r--r--gallery_dl/extractor/fanbox.py23
-rw-r--r--gallery_dl/extractor/fantia.py2
-rw-r--r--gallery_dl/extractor/flickr.py2
-rw-r--r--gallery_dl/extractor/furaffinity.py7
-rw-r--r--gallery_dl/extractor/generic.py208
-rw-r--r--gallery_dl/extractor/hitomi.py37
-rw-r--r--gallery_dl/extractor/imgbb.py2
-rw-r--r--gallery_dl/extractor/inkbunny.py22
-rw-r--r--gallery_dl/extractor/instagram.py31
-rw-r--r--gallery_dl/extractor/keenspot.py2
-rw-r--r--gallery_dl/extractor/kemonoparty.py43
-rw-r--r--gallery_dl/extractor/lolisafe.py79
-rw-r--r--gallery_dl/extractor/myportfolio.py4
-rw-r--r--gallery_dl/extractor/newgrounds.py4
-rw-r--r--gallery_dl/extractor/patreon.py2
-rw-r--r--gallery_dl/extractor/philomena.py2
-rw-r--r--gallery_dl/extractor/photobucket.py10
-rw-r--r--gallery_dl/extractor/pixiv.py15
-rw-r--r--gallery_dl/extractor/pixnet.py2
-rw-r--r--gallery_dl/extractor/pornhub.py2
-rw-r--r--gallery_dl/extractor/rule34us.py130
-rw-r--r--gallery_dl/extractor/sexcom.py9
-rw-r--r--gallery_dl/extractor/slickpic.py2
-rw-r--r--gallery_dl/extractor/smugmug.py2
-rw-r--r--gallery_dl/extractor/tumblr.py2
-rw-r--r--gallery_dl/extractor/tumblrgallery.py115
-rw-r--r--gallery_dl/extractor/twitter.py2
-rw-r--r--gallery_dl/extractor/wordpress.py41
-rw-r--r--gallery_dl/extractor/xhamster.py2
-rw-r--r--gallery_dl/extractor/ytdl.py10
-rw-r--r--gallery_dl/option.py52
-rw-r--r--gallery_dl/output.py30
-rw-r--r--gallery_dl/path.py7
-rw-r--r--gallery_dl/util.py20
-rw-r--r--gallery_dl/version.py2
-rw-r--r--gallery_dl/ytdl.py41
-rw-r--r--test/test_results.py31
-rw-r--r--test/test_util.py25
-rw-r--r--test/test_ytdl.py545
60 files changed, 2478 insertions, 981 deletions
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 16e843f..1dc4a21 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,44 @@
# Changelog
+## 1.20.0 - 2021-12-29
+### Additions
+- [500px] add `favorite` extractor ([#1927](https://github.com/mikf/gallery-dl/issues/1927))
+- [exhentai] add `source` option
+- [fanbox] support pixiv redirects ([#2122](https://github.com/mikf/gallery-dl/issues/2122))
+- [inkbunny] add `search` extractor ([#2094](https://github.com/mikf/gallery-dl/issues/2094))
+- [kemonoparty] support coomer.party ([#2100](https://github.com/mikf/gallery-dl/issues/2100))
+- [lolisafe] add generic album extractor for lolisafe/chibisafe instances ([#2038](https://github.com/mikf/gallery-dl/issues/2038), [#2105](https://github.com/mikf/gallery-dl/issues/2105))
+- [rule34us] add `tag` and `post` extractors ([#1527](https://github.com/mikf/gallery-dl/issues/1527))
+- add a generic extractor ([#735](https://github.com/mikf/gallery-dl/issues/735), [#683](https://github.com/mikf/gallery-dl/issues/683))
+- add `-d/--directory` and `-f/--filename` command-line options
+- add `--sleep-request` and `--sleep-extractor` command-line options
+- allow specifying `sleep-*` options as string
+### Changes
+- [cyberdrop] include file ID in default filenames
+- [hitomi] disable `metadata` by default
+- [kemonoparty] use `service` as subcategory ([#2147](https://github.com/mikf/gallery-dl/issues/2147))
+- [kemonoparty] change default `files` order to `attachments,file,inline` ([#1991](https://github.com/mikf/gallery-dl/issues/1991))
+- [output] write download progress indicator to stderr
+- [ytdl] prefer yt-dlp over youtube-dl ([#1850](https://github.com/mikf/gallery-dl/issues/1850), [#2028](https://github.com/mikf/gallery-dl/issues/2028))
+- rename `--write-infojson` to `--write-info-json`
+### Fixes
+- [500px] create directories per photo
+- [artstation] create directories per asset ([#2136](https://github.com/mikf/gallery-dl/issues/2136))
+- [deviantart] use `/browse/newest` for most-recent searches ([#2096](https://github.com/mikf/gallery-dl/issues/2096))
+- [hitomi] fix image URLs
+- [instagram] fix error when PostPage data is not in GraphQL format ([#2037](https://github.com/mikf/gallery-dl/issues/2037))
+- [instagran] match post URLs with usernames ([#2085](https://github.com/mikf/gallery-dl/issues/2085))
+- [instagram] allow downloading specific stories ([#2088](https://github.com/mikf/gallery-dl/issues/2088))
+- [furaffinity] warn when no session cookies were found
+- [pixiv] respect date ranges in search URLs ([#2133](https://github.com/mikf/gallery-dl/issues/2133))
+- [sexcom] fix and improve embed extraction ([#2145](https://github.com/mikf/gallery-dl/issues/2145))
+- [tumblrgallery] fix extraction ([#2112](https://github.com/mikf/gallery-dl/issues/2112))
+- [tumblrgallery] improve `id` extraction ([#2115](https://github.com/mikf/gallery-dl/issues/2115))
+- [tumblrgallery] improve search pagination ([#2132](https://github.com/mikf/gallery-dl/issues/2132))
+- [twitter] include `4096x4096` as a default image fallback ([#1881](https://github.com/mikf/gallery-dl/issues/1881), [#2107](https://github.com/mikf/gallery-dl/issues/2107))
+- [ytdl] update argument parsing to latest yt-dlp changes ([#2124](https://github.com/mikf/gallery-dl/issues/2124))
+- handle UNC paths ([#2113](https://github.com/mikf/gallery-dl/issues/2113))
+
## 1.19.3 - 2021-11-27
### Additions
- [dynastyscans] add `manga` extractor ([#2035](https://github.com/mikf/gallery-dl/issues/2035))
diff --git a/PKG-INFO b/PKG-INFO
index b758e4c..08b652d 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,6 +1,6 @@
Metadata-Version: 2.1
Name: gallery_dl
-Version: 1.19.3
+Version: 1.20.0
Summary: Command-line program to download image galleries and collections from several image hosting sites
Home-page: https://github.com/mikf/gallery-dl
Author: Mike Fährmann
@@ -9,352 +9,6 @@ Maintainer: Mike Fährmann
Maintainer-email: mike_faehrmann@web.de
License: GPLv2
Download-URL: https://github.com/mikf/gallery-dl/releases/latest
-Description: ==========
- gallery-dl
- ==========
-
- *gallery-dl* is a command-line program to download image galleries and
- collections from several image hosting sites (see `Supported Sites`_).
- It is a cross-platform tool with many configuration options
- and powerful `filenaming capabilities <Formatting_>`_.
-
-
- |pypi| |build| |gitter|
-
- .. contents::
-
-
- Dependencies
- ============
-
- - Python_ 3.4+
- - Requests_
-
- Optional
- --------
-
- - FFmpeg_: Pixiv Ugoira to WebM conversion
- - youtube-dl_: Video downloads
-
-
- Installation
- ============
-
-
- Pip
- ---
-
- The stable releases of *gallery-dl* are distributed on PyPI_ and can be
- easily installed or upgraded using pip_:
-
- .. code:: bash
-
- $ python3 -m pip install -U gallery-dl
-
- Installing the latest dev version directly from GitHub can be done with
- pip_ as well:
-
- .. code:: bash
-
- $ python3 -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz
-
- Note: Windows users should use :code:`py -3` instead of :code:`python3`.
-
- It is advised to use the latest version of pip_,
- including the essential packages :code:`setuptools` and :code:`wheel`.
- To ensure these packages are up-to-date, run
-
- .. code:: bash
-
- $ python3 -m pip install --upgrade pip setuptools wheel
-
-
- Standalone Executable
- ---------------------
-
- Prebuilt executable files with a Python interpreter and
- required Python packages included are available for
-
- - `Windows <https://github.com/mikf/gallery-dl/releases/download/v1.19.3/gallery-dl.exe>`__
- - `Linux <https://github.com/mikf/gallery-dl/releases/download/v1.19.3/gallery-dl.bin>`__
-
- | Executables build from the latest commit can be found at
- | https://github.com/mikf/gallery-dl/actions/workflows/executables.yml
-
-
- Snap
- ----
-
- Linux users that are using a distro that is supported by Snapd_ can install *gallery-dl* from the Snap Store:
-
- .. code:: bash
-
- $ snap install gallery-dl
-
-
- Chocolatey
- ----------
-
- Windows users that have Chocolatey_ installed can install *gallery-dl* from the Chocolatey Community Packages repository:
-
- .. code:: powershell
-
- $ choco install gallery-dl
-
-
- Scoop
- -----
-
- *gallery-dl* is also available in the Scoop_ "main" bucket for Windows users:
-
- .. code:: powershell
-
- $ scoop install gallery-dl
-
-
- Usage
- =====
-
- To use *gallery-dl* simply call it with the URLs you wish to download images
- from:
-
- .. code:: bash
-
- $ gallery-dl [OPTION]... URL...
-
- See also :code:`gallery-dl --help`.
-
-
- Examples
- --------
-
- Download images; in this case from danbooru via tag search for 'bonocho':
-
- .. code:: bash
-
- $ gallery-dl "https://danbooru.donmai.us/posts?tags=bonocho"
-
-
- Get the direct URL of an image from a site that requires authentication:
-
- .. code:: bash
-
- $ gallery-dl -g -u "<username>" -p "<password>" "https://seiga.nicovideo.jp/seiga/im3211703"
-
-
- Filter manga chapters by language and chapter number:
-
- .. code:: bash
-
- $ gallery-dl --chapter-filter "lang == 'fr' and 10 <= chapter < 20" "https://mangadex.org/title/2354/"
-
-
- | Search a remote resource for URLs and download images from them:
- | (URLs for which no extractor can be found will be silently ignored)
-
- .. code:: bash
-
- $ gallery-dl "r:https://pastebin.com/raw/FLwrCYsT"
-
-
- If a site's address is nonstandard for its extractor, you can prefix the URL with the
- extractor's name to force the use of a specific extractor:
-
- .. code:: bash
-
- $ gallery-dl "tumblr:https://sometumblrblog.example"
-
-
- Configuration
- =============
-
- Configuration files for *gallery-dl* use a JSON-based file format.
-
- | For a (more or less) complete example with options set to their default values,
- see gallery-dl.conf_.
- | For a configuration file example with more involved settings and options,
- see gallery-dl-example.conf_.
- | A list of all available configuration options and their
- descriptions can be found in configuration.rst_.
- |
-
- *gallery-dl* searches for configuration files in the following places:
-
- Windows:
- * ``%APPDATA%\gallery-dl\config.json``
- * ``%USERPROFILE%\gallery-dl\config.json``
- * ``%USERPROFILE%\gallery-dl.conf``
-
- (``%USERPROFILE%`` usually refers to the user's home directory,
- i.e. ``C:\Users\<username>\``)
-
- Linux, macOS, etc.:
- * ``/etc/gallery-dl.conf``
- * ``${XDG_CONFIG_HOME}/gallery-dl/config.json``
- * ``${HOME}/.config/gallery-dl/config.json``
- * ``${HOME}/.gallery-dl.conf``
-
- Values in later configuration files will override previous ones.
-
- Command line options will override all related settings in the configuration file(s),
- e.g. using ``--write-metadata`` will enable writing metadata using the default values
- for all ``postprocessors.metadata.*`` settings, overriding any specific settings in
- configuration files.
-
-
- Authentication
- ==============
-
- Username & Password
- -------------------
-
- Some extractors require you to provide valid login credentials in the form of
- a username & password pair. This is necessary for
- ``nijie`` and ``seiga``
- and optional for
- ``aryion``,
- ``danbooru``,
- ``e621``,
- ``exhentai``,
- ``idolcomplex``,
- ``imgbb``,
- ``inkbunny``,
- ``instagram``,
- ``mangadex``,
- ``mangoxo``,
- ``pillowfort``,
- ``sankaku``,
- ``subscribestar``,
- ``tapas``,
- ``tsumino``,
- and ``twitter``.
-
- You can set the necessary information in your configuration file
- (cf. gallery-dl.conf_)
-
- .. code:: json
-
- {
- "extractor": {
- "seiga": {
- "username": "<username>",
- "password": "<password>"
- }
- }
- }
-
- or you can provide them directly via the
- :code:`-u/--username` and :code:`-p/--password` or via the
- :code:`-o/--option` command-line options
-
- .. code:: bash
-
- $ gallery-dl -u <username> -p <password> URL
- $ gallery-dl -o username=<username> -o password=<password> URL
-
-
- Cookies
- -------
-
- For sites where login with username & password is not possible due to
- CAPTCHA or similar, or has not been implemented yet, you can use the
- cookies from a browser login session and input them into *gallery-dl*.
-
- This can be done via the
- `cookies <https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorcookies>`__
- option in your configuration file by specifying
-
- - | the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon
- | (e.g. `Get cookies.txt <https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid/>`__ for Chrome,
- `Export Cookies <https://addons.mozilla.org/en-US/firefox/addon/export-cookies-txt/>`__ for Firefox)
-
- - | a list of name-value pairs gathered from your browser's web developer tools
- | (in `Chrome <https://developers.google.com/web/tools/chrome-devtools/storage/cookies>`__,
- in `Firefox <https://developer.mozilla.org/en-US/docs/Tools/Storage_Inspector>`__)
-
- For example:
-
- .. code:: json
-
- {
- "extractor": {
- "instagram": {
- "cookies": "$HOME/path/to/cookies.txt"
- },
- "patreon": {
- "cookies": {
- "session_id": "K1T57EKu19TR49C51CDjOJoXNQLF7VbdVOiBrC9ye0a"
- }
- }
- }
- }
-
- You can also specify a cookies.txt file with
- the :code:`--cookies` command-line option:
-
- .. code:: bash
-
- $ gallery-dl --cookies "$HOME/path/to/cookies.txt" URL
-
-
- OAuth
- -----
-
- *gallery-dl* supports user authentication via OAuth_ for
- ``deviantart``, ``flickr``, ``reddit``, ``smugmug``, ``tumblr``,
- and ``mastodon`` instances.
- This is mostly optional, but grants *gallery-dl* the ability
- to issue requests on your account's behalf and enables it to access resources
- which would otherwise be unavailable to a public user.
-
- To link your account to *gallery-dl*, start by invoking it with
- ``oauth:<sitename>`` as an argument. For example:
-
- .. code:: bash
-
- $ gallery-dl oauth:flickr
-
- You will be sent to the site's authorization page and asked to grant read
- access to *gallery-dl*. Authorize it and you will be shown one or more
- "tokens", which should be added to your configuration file.
-
- To authenticate with a ``mastodon`` instance, run *gallery-dl* with
- ``oauth:mastodon:<instance>`` as argument. For example:
-
- .. code:: bash
-
- $ gallery-dl oauth:mastodon:pawoo.net
- $ gallery-dl oauth:mastodon:https://mastodon.social/
-
-
-
- .. _gallery-dl.conf: https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl.conf
- .. _gallery-dl-example.conf: https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl-example.conf
- .. _configuration.rst: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst
- .. _Supported Sites: https://github.com/mikf/gallery-dl/blob/master/docs/supportedsites.md
- .. _Formatting: https://github.com/mikf/gallery-dl/blob/master/docs/formatting.md
-
- .. _Python: https://www.python.org/downloads/
- .. _PyPI: https://pypi.org/
- .. _pip: https://pip.pypa.io/en/stable/
- .. _Requests: https://requests.readthedocs.io/en/master/
- .. _FFmpeg: https://www.ffmpeg.org/
- .. _youtube-dl: https://ytdl-org.github.io/youtube-dl/
- .. _pyOpenSSL: https://pyopenssl.org/
- .. _Snapd: https://docs.snapcraft.io/installing-snapd
- .. _OAuth: https://en.wikipedia.org/wiki/OAuth
- .. _Chocolatey: https://chocolatey.org/install
- .. _Scoop: https://scoop.sh
-
- .. |pypi| image:: https://img.shields.io/pypi/v/gallery-dl.svg
- :target: https://pypi.org/project/gallery-dl/
-
- .. |build| image:: https://github.com/mikf/gallery-dl/workflows/tests/badge.svg
- :target: https://github.com/mikf/gallery-dl/actions
-
- .. |gitter| image:: https://badges.gitter.im/gallery-dl/main.svg
- :target: https://gitter.im/gallery-dl/main
-
Keywords: image gallery downloader crawler scraper
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
@@ -376,3 +30,353 @@ Classifier: Topic :: Multimedia :: Graphics
Classifier: Topic :: Utilities
Requires-Python: >=3.4
Provides-Extra: video
+License-File: LICENSE
+
+==========
+gallery-dl
+==========
+
+*gallery-dl* is a command-line program to download image galleries and
+collections from several image hosting sites (see `Supported Sites`_).
+It is a cross-platform tool with many configuration options
+and powerful `filenaming capabilities <Formatting_>`_.
+
+
+|pypi| |build| |gitter|
+
+.. contents::
+
+
+Dependencies
+============
+
+- Python_ 3.4+
+- Requests_
+
+Optional
+--------
+
+- FFmpeg_: Pixiv Ugoira to WebM conversion
+- yt-dlp_ or youtube-dl_: Video downloads
+
+
+Installation
+============
+
+
+Pip
+---
+
+The stable releases of *gallery-dl* are distributed on PyPI_ and can be
+easily installed or upgraded using pip_:
+
+.. code:: bash
+
+ $ python3 -m pip install -U gallery-dl
+
+Installing the latest dev version directly from GitHub can be done with
+pip_ as well:
+
+.. code:: bash
+
+ $ python3 -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz
+
+Note: Windows users should use :code:`py -3` instead of :code:`python3`.
+
+It is advised to use the latest version of pip_,
+including the essential packages :code:`setuptools` and :code:`wheel`.
+To ensure these packages are up-to-date, run
+
+.. code:: bash
+
+ $ python3 -m pip install --upgrade pip setuptools wheel
+
+
+Standalone Executable
+---------------------
+
+Prebuilt executable files with a Python interpreter and
+required Python packages included are available for
+
+- `Windows <https://github.com/mikf/gallery-dl/releases/download/v1.20.0/gallery-dl.exe>`__
+- `Linux <https://github.com/mikf/gallery-dl/releases/download/v1.20.0/gallery-dl.bin>`__
+
+| Executables build from the latest commit can be found at
+| https://github.com/mikf/gallery-dl/actions/workflows/executables.yml
+
+
+Snap
+----
+
+Linux users that are using a distro that is supported by Snapd_ can install *gallery-dl* from the Snap Store:
+
+.. code:: bash
+
+ $ snap install gallery-dl
+
+
+Chocolatey
+----------
+
+Windows users that have Chocolatey_ installed can install *gallery-dl* from the Chocolatey Community Packages repository:
+
+.. code:: powershell
+
+ $ choco install gallery-dl
+
+
+Scoop
+-----
+
+*gallery-dl* is also available in the Scoop_ "main" bucket for Windows users:
+
+.. code:: powershell
+
+ $ scoop install gallery-dl
+
+
+Usage
+=====
+
+To use *gallery-dl* simply call it with the URLs you wish to download images
+from:
+
+.. code:: bash
+
+ $ gallery-dl [OPTION]... URL...
+
+See also :code:`gallery-dl --help`.
+
+
+Examples
+--------
+
+Download images; in this case from danbooru via tag search for 'bonocho':
+
+.. code:: bash
+
+ $ gallery-dl "https://danbooru.donmai.us/posts?tags=bonocho"
+
+
+Get the direct URL of an image from a site that requires authentication:
+
+.. code:: bash
+
+ $ gallery-dl -g -u "<username>" -p "<password>" "https://seiga.nicovideo.jp/seiga/im3211703"
+
+
+Filter manga chapters by language and chapter number:
+
+.. code:: bash
+
+ $ gallery-dl --chapter-filter "lang == 'fr' and 10 <= chapter < 20" "https://mangadex.org/title/2354/"
+
+
+| Search a remote resource for URLs and download images from them:
+| (URLs for which no extractor can be found will be silently ignored)
+
+.. code:: bash
+
+ $ gallery-dl "r:https://pastebin.com/raw/FLwrCYsT"
+
+
+If a site's address is nonstandard for its extractor, you can prefix the URL with the
+extractor's name to force the use of a specific extractor:
+
+.. code:: bash
+
+ $ gallery-dl "tumblr:https://sometumblrblog.example"
+
+
+Configuration
+=============
+
+Configuration files for *gallery-dl* use a JSON-based file format.
+
+| For a (more or less) complete example with options set to their default values,
+ see gallery-dl.conf_.
+| For a configuration file example with more involved settings and options,
+ see gallery-dl-example.conf_.
+| A list of all available configuration options and their
+ descriptions can be found in configuration.rst_.
+|
+
+*gallery-dl* searches for configuration files in the following places:
+
+Windows:
+ * ``%APPDATA%\gallery-dl\config.json``
+ * ``%USERPROFILE%\gallery-dl\config.json``
+ * ``%USERPROFILE%\gallery-dl.conf``
+
+ (``%USERPROFILE%`` usually refers to the user's home directory,
+ i.e. ``C:\Users\<username>\``)
+
+Linux, macOS, etc.:
+ * ``/etc/gallery-dl.conf``
+ * ``${XDG_CONFIG_HOME}/gallery-dl/config.json``
+ * ``${HOME}/.config/gallery-dl/config.json``
+ * ``${HOME}/.gallery-dl.conf``
+
+Values in later configuration files will override previous ones.
+
+Command line options will override all related settings in the configuration file(s),
+e.g. using ``--write-metadata`` will enable writing metadata using the default values
+for all ``postprocessors.metadata.*`` settings, overriding any specific settings in
+configuration files.
+
+
+Authentication
+==============
+
+Username & Password
+-------------------
+
+Some extractors require you to provide valid login credentials in the form of
+a username & password pair. This is necessary for
+``nijie`` and ``seiga``
+and optional for
+``aryion``,
+``danbooru``,
+``e621``,
+``exhentai``,
+``idolcomplex``,
+``imgbb``,
+``inkbunny``,
+``instagram``,
+``mangadex``,
+``mangoxo``,
+``pillowfort``,
+``sankaku``,
+``subscribestar``,
+``tapas``,
+``tsumino``,
+and ``twitter``.
+
+You can set the necessary information in your configuration file
+(cf. gallery-dl.conf_)
+
+.. code:: json
+
+ {
+ "extractor": {
+ "seiga": {
+ "username": "<username>",
+ "password": "<password>"
+ }
+ }
+ }
+
+or you can provide them directly via the
+:code:`-u/--username` and :code:`-p/--password` or via the
+:code:`-o/--option` command-line options
+
+.. code:: bash
+
+ $ gallery-dl -u <username> -p <password> URL
+ $ gallery-dl -o username=<username> -o password=<password> URL
+
+
+Cookies
+-------
+
+For sites where login with username & password is not possible due to
+CAPTCHA or similar, or has not been implemented yet, you can use the
+cookies from a browser login session and input them into *gallery-dl*.
+
+This can be done via the
+`cookies <https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorcookies>`__
+option in your configuration file by specifying
+
+- | the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon
+ | (e.g. `Get cookies.txt <https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid/>`__ for Chrome,
+ `Export Cookies <https://addons.mozilla.org/en-US/firefox/addon/export-cookies-txt/>`__ for Firefox)
+
+- | a list of name-value pairs gathered from your browser's web developer tools
+ | (in `Chrome <https://developers.google.com/web/tools/chrome-devtools/storage/cookies>`__,
+ in `Firefox <https://developer.mozilla.org/en-US/docs/Tools/Storage_Inspector>`__)
+
+For example:
+
+.. code:: json
+
+ {
+ "extractor": {
+ "instagram": {
+ "cookies": "$HOME/path/to/cookies.txt"
+ },
+ "patreon": {
+ "cookies": {
+ "session_id": "K1T57EKu19TR49C51CDjOJoXNQLF7VbdVOiBrC9ye0a"
+ }
+ }
+ }
+ }
+
+You can also specify a cookies.txt file with
+the :code:`--cookies` command-line option:
+
+.. code:: bash
+
+ $ gallery-dl --cookies "$HOME/path/to/cookies.txt" URL
+
+
+OAuth
+-----
+
+*gallery-dl* supports user authentication via OAuth_ for
+``deviantart``, ``flickr``, ``reddit``, ``smugmug``, ``tumblr``,
+and ``mastodon`` instances.
+This is mostly optional, but grants *gallery-dl* the ability
+to issue requests on your account's behalf and enables it to access resources
+which would otherwise be unavailable to a public user.
+
+To link your account to *gallery-dl*, start by invoking it with
+``oauth:<sitename>`` as an argument. For example:
+
+.. code:: bash
+
+ $ gallery-dl oauth:flickr
+
+You will be sent to the site's authorization page and asked to grant read
+access to *gallery-dl*. Authorize it and you will be shown one or more
+"tokens", which should be added to your configuration file.
+
+To authenticate with a ``mastodon`` instance, run *gallery-dl* with
+``oauth:mastodon:<instance>`` as argument. For example:
+
+.. code:: bash
+
+ $ gallery-dl oauth:mastodon:pawoo.net
+ $ gallery-dl oauth:mastodon:https://mastodon.social/
+
+
+
+.. _gallery-dl.conf: https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl.conf
+.. _gallery-dl-example.conf: https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl-example.conf
+.. _configuration.rst: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst
+.. _Supported Sites: https://github.com/mikf/gallery-dl/blob/master/docs/supportedsites.md
+.. _Formatting: https://github.com/mikf/gallery-dl/blob/master/docs/formatting.md
+
+.. _Python: https://www.python.org/downloads/
+.. _PyPI: https://pypi.org/
+.. _pip: https://pip.pypa.io/en/stable/
+.. _Requests: https://requests.readthedocs.io/en/master/
+.. _FFmpeg: https://www.ffmpeg.org/
+.. _yt-dlp: https://github.com/yt-dlp/yt-dlp
+.. _youtube-dl: https://ytdl-org.github.io/youtube-dl/
+.. _pyOpenSSL: https://pyopenssl.org/
+.. _Snapd: https://docs.snapcraft.io/installing-snapd
+.. _OAuth: https://en.wikipedia.org/wiki/OAuth
+.. _Chocolatey: https://chocolatey.org/install
+.. _Scoop: https://scoop.sh
+
+.. |pypi| image:: https://img.shields.io/pypi/v/gallery-dl.svg
+ :target: https://pypi.org/project/gallery-dl/
+
+.. |build| image:: https://github.com/mikf/gallery-dl/workflows/tests/badge.svg
+ :target: https://github.com/mikf/gallery-dl/actions
+
+.. |gitter| image:: https://badges.gitter.im/gallery-dl/main.svg
+ :target: https://gitter.im/gallery-dl/main
+
+
diff --git a/README.rst b/README.rst
index 72f7c82..c8b7afd 100644
--- a/README.rst
+++ b/README.rst
@@ -23,7 +23,7 @@ Optional
--------
- FFmpeg_: Pixiv Ugoira to WebM conversion
-- youtube-dl_: Video downloads
+- yt-dlp_ or youtube-dl_: Video downloads
Installation
@@ -64,8 +64,8 @@ Standalone Executable
Prebuilt executable files with a Python interpreter and
required Python packages included are available for
-- `Windows <https://github.com/mikf/gallery-dl/releases/download/v1.19.3/gallery-dl.exe>`__
-- `Linux <https://github.com/mikf/gallery-dl/releases/download/v1.19.3/gallery-dl.bin>`__
+- `Windows <https://github.com/mikf/gallery-dl/releases/download/v1.20.0/gallery-dl.exe>`__
+- `Linux <https://github.com/mikf/gallery-dl/releases/download/v1.20.0/gallery-dl.bin>`__
| Executables build from the latest commit can be found at
| https://github.com/mikf/gallery-dl/actions/workflows/executables.yml
@@ -328,6 +328,7 @@ To authenticate with a ``mastodon`` instance, run *gallery-dl* with
.. _pip: https://pip.pypa.io/en/stable/
.. _Requests: https://requests.readthedocs.io/en/master/
.. _FFmpeg: https://www.ffmpeg.org/
+.. _yt-dlp: https://github.com/yt-dlp/yt-dlp
.. _youtube-dl: https://ytdl-org.github.io/youtube-dl/
.. _pyOpenSSL: https://pyopenssl.org/
.. _Snapd: https://docs.snapcraft.io/installing-snapd
diff --git a/data/completion/_gallery-dl b/data/completion/_gallery-dl
index 22a5f25..2ac93f7 100644
--- a/data/completion/_gallery-dl
+++ b/data/completion/_gallery-dl
@@ -7,8 +7,10 @@ local rc=1
_arguments -C -S \
{-h,--help}'[Print this help message and exit]' \
--version'[Print program version and exit]' \
-{-d,--dest}'[Destination directory]':'<dest>':_files \
+--dest'[==SUPPRESS==]':'<dest>':_files \
{-i,--input-file}'[Download URLs found in FILE ("-" for stdin). More than one --input-file can be specified]':'<file>':_files \
+{-f,--filename}'[Filename format string for downloaded files ("/O" for "original" filenames)]':'<format>' \
+{-d,--directory}'[Target location for file downloads]':'<path>' \
--cookies'[File to load additional cookies from]':'<file>':_files \
--proxy'[Use the specified proxy]':'<url>' \
--clear-cache'[Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything)]':'<module>' \
@@ -28,7 +30,9 @@ _arguments -C -S \
{-r,--limit-rate}'[Maximum download rate (e.g. 500k or 2.5M)]':'<rate>' \
{-R,--retries}'[Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4)]':'<n>' \
--http-timeout'[Timeout for HTTP connections (default: 30.0)]':'<seconds>' \
---sleep'[Number of seconds to sleep before each download]':'<seconds>' \
+--sleep'[Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5)]':'<seconds>' \
+--sleep-request'[Number of seconds to wait between HTTP requests during data extraction]':'<seconds>' \
+--sleep-extractor'[Number of seconds to wait before starting data extraction for an input URL]':'<seconds>' \
--filesize-min'[Do not download files smaller than SIZE (e.g. 500k or 2.5M)]':'<size>' \
--filesize-max'[Do not download files larger than SIZE (e.g. 500k or 2.5M)]':'<size>' \
--no-part'[Do not use .part files]' \
@@ -54,7 +58,8 @@ _arguments -C -S \
--ugoira-conv'[Convert Pixiv Ugoira to WebM (requires FFmpeg)]' \
--ugoira-conv-lossless'[Convert Pixiv Ugoira to WebM in VP9 lossless mode]' \
--write-metadata'[Write metadata to separate JSON files]' \
---write-infojson'[Write gallery metadata to a info.json file]' \
+--write-info-json'[Write gallery metadata to a info.json file]' \
+--write-infojson'[==SUPPRESS==]' \
--write-tags'[Write image tags to separate text files]' \
--mtime-from-date'[Set file modification times according to "date" metadata]' \
--exec'[Execute CMD for each downloaded file. Example: --exec "convert {} {}.png && rm {}"]':'<cmd>' \
diff --git a/data/completion/gallery-dl b/data/completion/gallery-dl
index c2ef896..4085bb9 100644
--- a/data/completion/gallery-dl
+++ b/data/completion/gallery-dl
@@ -7,10 +7,10 @@ _gallery_dl()
if [[ "${prev}" =~ ^(-i|--input-file|--cookies|--write-log|--write-unsupported|-c|--config|--config-yaml|--download-archive)$ ]]; then
COMPREPLY=( $(compgen -f -- "${cur}") )
- elif [[ "${prev}" =~ ^(-d|--dest)$ ]]; then
+ elif [[ "${prev}" =~ ^(--dest)$ ]]; then
COMPREPLY=( $(compgen -d -- "${cur}") )
else
- COMPREPLY=( $(compgen -W "--help --version --dest --input-file --cookies --proxy --clear-cache --quiet --verbose --get-urls --resolve-urls --dump-json --simulate --extractor-info --list-keywords --list-modules --list-extractors --write-log --write-unsupported --write-pages --limit-rate --retries --http-timeout --sleep --filesize-min --filesize-max --no-part --no-skip --no-mtime --no-download --no-check-certificate --config --config-yaml --option --ignore-config --username --password --netrc --download-archive --abort --terminate --range --chapter-range --filter --chapter-filter --zip --ugoira-conv --ugoira-conv-lossless --write-metadata --write-infojson --write-tags --mtime-from-date --exec --exec-after --postprocessor" -- "${cur}") )
+ COMPREPLY=( $(compgen -W "--help --version --dest --input-file --filename --directory --cookies --proxy --clear-cache --quiet --verbose --get-urls --resolve-urls --dump-json --simulate --extractor-info --list-keywords --list-modules --list-extractors --write-log --write-unsupported --write-pages --limit-rate --retries --http-timeout --sleep --sleep-request --sleep-extractor --filesize-min --filesize-max --no-part --no-skip --no-mtime --no-download --no-check-certificate --config --config-yaml --option --ignore-config --username --password --netrc --download-archive --abort --terminate --range --chapter-range --filter --chapter-filter --zip --ugoira-conv --ugoira-conv-lossless --write-metadata --write-info-json --write-infojson --write-tags --mtime-from-date --exec --exec-after --postprocessor" -- "${cur}") )
fi
}
diff --git a/data/man/gallery-dl.1 b/data/man/gallery-dl.1
index e7741ef..a7f51a7 100644
--- a/data/man/gallery-dl.1
+++ b/data/man/gallery-dl.1
@@ -1,4 +1,4 @@
-.TH "GALLERY-DL" "1" "2021-11-27" "1.19.3" "gallery-dl Manual"
+.TH "GALLERY-DL" "1" "2021-12-29" "1.20.0" "gallery-dl Manual"
.\" disable hyphenation
.nh
@@ -23,12 +23,15 @@ Print this help message and exit
.B "\-\-version"
Print program version and exit
.TP
-.B "\-d, \-\-dest" \f[I]DEST\f[]
-Destination directory
-.TP
.B "\-i, \-\-input\-file" \f[I]FILE\f[]
Download URLs found in FILE ('-' for stdin). More than one --input-file can be specified
.TP
+.B "\-f, \-\-filename" \f[I]FORMAT\f[]
+Filename format string for downloaded files ('/O' for "original" filenames)
+.TP
+.B "\-d, \-\-directory" \f[I]PATH\f[]
+Target location for file downloads
+.TP
.B "\-\-cookies" \f[I]FILE\f[]
File to load additional cookies from
.TP
@@ -87,7 +90,13 @@ Maximum number of retries for failed HTTP requests or -1 for infinite retries (d
Timeout for HTTP connections (default: 30.0)
.TP
.B "\-\-sleep" \f[I]SECONDS\f[]
-Number of seconds to sleep before each download
+Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5)
+.TP
+.B "\-\-sleep\-request" \f[I]SECONDS\f[]
+Number of seconds to wait between HTTP requests during data extraction
+.TP
+.B "\-\-sleep\-extractor" \f[I]SECONDS\f[]
+Number of seconds to wait before starting data extraction for an input URL
.TP
.B "\-\-filesize\-min" \f[I]SIZE\f[]
Do not download files smaller than SIZE (e.g. 500k or 2.5M)
@@ -161,7 +170,7 @@ Convert Pixiv Ugoira to WebM in VP9 lossless mode
.B "\-\-write\-metadata"
Write metadata to separate JSON files
.TP
-.B "\-\-write\-infojson"
+.B "\-\-write\-info\-json"
Write gallery metadata to a info.json file
.TP
.B "\-\-write\-tags"
diff --git a/data/man/gallery-dl.conf.5 b/data/man/gallery-dl.conf.5
index 09d2820..a574625 100644
--- a/data/man/gallery-dl.conf.5
+++ b/data/man/gallery-dl.conf.5
@@ -1,4 +1,4 @@
-.TH "GALLERY-DL.CONF" "5" "2021-11-27" "1.19.3" "gallery-dl Manual"
+.TH "GALLERY-DL.CONF" "5" "2021-12-29" "1.20.0" "gallery-dl Manual"
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
@@ -1235,18 +1235,6 @@ or whenever your \f[I]cache file\f[] is deleted or cleared.
Minimum wait time in seconds before API requests.
-.SS extractor.exhentai.limits
-.IP "Type:" 6
-\f[I]integer\f[]
-
-.IP "Default:" 9
-\f[I]null\f[]
-
-.IP "Description:" 4
-Sets a custom image download limit and
-stops extraction when it gets exceeded.
-
-
.SS extractor.exhentai.domain
.IP "Type:" 6
\f[I]string\f[]
@@ -1264,6 +1252,18 @@ depending on the input URL
* \f[I]"exhentai.org"\f[]: Use \f[I]exhentai.org\f[] for all URLs
+.SS extractor.exhentai.limits
+.IP "Type:" 6
+\f[I]integer\f[]
+
+.IP "Default:" 9
+\f[I]null\f[]
+
+.IP "Description:" 4
+Sets a custom image download limit and
+stops extraction when it gets exceeded.
+
+
.SS extractor.exhentai.metadata
.IP "Type:" 6
\f[I]bool\f[]
@@ -1290,6 +1290,20 @@ Makes \f[I]date\f[] and \f[I]filesize\f[] more precise.
Download full-sized original images if available.
+.SS extractor.exhentai.source
+.IP "Type:" 6
+\f[I]string\f[]
+
+.IP "Default:" 9
+\f[I]"gallery"\f[]
+
+.IP "Description:" 4
+Selects an alternative source to download files from.
+
+.br
+* \f[I]"hitomi"\f[]: Download the corresponding gallery from \f[I]hitomi.la\f[]
+
+
.SS extractor.fanbox.embeds
.IP "Type:" 6
\f[I]bool\f[] or \f[I]string\f[]
@@ -1399,6 +1413,18 @@ Possible values are
You can use \f[I]"all"\f[] instead of listing all values separately.
+.SS extractor.generic.enabled
+.IP "Type:" 6
+\f[I]bool\f[]
+
+.IP "Default:" 9
+\f[I]false\f[]
+
+.IP "Description:" 4
+Match **all** URLs not otherwise supported by gallery-dl,
+even ones without a \f[I]generic:\f[] prefix.
+
+
.SS extractor.gfycat.format
.IP "Type:" 6
.br
@@ -1446,7 +1472,7 @@ You can use \f[I]"all"\f[] instead of listing all values separately.
\f[I]bool\f[]
.IP "Default:" 9
-\f[I]true\f[]
+\f[I]false\f[]
.IP "Description:" 4
Try to extract
@@ -1545,7 +1571,7 @@ Extract a user's direct messages as \f[I]dms\f[] metadata.
\f[I]list\f[] of \f[I]strings\f[]
.IP "Default:" 9
-\f[I]["file", "attachments", "inline"]\f[]
+\f[I]["attachments", "file", "inline"]\f[]
.IP "Description:" 4
Determines the type and order of files to be downloaded.
@@ -2287,7 +2313,7 @@ Fetch media from all Tweets and replies in a \f[I]conversation
\f[I]list\f[] of \f[I]strings\f[]
.IP "Default:" 9
-\f[I]["orig", "large", "medium", "small"]\f[]
+\f[I]["orig", "4096x4096", "large", "medium", "small"]\f[]
.IP "Description:" 4
The image version to download.
@@ -2566,11 +2592,14 @@ Note: Set \f[I]quiet\f[] and \f[I]no_warnings\f[] in
\f[I]string\f[]
.IP "Default:" 9
-\f[I]"youtube_dl"\f[]
+\f[I]null\f[]
.IP "Description:" 4
Name of the youtube-dl Python module to import.
+Setting this to \f[I]null\f[] will try to import \f[I]"yt_dlp"\f[]
+followed by \f[I]"youtube_dl"\f[] as fallback.
+
.SS extractor.ytdl.raw-options
.IP "Type:" 6
@@ -2885,11 +2914,14 @@ Note: Set \f[I]quiet\f[] and \f[I]no_warnings\f[] in
\f[I]string\f[]
.IP "Default:" 9
-\f[I]"youtube_dl"\f[]
+\f[I]null\f[]
.IP "Description:" 4
Name of the youtube-dl Python module to import.
+Setting this to \f[I]null\f[] will first try to import \f[I]"yt_dlp"\f[]
+and use \f[I]"youtube_dl"\f[] as fallback.
+
.SS downloader.ytdl.outtmpl
.IP "Type:" 6
@@ -3736,12 +3768,16 @@ A \f[I]Date\f[] value represents a specific point in time.
* \f[I]float\f[]
.br
* \f[I]list\f[] with 2 \f[I]floats\f[]
+.br
+* \f[I]string\f[]
.IP "Example:" 4
.br
* 2.85
.br
* [1.5, 3.0]
+.br
+* "2.85", "1.5-3.0"
.IP "Description:" 4
A \f[I]Duration\f[] represents a span of time in seconds.
@@ -3752,6 +3788,9 @@ A \f[I]Duration\f[] represents a span of time in seconds.
* If given as a \f[I]list\f[] with 2 floating-point numbers \f[I]a\f[] & \f[I]b\f[] ,
it will be randomly chosen with uniform distribution such that \f[I]a <= N <=b\f[].
(see \f[I]random.uniform()\f[])
+.br
+* If given as a \f[I]string\f[], it can either represent a single \f[I]float\f[]
+value (\f[I]"2.85"\f[]) or a range (\f[I]"1.5-3.0"\f[]).
.SS Path
diff --git a/docs/gallery-dl.conf b/docs/gallery-dl.conf
index 0800ec7..8e7ff6d 100644
--- a/docs/gallery-dl.conf
+++ b/docs/gallery-dl.conf
@@ -114,7 +114,7 @@
},
"hitomi":
{
- "metadata": true
+ "metadata": false
},
"idolcomplex":
{
@@ -303,7 +303,7 @@
"format": null,
"generic": true,
"logging": true,
- "module": "youtube_dl",
+ "module": null,
"raw-options": null
},
"booru":
@@ -337,7 +337,7 @@
"format": null,
"forward-cookies": false,
"logging": true,
- "module": "youtube_dl",
+ "module": null,
"outtmpl": null,
"raw-options": null
}
diff --git a/gallery_dl.egg-info/PKG-INFO b/gallery_dl.egg-info/PKG-INFO
index bf70cac..8b87746 100644
--- a/gallery_dl.egg-info/PKG-INFO
+++ b/gallery_dl.egg-info/PKG-INFO
@@ -1,6 +1,6 @@
Metadata-Version: 2.1
Name: gallery-dl
-Version: 1.19.3
+Version: 1.20.0
Summary: Command-line program to download image galleries and collections from several image hosting sites
Home-page: https://github.com/mikf/gallery-dl
Author: Mike Fährmann
@@ -9,352 +9,6 @@ Maintainer: Mike Fährmann
Maintainer-email: mike_faehrmann@web.de
License: GPLv2
Download-URL: https://github.com/mikf/gallery-dl/releases/latest
-Description: ==========
- gallery-dl
- ==========
-
- *gallery-dl* is a command-line program to download image galleries and
- collections from several image hosting sites (see `Supported Sites`_).
- It is a cross-platform tool with many configuration options
- and powerful `filenaming capabilities <Formatting_>`_.
-
-
- |pypi| |build| |gitter|
-
- .. contents::
-
-
- Dependencies
- ============
-
- - Python_ 3.4+
- - Requests_
-
- Optional
- --------
-
- - FFmpeg_: Pixiv Ugoira to WebM conversion
- - youtube-dl_: Video downloads
-
-
- Installation
- ============
-
-
- Pip
- ---
-
- The stable releases of *gallery-dl* are distributed on PyPI_ and can be
- easily installed or upgraded using pip_:
-
- .. code:: bash
-
- $ python3 -m pip install -U gallery-dl
-
- Installing the latest dev version directly from GitHub can be done with
- pip_ as well:
-
- .. code:: bash
-
- $ python3 -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz
-
- Note: Windows users should use :code:`py -3` instead of :code:`python3`.
-
- It is advised to use the latest version of pip_,
- including the essential packages :code:`setuptools` and :code:`wheel`.
- To ensure these packages are up-to-date, run
-
- .. code:: bash
-
- $ python3 -m pip install --upgrade pip setuptools wheel
-
-
- Standalone Executable
- ---------------------
-
- Prebuilt executable files with a Python interpreter and
- required Python packages included are available for
-
- - `Windows <https://github.com/mikf/gallery-dl/releases/download/v1.19.3/gallery-dl.exe>`__
- - `Linux <https://github.com/mikf/gallery-dl/releases/download/v1.19.3/gallery-dl.bin>`__
-
- | Executables build from the latest commit can be found at
- | https://github.com/mikf/gallery-dl/actions/workflows/executables.yml
-
-
- Snap
- ----
-
- Linux users that are using a distro that is supported by Snapd_ can install *gallery-dl* from the Snap Store:
-
- .. code:: bash
-
- $ snap install gallery-dl
-
-
- Chocolatey
- ----------
-
- Windows users that have Chocolatey_ installed can install *gallery-dl* from the Chocolatey Community Packages repository:
-
- .. code:: powershell
-
- $ choco install gallery-dl
-
-
- Scoop
- -----
-
- *gallery-dl* is also available in the Scoop_ "main" bucket for Windows users:
-
- .. code:: powershell
-
- $ scoop install gallery-dl
-
-
- Usage
- =====
-
- To use *gallery-dl* simply call it with the URLs you wish to download images
- from:
-
- .. code:: bash
-
- $ gallery-dl [OPTION]... URL...
-
- See also :code:`gallery-dl --help`.
-
-
- Examples
- --------
-
- Download images; in this case from danbooru via tag search for 'bonocho':
-
- .. code:: bash
-
- $ gallery-dl "https://danbooru.donmai.us/posts?tags=bonocho"
-
-
- Get the direct URL of an image from a site that requires authentication:
-
- .. code:: bash
-
- $ gallery-dl -g -u "<username>" -p "<password>" "https://seiga.nicovideo.jp/seiga/im3211703"
-
-
- Filter manga chapters by language and chapter number:
-
- .. code:: bash
-
- $ gallery-dl --chapter-filter "lang == 'fr' and 10 <= chapter < 20" "https://mangadex.org/title/2354/"
-
-
- | Search a remote resource for URLs and download images from them:
- | (URLs for which no extractor can be found will be silently ignored)
-
- .. code:: bash
-
- $ gallery-dl "r:https://pastebin.com/raw/FLwrCYsT"
-
-
- If a site's address is nonstandard for its extractor, you can prefix the URL with the
- extractor's name to force the use of a specific extractor:
-
- .. code:: bash
-
- $ gallery-dl "tumblr:https://sometumblrblog.example"
-
-
- Configuration
- =============
-
- Configuration files for *gallery-dl* use a JSON-based file format.
-
- | For a (more or less) complete example with options set to their default values,
- see gallery-dl.conf_.
- | For a configuration file example with more involved settings and options,
- see gallery-dl-example.conf_.
- | A list of all available configuration options and their
- descriptions can be found in configuration.rst_.
- |
-
- *gallery-dl* searches for configuration files in the following places:
-
- Windows:
- * ``%APPDATA%\gallery-dl\config.json``
- * ``%USERPROFILE%\gallery-dl\config.json``
- * ``%USERPROFILE%\gallery-dl.conf``
-
- (``%USERPROFILE%`` usually refers to the user's home directory,
- i.e. ``C:\Users\<username>\``)
-
- Linux, macOS, etc.:
- * ``/etc/gallery-dl.conf``
- * ``${XDG_CONFIG_HOME}/gallery-dl/config.json``
- * ``${HOME}/.config/gallery-dl/config.json``
- * ``${HOME}/.gallery-dl.conf``
-
- Values in later configuration files will override previous ones.
-
- Command line options will override all related settings in the configuration file(s),
- e.g. using ``--write-metadata`` will enable writing metadata using the default values
- for all ``postprocessors.metadata.*`` settings, overriding any specific settings in
- configuration files.
-
-
- Authentication
- ==============
-
- Username & Password
- -------------------
-
- Some extractors require you to provide valid login credentials in the form of
- a username & password pair. This is necessary for
- ``nijie`` and ``seiga``
- and optional for
- ``aryion``,
- ``danbooru``,
- ``e621``,
- ``exhentai``,
- ``idolcomplex``,
- ``imgbb``,
- ``inkbunny``,
- ``instagram``,
- ``mangadex``,
- ``mangoxo``,
- ``pillowfort``,
- ``sankaku``,
- ``subscribestar``,
- ``tapas``,
- ``tsumino``,
- and ``twitter``.
-
- You can set the necessary information in your configuration file
- (cf. gallery-dl.conf_)
-
- .. code:: json
-
- {
- "extractor": {
- "seiga": {
- "username": "<username>",
- "password": "<password>"
- }
- }
- }
-
- or you can provide them directly via the
- :code:`-u/--username` and :code:`-p/--password` or via the
- :code:`-o/--option` command-line options
-
- .. code:: bash
-
- $ gallery-dl -u <username> -p <password> URL
- $ gallery-dl -o username=<username> -o password=<password> URL
-
-
- Cookies
- -------
-
- For sites where login with username & password is not possible due to
- CAPTCHA or similar, or has not been implemented yet, you can use the
- cookies from a browser login session and input them into *gallery-dl*.
-
- This can be done via the
- `cookies <https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorcookies>`__
- option in your configuration file by specifying
-
- - | the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon
- | (e.g. `Get cookies.txt <https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid/>`__ for Chrome,
- `Export Cookies <https://addons.mozilla.org/en-US/firefox/addon/export-cookies-txt/>`__ for Firefox)
-
- - | a list of name-value pairs gathered from your browser's web developer tools
- | (in `Chrome <https://developers.google.com/web/tools/chrome-devtools/storage/cookies>`__,
- in `Firefox <https://developer.mozilla.org/en-US/docs/Tools/Storage_Inspector>`__)
-
- For example:
-
- .. code:: json
-
- {
- "extractor": {
- "instagram": {
- "cookies": "$HOME/path/to/cookies.txt"
- },
- "patreon": {
- "cookies": {
- "session_id": "K1T57EKu19TR49C51CDjOJoXNQLF7VbdVOiBrC9ye0a"
- }
- }
- }
- }
-
- You can also specify a cookies.txt file with
- the :code:`--cookies` command-line option:
-
- .. code:: bash
-
- $ gallery-dl --cookies "$HOME/path/to/cookies.txt" URL
-
-
- OAuth
- -----
-
- *gallery-dl* supports user authentication via OAuth_ for
- ``deviantart``, ``flickr``, ``reddit``, ``smugmug``, ``tumblr``,
- and ``mastodon`` instances.
- This is mostly optional, but grants *gallery-dl* the ability
- to issue requests on your account's behalf and enables it to access resources
- which would otherwise be unavailable to a public user.
-
- To link your account to *gallery-dl*, start by invoking it with
- ``oauth:<sitename>`` as an argument. For example:
-
- .. code:: bash
-
- $ gallery-dl oauth:flickr
-
- You will be sent to the site's authorization page and asked to grant read
- access to *gallery-dl*. Authorize it and you will be shown one or more
- "tokens", which should be added to your configuration file.
-
- To authenticate with a ``mastodon`` instance, run *gallery-dl* with
- ``oauth:mastodon:<instance>`` as argument. For example:
-
- .. code:: bash
-
- $ gallery-dl oauth:mastodon:pawoo.net
- $ gallery-dl oauth:mastodon:https://mastodon.social/
-
-
-
- .. _gallery-dl.conf: https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl.conf
- .. _gallery-dl-example.conf: https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl-example.conf
- .. _configuration.rst: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst
- .. _Supported Sites: https://github.com/mikf/gallery-dl/blob/master/docs/supportedsites.md
- .. _Formatting: https://github.com/mikf/gallery-dl/blob/master/docs/formatting.md
-
- .. _Python: https://www.python.org/downloads/
- .. _PyPI: https://pypi.org/
- .. _pip: https://pip.pypa.io/en/stable/
- .. _Requests: https://requests.readthedocs.io/en/master/
- .. _FFmpeg: https://www.ffmpeg.org/
- .. _youtube-dl: https://ytdl-org.github.io/youtube-dl/
- .. _pyOpenSSL: https://pyopenssl.org/
- .. _Snapd: https://docs.snapcraft.io/installing-snapd
- .. _OAuth: https://en.wikipedia.org/wiki/OAuth
- .. _Chocolatey: https://chocolatey.org/install
- .. _Scoop: https://scoop.sh
-
- .. |pypi| image:: https://img.shields.io/pypi/v/gallery-dl.svg
- :target: https://pypi.org/project/gallery-dl/
-
- .. |build| image:: https://github.com/mikf/gallery-dl/workflows/tests/badge.svg
- :target: https://github.com/mikf/gallery-dl/actions
-
- .. |gitter| image:: https://badges.gitter.im/gallery-dl/main.svg
- :target: https://gitter.im/gallery-dl/main
-
Keywords: image gallery downloader crawler scraper
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
@@ -376,3 +30,353 @@ Classifier: Topic :: Multimedia :: Graphics
Classifier: Topic :: Utilities
Requires-Python: >=3.4
Provides-Extra: video
+License-File: LICENSE
+
+==========
+gallery-dl
+==========
+
+*gallery-dl* is a command-line program to download image galleries and
+collections from several image hosting sites (see `Supported Sites`_).
+It is a cross-platform tool with many configuration options
+and powerful `filenaming capabilities <Formatting_>`_.
+
+
+|pypi| |build| |gitter|
+
+.. contents::
+
+
+Dependencies
+============
+
+- Python_ 3.4+
+- Requests_
+
+Optional
+--------
+
+- FFmpeg_: Pixiv Ugoira to WebM conversion
+- yt-dlp_ or youtube-dl_: Video downloads
+
+
+Installation
+============
+
+
+Pip
+---
+
+The stable releases of *gallery-dl* are distributed on PyPI_ and can be
+easily installed or upgraded using pip_:
+
+.. code:: bash
+
+ $ python3 -m pip install -U gallery-dl
+
+Installing the latest dev version directly from GitHub can be done with
+pip_ as well:
+
+.. code:: bash
+
+ $ python3 -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz
+
+Note: Windows users should use :code:`py -3` instead of :code:`python3`.
+
+It is advised to use the latest version of pip_,
+including the essential packages :code:`setuptools` and :code:`wheel`.
+To ensure these packages are up-to-date, run
+
+.. code:: bash
+
+ $ python3 -m pip install --upgrade pip setuptools wheel
+
+
+Standalone Executable
+---------------------
+
+Prebuilt executable files with a Python interpreter and
+required Python packages included are available for
+
+- `Windows <https://github.com/mikf/gallery-dl/releases/download/v1.20.0/gallery-dl.exe>`__
+- `Linux <https://github.com/mikf/gallery-dl/releases/download/v1.20.0/gallery-dl.bin>`__
+
+| Executables build from the latest commit can be found at
+| https://github.com/mikf/gallery-dl/actions/workflows/executables.yml
+
+
+Snap
+----
+
+Linux users that are using a distro that is supported by Snapd_ can install *gallery-dl* from the Snap Store:
+
+.. code:: bash
+
+ $ snap install gallery-dl
+
+
+Chocolatey
+----------
+
+Windows users that have Chocolatey_ installed can install *gallery-dl* from the Chocolatey Community Packages repository:
+
+.. code:: powershell
+
+ $ choco install gallery-dl
+
+
+Scoop
+-----
+
+*gallery-dl* is also available in the Scoop_ "main" bucket for Windows users:
+
+.. code:: powershell
+
+ $ scoop install gallery-dl
+
+
+Usage
+=====
+
+To use *gallery-dl* simply call it with the URLs you wish to download images
+from:
+
+.. code:: bash
+
+ $ gallery-dl [OPTION]... URL...
+
+See also :code:`gallery-dl --help`.
+
+
+Examples
+--------
+
+Download images; in this case from danbooru via tag search for 'bonocho':
+
+.. code:: bash
+
+ $ gallery-dl "https://danbooru.donmai.us/posts?tags=bonocho"
+
+
+Get the direct URL of an image from a site that requires authentication:
+
+.. code:: bash
+
+ $ gallery-dl -g -u "<username>" -p "<password>" "https://seiga.nicovideo.jp/seiga/im3211703"
+
+
+Filter manga chapters by language and chapter number:
+
+.. code:: bash
+
+ $ gallery-dl --chapter-filter "lang == 'fr' and 10 <= chapter < 20" "https://mangadex.org/title/2354/"
+
+
+| Search a remote resource for URLs and download images from them:
+| (URLs for which no extractor can be found will be silently ignored)
+
+.. code:: bash
+
+ $ gallery-dl "r:https://pastebin.com/raw/FLwrCYsT"
+
+
+If a site's address is nonstandard for its extractor, you can prefix the URL with the
+extractor's name to force the use of a specific extractor:
+
+.. code:: bash
+
+ $ gallery-dl "tumblr:https://sometumblrblog.example"
+
+
+Configuration
+=============
+
+Configuration files for *gallery-dl* use a JSON-based file format.
+
+| For a (more or less) complete example with options set to their default values,
+ see gallery-dl.conf_.
+| For a configuration file example with more involved settings and options,
+ see gallery-dl-example.conf_.
+| A list of all available configuration options and their
+ descriptions can be found in configuration.rst_.
+|
+
+*gallery-dl* searches for configuration files in the following places:
+
+Windows:
+ * ``%APPDATA%\gallery-dl\config.json``
+ * ``%USERPROFILE%\gallery-dl\config.json``
+ * ``%USERPROFILE%\gallery-dl.conf``
+
+ (``%USERPROFILE%`` usually refers to the user's home directory,
+ i.e. ``C:\Users\<username>\``)
+
+Linux, macOS, etc.:
+ * ``/etc/gallery-dl.conf``
+ * ``${XDG_CONFIG_HOME}/gallery-dl/config.json``
+ * ``${HOME}/.config/gallery-dl/config.json``
+ * ``${HOME}/.gallery-dl.conf``
+
+Values in later configuration files will override previous ones.
+
+Command line options will override all related settings in the configuration file(s),
+e.g. using ``--write-metadata`` will enable writing metadata using the default values
+for all ``postprocessors.metadata.*`` settings, overriding any specific settings in
+configuration files.
+
+
+Authentication
+==============
+
+Username & Password
+-------------------
+
+Some extractors require you to provide valid login credentials in the form of
+a username & password pair. This is necessary for
+``nijie`` and ``seiga``
+and optional for
+``aryion``,
+``danbooru``,
+``e621``,
+``exhentai``,
+``idolcomplex``,
+``imgbb``,
+``inkbunny``,
+``instagram``,
+``mangadex``,
+``mangoxo``,
+``pillowfort``,
+``sankaku``,
+``subscribestar``,
+``tapas``,
+``tsumino``,
+and ``twitter``.
+
+You can set the necessary information in your configuration file
+(cf. gallery-dl.conf_)
+
+.. code:: json
+
+ {
+ "extractor": {
+ "seiga": {
+ "username": "<username>",
+ "password": "<password>"
+ }
+ }
+ }
+
+or you can provide them directly via the
+:code:`-u/--username` and :code:`-p/--password` or via the
+:code:`-o/--option` command-line options
+
+.. code:: bash
+
+ $ gallery-dl -u <username> -p <password> URL
+ $ gallery-dl -o username=<username> -o password=<password> URL
+
+
+Cookies
+-------
+
+For sites where login with username & password is not possible due to
+CAPTCHA or similar, or has not been implemented yet, you can use the
+cookies from a browser login session and input them into *gallery-dl*.
+
+This can be done via the
+`cookies <https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorcookies>`__
+option in your configuration file by specifying
+
+- | the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon
+ | (e.g. `Get cookies.txt <https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid/>`__ for Chrome,
+ `Export Cookies <https://addons.mozilla.org/en-US/firefox/addon/export-cookies-txt/>`__ for Firefox)
+
+- | a list of name-value pairs gathered from your browser's web developer tools
+ | (in `Chrome <https://developers.google.com/web/tools/chrome-devtools/storage/cookies>`__,
+ in `Firefox <https://developer.mozilla.org/en-US/docs/Tools/Storage_Inspector>`__)
+
+For example:
+
+.. code:: json
+
+ {
+ "extractor": {
+ "instagram": {
+ "cookies": "$HOME/path/to/cookies.txt"
+ },
+ "patreon": {
+ "cookies": {
+ "session_id": "K1T57EKu19TR49C51CDjOJoXNQLF7VbdVOiBrC9ye0a"
+ }
+ }
+ }
+ }
+
+You can also specify a cookies.txt file with
+the :code:`--cookies` command-line option:
+
+.. code:: bash
+
+ $ gallery-dl --cookies "$HOME/path/to/cookies.txt" URL
+
+
+OAuth
+-----
+
+*gallery-dl* supports user authentication via OAuth_ for
+``deviantart``, ``flickr``, ``reddit``, ``smugmug``, ``tumblr``,
+and ``mastodon`` instances.
+This is mostly optional, but grants *gallery-dl* the ability
+to issue requests on your account's behalf and enables it to access resources
+which would otherwise be unavailable to a public user.
+
+To link your account to *gallery-dl*, start by invoking it with
+``oauth:<sitename>`` as an argument. For example:
+
+.. code:: bash
+
+ $ gallery-dl oauth:flickr
+
+You will be sent to the site's authorization page and asked to grant read
+access to *gallery-dl*. Authorize it and you will be shown one or more
+"tokens", which should be added to your configuration file.
+
+To authenticate with a ``mastodon`` instance, run *gallery-dl* with
+``oauth:mastodon:<instance>`` as argument. For example:
+
+.. code:: bash
+
+ $ gallery-dl oauth:mastodon:pawoo.net
+ $ gallery-dl oauth:mastodon:https://mastodon.social/
+
+
+
+.. _gallery-dl.conf: https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl.conf
+.. _gallery-dl-example.conf: https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl-example.conf
+.. _configuration.rst: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst
+.. _Supported Sites: https://github.com/mikf/gallery-dl/blob/master/docs/supportedsites.md
+.. _Formatting: https://github.com/mikf/gallery-dl/blob/master/docs/formatting.md
+
+.. _Python: https://www.python.org/downloads/
+.. _PyPI: https://pypi.org/
+.. _pip: https://pip.pypa.io/en/stable/
+.. _Requests: https://requests.readthedocs.io/en/master/
+.. _FFmpeg: https://www.ffmpeg.org/
+.. _yt-dlp: https://github.com/yt-dlp/yt-dlp
+.. _youtube-dl: https://ytdl-org.github.io/youtube-dl/
+.. _pyOpenSSL: https://pyopenssl.org/
+.. _Snapd: https://docs.snapcraft.io/installing-snapd
+.. _OAuth: https://en.wikipedia.org/wiki/OAuth
+.. _Chocolatey: https://chocolatey.org/install
+.. _Scoop: https://scoop.sh
+
+.. |pypi| image:: https://img.shields.io/pypi/v/gallery-dl.svg
+ :target: https://pypi.org/project/gallery-dl/
+
+.. |build| image:: https://github.com/mikf/gallery-dl/workflows/tests/badge.svg
+ :target: https://github.com/mikf/gallery-dl/actions
+
+.. |gitter| image:: https://badges.gitter.im/gallery-dl/main.svg
+ :target: https://gitter.im/gallery-dl/main
+
+
diff --git a/gallery_dl.egg-info/SOURCES.txt b/gallery_dl.egg-info/SOURCES.txt
index d05066c..127354e 100644
--- a/gallery_dl.egg-info/SOURCES.txt
+++ b/gallery_dl.egg-info/SOURCES.txt
@@ -76,6 +76,7 @@ gallery_dl/extractor/fuskator.py
gallery_dl/extractor/gelbooru.py
gallery_dl/extractor/gelbooru_v01.py
gallery_dl/extractor/gelbooru_v02.py
+gallery_dl/extractor/generic.py
gallery_dl/extractor/gfycat.py
gallery_dl/extractor/hbrowse.py
gallery_dl/extractor/hentai2read.py
@@ -105,6 +106,7 @@ gallery_dl/extractor/khinsider.py
gallery_dl/extractor/komikcast.py
gallery_dl/extractor/lineblog.py
gallery_dl/extractor/livedoor.py
+gallery_dl/extractor/lolisafe.py
gallery_dl/extractor/luscious.py
gallery_dl/extractor/mangadex.py
gallery_dl/extractor/mangafox.py
@@ -147,6 +149,7 @@ gallery_dl/extractor/readcomiconline.py
gallery_dl/extractor/recursive.py
gallery_dl/extractor/reddit.py
gallery_dl/extractor/redgifs.py
+gallery_dl/extractor/rule34us.py
gallery_dl/extractor/sankaku.py
gallery_dl/extractor/sankakucomplex.py
gallery_dl/extractor/seiga.py
@@ -177,6 +180,7 @@ gallery_dl/extractor/webtoons.py
gallery_dl/extractor/weibo.py
gallery_dl/extractor/wikiart.py
gallery_dl/extractor/wikieat.py
+gallery_dl/extractor/wordpress.py
gallery_dl/extractor/xhamster.py
gallery_dl/extractor/xvideos.py
gallery_dl/extractor/ytdl.py
@@ -201,4 +205,5 @@ test/test_output.py
test/test_postprocessor.py
test/test_results.py
test/test_text.py
-test/test_util.py \ No newline at end of file
+test/test_util.py
+test/test_ytdl.py \ No newline at end of file
diff --git a/gallery_dl/__init__.py b/gallery_dl/__init__.py
index 2cad029..ad8286e 100644
--- a/gallery_dl/__init__.py
+++ b/gallery_dl/__init__.py
@@ -115,6 +115,13 @@ def main():
config.load(args.cfgfiles, strict=True)
if args.yamlfiles:
config.load(args.yamlfiles, strict=True, fmt="yaml")
+ if args.filename:
+ if args.filename == "/O":
+ args.filename = "{filename}.{extension}"
+ config.set((), "filename", args.filename)
+ if args.directory:
+ config.set((), "base-directory", args.directory)
+ config.set((), "directory", ())
if args.postprocessors:
config.set((), "postprocessors", args.postprocessors)
if args.abort:
@@ -142,20 +149,23 @@ def main():
import os.path
import requests
- head = ""
- try:
- out, err = subprocess.Popen(
- ("git", "rev-parse", "--short", "HEAD"),
- stdout=subprocess.PIPE,
- stderr=subprocess.PIPE,
- cwd=os.path.dirname(os.path.abspath(__file__)),
- ).communicate()
- if out and not err:
- head = " - Git HEAD: " + out.decode().rstrip()
- except (OSError, subprocess.SubprocessError):
- pass
+ extra = ""
+ if getattr(sys, "frozen", False):
+ extra = " - Executable"
+ else:
+ try:
+ out, err = subprocess.Popen(
+ ("git", "rev-parse", "--short", "HEAD"),
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE,
+ cwd=os.path.dirname(os.path.abspath(__file__)),
+ ).communicate()
+ if out and not err:
+ extra = " - Git HEAD: " + out.decode().rstrip()
+ except (OSError, subprocess.SubprocessError):
+ pass
- log.debug("Version %s%s", __version__, head)
+ log.debug("Version %s%s", __version__, extra)
log.debug("Python %s - %s",
platform.python_version(), platform.platform())
try:
diff --git a/gallery_dl/downloader/ytdl.py b/gallery_dl/downloader/ytdl.py
index 8416ca0..30f628e 100644
--- a/gallery_dl/downloader/ytdl.py
+++ b/gallery_dl/downloader/ytdl.py
@@ -39,7 +39,7 @@ class YoutubeDLDownloader(DownloaderBase):
if not ytdl_instance:
ytdl_instance = self.ytdl_instance
if not ytdl_instance:
- module = __import__(self.config("module") or "youtube_dl")
+ module = ytdl.import_module(self.config("module"))
self.ytdl_instance = ytdl_instance = ytdl.construct_YoutubeDL(
module, self, self.ytdl_opts)
if self.outtmpl == "default":
diff --git a/gallery_dl/extractor/2chan.py b/gallery_dl/extractor/2chan.py
index c92969b..38b2d5a 100644
--- a/gallery_dl/extractor/2chan.py
+++ b/gallery_dl/extractor/2chan.py
@@ -20,7 +20,7 @@ class _2chanThreadExtractor(Extractor):
filename_fmt = "{tim}.{extension}"
archive_fmt = "{board}_{thread}_{tim}"
url_fmt = "https://{server}.2chan.net/{board}/src/{filename}"
- pattern = r"(?:https?://)?([^.]+)\.2chan\.net/([^/]+)/res/(\d+)"
+ pattern = r"(?:https?://)?([\w-]+)\.2chan\.net/([^/]+)/res/(\d+)"
test = ("http://dec.2chan.net/70/res/4752.htm", {
"url": "f49aa31340e9a3429226af24e19e01f5b819ca1f",
"keyword": "44599c21b248e79692b2eb2da12699bd0ed5640a",
diff --git a/gallery_dl/extractor/500px.py b/gallery_dl/extractor/500px.py
index 8c6fa09..88ceaeb 100644
--- a/gallery_dl/extractor/500px.py
+++ b/gallery_dl/extractor/500px.py
@@ -21,13 +21,13 @@ class _500pxExtractor(Extractor):
filename_fmt = "{id}_{name}.{extension}"
archive_fmt = "{id}"
root = "https://500px.com"
+ cookiedomain = ".500px.com"
def __init__(self, match):
Extractor.__init__(self, match)
self.session.headers["Referer"] = self.root + "/"
def items(self):
- first = True
data = self.metadata()
for photo in self.photos():
@@ -35,9 +35,7 @@ class _500pxExtractor(Extractor):
photo["extension"] = photo["image_format"]
if data:
photo.update(data)
- if first:
- first = False
- yield Message.Directory, photo
+ yield Message.Directory, photo
yield Message.Url, url, photo
def metadata(self):
@@ -72,24 +70,33 @@ class _500pxExtractor(Extractor):
self.log.warning("Unable to fetch photo %s", pid)
]
- def _request_api(self, url, params, csrf_token=None):
- headers = {"Origin": self.root, "X-CSRF-Token": csrf_token}
+ def _request_api(self, url, params):
+ headers = {
+ "Origin": self.root,
+ "x-csrf-token": self.session.cookies.get(
+ "x-csrf-token", domain=".500px.com"),
+ }
return self.request(url, headers=headers, params=params).json()
def _request_graphql(self, opname, variables):
url = "https://api.500px.com/graphql"
+ headers = {
+ "x-csrf-token": self.session.cookies.get(
+ "x-csrf-token", domain=".500px.com"),
+ }
data = {
"operationName": opname,
"variables" : json.dumps(variables),
"query" : QUERIES[opname],
}
- return self.request(url, method="POST", json=data).json()["data"]
+ return self.request(
+ url, method="POST", headers=headers, json=data).json()["data"]
class _500pxUserExtractor(_500pxExtractor):
"""Extractor for photos from a user's photostream on 500px.com"""
subcategory = "user"
- pattern = BASE_PATTERN + r"/(?!photo/)(?:p/)?([^/?#]+)/?(?:$|[?#])"
+ pattern = BASE_PATTERN + r"/(?!photo/|liked)(?:p/)?([^/?#]+)/?(?:$|[?#])"
test = (
("https://500px.com/p/light_expression_photography", {
"pattern": r"https?://drscdn.500px.org/photo/\d+/m%3D4096/v2",
@@ -137,10 +144,6 @@ class _500pxGalleryExtractor(_500pxExtractor):
"user": dict,
},
}),
- # unavailable photos (#1335)
- ("https://500px.com/p/Light_Expression_Photography/galleries/street", {
- "count": 4,
- }),
("https://500px.com/fashvamp/galleries/lera"),
)
@@ -194,6 +197,30 @@ class _500pxGalleryExtractor(_500pxExtractor):
)["galleryByOwnerIdAndSlugOrToken"]["photos"]
+class _500pxFavoriteExtractor(_500pxExtractor):
+ """Extractor for favorite 500px photos"""
+ subcategory = "favorite"
+ pattern = BASE_PATTERN + r"/liked/?$"
+ test = ("https://500px.com/liked",)
+
+ def photos(self):
+ variables = {"pageSize": 20}
+ photos = self._request_graphql(
+ "LikedPhotosQueryRendererQuery", variables,
+ )["likedPhotos"]
+
+ while True:
+ yield from self._extend(photos["edges"])
+
+ if not photos["pageInfo"]["hasNextPage"]:
+ return
+
+ variables["cursor"] = photos["pageInfo"]["endCursor"]
+ photos = self._request_graphql(
+ "LikedPhotosPaginationContainerQuery", variables,
+ )["likedPhotos"]
+
+
class _500pxImageExtractor(_500pxExtractor):
"""Extractor for individual images from 500px.com"""
subcategory = "image"
@@ -640,4 +667,122 @@ fragment GalleriesDetailPaginationContainer_gallery_3e6UuE on Gallery {
}
""",
+ "LikedPhotosQueryRendererQuery": """\
+query LikedPhotosQueryRendererQuery($pageSize: Int) {
+ ...LikedPhotosPaginationContainer_query_RlXb8
+}
+
+fragment LikedPhotosPaginationContainer_query_RlXb8 on Query {
+ likedPhotos(first: $pageSize) {
+ edges {
+ node {
+ id
+ legacyId
+ canonicalPath
+ name
+ description
+ category
+ uploadedAt
+ location
+ width
+ height
+ isLikedByMe
+ notSafeForWork
+ tags
+ photographer: uploader {
+ id
+ legacyId
+ username
+ displayName
+ canonicalPath
+ avatar {
+ images {
+ url
+ id
+ }
+ id
+ }
+ followedByUsers {
+ totalCount
+ isFollowedByMe
+ }
+ }
+ images(sizes: [33, 35]) {
+ size
+ url
+ jpegUrl
+ webpUrl
+ id
+ }
+ __typename
+ }
+ cursor
+ }
+ pageInfo {
+ endCursor
+ hasNextPage
+ }
+ }
+}
+""",
+
+ "LikedPhotosPaginationContainerQuery": """\
+query LikedPhotosPaginationContainerQuery($cursor: String, $pageSize: Int) {
+ ...LikedPhotosPaginationContainer_query_3e6UuE
+}
+
+fragment LikedPhotosPaginationContainer_query_3e6UuE on Query {
+ likedPhotos(first: $pageSize, after: $cursor) {
+ edges {
+ node {
+ id
+ legacyId
+ canonicalPath
+ name
+ description
+ category
+ uploadedAt
+ location
+ width
+ height
+ isLikedByMe
+ notSafeForWork
+ tags
+ photographer: uploader {
+ id
+ legacyId
+ username
+ displayName
+ canonicalPath
+ avatar {
+ images {
+ url
+ id
+ }
+ id
+ }
+ followedByUsers {
+ totalCount
+ isFollowedByMe
+ }
+ }
+ images(sizes: [33, 35]) {
+ size
+ url
+ jpegUrl
+ webpUrl
+ id
+ }
+ __typename
+ }
+ cursor
+ }
+ pageInfo {
+ endCursor
+ hasNextPage
+ }
+ }
+}
+""",
+
}
diff --git a/gallery_dl/extractor/__init__.py b/gallery_dl/extractor/__init__.py
index dd9da01..65c994d 100644
--- a/gallery_dl/extractor/__init__.py
+++ b/gallery_dl/extractor/__init__.py
@@ -108,6 +108,7 @@ modules = [
"readcomiconline",
"reddit",
"redgifs",
+ "rule34us",
"sankaku",
"sankakucomplex",
"seiga",
@@ -144,12 +145,14 @@ modules = [
"foolslide",
"mastodon",
"shopify",
+ "lolisafe",
"imagehosts",
"directlink",
"recursive",
"oauth",
"test",
"ytdl",
+ "generic",
]
diff --git a/gallery_dl/extractor/artstation.py b/gallery_dl/extractor/artstation.py
index f687ff8..5675081 100644
--- a/gallery_dl/extractor/artstation.py
+++ b/gallery_dl/extractor/artstation.py
@@ -29,12 +29,12 @@ class ArtstationExtractor(Extractor):
def items(self):
data = self.metadata()
- yield Message.Directory, data
for project in self.projects():
for asset in self.get_project_assets(project["hash_id"]):
asset.update(data)
adict = asset["asset"]
+ yield Message.Directory, asset
if adict["has_embedded_player"] and self.external:
player = adict["player_embedded"]
diff --git a/gallery_dl/extractor/blogger.py b/gallery_dl/extractor/blogger.py
index 7e7c282..9a86cc4 100644
--- a/gallery_dl/extractor/blogger.py
+++ b/gallery_dl/extractor/blogger.py
@@ -15,7 +15,7 @@ import re
BASE_PATTERN = (
r"(?:blogger:(?:https?://)?([^/]+)|"
- r"(?:https?://)?([^.]+\.blogspot\.com))")
+ r"(?:https?://)?([\w-]+\.blogspot\.com))")
class BloggerExtractor(Extractor):
diff --git a/gallery_dl/extractor/common.py b/gallery_dl/extractor/common.py
index e80366e..c440aee 100644
--- a/gallery_dl/extractor/common.py
+++ b/gallery_dl/extractor/common.py
@@ -571,7 +571,11 @@ class BaseExtractor(Extractor):
if not self.category:
for index, group in enumerate(match.groups()):
if group is not None:
- self.category, self.root = self.instances[index]
+ if index:
+ self.category, self.root = self.instances[index-1]
+ else:
+ self.root = group
+ self.category = group.partition("://")[2]
break
Extractor.__init__(self, match)
@@ -594,7 +598,10 @@ class BaseExtractor(Extractor):
pattern = re.escape(root[root.index(":") + 3:])
pattern_list.append(pattern + "()")
- return r"(?:https?://)?(?:" + "|".join(pattern_list) + r")"
+ return (
+ r"(?:" + cls.basecategory + r":(https?://[^/?#]+)|"
+ r"(?:https?://)?(?:" + "|".join(pattern_list) + r"))"
+ )
class HTTPSAdapter(HTTPAdapter):
diff --git a/gallery_dl/extractor/cyberdrop.py b/gallery_dl/extractor/cyberdrop.py
index dbaa97e..6d6e192 100644
--- a/gallery_dl/extractor/cyberdrop.py
+++ b/gallery_dl/extractor/cyberdrop.py
@@ -6,16 +6,13 @@
"""Extractors for https://cyberdrop.me/"""
-from .common import Extractor, Message
+from . import lolisafe
from .. import text
-class CyberdropAlbumExtractor(Extractor):
+class CyberdropAlbumExtractor(lolisafe.LolisafelbumExtractor):
category = "cyberdrop"
- subcategory = "album"
root = "https://cyberdrop.me"
- directory_fmt = ("{category}", "{album_name} ({album_id})")
- archive_fmt = "{album_id}_{id}"
pattern = r"(?:https?://)?(?:www\.)?cyberdrop\.me/a/([^/?#]+)"
test = (
# images
@@ -44,11 +41,7 @@ class CyberdropAlbumExtractor(Extractor):
}),
)
- def __init__(self, match):
- Extractor.__init__(self, match)
- self.album_id = match.group(1)
-
- def items(self):
+ def fetch_album(self, album_id):
url = self.root + "/a/" + self.album_id
extr = text.extract_from(self.request(url).text)
@@ -58,9 +51,9 @@ class CyberdropAlbumExtractor(Extractor):
url = extr('id="file" href="', '"')
if not url:
break
- append(text.unescape(url))
+ append({"file": text.unescape(url)})
- data = {
+ return files, {
"album_id" : self.album_id,
"album_name" : extr("name: '", "'"),
"date" : text.parse_timestamp(extr("timestamp: ", ",")),
@@ -68,9 +61,3 @@ class CyberdropAlbumExtractor(Extractor):
"description": extr("description: `", "`"),
"count" : len(files),
}
-
- yield Message.Directory, data
- for url in files:
- text.nameext_from_url(url, data)
- data["filename"], _, data["id"] = data["filename"].rpartition("-")
- yield Message.Url, url, data
diff --git a/gallery_dl/extractor/deviantart.py b/gallery_dl/extractor/deviantart.py
index 61affb5..94fec16 100644
--- a/gallery_dl/extractor/deviantart.py
+++ b/gallery_dl/extractor/deviantart.py
@@ -772,6 +772,7 @@ class DeviantartPopularExtractor(DeviantartExtractor):
if trange.startswith("popular-"):
trange = trange[8:]
self.time_range = {
+ "newest" : "now",
"most-recent" : "now",
"this-week" : "1week",
"this-month" : "1month",
@@ -786,6 +787,8 @@ class DeviantartPopularExtractor(DeviantartExtractor):
}
def deviations(self):
+ if self.time_range == "now":
+ return self.api.browse_newest(self.search_term, self.offset)
return self.api.browse_popular(
self.search_term, self.time_range, self.offset)
@@ -1034,21 +1037,32 @@ class DeviantartOAuthAPI():
def browse_deviantsyouwatch(self, offset=0):
"""Yield deviations from users you watch"""
- endpoint = "browse/deviantsyouwatch"
+ endpoint = "/browse/deviantsyouwatch"
params = {"limit": "50", "offset": offset,
"mature_content": self.mature}
return self._pagination(endpoint, params, public=False)
def browse_posts_deviantsyouwatch(self, offset=0):
"""Yield posts from users you watch"""
- endpoint = "browse/posts/deviantsyouwatch"
+ endpoint = "/browse/posts/deviantsyouwatch"
params = {"limit": "50", "offset": offset,
"mature_content": self.mature}
return self._pagination(endpoint, params, public=False, unpack=True)
+ def browse_newest(self, query=None, offset=0):
+ """Browse newest deviations"""
+ endpoint = "/browse/newest"
+ params = {
+ "q" : query,
+ "limit" : 50 if self.metadata else 120,
+ "offset" : offset,
+ "mature_content": self.mature,
+ }
+ return self._pagination(endpoint, params)
+
def browse_popular(self, query=None, timerange=None, offset=0):
"""Yield popular deviations"""
- endpoint = "browse/popular"
+ endpoint = "/browse/popular"
params = {
"q" : query,
"limit" : 50 if self.metadata else 120,
@@ -1060,7 +1074,7 @@ class DeviantartOAuthAPI():
def browse_tags(self, tag, offset=0):
""" Browse a tag """
- endpoint = "browse/tags"
+ endpoint = "/browse/tags"
params = {
"tag" : tag,
"offset" : offset,
@@ -1071,14 +1085,14 @@ class DeviantartOAuthAPI():
def browse_user_journals(self, username, offset=0):
"""Yield all journal entries of a specific user"""
- endpoint = "browse/user/journals"
+ endpoint = "/browse/user/journals"
params = {"username": username, "offset": offset, "limit": 50,
"mature_content": self.mature, "featured": "false"}
return self._pagination(endpoint, params)
def collections(self, username, folder_id, offset=0):
"""Yield all Deviation-objects contained in a collection folder"""
- endpoint = "collections/" + folder_id
+ endpoint = "/collections/" + folder_id
params = {"username": username, "offset": offset, "limit": 24,
"mature_content": self.mature}
return self._pagination(endpoint, params)
@@ -1086,21 +1100,21 @@ class DeviantartOAuthAPI():
@memcache(keyarg=1)
def collections_folders(self, username, offset=0):
"""Yield all collection folders of a specific user"""
- endpoint = "collections/folders"
+ endpoint = "/collections/folders"
params = {"username": username, "offset": offset, "limit": 50,
"mature_content": self.mature}
return self._pagination_list(endpoint, params)
def comments_deviation(self, deviation_id, offset=0):
"""Fetch comments posted on a deviation"""
- endpoint = "comments/deviation/" + deviation_id
+ endpoint = "/comments/deviation/" + deviation_id
params = {"maxdepth": "5", "offset": offset, "limit": 50,
"mature_content": self.mature}
return self._pagination_list(endpoint, params=params, key="thread")
def deviation(self, deviation_id, public=True):
"""Query and return info about a single Deviation"""
- endpoint = "deviation/" + deviation_id
+ endpoint = "/deviation/" + deviation_id
deviation = self._call(endpoint, public=public)
if self.metadata:
self._metadata((deviation,))
@@ -1110,13 +1124,13 @@ class DeviantartOAuthAPI():
def deviation_content(self, deviation_id, public=False):
"""Get extended content of a single Deviation"""
- endpoint = "deviation/content"
+ endpoint = "/deviation/content"
params = {"deviationid": deviation_id}
return self._call(endpoint, params=params, public=public)
def deviation_download(self, deviation_id, public=True):
"""Get the original file download (if allowed)"""
- endpoint = "deviation/download/" + deviation_id
+ endpoint = "/deviation/download/" + deviation_id
params = {"mature_content": self.mature}
return self._call(endpoint, params=params, public=public)
@@ -1124,7 +1138,7 @@ class DeviantartOAuthAPI():
""" Fetch deviation metadata for a set of deviations"""
if not deviations:
return []
- endpoint = "deviation/metadata?" + "&".join(
+ endpoint = "/deviation/metadata?" + "&".join(
"deviationids[{}]={}".format(num, deviation["deviationid"])
for num, deviation in enumerate(deviations)
)
@@ -1133,14 +1147,14 @@ class DeviantartOAuthAPI():
def gallery(self, username, folder_id, offset=0, extend=True, public=True):
"""Yield all Deviation-objects contained in a gallery folder"""
- endpoint = "gallery/" + folder_id
+ endpoint = "/gallery/" + folder_id
params = {"username": username, "offset": offset, "limit": 24,
"mature_content": self.mature, "mode": "newest"}
return self._pagination(endpoint, params, extend, public)
def gallery_all(self, username, offset=0):
"""Yield all Deviation-objects of a specific user"""
- endpoint = "gallery/all"
+ endpoint = "/gallery/all"
params = {"username": username, "offset": offset, "limit": 24,
"mature_content": self.mature}
return self._pagination(endpoint, params)
@@ -1148,7 +1162,7 @@ class DeviantartOAuthAPI():
@memcache(keyarg=1)
def gallery_folders(self, username, offset=0):
"""Yield all gallery folders of a specific user"""
- endpoint = "gallery/folders"
+ endpoint = "/gallery/folders"
params = {"username": username, "offset": offset, "limit": 50,
"mature_content": self.mature}
return self._pagination_list(endpoint, params)
@@ -1156,12 +1170,12 @@ class DeviantartOAuthAPI():
@memcache(keyarg=1)
def user_profile(self, username):
"""Get user profile information"""
- endpoint = "user/profile/" + username
+ endpoint = "/user/profile/" + username
return self._call(endpoint, fatal=False)
def user_friends_watch(self, username):
"""Watch a user"""
- endpoint = "user/friends/watch/" + username
+ endpoint = "/user/friends/watch/" + username
data = {
"watch[friend]" : "0",
"watch[deviations]" : "0",
@@ -1179,7 +1193,7 @@ class DeviantartOAuthAPI():
def user_friends_unwatch(self, username):
"""Unwatch a user"""
- endpoint = "user/friends/unwatch/" + username
+ endpoint = "/user/friends/unwatch/" + username
return self._call(
endpoint, method="POST", public=False, fatal=False,
).get("success")
@@ -1217,7 +1231,7 @@ class DeviantartOAuthAPI():
def _call(self, endpoint, fatal=True, public=True, **kwargs):
"""Call an API endpoint"""
- url = "https://www.deviantart.com/api/v1/oauth2/" + endpoint
+ url = "https://www.deviantart.com/api/v1/oauth2" + endpoint
kwargs["fatal"] = None
while True:
@@ -1357,7 +1371,7 @@ class DeviantartEclipseAPI():
self.log = extractor.log
def deviation_extended_fetch(self, deviation_id, user=None, kind=None):
- endpoint = "da-browse/shared_api/deviation/extended_fetch"
+ endpoint = "/da-browse/shared_api/deviation/extended_fetch"
params = {
"deviationid" : deviation_id,
"username" : user,
@@ -1367,7 +1381,7 @@ class DeviantartEclipseAPI():
return self._call(endpoint, params)
def gallery_scraps(self, user, offset=None):
- endpoint = "da-user-profile/api/gallery/contents"
+ endpoint = "/da-user-profile/api/gallery/contents"
params = {
"username" : user,
"offset" : offset,
@@ -1377,7 +1391,7 @@ class DeviantartEclipseAPI():
return self._pagination(endpoint, params)
def user_watching(self, user, offset=None):
- endpoint = "da-user-profile/api/module/watching"
+ endpoint = "/da-user-profile/api/module/watching"
params = {
"username": user,
"moduleid": self._module_id_watching(user),
@@ -1387,7 +1401,7 @@ class DeviantartEclipseAPI():
return self._pagination(endpoint, params)
def _call(self, endpoint, params=None):
- url = "https://www.deviantart.com/_napi/" + endpoint
+ url = "https://www.deviantart.com/_napi" + endpoint
headers = {"Referer": "https://www.deviantart.com/"}
response = self.extractor._limited_request(
diff --git a/gallery_dl/extractor/exhentai.py b/gallery_dl/extractor/exhentai.py
index 7ffb214..cf9706b 100644
--- a/gallery_dl/extractor/exhentai.py
+++ b/gallery_dl/extractor/exhentai.py
@@ -176,6 +176,10 @@ class ExhentaiGalleryExtractor(ExhentaiExtractor):
self.image_token = match.group(4)
self.image_num = text.parse_int(match.group(6), 1)
+ source = self.config("source")
+ if source == "hitomi":
+ self.items = self._items_hitomi
+
def items(self):
self.login()
@@ -221,6 +225,18 @@ class ExhentaiGalleryExtractor(ExhentaiExtractor):
data["_http_validate"] = None
yield Message.Url, url, data
+ def _items_hitomi(self):
+ if self.config("metadata", False):
+ data = self.metadata_from_api()
+ data["date"] = text.parse_timestamp(data["posted"])
+ else:
+ data = {}
+
+ from .hitomi import HitomiGalleryExtractor
+ url = "https://hitomi.la/galleries/{}.html".format(self.gallery_id)
+ data["_extractor"] = HitomiGalleryExtractor
+ yield Message.Queue, url, data
+
def get_metadata(self, page):
"""Extract gallery metadata"""
data = self.metadata_from_page(page)
diff --git a/gallery_dl/extractor/fanbox.py b/gallery_dl/extractor/fanbox.py
index cc6ee97..ef79808 100644
--- a/gallery_dl/extractor/fanbox.py
+++ b/gallery_dl/extractor/fanbox.py
@@ -33,7 +33,7 @@ class FanboxExtractor(Extractor):
def items(self):
if self._warning:
- if "FANBOXSESSID" not in self.session.cookies:
+ if not self._check_cookies(("FANBOXSESSID",)):
self.log.warning("no 'FANBOXSESSID' cookie set")
FanboxExtractor._warning = False
@@ -280,3 +280,24 @@ class FanboxPostExtractor(FanboxExtractor):
def posts(self):
return (self._get_post_data_from_id(self.post_id),)
+
+
+class FanboxRedirectExtractor(Extractor):
+ """Extractor for pixiv redirects to fanbox.cc"""
+ category = "fanbox"
+ subcategory = "redirect"
+ pattern = r"(?:https?://)?(?:www\.)?pixiv\.net/fanbox/creator/(\d+)"
+ test = ("https://www.pixiv.net/fanbox/creator/52336352", {
+ "pattern": FanboxCreatorExtractor.pattern,
+ })
+
+ def __init__(self, match):
+ Extractor.__init__(self, match)
+ self.user_id = match.group(1)
+
+ def items(self):
+ url = "https://www.pixiv.net/fanbox/creator/" + self.user_id
+ data = {"_extractor": FanboxCreatorExtractor}
+ response = self.request(
+ url, method="HEAD", allow_redirects=False, notfound="user")
+ yield Message.Queue, response.headers["Location"], data
diff --git a/gallery_dl/extractor/fantia.py b/gallery_dl/extractor/fantia.py
index 62f7429..89a965f 100644
--- a/gallery_dl/extractor/fantia.py
+++ b/gallery_dl/extractor/fantia.py
@@ -22,7 +22,7 @@ class FantiaExtractor(Extractor):
def items(self):
if self._warning:
- if "_session_id" not in self.session.cookies:
+ if not self._check_cookies(("_session_id",)):
self.log.warning("no '_session_id' cookie set")
FantiaExtractor._warning = False
diff --git a/gallery_dl/extractor/flickr.py b/gallery_dl/extractor/flickr.py
index 6c5c7df..2bd8c6b 100644
--- a/gallery_dl/extractor/flickr.py
+++ b/gallery_dl/extractor/flickr.py
@@ -56,7 +56,7 @@ class FlickrImageExtractor(FlickrExtractor):
subcategory = "image"
pattern = (r"(?:https?://)?(?:"
r"(?:(?:www\.|m\.)?flickr\.com/photos/[^/]+/"
- r"|[^.]+\.static\.?flickr\.com/(?:\d+/)+)(\d+)"
+ r"|[\w-]+\.static\.?flickr\.com/(?:\d+/)+)(\d+)"
r"|flic\.kr/p/([A-Za-z1-9]+))")
test = (
("https://www.flickr.com/photos/departingyyz/16089302239", {
diff --git a/gallery_dl/extractor/furaffinity.py b/gallery_dl/extractor/furaffinity.py
index b5ecbd6..891e0c1 100644
--- a/gallery_dl/extractor/furaffinity.py
+++ b/gallery_dl/extractor/furaffinity.py
@@ -22,6 +22,7 @@ class FuraffinityExtractor(Extractor):
archive_fmt = "{id}"
cookiedomain = ".furaffinity.net"
root = "https://www.furaffinity.net"
+ _warning = True
def __init__(self, match):
Extractor.__init__(self, match)
@@ -32,6 +33,12 @@ class FuraffinityExtractor(Extractor):
self._process_description = str.strip
def items(self):
+
+ if self._warning:
+ if not self._check_cookies(("a", "b")):
+ self.log.warning("no 'a' and 'b' session cookies set")
+ FuraffinityExtractor._warning = False
+
external = self.config("external", False)
metadata = self.metadata()
for post_id in util.advance(self.posts(), self.offset):
diff --git a/gallery_dl/extractor/generic.py b/gallery_dl/extractor/generic.py
new file mode 100644
index 0000000..bece905
--- /dev/null
+++ b/gallery_dl/extractor/generic.py
@@ -0,0 +1,208 @@
+# -*- coding: utf-8 -*-
+
+"""Extractor for images in a generic web page."""
+
+from .common import Extractor, Message
+from .. import config, text
+import re
+import os.path
+
+
+class GenericExtractor(Extractor):
+ """Extractor for images in a generic web page."""
+
+ category = "generic"
+ directory_fmt = ("{category}", "{pageurl}")
+ archive_fmt = "{imageurl}"
+
+ # By default, the generic extractor is disabled
+ # and the "g(eneric):" prefix in url is required.
+ # If the extractor is enabled, make the prefix optional
+ pattern = r"(?ix)(?P<generic>g(?:eneric)?:)"
+ if config.get(("extractor", "generic"), "enabled"):
+ pattern += r"?"
+
+ # The generic extractor pattern should match (almost) any valid url
+ # Based on: https://tools.ietf.org/html/rfc3986#appendix-B
+ pattern += r"""
+ (?P<scheme>https?://)? # optional http(s) scheme
+ (?P<domain>[-\w\.]+) # required domain
+ (?P<path>/[^?&#]*)? # optional path
+ (?:\?(?P<query>[^/?#]*))? # optional query
+ (?:\#(?P<fragment>.*))?$ # optional fragment
+ """
+
+ def __init__(self, match):
+ """Init."""
+ Extractor.__init__(self, match)
+
+ # Strip the "g(eneric):" prefix
+ # and inform about "forced" or "fallback" mode
+ if match.group('generic'):
+ self.log.info("Forcing use of generic information extractor.")
+ self.url = match.group(0).partition(":")[2]
+ else:
+ self.log.info("Falling back on generic information extractor.")
+ self.url = match.group(0)
+
+ # Make sure we have a scheme, or use https
+ if match.group('scheme'):
+ self.scheme = match.group('scheme')
+ else:
+ self.scheme = 'https://'
+ self.url = self.scheme + self.url
+
+ # Used to resolve relative image urls
+ self.root = self.scheme + match.group('domain')
+
+ def items(self):
+ """Get page, extract metadata & images, yield them in suitable messages.
+
+ Adapted from common.GalleryExtractor.items()
+
+ """
+ page = self.request(self.url).text
+ data = self.metadata(page)
+ imgs = self.images(page)
+
+ try:
+ data["count"] = len(imgs)
+ except TypeError:
+ pass
+ images = enumerate(imgs, 1)
+
+ yield Message.Version, 1
+ yield Message.Directory, data
+
+ for data["num"], (url, imgdata) in images:
+ if imgdata:
+ data.update(imgdata)
+ if "extension" not in imgdata:
+ text.nameext_from_url(url, data)
+ else:
+ text.nameext_from_url(url, data)
+ yield Message.Url, url, data
+
+ def metadata(self, page):
+ """Extract generic webpage metadata, return them in a dict."""
+ data = {}
+ data['pageurl'] = self.url
+ data['title'] = text.extract(page, '<title>', "</title>")[0] or ""
+ data['description'] = text.extract(
+ page, '<meta name="description" content="', '"')[0] or ""
+ data['keywords'] = text.extract(
+ page, '<meta name="keywords" content="', '"')[0] or ""
+ data['language'] = text.extract(
+ page, '<meta name="language" content="', '"')[0] or ""
+ data['name'] = text.extract(
+ page, '<meta itemprop="name" content="', '"')[0] or ""
+ data['copyright'] = text.extract(
+ page, '<meta name="copyright" content="', '"')[0] or ""
+ data['og_site'] = text.extract(
+ page, '<meta property="og:site" content="', '"')[0] or ""
+ data['og_site_name'] = text.extract(
+ page, '<meta property="og:site_name" content="', '"')[0] or ""
+ data['og_title'] = text.extract(
+ page, '<meta property="og:title" content="', '"')[0] or ""
+ data['og_description'] = text.extract(
+ page, '<meta property="og:description" content="', '"')[0] or ""
+
+ data = {k: text.unescape(data[k]) for k in data if data[k] != ""}
+
+ return data
+
+ def images(self, page):
+ """Extract image urls, return a list of (image url, metadata) tuples.
+
+ The extractor aims at finding as many _likely_ image urls as possible,
+ using two strategies (see below); since these often overlap, any
+ duplicate urls will be removed at the end of the process.
+
+ Note: since we are using re.findall() (see below), it's essential that
+ the following patterns contain 0 or at most 1 capturing group, so that
+ re.findall() return a list of urls (instead of a list of tuples of
+ matching groups). All other groups used in the pattern should be
+ non-capturing (?:...).
+
+ 1: Look in src/srcset attributes of img/video/source elements
+
+ See:
+ https://www.w3schools.com/tags/att_src.asp
+ https://www.w3schools.com/tags/att_source_srcset.asp
+
+ We allow both absolute and relative urls here.
+
+ Note that srcset attributes often contain multiple space separated
+ image urls; this pattern matches only the first url; remaining urls
+ will be matched by the "imageurl_pattern_ext" pattern below.
+ """
+ imageurl_pattern_src = r"""(?ix)
+ <(?:img|video|source)\s.*? # <img>, <video> or <source>
+ src(?:set)?=["']? # src or srcset attributes
+ (?P<URL>[^"'\s>]+) # url
+ """
+
+ """
+ 2: Look anywhere for urls containing common image/video extensions
+
+ The list of allowed extensions is borrowed from the directlink.py
+ extractor; other could be added, see
+ https://en.wikipedia.org/wiki/List_of_file_formats
+
+ Compared to the "pattern" class variable, here we must exclude also
+ other special characters (space, ", ', >), since we are looking for
+ urls in html tags.
+ """
+
+ imageurl_pattern_ext = r"""(?ix)
+ (?:[^?&#"'>\s]+) # anything until dot+extension
+ \.(?:jpe?g|jpe|png|gif
+ |web[mp]|mp4|mkv|og[gmv]|opus) # dot + image/video extensions
+ (?:[^"'>\s]*)? # optional query and fragment
+ """
+
+ imageurls_src = re.findall(imageurl_pattern_src, page)
+ imageurls_ext = re.findall(imageurl_pattern_ext, page)
+ imageurls = imageurls_src + imageurls_ext
+
+ # Resolve relative urls
+ #
+ # Image urls catched so far may be relative, so we must resolve them
+ # by prepending a suitable base url.
+ #
+ # If the page contains a <base> element, use it as base url
+ basematch = re.search(
+ r"(?i)(?:<base\s.*?href=[\"']?)(?P<url>[^\"' >]+)", page)
+ if basematch:
+ self.baseurl = basematch.group('url').rstrip('/')
+ # Otherwise, extract the base url from self.url
+ else:
+ if self.url.endswith("/"):
+ self.baseurl = self.url.rstrip('/')
+ else:
+ self.baseurl = os.path.dirname(self.url)
+
+ # Build the list of absolute image urls
+ absimageurls = []
+ for u in imageurls:
+ # Absolute urls are taken as-is
+ if u.startswith('http'):
+ absimageurls.append(u)
+ # // relative urls are prefixed with current scheme
+ elif u.startswith('//'):
+ absimageurls.append(self.scheme + u.lstrip('/'))
+ # / relative urls are prefixed with current scheme+domain
+ elif u.startswith('/'):
+ absimageurls.append(self.root + u)
+ # other relative urls are prefixed with baseurl
+ else:
+ absimageurls.append(self.baseurl + '/' + u)
+
+ # Remove duplicates
+ absimageurls = set(absimageurls)
+
+ # Create the image metadata dict and add imageurl to it
+ # (image filename and extension are added by items())
+ images = [(u, {'imageurl': u}) for u in absimageurls]
+
+ return images
diff --git a/gallery_dl/extractor/hitomi.py b/gallery_dl/extractor/hitomi.py
index a4ce925..88cf98c 100644
--- a/gallery_dl/extractor/hitomi.py
+++ b/gallery_dl/extractor/hitomi.py
@@ -10,9 +10,11 @@
from .common import GalleryExtractor, Extractor, Message
from .nozomi import decode_nozomi
+from ..cache import memcache
from .. import text, util
import string
import json
+import re
class HitomiGalleryExtractor(GalleryExtractor):
@@ -24,8 +26,10 @@ class HitomiGalleryExtractor(GalleryExtractor):
r"/(?:[^/?#]+-)?(\d+)")
test = (
("https://hitomi.la/galleries/867789.html", {
- "pattern": r"https://[a-c]b.hitomi.la/images/./../[0-9a-f]+.jpg",
+ "pattern": r"https://[a-c]b.hitomi.la/images/1639745412/\d+"
+ r"/[0-9a-f]{64}\.jpg",
"keyword": "4873ef9a523621fc857b114e0b2820ba4066e9ae",
+ "options": (("metadata", True),),
"count": 16,
}),
# download test
@@ -35,12 +39,12 @@ class HitomiGalleryExtractor(GalleryExtractor):
}),
# Game CG with scenes (#321)
("https://hitomi.la/galleries/733697.html", {
- "url": "0cb629ab2bfe93d994a7972f68ad2a5a64ecc161",
+ "url": "479d16fe92117a6a2ce81b4e702e6347922c81e3",
"count": 210,
}),
# fallback for galleries only available through /reader/ URLs
("https://hitomi.la/galleries/1045954.html", {
- "url": "b420755d56a1135104ca8ca0765f44e290db70c3",
+ "url": "ebc1415c5d7f634166ef7e2635b77735de1ea7a2",
"count": 1413,
}),
# gallery with "broken" redirect
@@ -71,7 +75,7 @@ class HitomiGalleryExtractor(GalleryExtractor):
self.info = info = json.loads(page.partition("=")[2])
data = self._data_from_gallery_info(info)
- if self.config("metadata", True):
+ if self.config("metadata", False):
data.update(self._data_from_gallery_page(info))
return data
@@ -133,19 +137,19 @@ class HitomiGalleryExtractor(GalleryExtractor):
}
def images(self, _):
+ # see https://ltn.hitomi.la/gg.js
+ gg_m, gg_b = _parse_gg(self)
+
result = []
for image in self.info["files"]:
ihash = image["hash"]
idata = text.nameext_from_url(image["name"])
# see https://ltn.hitomi.la/common.js
- inum = int(ihash[-3:-1], 16)
- offset = 1 if inum < 0x7c else 0
-
+ inum = int(ihash[-1] + ihash[-3:-1], 16)
url = "https://{}b.hitomi.la/images/{}/{}/{}.{}".format(
- chr(97 + offset),
- ihash[-1], ihash[-3:-1], ihash,
- idata["extension"],
+ chr(97 + gg_m.get(inum, 0)),
+ gg_b, inum, ihash, idata["extension"],
)
result.append((url, idata))
return result
@@ -185,3 +189,16 @@ class HitomiTagExtractor(Extractor):
for gallery_id in decode_nozomi(self.request(url).content):
url = "https://hitomi.la/galleries/{}.html".format(gallery_id)
yield Message.Queue, url, data
+
+
+@memcache()
+def _parse_gg(extr):
+ page = extr.request("https://ltn.hitomi.la/gg.js").text
+
+ m = {
+ int(match.group(1)): int(match.group(2))
+ for match in re.finditer(r"case (\d+): o = (\d+); break;", page)
+ }
+ b = re.search(r"b:\s*[\"'](.+)[\"']", page)
+
+ return m, b.group(1).strip("/")
diff --git a/gallery_dl/extractor/imgbb.py b/gallery_dl/extractor/imgbb.py
index 1e875f0..f32093a 100644
--- a/gallery_dl/extractor/imgbb.py
+++ b/gallery_dl/extractor/imgbb.py
@@ -169,7 +169,7 @@ class ImgbbAlbumExtractor(ImgbbExtractor):
class ImgbbUserExtractor(ImgbbExtractor):
"""Extractor for user profiles in imgbb.com"""
subcategory = "user"
- pattern = r"(?:https?://)?([^.]+)\.imgbb\.com/?(?:\?([^#]+))?$"
+ pattern = r"(?:https?://)?([\w-]+)\.imgbb\.com/?(?:\?([^#]+))?$"
test = ("https://folkie.imgbb.com", {
"range": "1-80",
"pattern": r"https?://i\.ibb\.co/\w+/[^/?#]+",
diff --git a/gallery_dl/extractor/inkbunny.py b/gallery_dl/extractor/inkbunny.py
index 3d09d79..8ee8ca9 100644
--- a/gallery_dl/extractor/inkbunny.py
+++ b/gallery_dl/extractor/inkbunny.py
@@ -205,6 +205,28 @@ class InkbunnyFavoriteExtractor(InkbunnyExtractor):
return self.api.search(params)
+class InkbunnySearchExtractor(InkbunnyExtractor):
+ """Extractor for inkbunny search results"""
+ subcategory = "search"
+ pattern = (BASE_PATTERN +
+ r"/submissionsviewall\.php\?([^#]+&mode=search&[^#]+)")
+ test = (("https://inkbunny.net/submissionsviewall.php?rid=ffffffffff"
+ "&mode=search&page=1&orderby=create_datetime&text=cute"
+ "&stringtype=and&keywords=yes&title=yes&description=no&artist="
+ "&favsby=&type=&days=&keyword_id=&user_id=&random=&md5="), {
+ "range": "1-10",
+ "count": 10,
+ })
+
+ def __init__(self, match):
+ InkbunnyExtractor.__init__(self, match)
+ self.params = text.parse_query(match.group(1))
+ self.params.pop("rid", None)
+
+ def posts(self):
+ return self.api.search(self.params)
+
+
class InkbunnyFollowingExtractor(InkbunnyExtractor):
"""Extractor for inkbunny user watches"""
subcategory = "following"
diff --git a/gallery_dl/extractor/instagram.py b/gallery_dl/extractor/instagram.py
index a1dd465..781bf01 100644
--- a/gallery_dl/extractor/instagram.py
+++ b/gallery_dl/extractor/instagram.py
@@ -174,10 +174,16 @@ class InstagramExtractor(Extractor):
if post.get("is_video") and "video_url" not in post:
url = "{}/tv/{}/".format(self.root, post["shortcode"])
post = self._extract_post_page(url)
+ if "items" in post:
+ return self._parse_post_api({"media": post["items"][0]})
+ post = post["graphql"]["shortcode_media"]
elif typename == "GraphSidecar" and \
"edge_sidecar_to_children" not in post:
url = "{}/p/{}/".format(self.root, post["shortcode"])
post = self._extract_post_page(url)
+ if "items" in post:
+ return self._parse_post_api({"media": post["items"][0]})
+ post = post["graphql"]["shortcode_media"]
owner = post["owner"]
data = {
@@ -347,7 +353,7 @@ class InstagramExtractor(Extractor):
data = self._extract_shared_data(url)["entry_data"]
if "HttpErrorPage" in data:
raise exception.NotFoundError("post")
- return data["PostPage"][0]["graphql"]["shortcode_media"]
+ return data["PostPage"][0]
def _get_edge_data(self, user, key):
cursor = self.config("cursor")
@@ -564,7 +570,7 @@ class InstagramPostExtractor(InstagramExtractor):
"""Extractor for an Instagram post"""
subcategory = "post"
pattern = (r"(?:https?://)?(?:www\.)?instagram\.com"
- r"/(?:p|tv|reel)/([^/?#]+)")
+ r"/(?:[^/?#]+/)?(?:p|tv|reel)/([^/?#]+)")
test = (
# GraphImage
("https://www.instagram.com/p/BqvsDleB3lV/", {
@@ -663,6 +669,9 @@ class InstagramPostExtractor(InstagramExtractor):
}
}),
+ # URL with username (#2085)
+ ("https://www.instagram.com/dm/p/CW042g7B9CY/"),
+
("https://www.instagram.com/reel/CDg_6Y1pxWu/"),
)
@@ -686,14 +695,15 @@ class InstagramStoriesExtractor(InstagramExtractor):
"""Extractor for Instagram stories"""
subcategory = "stories"
pattern = (r"(?:https?://)?(?:www\.)?instagram\.com"
- r"/stories/(?:highlights/(\d+)|([^/?#]+))")
+ r"/stories/(?:highlights/(\d+)|([^/?#]+)(?:/(\d+))?)")
test = (
("https://www.instagram.com/stories/instagram/"),
("https://www.instagram.com/stories/highlights/18042509488170095/"),
+ ("https://instagram.com/stories/geekmig/2724343156064789461"),
)
def __init__(self, match):
- self.highlight_id, self.user = match.groups()
+ self.highlight_id, self.user, self.media_id = match.groups()
if self.highlight_id:
self.subcategory = InstagramHighlightsExtractor.subcategory
InstagramExtractor.__init__(self, match)
@@ -712,7 +722,18 @@ class InstagramStoriesExtractor(InstagramExtractor):
endpoint = "/v1/feed/reels_media/"
params = {"reel_ids": reel_id}
- return self._request_api(endpoint, params=params)["reels"].values()
+ reels = self._request_api(endpoint, params=params)["reels"]
+
+ if self.media_id:
+ reel = reels[reel_id]
+ for item in reel["items"]:
+ if item["pk"] == self.media_id:
+ reel["items"] = (item,)
+ break
+ else:
+ raise exception.NotFoundError("story")
+
+ return reels.values()
class InstagramHighlightsExtractor(InstagramExtractor):
diff --git a/gallery_dl/extractor/keenspot.py b/gallery_dl/extractor/keenspot.py
index 4012760..50ce0d3 100644
--- a/gallery_dl/extractor/keenspot.py
+++ b/gallery_dl/extractor/keenspot.py
@@ -19,7 +19,7 @@ class KeenspotComicExtractor(Extractor):
directory_fmt = ("{category}", "{comic}")
filename_fmt = "{filename}.{extension}"
archive_fmt = "{comic}_{filename}"
- pattern = r"(?:https?://)?(?!www\.|forums\.)([^.]+)\.keenspot\.com(/.+)?"
+ pattern = r"(?:https?://)?(?!www\.|forums\.)([\w-]+)\.keenspot\.com(/.+)?"
test = (
("http://marksmen.keenspot.com/", { # link
"range": "1-3",
diff --git a/gallery_dl/extractor/kemonoparty.py b/gallery_dl/extractor/kemonoparty.py
index 6483278..f1d7bcf 100644
--- a/gallery_dl/extractor/kemonoparty.py
+++ b/gallery_dl/extractor/kemonoparty.py
@@ -14,7 +14,7 @@ from ..cache import cache
import itertools
import re
-BASE_PATTERN = r"(?:https?://)?(?:www\.)?kemono\.party"
+BASE_PATTERN = r"(?:https?://)?(?:www\.)?(kemono|coomer)\.party"
USER_PATTERN = BASE_PATTERN + r"/([^/?#]+)/user/([^/?#]+)"
@@ -27,17 +27,30 @@ class KemonopartyExtractor(Extractor):
archive_fmt = "{service}_{user}_{id}_{num}"
cookiedomain = ".kemono.party"
+ def __init__(self, match):
+ if match.group(1) == "coomer":
+ self.category = "coomerparty"
+ self.root = "https://coomer.party"
+ self.cookiedomain = ".coomer.party"
+ Extractor.__init__(self, match)
+
def items(self):
self._prepare_ddosguard_cookies()
self._find_inline = re.compile(
- r'src="(?:https?://kemono\.party)?(/inline/[^"]+'
+ r'src="(?:https?://(?:kemono|coomer)\.party)?(/inline/[^"]+'
r'|/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{64}\.[^"]+)').findall
find_hash = re.compile("/[0-9a-f]{2}/[0-9a-f]{2}/([0-9a-f]{64})").match
generators = self._build_file_generators(self.config("files"))
comments = self.config("comments")
username = dms = None
+ # prevent files from coomer.party to be sent with gzip compression
+ if "coomer" in self.root:
+ headers = {"Accept-Encoding": "identity"}
+ else:
+ headers = None
+
if self.config("metadata"):
username = text.unescape(text.extract(
self.request(self.user_url).text,
@@ -83,10 +96,11 @@ class KemonopartyExtractor(Extractor):
post["type"] = file["type"]
post["num"] += 1
+ post["_http_headers"] = headers
if url[0] == "/":
url = self.root + "/data" + url
- elif url.startswith("https://kemono.party"):
+ elif url.startswith(self.root):
url = self.root + "/data" + url[20:]
text.nameext_from_url(file["name"], post)
@@ -129,7 +143,7 @@ class KemonopartyExtractor(Extractor):
def _build_file_generators(self, filetypes):
if filetypes is None:
- return (self._file, self._attachments, self._inline)
+ return (self._attachments, self._file, self._inline)
genmap = {
"file" : self._file,
"attachments": self._attachments,
@@ -191,8 +205,9 @@ class KemonopartyUserExtractor(KemonopartyExtractor):
)
def __init__(self, match):
+ _, service, user_id, offset = match.groups()
+ self.subcategory = service
KemonopartyExtractor.__init__(self, match)
- service, user_id, offset = match.groups()
self.api_url = "{}/api/{}/user/{}".format(self.root, service, user_id)
self.user_url = "{}/{}/user/{}".format(self.root, service, user_id)
self.offset = text.parse_int(offset)
@@ -233,7 +248,7 @@ class KemonopartyPostExtractor(KemonopartyExtractor):
"published": "Sun, 11 Aug 2019 02:09:04 GMT",
"service": "fanbox",
"shared_file": False,
- "subcategory": "post",
+ "subcategory": "fanbox",
"title": "c96取り置き",
"type": "file",
"user": "6993449",
@@ -249,7 +264,7 @@ class KemonopartyPostExtractor(KemonopartyExtractor):
# kemono.party -> data.kemono.party
("https://kemono.party/gumroad/user/trylsc/post/IURjT", {
"pattern": r"https://kemono\.party/data/("
- r"files/gumroad/trylsc/IURjT/reward8\.jpg|"
+ r"a4/7b/a47bfe938d8c1682eef06e885927484cd8df1b.+\.jpg|"
r"c6/04/c6048f5067fd9dbfa7a8be565ac194efdfb6e4.+\.zip)",
}),
# username (#1548, #1652)
@@ -272,13 +287,19 @@ class KemonopartyPostExtractor(KemonopartyExtractor):
"date": "2021-07-31 02:47:51.327865",
}]},
}),
+ # coomer.party (#2100)
+ ("https://coomer.party/onlyfans/user/alinity/post/125962203", {
+ "pattern": r"https://coomer\.party/data/7d/3f/7d3fd9804583dc224968"
+ r"c0591163ec91794552b04f00a6c2f42a15b68231d5a8\.jpg",
+ }),
("https://kemono.party/subscribestar/user/alcorart/post/184330"),
("https://www.kemono.party/subscribestar/user/alcorart/post/184330"),
)
def __init__(self, match):
+ _, service, user_id, post_id = match.groups()
+ self.subcategory = service
KemonopartyExtractor.__init__(self, match)
- service, user_id, post_id = match.groups()
self.api_url = "{}/api/{}/user/{}/post/{}".format(
self.root, service, user_id, post_id)
self.user_url = "{}/{}/user/{}".format(self.root, service, user_id)
@@ -319,7 +340,7 @@ class KemonopartyDiscordExtractor(KemonopartyExtractor):
def __init__(self, match):
KemonopartyExtractor.__init__(self, match)
- self.server, self.channel, self.channel_name = match.groups()
+ _, self.server, self.channel, self.channel_name = match.groups()
def items(self):
self._prepare_ddosguard_cookies()
@@ -353,7 +374,7 @@ class KemonopartyDiscordExtractor(KemonopartyExtractor):
url = file["path"]
if url[0] == "/":
url = self.root + "/data" + url
- elif url.startswith("https://kemono.party"):
+ elif url.startswith(self.root):
url = self.root + "/data" + url[20:]
text.nameext_from_url(file["name"], post)
@@ -392,7 +413,7 @@ class KemonopartyDiscordServerExtractor(KemonopartyExtractor):
def __init__(self, match):
KemonopartyExtractor.__init__(self, match)
- self.server = match.group(1)
+ self.server = match.group(2)
def items(self):
url = "{}/api/discord/channels/lookup?q={}".format(
diff --git a/gallery_dl/extractor/lolisafe.py b/gallery_dl/extractor/lolisafe.py
new file mode 100644
index 0000000..cdaf22b
--- /dev/null
+++ b/gallery_dl/extractor/lolisafe.py
@@ -0,0 +1,79 @@
+# -*- coding: utf-8 -*-
+
+# Copyright 2021 Mike Fährmann
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+
+"""Extractors for lolisafe/chibisafe instances"""
+
+from .common import BaseExtractor, Message
+from .. import text
+
+
+class LolisafeExtractor(BaseExtractor):
+ """Base class for lolisafe extractors"""
+ basecategory = "lolisafe"
+ directory_fmt = ("{category}", "{album_name} ({album_id})")
+ archive_fmt = "{album_id}_{id}"
+
+
+BASE_PATTERN = LolisafeExtractor.update({
+ "bunkr": {"root": "https://bunkr.is", "pattern": r"bunkr\.(?:is|to)"},
+ "zzzz" : {"root": "https://zz.ht" , "pattern": r"zz\.(?:ht|fo)"},
+})
+
+
+class LolisafelbumExtractor(LolisafeExtractor):
+ subcategory = "album"
+ pattern = BASE_PATTERN + "/a/([^/?#]+)"
+ test = (
+ ("https://bunkr.is/a/Lktg9Keq", {
+ "pattern": r"https://cdn\.bunkr\.is/test-テスト-\"&>-QjgneIQv\.png",
+ "content": "0c8768055e4e20e7c7259608b67799171b691140",
+ "keyword": {
+ "album_id": "Lktg9Keq",
+ "album_name": 'test テスト "&>',
+ "count": 1,
+ "filename": 'test-テスト-"&>-QjgneIQv',
+ "id": "QjgneIQv",
+ "name": 'test-テスト-"&>',
+ "num": int,
+ },
+ }),
+ ("https://bunkr.to/a/Lktg9Keq"),
+ ("https://zz.ht/a/lop7W6EZ", {
+ "pattern": r"https://z\.zz\.fo/(4anuY|ih560)\.png",
+ "count": 2,
+ "keyword": {
+ "album_id": "lop7W6EZ",
+ "album_name": "ferris",
+ },
+ }),
+ ("https://zz.fo/a/lop7W6EZ"),
+ )
+
+ def __init__(self, match):
+ LolisafeExtractor.__init__(self, match)
+ self.album_id = match.group(match.lastindex)
+
+ def items(self):
+ files, data = self.fetch_album(self.album_id)
+
+ yield Message.Directory, data
+ for data["num"], file in enumerate(files, 1):
+ url = file["file"]
+ text.nameext_from_url(url, data)
+ data["name"], sep, data["id"] = data["filename"].rpartition("-")
+ yield Message.Url, url, data
+
+ def fetch_album(self, album_id):
+ url = "{}/api/album/get/{}".format(self.root, album_id)
+ data = self.request(url).json()
+
+ return data["files"], {
+ "album_id" : self.album_id,
+ "album_name": text.unescape(data["title"]),
+ "count" : data["count"],
+ }
diff --git a/gallery_dl/extractor/myportfolio.py b/gallery_dl/extractor/myportfolio.py
index 5c202f3..f06ab70 100644
--- a/gallery_dl/extractor/myportfolio.py
+++ b/gallery_dl/extractor/myportfolio.py
@@ -20,8 +20,8 @@ class MyportfolioGalleryExtractor(Extractor):
filename_fmt = "{num:>02}.{extension}"
archive_fmt = "{user}_{filename}"
pattern = (r"(?:myportfolio:(?:https?://)?([^/]+)|"
- r"(?:https?://)?([^.]+\.myportfolio\.com))"
- r"(/[^/?#]+)?")
+ r"(?:https?://)?([\w-]+\.myportfolio\.com))"
+ r"(/[^/?&#]+)?")
test = (
("https://andrewling.myportfolio.com/volvo-xc-90-hybrid", {
"url": "acea0690c76db0e5cf267648cefd86e921bc3499",
diff --git a/gallery_dl/extractor/newgrounds.py b/gallery_dl/extractor/newgrounds.py
index a699401..4351b3e 100644
--- a/gallery_dl/extractor/newgrounds.py
+++ b/gallery_dl/extractor/newgrounds.py
@@ -420,7 +420,7 @@ class NewgroundsFavoriteExtractor(NewgroundsExtractor):
"""Extractor for posts favorited by a newgrounds user"""
subcategory = "favorite"
directory_fmt = ("{category}", "{user}", "Favorites")
- pattern = (r"(?:https?://)?([^.]+)\.newgrounds\.com"
+ pattern = (r"(?:https?://)?([\w-]+)\.newgrounds\.com"
r"/favorites(?!/following)(?:/(art|audio|movies))?/?")
test = (
("https://tomfulp.newgrounds.com/favorites/art", {
@@ -475,7 +475,7 @@ class NewgroundsFavoriteExtractor(NewgroundsExtractor):
class NewgroundsFollowingExtractor(NewgroundsFavoriteExtractor):
"""Extractor for a newgrounds user's favorited users"""
subcategory = "following"
- pattern = r"(?:https?://)?([^.]+)\.newgrounds\.com/favorites/(following)"
+ pattern = r"(?:https?://)?([\w-]+)\.newgrounds\.com/favorites/(following)"
test = ("https://tomfulp.newgrounds.com/favorites/following", {
"pattern": NewgroundsUserExtractor.pattern,
"range": "76-125",
diff --git a/gallery_dl/extractor/patreon.py b/gallery_dl/extractor/patreon.py
index 62e4f58..f8c80ef 100644
--- a/gallery_dl/extractor/patreon.py
+++ b/gallery_dl/extractor/patreon.py
@@ -29,7 +29,7 @@ class PatreonExtractor(Extractor):
def items(self):
if self._warning:
- if "session_id" not in self.session.cookies:
+ if not self._check_cookies(("session_id",)):
self.log.warning("no 'session_id' cookie set")
PatreonExtractor._warning = False
generators = self._build_file_generators(self.config("files"))
diff --git a/gallery_dl/extractor/philomena.py b/gallery_dl/extractor/philomena.py
index 51a0d38..6377fb0 100644
--- a/gallery_dl/extractor/philomena.py
+++ b/gallery_dl/extractor/philomena.py
@@ -46,7 +46,7 @@ class PhilomenaExtractor(BooruExtractor):
try:
params["filter_id"] = INSTANCES[self.category]["filter_id"]
except (KeyError, TypeError):
- pass
+ params["filter_id"] = "2"
while True:
data = self.request(url, params=params).json()
diff --git a/gallery_dl/extractor/photobucket.py b/gallery_dl/extractor/photobucket.py
index bea0276..1993ab6 100644
--- a/gallery_dl/extractor/photobucket.py
+++ b/gallery_dl/extractor/photobucket.py
@@ -21,8 +21,8 @@ class PhotobucketAlbumExtractor(Extractor):
directory_fmt = ("{category}", "{username}", "{location}")
filename_fmt = "{offset:>03}{pictureId:?_//}_{titleOrFilename}.{extension}"
archive_fmt = "{id}"
- pattern = (r"(?:https?://)?((?:[^.]+\.)?photobucket\.com)"
- r"/user/[^/?#]+/library(?:/[^?#]*)?")
+ pattern = (r"(?:https?://)?((?:[\w-]+\.)?photobucket\.com)"
+ r"/user/[^/?&#]+/library(?:/[^?&#]*)?")
test = (
("https://s369.photobucket.com/user/CrpyLrkr/library", {
"pattern": r"https?://[oi]+\d+.photobucket.com/albums/oo139/",
@@ -109,9 +109,9 @@ class PhotobucketImageExtractor(Extractor):
directory_fmt = ("{category}", "{username}")
filename_fmt = "{pictureId:?/_/}{titleOrFilename}.{extension}"
archive_fmt = "{username}_{id}"
- pattern = (r"(?:https?://)?(?:[^.]+\.)?photobucket\.com"
- r"(?:/gallery/user/([^/?#]+)/media/([^/?#]+)"
- r"|/user/([^/?#]+)/media/[^?#]+\.html)")
+ pattern = (r"(?:https?://)?(?:[\w-]+\.)?photobucket\.com"
+ r"(?:/gallery/user/([^/?&#]+)/media/([^/?&#]+)"
+ r"|/user/([^/?&#]+)/media/[^?&#]+\.html)")
test = (
(("https://s271.photobucket.com/user/lakerfanryan"
"/media/Untitled-3-1.jpg.html"), {
diff --git a/gallery_dl/extractor/pixiv.py b/gallery_dl/extractor/pixiv.py
index 8e47e2e..8943747 100644
--- a/gallery_dl/extractor/pixiv.py
+++ b/gallery_dl/extractor/pixiv.py
@@ -456,7 +456,9 @@ class PixivSearchExtractor(PixivExtractor):
self.sort = self.target = None
def works(self):
- return self.api.search_illust(self.word, self.sort, self.target)
+ return self.api.search_illust(
+ self.word, self.sort, self.target,
+ date_start=self.date_start, date_end=self.date_end)
def metadata(self):
query = text.parse_query(self.query)
@@ -489,10 +491,15 @@ class PixivSearchExtractor(PixivExtractor):
target = "s_tag"
self.target = target_map[target]
+ self.date_start = query.get("scd")
+ self.date_end = query.get("ecd")
+
return {"search": {
"word": self.word,
"sort": self.sort,
"target": self.target,
+ "date_start": self.date_start,
+ "date_end": self.date_end,
}}
@@ -710,9 +717,11 @@ class PixivAppAPI():
params = {"illust_id": illust_id}
return self._pagination("v2/illust/related", params)
- def search_illust(self, word, sort=None, target=None, duration=None):
+ def search_illust(self, word, sort=None, target=None, duration=None,
+ date_start=None, date_end=None):
params = {"word": word, "search_target": target,
- "sort": sort, "duration": duration}
+ "sort": sort, "duration": duration,
+ "start_date": date_start, "end_date": date_end}
return self._pagination("v1/search/illust", params)
def user_bookmarks_illust(self, user_id, tag=None, restrict="public"):
diff --git a/gallery_dl/extractor/pixnet.py b/gallery_dl/extractor/pixnet.py
index 98928d6..a52071e 100644
--- a/gallery_dl/extractor/pixnet.py
+++ b/gallery_dl/extractor/pixnet.py
@@ -12,7 +12,7 @@ from .common import Extractor, Message
from .. import text, exception
-BASE_PATTERN = r"(?:https?://)?(?!www\.)([^.]+)\.pixnet.net"
+BASE_PATTERN = r"(?:https?://)?(?!www\.)([\w-]+)\.pixnet.net"
class PixnetExtractor(Extractor):
diff --git a/gallery_dl/extractor/pornhub.py b/gallery_dl/extractor/pornhub.py
index f976e82..f8497c0 100644
--- a/gallery_dl/extractor/pornhub.py
+++ b/gallery_dl/extractor/pornhub.py
@@ -12,7 +12,7 @@ from .common import Extractor, Message
from .. import text, exception
-BASE_PATTERN = r"(?:https?://)?(?:[^.]+\.)?pornhub\.com"
+BASE_PATTERN = r"(?:https?://)?(?:[\w-]+\.)?pornhub\.com"
class PornhubExtractor(Extractor):
diff --git a/gallery_dl/extractor/rule34us.py b/gallery_dl/extractor/rule34us.py
new file mode 100644
index 0000000..00b6972
--- /dev/null
+++ b/gallery_dl/extractor/rule34us.py
@@ -0,0 +1,130 @@
+# -*- coding: utf-8 -*-
+
+# Copyright 2021 Mike Fährmann
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+
+"""Extractors for https://rule34.us/"""
+
+from .booru import BooruExtractor
+from .. import text
+import re
+import collections
+
+
+class Rule34usExtractor(BooruExtractor):
+ category = "rule34us"
+ root = "https://rule34.us"
+ per_page = 42
+
+ def __init__(self, match):
+ BooruExtractor.__init__(self, match)
+ self._find_tags = re.compile(
+ r'<li class="([^-"]+)-tag"[^>]*><a href="[^;"]+;q=([^"]+)').findall
+
+ def _parse_post(self, post_id):
+ url = "{}/index.php?r=posts/view&id={}".format(self.root, post_id)
+ page = self.request(url).text
+ extr = text.extract_from(page)
+
+ post = {
+ "id" : post_id,
+ "tags" : text.unescape(extr(
+ 'name="keywords" content="', '"').rstrip(", ")),
+ "uploader": text.extract(extr('Added by: ', '</li>'), ">", "<")[0],
+ "score" : text.extract(extr('Score: ', '> - <'), ">", "<")[0],
+ "width" : extr('Size: ', 'w'),
+ "height" : extr(' x ', 'h'),
+ "file_url": extr(' src="', '"'),
+ }
+ post["md5"] = post["file_url"].rpartition("/")[2].partition(".")[0]
+
+ tags = collections.defaultdict(list)
+ for tag_type, tag_name in self._find_tags(page):
+ tags[tag_type].append(text.unquote(tag_name))
+ for key, value in tags.items():
+ post["tags_" + key] = " ".join(value)
+
+ return post
+
+
+class Rule34usTagExtractor(Rule34usExtractor):
+ subcategory = "tag"
+ directory_fmt = ("{category}", "{search_tags}")
+ archive_fmt = "t_{search_tags}_{id}"
+ pattern = r"(?:https?://)?rule34\.us/index\.php\?r=posts/index&q=([^&#]+)"
+ test = ("https://rule34.us/index.php?r=posts/index&q=[terios]_elysion", {
+ "pattern": r"https://img\d*\.rule34\.us"
+ r"/images/../../[0-9a-f]{32}\.\w+",
+ "count": 10,
+ })
+
+ def __init__(self, match):
+ Rule34usExtractor.__init__(self, match)
+ self.tags = text.unquote(match.group(1).replace("+", " "))
+
+ def metadata(self):
+ return {"search_tags": self.tags}
+
+ def posts(self):
+ url = self.root + "/index.php"
+ params = {
+ "r" : "posts/index",
+ "q" : self.tags,
+ "page": self.page_start,
+ }
+
+ while True:
+ page = self.request(url, params=params).text
+
+ cnt = 0
+ for post_id in text.extract_iter(page, '><a id="', '"'):
+ yield self._parse_post(post_id)
+ cnt += 1
+
+ if cnt < self.per_page:
+ return
+
+ if "page" in params:
+ del params["page"]
+ params["q"] = self.tags + " id:<" + post_id
+
+
+class Rule34usPostExtractor(Rule34usExtractor):
+ subcategory = "post"
+ archive_fmt = "{id}"
+ pattern = r"(?:https?://)?rule34\.us/index\.php\?r=posts/view&id=(\d+)"
+ test = (
+ ("https://rule34.us/index.php?r=posts/view&id=3709005", {
+ "pattern": r"https://img\d*\.rule34\.us/images/14/7b"
+ r"/147bee6fc2e13f73f5f9bac9d4930b13\.png",
+ "content": "d714342ea84050f82dda5f0c194d677337abafc5",
+ }),
+ ("https://rule34.us/index.php?r=posts/view&id=4576310", {
+ "pattern": r"https://video\.rule34\.us/images/a2/94"
+ r"/a294ff8e1f8e0efa041e5dc9d1480011\.mp4",
+ "keyword": {
+ "extension": "mp4",
+ "file_url": str,
+ "filename": "a294ff8e1f8e0efa041e5dc9d1480011",
+ "height": "3982",
+ "id": "4576310",
+ "md5": "a294ff8e1f8e0efa041e5dc9d1480011",
+ "score": r"re:\d+",
+ "tags": "tagme, video",
+ "tags_general": "video",
+ "tags_metadata": "tagme",
+ "uploader": "Anonymous",
+ "width": "3184",
+ },
+ }),
+ )
+
+ def __init__(self, match):
+ Rule34usExtractor.__init__(self, match)
+ self.post_id = match.group(1)
+
+ def posts(self):
+ return (self._parse_post(self.post_id),)
diff --git a/gallery_dl/extractor/sexcom.py b/gallery_dl/extractor/sexcom.py
index ccedff3..199b1ba 100644
--- a/gallery_dl/extractor/sexcom.py
+++ b/gallery_dl/extractor/sexcom.py
@@ -78,9 +78,14 @@ class SexcomExtractor(Extractor):
path += "/hd"
data["url"] = self.root + path
else:
+ iframe = extr('<iframe', '>')
+ src = (text.extract(iframe, ' src="', '"')[0] or
+ text.extract(iframe, " src='", "'")[0])
+ if not src:
+ self.log.warning("Unable to fetch media from %s", url)
+ return None
data["extension"] = None
- data["url"] = "ytdl:" + text.extract(
- extr('<iframe', '>'), ' src="', '"')[0]
+ data["url"] = "ytdl:" + src
else:
data["url"] = text.unescape(extr(' src="', '"').partition("?")[0])
text.nameext_from_url(data["url"], data)
diff --git a/gallery_dl/extractor/slickpic.py b/gallery_dl/extractor/slickpic.py
index b5fbdc2..7b5982a 100644
--- a/gallery_dl/extractor/slickpic.py
+++ b/gallery_dl/extractor/slickpic.py
@@ -13,7 +13,7 @@ from .. import text
import time
-BASE_PATTERN = r"(?:https?://)?([^.]+)\.slickpic\.com"
+BASE_PATTERN = r"(?:https?://)?([\w-]+)\.slickpic\.com"
class SlickpicExtractor(Extractor):
diff --git a/gallery_dl/extractor/smugmug.py b/gallery_dl/extractor/smugmug.py
index 5d582b5..bdf6036 100644
--- a/gallery_dl/extractor/smugmug.py
+++ b/gallery_dl/extractor/smugmug.py
@@ -13,7 +13,7 @@ from .. import text, oauth, exception
BASE_PATTERN = (
r"(?:smugmug:(?!album:)(?:https?://)?([^/]+)|"
- r"(?:https?://)?([^.]+)\.smugmug\.com)")
+ r"(?:https?://)?([\w-]+)\.smugmug\.com)")
class SmugmugExtractor(Extractor):
diff --git a/gallery_dl/extractor/tumblr.py b/gallery_dl/extractor/tumblr.py
index 243710d..358bc95 100644
--- a/gallery_dl/extractor/tumblr.py
+++ b/gallery_dl/extractor/tumblr.py
@@ -35,7 +35,7 @@ POST_TYPES = frozenset((
BASE_PATTERN = (
r"(?:tumblr:(?:https?://)?([^/]+)|"
- r"(?:https?://)?([^.]+\.tumblr\.com))")
+ r"(?:https?://)?([\w-]+\.tumblr\.com))")
class TumblrExtractor(Extractor):
diff --git a/gallery_dl/extractor/tumblrgallery.py b/gallery_dl/extractor/tumblrgallery.py
index 849dc49..e790613 100644
--- a/gallery_dl/extractor/tumblrgallery.py
+++ b/gallery_dl/extractor/tumblrgallery.py
@@ -19,6 +19,20 @@ class TumblrgalleryExtractor(GalleryExtractor):
directory_fmt = ("{category}", "{gallery_id} {title}")
root = "https://tumblrgallery.xyz"
+ @staticmethod
+ def _urls_from_page(page):
+ return text.extract_iter(
+ page, '<div class="report"> <a class="xx-co-me" href="', '"')
+
+ @staticmethod
+ def _data_from_url(url):
+ filename = text.nameext_from_url(url)["filename"]
+ parts = filename.split("_")
+ try:
+ return {"id": parts[1] if parts[1] != "inline" else parts[2]}
+ except IndexError:
+ return {"id": filename}
+
class TumblrgalleryTumblrblogExtractor(TumblrgalleryExtractor):
"""Extractor for Tumblrblog on tumblrgallery.xyz"""
@@ -39,34 +53,27 @@ class TumblrgalleryTumblrblogExtractor(TumblrgalleryExtractor):
def images(self, _):
page_num = 1
while True:
- response = self.request(
- "{}/tumblrblog/gallery/{}/{}.html"
- .format(self.root, self.gallery_id, page_num),
- allow_redirects=False
- )
- if response.status_code != 200:
+ url = "{}/tumblrblog/gallery/{}/{}.html".format(
+ self.root, self.gallery_id, page_num)
+ response = self.request(url, allow_redirects=False, fatal=False)
+
+ if response.status_code >= 300:
return
- page = response.text
+ for url in self._urls_from_page(response.text):
+ yield url, self._data_from_url(url)
page_num += 1
- urls = list(text.extract_iter(
- page,
- '<div class="report xx-co-me"> <a href="',
- '" data-fancybox="gallery"'
- ))
-
- for image_src in urls:
- yield image_src, {
- "id": text.extract(image_src, "tumblr_", "_")[0]
- }
-
class TumblrgalleryPostExtractor(TumblrgalleryExtractor):
"""Extractor for Posts on tumblrgallery.xyz"""
subcategory = "post"
pattern = BASE_PATTERN + r"(/post/(\d+)\.html)"
- test = ("https://tumblrgallery.xyz/post/405674.html",)
+ test = ("https://tumblrgallery.xyz/post/405674.html", {
+ "pattern": r"https://78\.media\.tumblr\.com/bec67072219c1f3bc04fd9711d"
+ r"ec42ef/tumblr_p51qq1XCHS1txhgk3o1_1280\.jpg",
+ "count": 3,
+ })
def __init__(self, match):
TumblrgalleryExtractor.__init__(self, match)
@@ -81,17 +88,8 @@ class TumblrgalleryPostExtractor(TumblrgalleryExtractor):
}
def images(self, page):
- urls = list(text.extract_iter(
- page,
- '<div class="report xx-co-me"> <a href="',
- '" data-fancybox="gallery"'
- ))
-
- for image_src in urls:
- yield image_src, {
- "id": text.extract(image_src, "tumblr_", "_")[0] or
- text.nameext_from_url(image_src)["filename"]
- }
+ for url in self._urls_from_page(page):
+ yield url, self._data_from_url(url)
class TumblrgallerySearchExtractor(TumblrgalleryExtractor):
@@ -100,7 +98,10 @@ class TumblrgallerySearchExtractor(TumblrgalleryExtractor):
filename_fmt = "{category}_{num:>03}_{gallery_id}_{id}_{title}.{extension}"
directory_fmt = ("{category}", "{search_term}")
pattern = BASE_PATTERN + r"(/s\.php\?q=([^&#]+))"
- test = ("https://tumblrgallery.xyz/s.php?q=everyday-life",)
+ test = ("https://tumblrgallery.xyz/s.php?q=everyday-life", {
+ "pattern": r"https://\d+\.media\.tumblr\.com/.+",
+ "count": "< 1000",
+ })
def __init__(self, match):
TumblrgalleryExtractor.__init__(self, match)
@@ -112,38 +113,26 @@ class TumblrgallerySearchExtractor(TumblrgalleryExtractor):
}
def images(self, _):
- page_num = 1
+ page_url = "s.php?q=" + self.search_term
while True:
- response = self.request(
- "{}/s.php?q={}&page={}"
- .format(self.root, self.search_term, page_num),
- allow_redirects=False
- )
- if response.status_code != 200:
- return
+ page = self.request(self.root + "/" + page_url).text
- page = response.text
- page_num += 1
+ for gallery_id in text.extract_iter(
+ page, '<div class="title"><a href="post/', '.html'):
- gallery_ids = list(text.extract_iter(
- page,
- '<div class="title"><a href="post/',
- '.html'
- ))
-
- for gallery_id in gallery_ids:
- post_page = self.request(
- "{}/post/{}.html"
- .format(self.root, gallery_id),
- allow_redirects=False
- ).text
- for image_src in TumblrgalleryPostExtractor.images(
- self, post_page
- ):
- image_src[1]["title"] = text.remove_html(
- text.unescape(
- text.extract(post_page, "<title>", "</title>")[0]
- )
- ).replace("_", "-")
- image_src[1]["gallery_id"] = gallery_id
- yield image_src
+ url = "{}/post/{}.html".format(self.root, gallery_id)
+ post_page = self.request(url).text
+
+ for url in self._urls_from_page(post_page):
+ data = self._data_from_url(url)
+ data["gallery_id"] = gallery_id
+ data["title"] = text.remove_html(text.unescape(
+ text.extract(post_page, "<title>", "</title>")[0]
+ )).replace("_", "-")
+ yield url, data
+
+ next_url = text.extract(
+ page, '</span> <a class="btn btn-primary" href="', '"')[0]
+ if not next_url or page_url == next_url:
+ return
+ page_url = next_url
diff --git a/gallery_dl/extractor/twitter.py b/gallery_dl/extractor/twitter.py
index f1c392d..a49f1f2 100644
--- a/gallery_dl/extractor/twitter.py
+++ b/gallery_dl/extractor/twitter.py
@@ -47,7 +47,7 @@ class TwitterExtractor(Extractor):
size = self.config("size")
if size is None:
self._size_image = "orig"
- self._size_fallback = ("large", "medium", "small")
+ self._size_fallback = ("4096x4096", "large", "medium", "small")
else:
if isinstance(size, str):
size = size.split(",")
diff --git a/gallery_dl/extractor/wordpress.py b/gallery_dl/extractor/wordpress.py
new file mode 100644
index 0000000..dd7d28a
--- /dev/null
+++ b/gallery_dl/extractor/wordpress.py
@@ -0,0 +1,41 @@
+# -*- coding: utf-8 -*-
+
+# Copyright 2021 Mike Fährmann
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+
+"""Extractors for WordPress blogs"""
+
+from .common import BaseExtractor, Message
+from .. import text
+
+
+class WordpressExtractor(BaseExtractor):
+ """Base class for wordpress extractors"""
+ basecategory = "wordpress"
+
+ def items(self):
+ for post in self.posts():
+ yield Message.Difrectory, post
+
+
+
+BASE_PATTERN = WordpressExtractor.update({})
+
+
+class WordpressBlogExtractor(WordpressExtractor):
+ """Extractor for WordPress blogs"""
+ subcategory = "blog"
+ directory_fmt = ("{category}", "{blog}")
+ pattern = BASE_PATTERN + r"/?$"
+
+ def posts(self):
+ url = self.root + "/wp-json/wp/v2/posts"
+ params = {"page": 1, "per_page": "100"}
+
+ while True:
+ data = self.request(url, params=params).json()
+ exit()
+ yield 1
diff --git a/gallery_dl/extractor/xhamster.py b/gallery_dl/extractor/xhamster.py
index f7a0a7e..146ab04 100644
--- a/gallery_dl/extractor/xhamster.py
+++ b/gallery_dl/extractor/xhamster.py
@@ -13,7 +13,7 @@ from .. import text
import json
-BASE_PATTERN = (r"(?:https?://)?((?:[^.]+\.)?xhamster"
+BASE_PATTERN = (r"(?:https?://)?((?:[\w-]+\.)?xhamster"
r"(?:\d?\.(?:com|one|desi)|\.porncache\.net))")
diff --git a/gallery_dl/extractor/ytdl.py b/gallery_dl/extractor/ytdl.py
index 8eb0c83..8f3ef9a 100644
--- a/gallery_dl/extractor/ytdl.py
+++ b/gallery_dl/extractor/ytdl.py
@@ -23,9 +23,9 @@ class YoutubeDLExtractor(Extractor):
def __init__(self, match):
# import main youtube_dl module
- module_name = self.ytdl_module_name = config.get(
- ("extractor", "ytdl"), "module") or "youtube_dl"
- module = __import__(module_name)
+ ytdl_module = ytdl.import_module(config.get(
+ ("extractor", "ytdl"), "module"))
+ self.ytdl_module_name = ytdl_module.__name__
# find suitable youtube_dl extractor
self.ytdl_url = url = match.group(1)
@@ -34,7 +34,7 @@ class YoutubeDLExtractor(Extractor):
self.ytdl_ie_key = "Generic"
self.force_generic_extractor = True
else:
- for ie in module.extractor.gen_extractor_classes():
+ for ie in ytdl_module.extractor.gen_extractor_classes():
if ie.suitable(url):
self.ytdl_ie_key = ie.ie_key()
break
@@ -48,7 +48,7 @@ class YoutubeDLExtractor(Extractor):
def items(self):
# import subcategory module
- ytdl_module = __import__(
+ ytdl_module = ytdl.import_module(
config.get(("extractor", "ytdl", self.subcategory), "module") or
self.ytdl_module_name)
self.log.debug("Using %s", ytdl_module)
diff --git a/gallery_dl/option.py b/gallery_dl/option.py
index 5f7b281..1967bf7 100644
--- a/gallery_dl/option.py
+++ b/gallery_dl/option.py
@@ -92,9 +92,9 @@ def build_parser():
help="Print program version and exit",
)
general.add_argument(
- "-d", "--dest",
+ "--dest",
dest="base-directory", metavar="DEST", action=ConfigAction,
- help="Destination directory",
+ help=argparse.SUPPRESS,
)
general.add_argument(
"-i", "--input-file",
@@ -103,6 +103,17 @@ def build_parser():
"More than one --input-file can be specified"),
)
general.add_argument(
+ "-f", "--filename",
+ dest="filename", metavar="FORMAT",
+ help=("Filename format string for downloaded files "
+ "('/O' for \"original\" filenames)"),
+ )
+ general.add_argument(
+ "-d", "--directory",
+ dest="directory", metavar="PATH",
+ help="Target location for file downloads",
+ )
+ general.add_argument(
"--cookies",
dest="cookies", metavar="FILE", action=ConfigAction,
help="File to load additional cookies from",
@@ -211,8 +222,22 @@ def build_parser():
)
downloader.add_argument(
"--sleep",
- dest="sleep", metavar="SECONDS", type=float, action=ConfigAction,
- help="Number of seconds to sleep before each download",
+ dest="sleep", metavar="SECONDS", action=ConfigAction,
+ help=("Number of seconds to wait before each download. "
+ "This can be either a constant value or a range "
+ "(e.g. 2.7 or 2.0-3.5)"),
+ )
+ downloader.add_argument(
+ "--sleep-request",
+ dest="sleep-request", metavar="SECONDS", action=ConfigAction,
+ help=("Number of seconds to wait between HTTP requests "
+ "during data extraction"),
+ )
+ downloader.add_argument(
+ "--sleep-extractor",
+ dest="sleep-extractor", metavar="SECONDS", action=ConfigAction,
+ help=("Number of seconds to wait before starting data extraction "
+ "for an input URL"),
)
downloader.add_argument(
"--filesize-min",
@@ -337,6 +362,11 @@ def build_parser():
"and other delegated URLs"),
)
+ infojson = {
+ "name" : "metadata",
+ "event" : "init",
+ "filename": "info.json",
+ }
postprocessor = parser.add_argument_group("Post-processing Options")
postprocessor.add_argument(
"--zip",
@@ -372,16 +402,18 @@ def build_parser():
help="Write metadata to separate JSON files",
)
postprocessor.add_argument(
- "--write-infojson",
+ "--write-info-json",
dest="postprocessors",
- action="append_const", const={
- "name" : "metadata",
- "event" : "init",
- "filename": "info.json",
- },
+ action="append_const", const=infojson,
help="Write gallery metadata to a info.json file",
)
postprocessor.add_argument(
+ "--write-infojson",
+ dest="postprocessors",
+ action="append_const", const=infojson,
+ help=argparse.SUPPRESS,
+ )
+ postprocessor.add_argument(
"--write-tags",
dest="postprocessors",
action="append_const", const={"name": "metadata", "mode": "tags"},
diff --git a/gallery_dl/output.py b/gallery_dl/output.py
index d4d295f..7e00e1a 100644
--- a/gallery_dl/output.py
+++ b/gallery_dl/output.py
@@ -265,10 +265,14 @@ class NullOutput():
class PipeOutput(NullOutput):
def skip(self, path):
- print(CHAR_SKIP, path, sep="", flush=True)
+ stdout = sys.stdout
+ stdout.write(CHAR_SKIP + path + "\n")
+ stdout.flush()
def success(self, path, tries):
- print(path, flush=True)
+ stdout = sys.stdout
+ stdout.write(path + "\n")
+ stdout.flush()
class TerminalOutput(NullOutput):
@@ -284,34 +288,38 @@ class TerminalOutput(NullOutput):
self.shorten = util.identity
def start(self, path):
- print(self.shorten(" " + path), end="", flush=True)
+ stdout = sys.stdout
+ stdout.write(self.shorten(" " + path))
+ stdout.flush()
def skip(self, path):
- print(self.shorten(CHAR_SKIP + path))
+ sys.stdout.write(self.shorten(CHAR_SKIP + path) + "\n")
def success(self, path, tries):
- print("\r", self.shorten(CHAR_SUCCESS + path), sep="")
+ sys.stdout.write("\r" + self.shorten(CHAR_SUCCESS + path) + "\n")
def progress(self, bytes_total, bytes_downloaded, bytes_per_second):
bdl = util.format_value(bytes_downloaded)
bps = util.format_value(bytes_per_second)
if bytes_total is None:
- print("\r{:>7}B {:>7}B/s ".format(bdl, bps), end="")
+ sys.stderr.write("\r{:>7}B {:>7}B/s ".format(bdl, bps))
else:
- print("\r{:>3}% {:>7}B {:>7}B/s ".format(
- bytes_downloaded * 100 // bytes_total, bdl, bps), end="")
+ sys.stderr.write("\r{:>3}% {:>7}B {:>7}B/s ".format(
+ bytes_downloaded * 100 // bytes_total, bdl, bps))
class ColorOutput(TerminalOutput):
def start(self, path):
- print(self.shorten(path), end="", flush=True)
+ stdout = sys.stdout
+ stdout.write(self.shorten(path))
+ stdout.flush()
def skip(self, path):
- print("\033[2m", self.shorten(path), "\033[0m", sep="")
+ sys.stdout.write("\033[2m" + self.shorten(path) + "\033[0m\n")
def success(self, path, tries):
- print("\r\033[1;32m", self.shorten(path), "\033[0m", sep="")
+ sys.stdout.write("\r\033[1;32m" + self.shorten(path) + "\033[0m\n")
class EAWCache(dict):
diff --git a/gallery_dl/path.py b/gallery_dl/path.py
index 12ce8ad..9e9e983 100644
--- a/gallery_dl/path.py
+++ b/gallery_dl/path.py
@@ -177,8 +177,11 @@ class PathFormat():
self.directory = directory = self.basedirectory
if WINDOWS:
- # Enable longer-than-260-character paths on Windows
- directory = "\\\\?\\" + os.path.abspath(directory)
+ # Enable longer-than-260-character paths
+ if directory.startswith("\\\\"):
+ directory = "\\\\?\\UNC\\" + directory[2:]
+ else:
+ directory = "\\\\?\\" + os.path.abspath(directory)
# abspath() in Python 3.7+ removes trailing path separators (#402)
if directory[-1] != sep:
diff --git a/gallery_dl/util.py b/gallery_dl/util.py
index d25194e..bccae2d 100644
--- a/gallery_dl/util.py
+++ b/gallery_dl/util.py
@@ -428,18 +428,26 @@ def build_duration_func(duration, min=0.0):
if not duration:
return None
- try:
- lower, upper = duration
- except TypeError:
- pass
+ if isinstance(duration, str):
+ lower, _, upper = duration.partition("-")
+ lower = float(lower)
else:
+ try:
+ lower, upper = duration
+ except TypeError:
+ lower, upper = duration, None
+
+ if upper:
+ upper = float(upper)
return functools.partial(
random.uniform,
lower if lower > min else min,
upper if upper > min else min,
)
-
- return functools.partial(identity, duration if duration > min else min)
+ else:
+ if lower < min:
+ lower = min
+ return lambda: lower
def build_extractor_filter(categories, negate=True, special=None):
diff --git a/gallery_dl/version.py b/gallery_dl/version.py
index a363a97..b5114e8 100644
--- a/gallery_dl/version.py
+++ b/gallery_dl/version.py
@@ -6,4 +6,4 @@
# it under the terms of the GNU General Public License version 2 as
# published by the Free Software Foundation.
-__version__ = "1.19.3"
+__version__ = "1.20.0"
diff --git a/gallery_dl/ytdl.py b/gallery_dl/ytdl.py
index 4266f48..e6953eb 100644
--- a/gallery_dl/ytdl.py
+++ b/gallery_dl/ytdl.py
@@ -14,6 +14,15 @@ import itertools
from . import text, util, exception
+def import_module(module_name):
+ if module_name is None:
+ try:
+ return __import__("yt_dlp")
+ except ImportError:
+ return __import__("youtube_dl")
+ return __import__(module_name.replace("-", "_"))
+
+
def construct_YoutubeDL(module, obj, user_opts, system_opts=None):
opts = argv = None
config = obj.config
@@ -95,6 +104,8 @@ def parse_command_line(module, argv):
opts.continue_dl = False
if opts.retries is not None:
opts.retries = parse_retries(opts.retries)
+ if getattr(opts, "file_access_retries", None) is not None:
+ opts.file_access_retries = parse_retries(opts.file_access_retries)
if opts.fragment_retries is not None:
opts.fragment_retries = parse_retries(opts.fragment_retries)
if getattr(opts, "extractor_retries", None) is not None:
@@ -111,6 +122,10 @@ def parse_command_line(module, argv):
opts.recodevideo = opts.recodevideo.replace(" ", "")
if getattr(opts, "remuxvideo", None) is not None:
opts.remuxvideo = opts.remuxvideo.replace(" ", "")
+ if getattr(opts, "wait_for_video", None) is not None:
+ min_wait, _, max_wait = opts.wait_for_video.partition("-")
+ opts.wait_for_video = (module.parse_duration(min_wait),
+ module.parse_duration(max_wait))
if opts.date is not None:
date = module.DateRange.day(opts.date)
@@ -207,10 +222,6 @@ def parse_command_line(module, argv):
opts.sponsorblock_remove = \
getattr(opts, "sponsorblock_remove", None) or set()
sponsorblock_query = opts.sponsorblock_mark | opts.sponsorblock_remove
-
- addchapters = getattr(opts, "addchapters", None)
- if (opts.addmetadata or opts.sponsorblock_mark) and addchapters is None:
- addchapters = True
opts.remove_chapters = getattr(opts, "remove_chapters", None) or ()
# PostProcessors
@@ -297,11 +308,17 @@ def parse_command_line(module, argv):
"sponsorblock_chapter_title": opts.sponsorblock_chapter_title,
"force_keyframes": opts.force_keyframes_at_cuts,
})
- if opts.addmetadata or addchapters:
+ addchapters = getattr(opts, "addchapters", None)
+ embed_infojson = getattr(opts, "embed_infojson", None)
+ if opts.addmetadata or addchapters or embed_infojson:
pp = {"key": "FFmpegMetadata"}
if ytdlp:
- pp["add_chapters"] = addchapters
+ if embed_infojson is None:
+ embed_infojson = "if_exists"
pp["add_metadata"] = opts.addmetadata
+ pp["add_chapters"] = addchapters
+ pp["add_infojson"] = embed_infojson
+
postprocessors.append(pp)
if getattr(opts, "sponskrub", False) is not False:
postprocessors.append({
@@ -311,10 +328,11 @@ def parse_command_line(module, argv):
"cut": opts.sponskrub_cut,
"force": opts.sponskrub_force,
"ignoreerror": opts.sponskrub is None,
+ "_from_cli": True,
})
if opts.embedthumbnail:
already_have_thumbnail = (opts.writethumbnail or
- opts.write_all_thumbnails)
+ getattr(opts, "write_all_thumbnails", False))
postprocessors.append({
"key": "EmbedThumbnail",
"already_have_thumbnail": already_have_thumbnail,
@@ -395,6 +413,7 @@ def parse_command_line(module, argv):
"throttledratelimit": getattr(opts, "throttledratelimit", None),
"overwrites": getattr(opts, "overwrites", None),
"retries": opts.retries,
+ "file_access_retries": getattr(opts, "file_access_retries", None),
"fragment_retries": opts.fragment_retries,
"extractor_retries": getattr(opts, "extractor_retries", None),
"skip_unavailable_fragments": opts.skip_unavailable_fragments,
@@ -421,8 +440,9 @@ def parse_command_line(module, argv):
"allow_playlist_files": opts.allow_playlist_files,
"clean_infojson": opts.clean_infojson,
"getcomments": getattr(opts, "getcomments", None),
- "writethumbnail": opts.writethumbnail,
- "write_all_thumbnails": opts.write_all_thumbnails,
+ "writethumbnail": opts.writethumbnail is True,
+ "write_all_thumbnails": getattr(opts, "write_all_thumbnails", None) or
+ opts.writethumbnail == "all",
"writelink": getattr(opts, "writelink", None),
"writeurllink": getattr(opts, "writeurllink", None),
"writewebloclink": getattr(opts, "writewebloclink", None),
@@ -454,6 +474,7 @@ def parse_command_line(module, argv):
"download_archive": download_archive_fn,
"break_on_existing": getattr(opts, "break_on_existing", None),
"break_on_reject": getattr(opts, "break_on_reject", None),
+ "break_per_url": getattr(opts, "break_per_url", None),
"skip_playlist_after_errors": getattr(
opts, "skip_playlist_after_errors", None),
"cookiefile": opts.cookiefile,
@@ -475,6 +496,8 @@ def parse_command_line(module, argv):
opts, "youtube_include_hls_manifest", None),
"encoding": opts.encoding,
"extract_flat": opts.extract_flat,
+ "live_from_start": getattr(opts, "live_from_start", None),
+ "wait_for_video": getattr(opts, "wait_for_video", None),
"mark_watched": opts.mark_watched,
"merge_output_format": opts.merge_output_format,
"postprocessors": postprocessors,
diff --git a/test/test_results.py b/test/test_results.py
index 944f14d..37dea38 100644
--- a/test/test_results.py
+++ b/test/test_results.py
@@ -353,28 +353,23 @@ def generate_tests():
# enable selective testing for direct calls
if __name__ == '__main__' and len(sys.argv) > 1:
- if sys.argv[1].lower() == "all":
- fltr = lambda c, bc: True # noqa: E731
- elif sys.argv[1].lower() == "broken":
- fltr = lambda c, bc: c in BROKEN # noqa: E731
- else:
- argv = sys.argv[1:]
- fltr = lambda c, bc: c in argv or bc in argv # noqa: E731
+ categories = sys.argv[1:]
+ negate = False
+ if categories[0].lower() == "all":
+ categories = ()
+ negate = True
+ elif categories[0].lower() == "broken":
+ categories = BROKEN
del sys.argv[1:]
else:
- skip = set(BROKEN)
- if skip:
- print("skipping:", ", ".join(skip))
- fltr = lambda c, bc: c not in skip # noqa: E731
-
- # filter available extractor classes
- extractors = [
- extr for extr in extractor.extractors()
- if fltr(extr.category, extr.basecategory)
- ]
+ categories = BROKEN
+ negate = True
+ if categories:
+ print("skipping:", ", ".join(categories))
+ fltr = util.build_extractor_filter(categories, negate=negate)
# add 'test_...' methods
- for extr in extractors:
+ for extr in filter(fltr, extractor.extractors()):
name = "test_" + extr.__name__ + "_"
for num, tcase in enumerate(extr._get_tests(), 1):
test = _generate_test(extr, tcase)
diff --git a/test/test_util.py b/test/test_util.py
index 32e9784..ce403a8 100644
--- a/test/test_util.py
+++ b/test/test_util.py
@@ -357,6 +357,31 @@ class TestOther(unittest.TestCase):
with self.assertRaises(exception.StopExtraction):
expr()
+ def test_build_duration_func(self, f=util.build_duration_func):
+ for v in (0, 0.0, "", None, (), []):
+ self.assertIsNone(f(v))
+
+ def test_single(df, v):
+ for _ in range(10):
+ self.assertEqual(df(), v)
+
+ def test_range(df, lower, upper):
+ for __ in range(10):
+ v = df()
+ self.assertGreaterEqual(v, lower)
+ self.assertLessEqual(v, upper)
+
+ test_single(f(3), 3)
+ test_single(f(3.0), 3.0)
+ test_single(f("3"), 3)
+ test_single(f("3.0-"), 3)
+ test_single(f(" 3 -"), 3)
+
+ test_range(f((2, 4)), 2, 4)
+ test_range(f([2, 4]), 2, 4)
+ test_range(f("2-4"), 2, 4)
+ test_range(f(" 2.0 - 4 "), 2, 4)
+
def test_extractor_filter(self):
# empty
func = util.build_extractor_filter("")
diff --git a/test/test_ytdl.py b/test/test_ytdl.py
new file mode 100644
index 0000000..97431e3
--- /dev/null
+++ b/test/test_ytdl.py
@@ -0,0 +1,545 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+
+# Copyright 2021 Mike Fährmann
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+
+import os
+import sys
+import unittest
+
+import re
+import shlex
+
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from gallery_dl import ytdl, util, config
+
+
+class Test_CommandlineArguments(unittest.TestCase):
+ module_name = "youtube_dl"
+
+ @classmethod
+ def setUpClass(cls):
+ try:
+ cls.module = __import__(cls.module_name)
+ except ImportError:
+ raise unittest.SkipTest("cannot import module '{}'".format(
+ cls.module_name))
+ cls.default = ytdl.parse_command_line(cls.module, [])
+
+ def test_ignore_errors(self):
+ self._("--ignore-errors" , "ignoreerrors", True)
+ self._("--abort-on-error", "ignoreerrors", False)
+
+ def test_default_search(self):
+ self._(["--default-search", "foo"] , "default_search", "foo")
+
+ def test_mark_watched(self):
+ self._("--mark-watched" , "mark_watched", True)
+ self._("--no-mark-watched", "mark_watched", False)
+
+ def test_proxy(self):
+ self._(["--proxy", "socks5://127.0.0.1:1080/"],
+ "proxy", "socks5://127.0.0.1:1080/")
+ self._(["--cn-verification-proxy", "https://127.0.0.1"],
+ "cn_verification_proxy", "https://127.0.0.1")
+ self._(["--geo-verification-proxy", "127.0.0.1"],
+ "geo_verification_proxy", "127.0.0.1")
+
+ def test_retries(self):
+ inf = float("inf")
+
+ self._(["--retries", "5"], "retries", 5)
+ self._(["--retries", "inf"], "retries", inf)
+ self._(["--retries", "infinite"], "retries", inf)
+ self._(["--fragment-retries", "8"], "fragment_retries", 8)
+ self._(["--fragment-retries", "inf"], "fragment_retries", inf)
+ self._(["--fragment-retries", "infinite"], "fragment_retries", inf)
+
+ def test_geo_bypass(self):
+ self._("--geo-bypass", "geo_bypass", True)
+ self._("--no-geo-bypass", "geo_bypass", False)
+ self._(["--geo-bypass-country", "EN"], "geo_bypass_country", "EN")
+ self._(["--geo-bypass-ip-block", "198.51.100.14/24"],
+ "geo_bypass_ip_block", "198.51.100.14/24")
+
+ def test_headers(self):
+ headers = self.module.std_headers
+
+ self.assertNotEqual(headers["User-Agent"], "Foo/1.0")
+ self._(["--user-agent", "Foo/1.0"])
+ self.assertEqual(headers["User-Agent"], "Foo/1.0")
+
+ self.assertNotIn("Referer", headers)
+ self._(["--referer", "http://example.org/"])
+ self.assertEqual(headers["Referer"], "http://example.org/")
+
+ self.assertNotEqual(headers["Accept"], "*/*")
+ self.assertNotIn("DNT", headers)
+ self._([
+ "--add-header", "accept:*/*",
+ "--add-header", "dnt:1",
+ ])
+ self.assertEqual(headers["accept"], "*/*")
+ self.assertEqual(headers["dnt"], "1")
+
+ def test_extract_audio(self):
+ opts = self._(["--extract-audio"])
+ self.assertEqual(opts["postprocessors"][0], {
+ "key": "FFmpegExtractAudio",
+ "preferredcodec": "best",
+ "preferredquality": "5",
+ "nopostoverwrites": False,
+ })
+
+ opts = self._([
+ "--extract-audio",
+ "--audio-format", "opus",
+ "--audio-quality", "9",
+ "--no-post-overwrites",
+ ])
+ self.assertEqual(opts["postprocessors"][0], {
+ "key": "FFmpegExtractAudio",
+ "preferredcodec": "opus",
+ "preferredquality": "9",
+ "nopostoverwrites": True,
+ })
+
+ def test_recode_video(self):
+ opts = self._(["--recode-video", " mkv "])
+ self.assertEqual(opts["postprocessors"][0], {
+ "key": "FFmpegVideoConvertor",
+ "preferedformat": "mkv",
+ })
+
+ def test_subs(self):
+ opts = self._(["--convert-subs", "srt"])
+ conv = {"key": "FFmpegSubtitlesConvertor", "format": "srt"}
+ if self.module_name == "yt_dlp":
+ conv["when"] = "before_dl"
+ self.assertEqual(opts["postprocessors"][0], conv)
+
+ def test_embed(self):
+ subs = {"key": "FFmpegEmbedSubtitle"}
+ thumb = {"key": "EmbedThumbnail", "already_have_thumbnail": False}
+ if self.module_name == "yt_dlp":
+ subs["already_have_subtitle"] = False
+
+ opts = self._(["--embed-subs", "--embed-thumbnail"])
+ self.assertEqual(opts["postprocessors"], [subs, thumb])
+
+ thumb["already_have_thumbnail"] = True
+ if self.module_name == "yt_dlp":
+ subs["already_have_subtitle"] = True
+
+ opts = self._([
+ "--embed-thumbnail",
+ "--embed-subs",
+ "--write-sub",
+ "--write-all-thumbnails",
+ ])
+ self.assertEqual(opts["postprocessors"], [subs, thumb])
+
+ def test_metadata(self):
+ opts = self._("--add-metadata")
+ self.assertEqual(opts["postprocessors"][0], {"key": "FFmpegMetadata"})
+
+ def test_metadata_from_title(self):
+ opts = self._(["--metadata-from-title", "%(artist)s - %(title)s"])
+ self.assertEqual(opts["postprocessors"][0], {
+ "key": "MetadataFromTitle",
+ "titleformat": "%(artist)s - %(title)s",
+ })
+
+ def test_xattr(self):
+ self._("--xattr-set-filesize", "xattr_set_filesize", True)
+
+ opts = self._("--xattrs")
+ self.assertEqual(opts["postprocessors"][0], {"key": "XAttrMetadata"})
+
+ def test_noop(self):
+ result = self._([
+ "--update",
+ "--dump-user-agent",
+ "--list-extractors",
+ "--extractor-descriptions",
+ "--ignore-config",
+ "--config-location",
+ "--dump-json",
+ "--dump-single-json",
+ "--list-thumbnails",
+ ])
+
+ result["daterange"] = self.default["daterange"]
+ self.assertEqual(result, self.default)
+
+ def _(self, cmdline, option=util.SENTINEL, expected=None):
+ if isinstance(cmdline, str):
+ cmdline = [cmdline]
+ result = ytdl.parse_command_line(self.module, cmdline)
+ if option is not util.SENTINEL:
+ self.assertEqual(result[option], expected, option)
+ return result
+
+
+class Test_CommandlineArguments_YtDlp(Test_CommandlineArguments):
+ module_name = "yt_dlp"
+
+ def test_retries_extractor(self):
+ inf = float("inf")
+
+ self._(["--extractor-retries", "5"], "extractor_retries", 5)
+ self._(["--extractor-retries", "inf"], "extractor_retries", inf)
+ self._(["--extractor-retries", "infinite"], "extractor_retries", inf)
+
+ def test_remuxs_video(self):
+ opts = self._(["--remux-video", " mkv "])
+ self.assertEqual(opts["postprocessors"][0], {
+ "key": "FFmpegVideoRemuxer",
+ "preferedformat": "mkv",
+ })
+
+ def test_metadata(self):
+ opts = self._(["--embed-metadata",
+ "--no-embed-chapters",
+ "--embed-info-json"])
+ self.assertEqual(opts["postprocessors"][0], {
+ "key": "FFmpegMetadata",
+ "add_chapters": False,
+ "add_metadata": True,
+ "add_infojson": True,
+ })
+
+ def test_metadata_from_title(self):
+ opts = self._(["--metadata-from-title", "%(artist)s - %(title)s"])
+ self.assertEqual(opts["postprocessors"][0], {
+ "key": "MetadataParser",
+ "when": "pre_process",
+ "actions": [self.module.MetadataFromFieldPP.to_action(
+ "title:%(artist)s - %(title)s")],
+ })
+
+
+if __name__ == "__main__":
+ unittest.main(warnings="ignore")
+
+'''
+Usage: __main__.py [OPTIONS] URL [URL...]
+
+Options:
+ General Options:
+ -h, --help Print this help text and exit
+ --version Print program version and exit
+ --force-generic-extractor Force extraction to use the generic
+ extractor
+ --flat-playlist Do not extract the videos of a
+ playlist, only list them.
+ --no-color Do not emit color codes in output
+
+ Network Options:
+ --socket-timeout SECONDS Time to wait before giving up, in
+ seconds
+ --source-address IP Client-side IP address to bind to
+ -4, --force-ipv4 Make all connections via IPv4
+ -6, --force-ipv6 Make all connections via IPv6
+
+ Video Selection:
+ --playlist-start NUMBER Playlist video to start at (default is
+ 1)
+ --playlist-end NUMBER Playlist video to end at (default is
+ last)
+ --playlist-items ITEM_SPEC Playlist video items to download.
+ Specify indices of the videos in the
+ playlist separated by commas like: "--
+ playlist-items 1,2,5,8" if you want to
+ download videos indexed 1, 2, 5, 8 in
+ the playlist. You can specify range: "
+ --playlist-items 1-3,7,10-13", it will
+ download the videos at index 1, 2, 3,
+ 7, 10, 11, 12 and 13.
+ --match-title REGEX Download only matching titles (regex or
+ caseless sub-string)
+ --reject-title REGEX Skip download for matching titles
+ (regex or caseless sub-string)
+ --max-downloads NUMBER Abort after downloading NUMBER files
+ --min-filesize SIZE Do not download any videos smaller than
+ SIZE (e.g. 50k or 44.6m)
+ --max-filesize SIZE Do not download any videos larger than
+ SIZE (e.g. 50k or 44.6m)
+ --date DATE Download only videos uploaded in this
+ date
+ --datebefore DATE Download only videos uploaded on or
+ before this date (i.e. inclusive)
+ --dateafter DATE Download only videos uploaded on or
+ after this date (i.e. inclusive)
+ --min-views COUNT Do not download any videos with less
+ than COUNT views
+ --max-views COUNT Do not download any videos with more
+ than COUNT views
+ --match-filter FILTER Generic video filter. Specify any key
+ (see the "OUTPUT TEMPLATE" for a list
+ of available keys) to match if the key
+ is present, !key to check if the key is
+ not present, key > NUMBER (like
+ "comment_count > 12", also works with
+ >=, <, <=, !=, =) to compare against a
+ number, key = 'LITERAL' (like "uploader
+ = 'Mike Smith'", also works with !=) to
+ match against a string literal and & to
+ require multiple matches. Values which
+ are not known are excluded unless you
+ put a question mark (?) after the
+ operator. For example, to only match
+ videos that have been liked more than
+ 100 times and disliked less than 50
+ times (or the dislike functionality is
+ not available at the given service),
+ but who also have a description, use
+ --match-filter "like_count > 100 &
+ dislike_count <? 50 & description" .
+ --no-playlist Download only the video, if the URL
+ refers to a video and a playlist.
+ --yes-playlist Download the playlist, if the URL
+ refers to a video and a playlist.
+ --age-limit YEARS Download only videos suitable for the
+ given age
+ --download-archive FILE Download only videos not listed in the
+ archive file. Record the IDs of all
+ downloaded videos in it.
+ --include-ads Download advertisements as well
+ (experimental)
+
+ Download Options:
+ -r, --limit-rate RATE Maximum download rate in bytes per
+ second (e.g. 50K or 4.2M)
+ --skip-unavailable-fragments Skip unavailable fragments (DASH,
+ hlsnative and ISM)
+ --abort-on-unavailable-fragment Abort downloading when some fragment is
+ not available
+ --keep-fragments Keep downloaded fragments on disk after
+ downloading is finished; fragments are
+ erased by default
+ --buffer-size SIZE Size of download buffer (e.g. 1024 or
+ 16K) (default is 1024)
+ --no-resize-buffer Do not automatically adjust the buffer
+ size. By default, the buffer size is
+ automatically resized from an initial
+ value of SIZE.
+ --http-chunk-size SIZE Size of a chunk for chunk-based HTTP
+ downloading (e.g. 10485760 or 10M)
+ (default is disabled). May be useful
+ for bypassing bandwidth throttling
+ imposed by a webserver (experimental)
+ --playlist-reverse Download playlist videos in reverse
+ order
+ --playlist-random Download playlist videos in random
+ order
+ --xattr-set-filesize Set file xattribute ytdl.filesize with
+ expected file size
+ --hls-prefer-native Use the native HLS downloader instead
+ of ffmpeg
+ --hls-prefer-ffmpeg Use ffmpeg instead of the native HLS
+ downloader
+ --hls-use-mpegts Use the mpegts container for HLS
+ videos, allowing to play the video
+ while downloading (some players may not
+ be able to play it)
+ --external-downloader COMMAND Use the specified external downloader.
+ Currently supports aria2c,avconv,axel,c
+ url,ffmpeg,httpie,wget
+ --external-downloader-args ARGS Give these arguments to the external
+ downloader
+
+ Filesystem Options:
+ -a, --batch-file FILE File containing URLs to download ('-'
+ for stdin), one URL per line. Lines
+ starting with '#', ';' or ']' are
+ considered as comments and ignored.
+ --id Use only video ID in file name
+ -o, --output TEMPLATE Output filename template, see the
+ "OUTPUT TEMPLATE" for all the info
+ --output-na-placeholder PLACEHOLDER Placeholder value for unavailable meta
+ fields in output filename template
+ (default is "NA")
+ --autonumber-start NUMBER Specify the start value for
+ %(autonumber)s (default is 1)
+ --restrict-filenames Restrict filenames to only ASCII
+ characters, and avoid "&" and spaces in
+ filenames
+ -w, --no-overwrites Do not overwrite files
+ -c, --continue Force resume of partially downloaded
+ files. By default, youtube-dl will
+ resume downloads if possible.
+ --no-continue Do not resume partially downloaded
+ files (restart from beginning)
+ --no-part Do not use .part files - write directly
+ into output file
+ --no-mtime Do not use the Last-modified header to
+ set the file modification time
+ --write-description Write video description to a
+ .description file
+ --write-info-json Write video metadata to a .info.json
+ file
+ --write-annotations Write video annotations to a
+ .annotations.xml file
+ --load-info-json FILE JSON file containing the video
+ information (created with the "--write-
+ info-json" option)
+ --cookies FILE File to read cookies from and dump
+ cookie jar in
+ --cache-dir DIR Location in the filesystem where
+ youtube-dl can store some downloaded
+ information permanently. By default
+ $XDG_CACHE_HOME/youtube-dl or
+ ~/.cache/youtube-dl . At the moment,
+ only YouTube player files (for videos
+ with obfuscated signatures) are cached,
+ but that may change.
+ --no-cache-dir Disable filesystem caching
+ --rm-cache-dir Delete all filesystem cache files
+
+ Thumbnail Options:
+ --write-thumbnail Write thumbnail image to disk
+ --write-all-thumbnails Write all thumbnail image formats to
+ disk
+
+ Verbosity / Simulation Options:
+ -q, --quiet Activate quiet mode
+ --no-warnings Ignore warnings
+ -s, --simulate Do not download the video and do not
+ write anything to disk
+ --skip-download Do not download the video
+ -g, --get-url Simulate, quiet but print URL
+ -e, --get-title Simulate, quiet but print title
+ --get-id Simulate, quiet but print id
+ --get-thumbnail Simulate, quiet but print thumbnail URL
+ --get-description Simulate, quiet but print video
+ description
+ --get-duration Simulate, quiet but print video length
+ --get-filename Simulate, quiet but print output
+ filename
+ --get-format Simulate, quiet but print output format
+ -j, --dump-json Simulate, quiet but print JSON
+ information. See the "OUTPUT TEMPLATE"
+ for a description of available keys.
+ -J, --dump-single-json Simulate, quiet but print JSON
+ information for each command-line
+ argument. If the URL refers to a
+ playlist, dump the whole playlist
+ information in a single line.
+ --print-json Be quiet and print the video
+ information as JSON (video is still
+ being downloaded).
+ --newline Output progress bar as new lines
+ --no-progress Do not print progress bar
+ --console-title Display progress in console titlebar
+ -v, --verbose Print various debugging information
+ --dump-pages Print downloaded pages encoded using
+ base64 to debug problems (very verbose)
+ --write-pages Write downloaded intermediary pages to
+ files in the current directory to debug
+ problems
+ --print-traffic Display sent and read HTTP traffic
+ -C, --call-home Contact the youtube-dl server for
+ debugging
+ --no-call-home Do NOT contact the youtube-dl server
+ for debugging
+
+ Workarounds:
+ --encoding ENCODING Force the specified encoding
+ (experimental)
+ --no-check-certificate Suppress HTTPS certificate validation
+ --prefer-insecure Use an unencrypted connection to
+ retrieve information about the video.
+ (Currently supported only for YouTube)
+ --bidi-workaround Work around terminals that lack
+ bidirectional text support. Requires
+ bidiv or fribidi executable in PATH
+ --sleep-interval SECONDS Number of seconds to sleep before each
+ download when used alone or a lower
+ bound of a range for randomized sleep
+ before each download (minimum possible
+ number of seconds to sleep) when used
+ along with --max-sleep-interval.
+ --max-sleep-interval SECONDS Upper bound of a range for randomized
+ sleep before each download (maximum
+ possible number of seconds to sleep).
+ Must only be used along with --min-
+ sleep-interval.
+
+ Video Format Options:
+ -f, --format FORMAT Video format code, see the "FORMAT
+ SELECTION" for all the info
+ --all-formats Download all available video formats
+ --prefer-free-formats Prefer free video formats unless a
+ specific one is requested
+ -F, --list-formats List all available formats of requested
+ videos
+ --youtube-skip-dash-manifest Do not download the DASH manifests and
+ related data on YouTube videos
+ --merge-output-format FORMAT If a merge is required (e.g.
+ bestvideo+bestaudio), output to given
+ container format. One of mkv, mp4, ogg,
+ webm, flv. Ignored if no merge is
+ required
+
+ Subtitle Options:
+ --write-sub Write subtitle file
+ --write-auto-sub Write automatically generated subtitle
+ file (YouTube only)
+ --all-subs Download all the available subtitles of
+ the video
+ --list-subs List all available subtitles for the
+ video
+ --sub-format FORMAT Subtitle format, accepts formats
+ preference, for example: "srt" or
+ "ass/srt/best"
+ --sub-lang LANGS Languages of the subtitles to download
+ (optional) separated by commas, use
+ --list-subs for available language tags
+
+ Authentication Options:
+ -u, --username USERNAME Login with this account ID
+ -p, --password PASSWORD Account password. If this option is
+ left out, youtube-dl will ask
+ interactively.
+ -2, --twofactor TWOFACTOR Two-factor authentication code
+ -n, --netrc Use .netrc authentication data
+ --video-password PASSWORD Video password (vimeo, youku)
+
+ Adobe Pass Options:
+ --ap-mso MSO Adobe Pass multiple-system operator (TV
+ provider) identifier, use --ap-list-mso
+ for a list of available MSOs
+ --ap-username USERNAME Multiple-system operator account login
+ --ap-password PASSWORD Multiple-system operator account
+ password. If this option is left out,
+ youtube-dl will ask interactively.
+ --ap-list-mso List all supported multiple-system
+ operators
+
+ Post-processing Options:
+ --postprocessor-args ARGS Give these arguments to the
+ postprocessor
+ -k, --keep-video Keep the video file on disk after the
+ post-processing; the video is erased by
+ default
+ --prefer-avconv Prefer avconv over ffmpeg for running
+ the postprocessors
+ --prefer-ffmpeg Prefer ffmpeg over avconv for running
+ the postprocessors (default)
+ --ffmpeg-location PATH Location of the ffmpeg/avconv binary;
+ either the path to the binary or its
+ containing directory.
+ --exec CMD Execute a command on the file after
+ downloading and post-processing,
+ similar to find's -exec syntax.
+ Example: --exec 'adb push {}
+ /sdcard/Music/ && rm {}'
+ --convert-subs FORMAT Convert the subtitles to other format
+ (currently supported: srt|ass|vtt|lrc)
+
+'''