diff --git a/.coveragerc b/.coveragerc deleted file mode 100644 index 303d855a..00000000 --- a/.coveragerc +++ /dev/null @@ -1,5 +0,0 @@ -[run] -include = - agate/* -omit = - agate/csv_py2.py diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 2e69c6ca..c36c622d 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -22,7 +22,7 @@ Contributors should use the following roadmap to guide them through the process 1. Fork the project on [GitHub]. 2. Check out the [issue tracker] and find a task that needs to be done and is of a scope you can realistically expect to complete in a few days. Don’t worry about the priority of the issues at first, but try to choose something you’ll enjoy. You’re much more likely to finish something to the point it can be merged if it’s something you really enjoy hacking on. 3. Comment on the ticket letting everyone know you’re going to be hacking on it so that nobody duplicates your effort. It’s also good practice to provide some general idea of how you plan on resolving the issue so that other developers can make suggestions. -4. Write tests for the feature you’re building. Follow the format of the existing tests in the test directory to see how this works. You can run all the tests with the command `nosetests tests`. (Or `tox` to run across all supported versions of Python.) +4. Write tests for the feature you’re building. Follow the format of the existing tests in the test directory to see how this works. You can run all the tests with the command `pytest`. 5. Write the code. Try to stay consistent with the style and organization of the existing codebase. A good patch won’t be refused for stylistic reasons, but large parts of it may be rewritten and nobody wants that. 6. As you are coding, periodically merge in work from the master branch and verify you haven’t broken anything by running the test suite. 7. Write documentation. Seriously. @@ -35,8 +35,8 @@ Legalese To the extent that they care, contributors should keep in mind that the source of agate and therefore of any contributions are licensed under the permissive [MIT license]. By submitting a patch or pull request you are agreeing to release your code under this license. You will be acknowledged in the AUTHORS list, the commit history and the hearts and minds of jo - [numpy]: http://www.numpy.org/ - [pandas]: http://pandas.pydata.org/ + [numpy]: https://numpy.org/ + [pandas]: https://pandas.pydata.org/ [GitHub]: https://github.com/wireservice/agate [issue tracker]: https://github.com/wireservice/agate/issues - [MIT license]: http://www.opensource.org/licenses/mit-license.php + [MIT license]: https://opensource.org/license/mit/ diff --git a/.github/dependabot.yml b/.github/dependabot.yml new file mode 100644 index 00000000..12301490 --- /dev/null +++ b/.github/dependabot.yml @@ -0,0 +1,6 @@ +version: 2 +updates: + - package-ecosystem: "github-actions" + directory: "/" + schedule: + interval: "daily" diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml new file mode 100644 index 00000000..2ee35ea3 --- /dev/null +++ b/.github/workflows/ci.yml @@ -0,0 +1,35 @@ +name: CI +on: [push, pull_request] +jobs: + build: + if: github.event_name == 'push' || github.event.pull_request.head.repo.full_name != github.repository + runs-on: ${{ matrix.os }} + strategy: + matrix: + os: [macos-latest, windows-latest, ubuntu-latest] + python-version: [3.8, 3.9, '3.10', '3.11', '3.12', pypy-3.9] + steps: + - if: matrix.os == 'ubuntu-latest' + name: Install UTF-8 locales and lxml requirements + run: | + sudo apt install libxml2-dev libxslt-dev + sudo locale-gen de_DE.UTF-8 + sudo locale-gen en_US.UTF-8 + sudo locale-gen ko_KR.UTF-8 + sudo update-locale + - uses: actions/checkout@v4 + - uses: actions/setup-python@v5 + with: + python-version: ${{ matrix.python-version }} + cache: pip + cache-dependency-path: setup.py + - run: pip install .[test] coveralls + - env: + LANG: en_US.UTF-8 + PYTHONIOENCODING: utf-8 + PYTHONUTF8: 1 + run: pytest --cov agate + - run: python charts.py + - env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + run: coveralls --service=github diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml new file mode 100644 index 00000000..e4a09a27 --- /dev/null +++ b/.github/workflows/lint.yml @@ -0,0 +1,17 @@ +name: Lint +on: [push, pull_request] +jobs: + build: + if: github.event_name == 'push' || github.event.pull_request.head.repo.full_name != github.repository + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: actions/setup-python@v5 + with: + python-version: '3.10' + cache: pip + cache-dependency-path: setup.py + - run: pip install --upgrade check-manifest flake8 isort setuptools + - run: check-manifest + - run: flake8 . + - run: isort . --check-only diff --git a/.github/workflows/pypi.yml b/.github/workflows/pypi.yml new file mode 100644 index 00000000..1e9ee54d --- /dev/null +++ b/.github/workflows/pypi.yml @@ -0,0 +1,23 @@ +name: Publish to PyPI +on: push +jobs: + build: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: actions/setup-python@v5 + with: + python-version: '3.10' + - run: pip install --upgrade build + - run: python -m build --sdist --wheel + - name: Publish to TestPyPI + uses: pypa/gh-action-pypi-publish@release/v1 + with: + password: ${{ secrets.TEST_PYPI_API_TOKEN }} + repository-url: https://test.pypi.org/legacy/ + skip-existing: true + - name: Publish to PyPI + if: startsWith(github.ref, 'refs/tags') + uses: pypa/gh-action-pypi-publish@release/v1 + with: + password: ${{ secrets.PYPI_API_TOKEN }} diff --git a/.gitignore b/.gitignore index 00314608..bdd410c9 100644 --- a/.gitignore +++ b/.gitignore @@ -2,7 +2,6 @@ *.pyc *.swp *.swo -.tox *.egg-info docs/_build dist diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml new file mode 100644 index 00000000..bc671142 --- /dev/null +++ b/.pre-commit-config.yaml @@ -0,0 +1,13 @@ +repos: + - repo: https://github.com/pycqa/flake8 + rev: 3.9.2 + hooks: + - id: flake8 + - repo: https://github.com/pycqa/isort + rev: 5.8.0 + hooks: + - id: isort + - repo: https://github.com/mgedmin/check-manifest + rev: "0.46" + hooks: + - id: check-manifest diff --git a/.readthedocs.yaml b/.readthedocs.yaml new file mode 100644 index 00000000..8ea94712 --- /dev/null +++ b/.readthedocs.yaml @@ -0,0 +1,11 @@ +version: 2 +build: + os: ubuntu-20.04 + tools: + python: "3.9" +python: + install: + - path: . + - requirements: docs/requirements.txt +sphinx: + fail_on_warning: true diff --git a/.travis.yml b/.travis.yml deleted file mode 100644 index 74b429f3..00000000 --- a/.travis.yml +++ /dev/null @@ -1,133 +0,0 @@ -language: python -os: linux -python: - - "3.8" - - "3.7" - - "3.6" - - "3.5" - - "2.7" - - "pypy3" - - "pypy3.5-6.0" - - "pypy3.5-7.0" - - "pypy3.6-7.0.0" - - "pypy" - - "pypy2.7-6.0" - - "pypy2.7-7.0.0" -jobs: - include: - - os: osx - python: "3.7" - osx_image: xcode11.2 # Python 3.7.4 running on macOS 10.14.4 - language: shell # 'language: python' is an error on Travis CI macOS - before_install: - - brew install pkg-config - - brew install icu4c - - export PATH="$PATH:/usr/local/opt/icu4c/bin" - - export PKG_CONFIG_PATH="$PKG_CONFIG_PATH:/usr/local/opt/icu4c/lib/pkgconfig" - - which uconv - - uconv -V - - export ICU_VERSION="$(uconv -V | sed -e 's,.*\ - if [[ "$TRAVIS_PYTHON_VERSION" == "2"* ]] || [[ "$TRAVIS_PYTHON_VERSION" == "pypy"* ]] && [[ "$TRAVIS_PYTHON_VERSION" != "pypy3"* ]]; then - pip install -r requirements-py2.txt; - else - pip3 install -r requirements-py3.txt; - fi -# command to run tests -script: - # pypy2 and pypy3 segfault on Travis CI if running all tests in the same process - - > - if [[ "$TRAVIS_PYTHON_VERSION" == "pypy" ]]; then - nosetests --collect-only -v tests 2>&1 \ - | grep -e 'ok$' \ - | while read func class etc; do - class="${class//[()]/}"; - class="${class%.*}:${class##*.}"; - nosetests -v "$class.$func"; - done || ( echo "$s" >> "script-failures.log" ); - if [ -e "script-failures.log" ]; then - exit 1; - fi; - elif [[ "$TRAVIS_PYTHON_VERSION" == "pypy3" ]]; then - find tests -type f -name "*.py" | while read s; do - ( [ ! -x "$s" ] && nosetests --no-byte-compile -s -v "$s" ) || ( echo "$s" >> "script-failures.log" ); - done; - if [ -e "script-failures.log" ]; then - exit 1; - fi; - else - nosetests --no-byte-compile --with-coverage tests; - fi -after_failure: - - > - if [ -e "script-failures.log" ]; then - echo $(cat "script-failures.log"); - fi -addons: - apt: - packages: - - language-pack-fr - - language-pack-de - - language-pack-ko - - pkg-config diff --git a/AUTHORS.rst b/AUTHORS.rst index 28091202..ca5e1b32 100644 --- a/AUTHORS.rst +++ b/AUTHORS.rst @@ -37,8 +37,14 @@ agate is made by a community. The following individuals have contributed code, d * `Neil MartinsenBurrell `_ * `Aliaksei Urbanski `_ * `Forest Gregg `_ +* `Robert Schütz `_ * `Wouter de Vries `_ * `Kartik Agaram `_ * `Loïc Corbasson `_ -* `Robert Schütz `_ * `Danny Sepler `_ +* `brian-from-quantrocket `_ +* `mathdesc `_ +* `Tim Gates `_ +* `castorf `_ +* `Julien Enselme `__ + diff --git a/CHANGELOG.rst b/CHANGELOG.rst index edc9aad9..705cbe39 100644 --- a/CHANGELOG.rst +++ b/CHANGELOG.rst @@ -1,21 +1,70 @@ -1.6.2 - Unreleased ------------------- +1.9.1 - December 21, 2023 +------------------------- + +* Add Babel 2.14 support. + +1.9.0 - October 17, 2023 +------------------------ + +* feat: Add a ``text_truncation_chars`` configuration for values that exceed ``max_column_width`` in :meth:`.Table.print_table` and :meth:`.Table.print_html`. +* feat: Add a ``number_truncation_chars`` configuration for values that exceed ``max_precision`` in :meth:`.Table.print_table` and :meth:`.Table.print_html`. + +1.8.0 - October 10, 2023 +------------------------ + +* feat: Lowercase the ``null_values`` provided to individual data types, since all comparisons to ``null_values`` are case-insensitive. (#770) +* feat: :class:`.Mean` works with :class:`.TimeDelta`. (#761) +* Switch from ``pytz`` to ``ZoneInfo``. +* Add Python 3.12 support. +* Drop Python 3.7 support (end-of-life was June 27, 2023). -* :meth:`.Date.__init__` and :meth:`.DateTime.__init__` accepts a ``locale`` keyword argument (e.g. :code:`en_US`) for parsing formatted dates. (#730) -* :meth:`.utils.max_precision` ignores infinity when calculating precision. (#726) -* :meth:`.Date.cast` catches ``OverflowError`` when type testing. (#720) -* :meth:`.Number.cast` casts ``True`` to ``1`` and ``False`` to ``0``. (#733) +1.7.1 - January 4, 2023 +----------------------- + +* Allow parsedatetime 2.6. + +1.7.0 - January 3, 2023 +----------------------- + +* Add Python 3.10 and 3.11 support. +* Drop support for Python 2.7 (EOL 2020-01-01), 3.6 (2021-12-23). + +1.6.3 - July 15, 2021 +--------------------- + +* feat: :meth:`.Table.from_csv` accepts a ``row_limit`` keyword argument. (#740) +* feat: :meth:`.Table.from_json` accepts an ``encoding`` keyword argument. (#734) +* feat: :meth:`.Table.print_html` accepts a ``max_precision`` keyword argument, like :meth:`.Table.print_table`. (#753) +* feat: :class:`.TypeTester` accepts a ``null_values`` keyword argument, like individual data types. (#745) +* feat: :class:`.Min`, :class:`.Max` and :class:`.Sum` (#735) work with :class:`.TimeDelta`. +* feat: :class:`.FieldSizeLimitError` includes the line number in the error message. (#681) +* feat: :class:`.csv.Sniffer` warns on error while sniffing CSV dialect. +* fix: :meth:`.Table.normalize` works with basic processing methods. (#691) +* fix: :meth:`.Table.homogenize` works with basic processing methods. (#756) +* fix: :meth:`.Table.homogenize` casts ``compare_values`` and ``default_row``. (#700) +* fix: :meth:`.Table.homogenize` accepts tuples. (#710) +* fix: :meth:`.TableSet.group_by` accepts input with no rows. (#703) +* fix: :class:`.TypeTester` warns if a column specified by the ``force`` argument is not in the table, instead of raising an error. (#747) +* fix: Aggregations return ``None`` if all values are ``None``, instead of raising an error. Note that ``Sum``, ``MaxLength`` and ``MaxPrecision`` continue to return ``0`` if all values are ``None``. (#706) +* fix: Ensure files are closed when errors occur. (#734) +* build: Make PyICU an optional dependency. +* Drop support for Python 3.4 (2019-03-18), 3.5 (2020-09-13). + +1.6.2 - March 10, 2021 +---------------------- + +* feat: :meth:`.Date.__init__` and :meth:`.DateTime.__init__` accepts a ``locale`` keyword argument (e.g. :code:`en_US`) for parsing formatted dates. (#730) +* feat: :meth:`.Number.cast` casts ``True`` to ``1`` and ``False`` to ``0``. (#733) +* fix: :meth:`.utils.max_precision` ignores infinity when calculating precision. (#726) +* fix: :meth:`.Date.cast` catches ``OverflowError`` when type testing. (#720) * Included examples in Python package. (#716) 1.6.1 - March 11, 2018 ---------------------- -* :meth:`.Date.cast` and :meth:`.DateTime.cast` will no longer parse strings that contain dates as dates. (#705) -* Added Forest Gregg to Authors. -* :meth:`.Table.to_json` can now use Decimal as keys. (#696) -* Link to tutorial now uses version through sphinx to avoid bad links on future releases. (#682) -* lxml limited to >= 3.6 and < 4 for pypy compatibility. - +* feat: :meth:`.Table.to_json` can use Decimal as keys. (#696) +* fix: :meth:`.Date.cast` and :meth:`.DateTime.cast` no longer parse non-date strings that contain date sub-strings as dates. (#705) +* docs: Link to tutorial now uses version through Sphinx to avoid bad links on future releases. (#682) 1.6.0 - February 28, 2017 ------------------------- @@ -71,7 +120,7 @@ This is a minor release fixing several small bugs that were blocking a downstrea 1.5.0 - November 16, 2016 ------------------------- -This release adds SVG charting via the `leather `_ charting library. Charts methods have been added for both :class:`.Table` and :class:`.TableSet`. (The latter create lattice plots.) See the revised tutorial and new cookbook entries for examples. Leather is still an early library. Please `report any bugs `_. +This release adds SVG charting via the `leather `_ charting library. Charts methods have been added for both :class:`.Table` and :class:`.TableSet`. (The latter create lattice plots.) See the revised tutorial and new cookbook entries for examples. Leather is still an early library. Please `report any bugs `_. Also in this release are a :class:`.Slugify` computation and a variety of small fixes and improvements. @@ -85,7 +134,7 @@ The complete list of changes is as follows: * Tables rendered by :meth:`.Table.print_table` are now GitHub Flavored Markdown (GFM) compatible. (#626) * The agate tutorial has been converted to a Jupyter Notebook. * :class:`.Table` now supports ``len`` as a proxy for ``len(table.rows)``. -* Simple SVG charting is now integrated via `leather `_. +* Simple SVG charting is now integrated via `leather `_. * Added :class:`.First` computation. (#634) * :meth:`.Table.print_table` now has a `max_precision` argument to limit Number precision. (#544) * Slug computation now accepts an array of column names to merge. (#617) @@ -334,7 +383,7 @@ This version of agate introduces three major changes. * :const:`.DEFAULT_NULL_VALUES` (the list of strings that mean null) is now importable from ``agate``. * :meth:`.Table.from_csv` and :meth:`.Table.to_csv` are now unicode-safe without separately importing csvkit. * ``agate`` can now be used as a drop-in replacement for Python's ``csv`` module. -* Migrated `csvkit `_'s unicode CSV reading/writing support into agate. (#354) +* Migrated `csvkit `_'s unicode CSV reading/writing support into agate. (#354) 1.0.1 - October 29, 2015 ------------------------ diff --git a/MANIFEST.in b/MANIFEST.in index 835552a8..72f9db54 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -1,4 +1,17 @@ -include CHANGELOG.rst COPYING -recursive-include docs * -recursive-include tests * -recursive-include examples * +include *.ipynb +include *.py +include *.rst +include COPYING +recursive-include benchmarks *.py +recursive-include docs *.py +recursive-include docs *.rst +recursive-include docs *.svg +recursive-include docs *.txt +recursive-include docs Makefile +recursive-include examples *.csv +recursive-include examples *.json +recursive-include examples testfixed +recursive-include tests *.py +exclude .pre-commit-config.yaml +exclude .readthedocs.yaml +global-exclude *.pyc diff --git a/README.rst b/README.rst index 54bc5795..321e333f 100644 --- a/README.rst +++ b/README.rst @@ -1,7 +1,15 @@ -.. image:: https://travis-ci.org/wireservice/agate.png - :target: https://travis-ci.org/wireservice/agate +.. image:: https://github.com/wireservice/agate/workflows/CI/badge.svg + :target: https://github.com/wireservice/agate/actions :alt: Build status +.. image:: https://coveralls.io/repos/wireservice/agate/badge.svg?branch=master + :target: https://coveralls.io/r/wireservice/agate + :alt: Coverage status + +.. image:: https://img.shields.io/pypi/dm/agate.svg + :target: https://pypi.python.org/pypi/agate + :alt: PyPI downloads + .. image:: https://img.shields.io/pypi/v/agate.svg :target: https://pypi.python.org/pypi/agate :alt: Version @@ -20,6 +28,6 @@ agate was previously known as journalism. Important links: -* Documentation: http://agate.rtfd.org +* Documentation: https://agate.rtfd.org * Repository: https://github.com/wireservice/agate * Issues: https://github.com/wireservice/agate/issues diff --git a/agate/__init__.py b/agate/__init__.py index cac0342f..77a01443 100644 --- a/agate/__init__.py +++ b/agate/__init__.py @@ -1,24 +1,16 @@ -#!/usr/bin/env python - -import six - +import agate.csv_py3 as csv from agate.aggregations import * -from agate.data_types import * -from agate.columns import Column # noqa +from agate.columns import Column from agate.computations import * -from agate.config import get_option, set_option, set_options # noqa +from agate.config import get_option, set_option, set_options +from agate.data_types import * from agate.exceptions import * -# import agate.fixed as fixed # noqa -from agate.mapped_sequence import MappedSequence # noqa -from agate.rows import Row # noqa -from agate.table import Table # noqa -from agate.tableset import TableSet # noqa -from agate.testcase import AgateTestCase # noqa -from agate.type_tester import TypeTester # noqa +# import agate.fixed as fixed +from agate.mapped_sequence import MappedSequence +from agate.rows import Row +from agate.table import Table +from agate.tableset import TableSet +from agate.testcase import AgateTestCase +from agate.type_tester import TypeTester from agate.utils import * -from agate.warns import NullCalculationWarning, DuplicateColumnWarning, warn_null_calculation, warn_duplicate_column # noqa - -if six.PY2: # pragma: no cover - import agate.csv_py2 as csv # noqa -else: - import agate.csv_py3 as csv # noqa +from agate.warns import DuplicateColumnWarning, NullCalculationWarning, warn_duplicate_column, warn_null_calculation diff --git a/agate/aggregations/__init__.py b/agate/aggregations/__init__.py index cf82a30f..6d2cc2f8 100644 --- a/agate/aggregations/__init__.py +++ b/agate/aggregations/__init__.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - """ Aggregations create a new value by summarizing a :class:`.Column`. For example, :class:`.Mean`, when applied to a column containing :class:`.Number` @@ -15,27 +13,26 @@ with a column for each aggregation and a row for each table in the set. """ -from agate.aggregations.base import Aggregation # noqa - -from agate.aggregations.all import All # noqa -from agate.aggregations.any import Any # noqa -from agate.aggregations.count import Count # noqa -from agate.aggregations.deciles import Deciles # noqa -from agate.aggregations.first import First # noqa -from agate.aggregations.has_nulls import HasNulls # noqa -from agate.aggregations.iqr import IQR # noqa -from agate.aggregations.mad import MAD # noqa -from agate.aggregations.max_length import MaxLength # noqa -from agate.aggregations.max_precision import MaxPrecision # noqa -from agate.aggregations.max import Max # noqa -from agate.aggregations.mean import Mean # noqa -from agate.aggregations.median import Median # noqa -from agate.aggregations.min import Min # noqa -from agate.aggregations.mode import Mode # noqa -from agate.aggregations.percentiles import Percentiles # noqa -from agate.aggregations.quartiles import Quartiles # noqa -from agate.aggregations.quintiles import Quintiles # noqa -from agate.aggregations.stdev import StDev, PopulationStDev # noqa -from agate.aggregations.sum import Sum # noqa -from agate.aggregations.summary import Summary # noqa -from agate.aggregations.variance import Variance, PopulationVariance # noqa +from agate.aggregations.all import All +from agate.aggregations.any import Any +from agate.aggregations.base import Aggregation +from agate.aggregations.count import Count +from agate.aggregations.deciles import Deciles +from agate.aggregations.first import First +from agate.aggregations.has_nulls import HasNulls +from agate.aggregations.iqr import IQR +from agate.aggregations.mad import MAD +from agate.aggregations.max import Max +from agate.aggregations.max_length import MaxLength +from agate.aggregations.max_precision import MaxPrecision +from agate.aggregations.mean import Mean +from agate.aggregations.median import Median +from agate.aggregations.min import Min +from agate.aggregations.mode import Mode +from agate.aggregations.percentiles import Percentiles +from agate.aggregations.quartiles import Quartiles +from agate.aggregations.quintiles import Quintiles +from agate.aggregations.stdev import PopulationStDev, StDev +from agate.aggregations.sum import Sum +from agate.aggregations.summary import Summary +from agate.aggregations.variance import PopulationVariance, Variance diff --git a/agate/aggregations/all.py b/agate/aggregations/all.py index 2a7f9290..0915e473 100644 --- a/agate/aggregations/all.py +++ b/agate/aggregations/all.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate.aggregations.base import Aggregation from agate.data_types import Boolean @@ -27,7 +25,7 @@ def get_aggregate_data_type(self, table): return Boolean() def validate(self, table): - column = table.columns[self._column_name] + table.columns[self._column_name] def run(self, table): """ diff --git a/agate/aggregations/any.py b/agate/aggregations/any.py index 8cea92a7..de25db97 100644 --- a/agate/aggregations/any.py +++ b/agate/aggregations/any.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate.aggregations.base import Aggregation from agate.data_types import Boolean @@ -27,7 +25,7 @@ def get_aggregate_data_type(self, table): return Boolean() def validate(self, table): - column = table.columns[self._column_name] + table.columns[self._column_name] def run(self, table): column = table.columns[self._column_name] diff --git a/agate/aggregations/base.py b/agate/aggregations/base.py index 4173ac61..307920f3 100644 --- a/agate/aggregations/base.py +++ b/agate/aggregations/base.py @@ -1,12 +1,7 @@ -#!/usr/bin/env python - -import six - from agate.exceptions import UnsupportedAggregationError -@six.python_2_unicode_compatible -class Aggregation(object): # pragma: no cover +class Aggregation: # pragma: no cover """ Aggregations create a new value by summarizing a :class:`.Column`. diff --git a/agate/aggregations/count.py b/agate/aggregations/count.py index e7c83917..7b6f8336 100644 --- a/agate/aggregations/count.py +++ b/agate/aggregations/count.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate.aggregations.base import Aggregation from agate.data_types import Number from agate.utils import default @@ -31,7 +29,5 @@ def run(self, table): if self._column_name is not None: if self._value is not default: return table.columns[self._column_name].values().count(self._value) - else: - return len(table.columns[self._column_name].values_without_nulls()) - else: - return len(table.rows) + return len(table.columns[self._column_name].values_without_nulls()) + return len(table.rows) diff --git a/agate/aggregations/deciles.py b/agate/aggregations/deciles.py index 188ee5f7..7d63233c 100644 --- a/agate/aggregations/deciles.py +++ b/agate/aggregations/deciles.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate.aggregations.base import Aggregation from agate.aggregations.has_nulls import HasNulls from agate.aggregations.percentiles import Percentiles diff --git a/agate/aggregations/first.py b/agate/aggregations/first.py index 37e16950..df774b1e 100644 --- a/agate/aggregations/first.py +++ b/agate/aggregations/first.py @@ -1,7 +1,4 @@ -#!/usr/bin/env python - from agate.aggregations.base import Aggregation -from agate.data_types import Boolean class First(Aggregation): @@ -39,4 +36,4 @@ def run(self, table): if self._test is None: return data[0] - return next((d for d in data if self._test(d))) + return next(d for d in data if self._test(d)) diff --git a/agate/aggregations/has_nulls.py b/agate/aggregations/has_nulls.py index 6f464932..039f480a 100644 --- a/agate/aggregations/has_nulls.py +++ b/agate/aggregations/has_nulls.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate.aggregations.base import Aggregation from agate.data_types import Boolean diff --git a/agate/aggregations/iqr.py b/agate/aggregations/iqr.py index 81d7c73a..12aa8c56 100644 --- a/agate/aggregations/iqr.py +++ b/agate/aggregations/iqr.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate.aggregations.base import Aggregation from agate.aggregations.has_nulls import HasNulls from agate.aggregations.percentiles import Percentiles @@ -36,4 +34,5 @@ def validate(self, table): def run(self, table): percentiles = self._percentiles.run(table) - return percentiles[75] - percentiles[25] + if percentiles[75] is not None and percentiles[25] is not None: + return percentiles[75] - percentiles[25] diff --git a/agate/aggregations/mad.py b/agate/aggregations/mad.py index 122db1b7..52958639 100644 --- a/agate/aggregations/mad.py +++ b/agate/aggregations/mad.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate.aggregations.base import Aggregation from agate.aggregations.has_nulls import HasNulls from agate.aggregations.median import Median @@ -11,7 +9,7 @@ class MAD(Aggregation): """ - Calculate the `median absolute deviation `_ + Calculate the `median absolute deviation `_ of a column. :param column_name: @@ -39,6 +37,6 @@ def run(self, table): column = table.columns[self._column_name] data = column.values_without_nulls_sorted() - m = self._median.run(table) - - return median(tuple(abs(n - m) for n in data)) + if data: + m = self._median.run(table) + return median(tuple(abs(n - m) for n in data)) diff --git a/agate/aggregations/max.py b/agate/aggregations/max.py index 11b7cec5..f769ee43 100644 --- a/agate/aggregations/max.py +++ b/agate/aggregations/max.py @@ -1,7 +1,5 @@ -#!/usr/bin/env python - from agate.aggregations.base import Aggregation -from agate.data_types import Date, DateTime, Number +from agate.data_types import Date, DateTime, Number, TimeDelta from agate.exceptions import DataTypeError @@ -21,16 +19,18 @@ def __init__(self, column_name): def get_aggregate_data_type(self, table): column = table.columns[self._column_name] - if isinstance(column.data_type, (Number, Date, DateTime)): + if isinstance(column.data_type, (Date, DateTime, Number, TimeDelta)): return column.data_type def validate(self, table): column = table.columns[self._column_name] - if not isinstance(column.data_type, (Number, Date, DateTime)): + if not isinstance(column.data_type, (Date, DateTime, Number, TimeDelta)): raise DataTypeError('Min can only be applied to columns containing DateTime, Date or Number data.') def run(self, table): column = table.columns[self._column_name] - return max(column.values_without_nulls()) + data = column.values_without_nulls() + if data: + return max(data) diff --git a/agate/aggregations/max_length.py b/agate/aggregations/max_length.py index ec9146b1..33595a80 100644 --- a/agate/aggregations/max_length.py +++ b/agate/aggregations/max_length.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from decimal import Decimal from agate.aggregations.base import Aggregation @@ -13,7 +11,7 @@ class MaxLength(Aggregation): Note: On Python 2.7 this function may miscalcuate the length of unicode strings that contain "wide characters". For details see this StackOverflow - answer: http://stackoverflow.com/a/35462951 + answer: https://stackoverflow.com/a/35462951 :param column_name: The name of a column containing :class:`.Text` data. diff --git a/agate/aggregations/max_precision.py b/agate/aggregations/max_precision.py index 2aa6e5ff..a2dc9095 100644 --- a/agate/aggregations/max_precision.py +++ b/agate/aggregations/max_precision.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate.aggregations.base import Aggregation from agate.data_types import Number from agate.exceptions import DataTypeError diff --git a/agate/aggregations/mean.py b/agate/aggregations/mean.py index d2de20c9..9fe27dc9 100644 --- a/agate/aggregations/mean.py +++ b/agate/aggregations/mean.py @@ -1,9 +1,7 @@ -#!/usr/bin/env python - from agate.aggregations.base import Aggregation from agate.aggregations.has_nulls import HasNulls from agate.aggregations.sum import Sum -from agate.data_types import Number +from agate.data_types import Number, TimeDelta from agate.exceptions import DataTypeError from agate.warns import warn_null_calculation @@ -20,13 +18,16 @@ def __init__(self, column_name): self._sum = Sum(column_name) def get_aggregate_data_type(self, table): - return Number() + column = table.columns[self._column_name] + + if isinstance(column.data_type, (Number, TimeDelta)): + return column.data_type def validate(self, table): column = table.columns[self._column_name] - if not isinstance(column.data_type, Number): - raise DataTypeError('Mean can only be applied to columns containing Number data.') + if not isinstance(column.data_type, (Number, TimeDelta)): + raise DataTypeError('Mean can only be applied to columns containing Number or TimeDelta data.') has_nulls = HasNulls(self._column_name).run(table) @@ -35,10 +36,7 @@ def validate(self, table): def run(self, table): column = table.columns[self._column_name] - num_of_values = len(column.values_without_nulls()) - # If there are no non-null columns then return null. - if num_of_values == 0: - return None - - sum_total = self._sum.run(table) - return sum_total / num_of_values + data = column.values_without_nulls() + if data: + sum_total = self._sum.run(table) + return sum_total / len(data) diff --git a/agate/aggregations/median.py b/agate/aggregations/median.py index 5abf4e02..aa68d28d 100644 --- a/agate/aggregations/median.py +++ b/agate/aggregations/median.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate.aggregations.base import Aggregation from agate.aggregations.has_nulls import HasNulls from agate.aggregations.percentiles import Percentiles diff --git a/agate/aggregations/min.py b/agate/aggregations/min.py index 974dfbda..2130739a 100644 --- a/agate/aggregations/min.py +++ b/agate/aggregations/min.py @@ -1,7 +1,5 @@ -#!/usr/bin/env python - from agate.aggregations.base import Aggregation -from agate.data_types import Date, DateTime, Number +from agate.data_types import Date, DateTime, Number, TimeDelta from agate.exceptions import DataTypeError @@ -21,16 +19,18 @@ def __init__(self, column_name): def get_aggregate_data_type(self, table): column = table.columns[self._column_name] - if isinstance(column.data_type, (Number, Date, DateTime)): + if isinstance(column.data_type, (Date, DateTime, Number, TimeDelta)): return column.data_type def validate(self, table): column = table.columns[self._column_name] - if not isinstance(column.data_type, (Number, Date, DateTime)): + if not isinstance(column.data_type, (Date, DateTime, Number, TimeDelta)): raise DataTypeError('Min can only be applied to columns containing DateTime, Date or Number data.') def run(self, table): column = table.columns[self._column_name] - return min(column.values_without_nulls()) + data = column.values_without_nulls() + if data: + return min(data) diff --git a/agate/aggregations/mode.py b/agate/aggregations/mode.py index d9aa1ac3..8ea50b5d 100644 --- a/agate/aggregations/mode.py +++ b/agate/aggregations/mode.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from collections import defaultdict from agate.aggregations.base import Aggregation @@ -37,9 +35,10 @@ def run(self, table): column = table.columns[self._column_name] data = column.values_without_nulls() - state = defaultdict(int) + if data: + state = defaultdict(int) - for n in data: - state[n] += 1 + for n in data: + state[n] += 1 - return max(state.keys(), key=lambda x: state[x]) + return max(state.keys(), key=lambda x: state[x]) diff --git a/agate/aggregations/percentiles.py b/agate/aggregations/percentiles.py index 56b6ec19..6fb48d0d 100644 --- a/agate/aggregations/percentiles.py +++ b/agate/aggregations/percentiles.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - import math from agate.aggregations.base import Aggregation @@ -51,6 +49,9 @@ def run(self, table): data = column.values_without_nulls_sorted() + if not data: + return Quantiles([None for percentile in range(101)]) + # Zeroth percentile is first datum quantiles = [data[0]] diff --git a/agate/aggregations/quartiles.py b/agate/aggregations/quartiles.py index 7025056f..103c4027 100644 --- a/agate/aggregations/quartiles.py +++ b/agate/aggregations/quartiles.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate.aggregations.base import Aggregation from agate.aggregations.has_nulls import HasNulls from agate.aggregations.percentiles import Percentiles diff --git a/agate/aggregations/quintiles.py b/agate/aggregations/quintiles.py index 05bed638..8ded98ee 100644 --- a/agate/aggregations/quintiles.py +++ b/agate/aggregations/quintiles.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate.aggregations.base import Aggregation from agate.aggregations.has_nulls import HasNulls from agate.aggregations.percentiles import Percentiles diff --git a/agate/aggregations/stdev.py b/agate/aggregations/stdev.py index 74f18862..398d3272 100644 --- a/agate/aggregations/stdev.py +++ b/agate/aggregations/stdev.py @@ -1,8 +1,6 @@ -#!/usr/bin/env python - from agate.aggregations import Aggregation from agate.aggregations.has_nulls import HasNulls -from agate.aggregations.variance import Variance, PopulationVariance +from agate.aggregations.variance import PopulationVariance, Variance from agate.data_types import Number from agate.exceptions import DataTypeError from agate.warns import warn_null_calculation @@ -36,7 +34,9 @@ def validate(self, table): warn_null_calculation(self, column) def run(self, table): - return self._variance.run(table).sqrt() + variance = self._variance.run(table) + if variance is not None: + return variance.sqrt() class PopulationStDev(StDev): @@ -67,4 +67,6 @@ def validate(self, table): warn_null_calculation(self, column) def run(self, table): - return self._population_variance.run(table).sqrt() + variance = self._population_variance.run(table) + if variance is not None: + return variance.sqrt() diff --git a/agate/aggregations/sum.py b/agate/aggregations/sum.py index 0d45342a..7218317e 100644 --- a/agate/aggregations/sum.py +++ b/agate/aggregations/sum.py @@ -1,4 +1,3 @@ -#!/usr/bin/env python import datetime from agate.aggregations.base import Aggregation diff --git a/agate/aggregations/summary.py b/agate/aggregations/summary.py index 1ae26f24..e1ab4a43 100644 --- a/agate/aggregations/summary.py +++ b/agate/aggregations/summary.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate.aggregations.base import Aggregation diff --git a/agate/aggregations/variance.py b/agate/aggregations/variance.py index 0dbc4d08..f81857eb 100644 --- a/agate/aggregations/variance.py +++ b/agate/aggregations/variance.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate.aggregations.base import Aggregation from agate.aggregations.has_nulls import HasNulls from agate.aggregations.mean import Mean @@ -39,9 +37,9 @@ def run(self, table): column = table.columns[self._column_name] data = column.values_without_nulls() - mean = self._mean.run(table) - - return sum((n - mean) ** 2 for n in data) / (len(data) - 1) + if data: + mean = self._mean.run(table) + return sum((n - mean) ** 2 for n in data) / (len(data) - 1) class PopulationVariance(Variance): @@ -75,6 +73,6 @@ def run(self, table): column = table.columns[self._column_name] data = column.values_without_nulls() - mean = self._mean.run(table) - - return sum((n - mean) ** 2 for n in data) / len(data) + if data: + mean = self._mean.run(table) + return sum((n - mean) ** 2 for n in data) / len(data) diff --git a/agate/columns.py b/agate/columns.py index 7e556c3e..7aa8e365 100644 --- a/agate/columns.py +++ b/agate/columns.py @@ -1,20 +1,13 @@ -#!/usr/bin/env python - """ This module contains the :class:`Column` class, which defines a "vertical" array of tabular data. Whereas :class:`.Row` instances are independent of their parent :class:`.Table`, columns depend on knowledge of both their position in the parent (column name, data type) as well as the rows that contain their data. """ -import six from agate.mapped_sequence import MappedSequence from agate.utils import NullOrder, memoize -if six.PY3: # pragma: no cover - # pylint: disable=W0622 - xrange = range - def null_handler(k): """ diff --git a/agate/computations/__init__.py b/agate/computations/__init__.py index 14bdf52e..f8232faf 100644 --- a/agate/computations/__init__.py +++ b/agate/computations/__init__.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - """ Computations create a new value for each :class:`.Row` in a :class:`.Table`. When used with :meth:`.Table.compute` these new values become a new column. @@ -13,12 +11,11 @@ class by inheriting from :class:`Computation`. """ -from agate.computations.base import Computation # noqa - -from agate.computations.formula import Formula # noqa -from agate.computations.change import Change # noqa -from agate.computations.percent import Percent # noqa -from agate.computations.percent_change import PercentChange # noqa -from agate.computations.rank import Rank # noqa -from agate.computations.percentile_rank import PercentileRank # noqa -from agate.computations.slug import Slug # noqa +from agate.computations.base import Computation +from agate.computations.change import Change +from agate.computations.formula import Formula +from agate.computations.percent import Percent +from agate.computations.percent_change import PercentChange +from agate.computations.percentile_rank import PercentileRank +from agate.computations.rank import Rank +from agate.computations.slug import Slug diff --git a/agate/computations/base.py b/agate/computations/base.py index 6ea05182..59b27dca 100644 --- a/agate/computations/base.py +++ b/agate/computations/base.py @@ -1,10 +1,4 @@ -#!/usr/bin/env python - -import six - - -@six.python_2_unicode_compatible -class Computation(object): # pragma: no cover +class Computation: # pragma: no cover """ Computations produce a new column by performing a calculation on each row. diff --git a/agate/computations/change.py b/agate/computations/change.py index 4925f67b..ea03fe3f 100644 --- a/agate/computations/change.py +++ b/agate/computations/change.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate.aggregations.has_nulls import HasNulls from agate.computations.base import Computation from agate.data_types import Date, DateTime, Number, TimeDelta @@ -29,7 +27,7 @@ def get_computed_data_type(self, table): if isinstance(before_column.data_type, (Date, DateTime, TimeDelta)): return TimeDelta() - elif isinstance(before_column.data_type, Number): + if isinstance(before_column.data_type, Number): return Number() def validate(self, table): @@ -49,7 +47,8 @@ def validate(self, table): return - raise DataTypeError('Change before and after columns must both contain data that is one of: Number, Date, DateTime or TimeDelta.') + raise DataTypeError('Change before and after columns must both contain data that is one of: ' + 'Number, Date, DateTime or TimeDelta.') def run(self, table): new_column = [] diff --git a/agate/computations/formula.py b/agate/computations/formula.py index 3a0f947b..3f5d66de 100644 --- a/agate/computations/formula.py +++ b/agate/computations/formula.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate.computations.base import Computation diff --git a/agate/computations/percent.py b/agate/computations/percent.py index 422ba4bd..3fb440e0 100644 --- a/agate/computations/percent.py +++ b/agate/computations/percent.py @@ -1,6 +1,3 @@ -#!/usr/bin/env python - - from agate.aggregations.has_nulls import HasNulls from agate.aggregations.sum import Sum from agate.computations.base import Computation diff --git a/agate/computations/percent_change.py b/agate/computations/percent_change.py index 8c287941..49f905ee 100644 --- a/agate/computations/percent_change.py +++ b/agate/computations/percent_change.py @@ -1,8 +1,5 @@ -#!/usr/bin/env python - from agate.aggregations.has_nulls import HasNulls from agate.computations.base import Computation - from agate.data_types import Number from agate.exceptions import DataTypeError from agate.warns import warn_null_calculation diff --git a/agate/computations/percentile_rank.py b/agate/computations/percentile_rank.py index e3c912ef..667c7064 100644 --- a/agate/computations/percentile_rank.py +++ b/agate/computations/percentile_rank.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate.aggregations.percentiles import Percentiles from agate.computations.rank import Rank from agate.data_types import Number diff --git a/agate/computations/rank.py b/agate/computations/rank.py index 46cec5c9..e525b94f 100644 --- a/agate/computations/rank.py +++ b/agate/computations/rank.py @@ -1,11 +1,5 @@ -#!/usr/bin/env python - from decimal import Decimal - -import six - -if six.PY3: - from functools import cmp_to_key +from functools import cmp_to_key from agate.computations.base import Computation from agate.data_types import Number @@ -44,10 +38,7 @@ def run(self, table): column = table.columns[self._column_name] if self._comparer: - if six.PY3: - data_sorted = sorted(column.values(), key=cmp_to_key(self._comparer)) - else: # pragma: no cover - data_sorted = sorted(column.values(), cmp=self._comparer) + data_sorted = sorted(column.values(), key=cmp_to_key(self._comparer)) else: data_sorted = column.values_sorted() diff --git a/agate/computations/slug.py b/agate/computations/slug.py index b6780e68..d8b3339f 100644 --- a/agate/computations/slug.py +++ b/agate/computations/slug.py @@ -1,10 +1,8 @@ -#!/usr/bin/env python - from agate.aggregations.has_nulls import HasNulls from agate.computations.base import Computation from agate.data_types import Text from agate.exceptions import DataTypeError -from agate.utils import slugify, issequence +from agate.utils import issequence, slugify class Slug(Computation): diff --git a/agate/config.py b/agate/config.py index f3c11dbb..f79ee875 100644 --- a/agate/config.py +++ b/agate/config.py @@ -1,6 +1,3 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - """ This module contains the global configuration for agate. Users should use :meth:`get_option` and :meth:`set_option` to modify the global @@ -13,47 +10,54 @@ +=========================+==========================================+=========================================+ | default_locale | Default locale for number formatting | default_locale('LC_NUMERIC') or 'en_US' | +-------------------------+------------------------------------------+-----------------------------------------+ -| horizontal_line_char | Character to render for horizontal lines | u'-' | +| horizontal_line_char | Character to render for horizontal lines | '-' | ++-------------------------+------------------------------------------+-----------------------------------------+ +| vertical_line_char | Character to render for vertical lines | '|' | +-------------------------+------------------------------------------+-----------------------------------------+ -| vertical_line_char | Character to render for vertical lines | u'|' | +| bar_char | Character to render for bar chart units | '░' | +-------------------------+------------------------------------------+-----------------------------------------+ -| bar_char | Character to render for bar chart units | u'░' | +| printable_bar_char | Printable character for bar chart units | ':' | +-------------------------+------------------------------------------+-----------------------------------------+ -| printable_bar_char | Printable character for bar chart units | u':' | +| zero_line_char | Character to render for zero line units | '▓' | +-------------------------+------------------------------------------+-----------------------------------------+ -| zero_line_char | Character to render for zero line units | u'▓' | +| printable_zero_line_char| Printable character for zero line units | '|' | +-------------------------+------------------------------------------+-----------------------------------------+ -| printable_zero_line_char| Printable character for zero line units | u'|' | +| tick_char | Character to render for axis ticks | '+' | +-------------------------+------------------------------------------+-----------------------------------------+ -| tick_char | Character to render for axis ticks | u'+' | +| ellipsis_chars | Characters to render for ellipsis | '...' | +-------------------------+------------------------------------------+-----------------------------------------+ -| ellipsis_chars | Characters to render for ellipsis | u'...' | +| text_truncation_chars | Characters for truncated text values | '...' | ++-------------------------+------------------------------------------+-----------------------------------------+ +| number_truncation_chars | Characters for truncated number values | '…' | +-------------------------+------------------------------------------+-----------------------------------------+ """ from babel.core import default_locale - _options = { #: Default locale for number formatting 'default_locale': default_locale('LC_NUMERIC') or 'en_US', #: Character to render for horizontal lines - 'horizontal_line_char': u'-', + 'horizontal_line_char': '-', #: Character to render for vertical lines - 'vertical_line_char': u'|', + 'vertical_line_char': '|', #: Character to render for bar chart units - 'bar_char': u'░', + 'bar_char': '░', #: Printable character to render for bar chart units - 'printable_bar_char': u':', + 'printable_bar_char': ':', #: Character to render for zero line units - 'zero_line_char': u'▓', + 'zero_line_char': '▓', #: Printable character to render for zero line units - 'printable_zero_line_char': u'|', + 'printable_zero_line_char': '|', #: Character to render for axis ticks - 'tick_char': u'+', + 'tick_char': '+', #: Characters to render for ellipsis - 'ellipsis_chars': u'...', + 'ellipsis_chars': '...', + #: Characters for truncated text values + 'text_truncation_chars': '...', + #: Characters for truncated number values + 'number_truncation_chars': '…', } diff --git a/agate/csv_py2.py b/agate/csv_py2.py deleted file mode 100644 index 034c5877..00000000 --- a/agate/csv_py2.py +++ /dev/null @@ -1,270 +0,0 @@ -#!/usr/bin/env python - -""" -This module contains the Python 2 replacement for :mod:`csv`. -""" - -import codecs -import csv - -import six - -from agate.exceptions import FieldSizeLimitError - -EIGHT_BIT_ENCODINGS = [ - 'utf-8', 'u8', 'utf', 'utf8', - 'latin-1', 'iso-8859-1', 'iso8859-1', '8859', 'cp819', 'latin', 'latin1', 'l1' -] - -POSSIBLE_DELIMITERS = [',', '\t', ';', ' ', ':', '|'] - - -class UTF8Recoder(six.Iterator): - """ - Iterator that reads an encoded stream and reencodes the input to UTF-8. - """ - def __init__(self, f, encoding): - self.reader = codecs.getreader(encoding)(f) - - def __iter__(self): - return self - - def __next__(self): - return next(self.reader).encode('utf-8') - - -class UnicodeReader(object): - """ - A CSV reader which will read rows from a file in a given encoding. - """ - def __init__(self, f, encoding='utf-8', field_size_limit=None, line_numbers=False, header=True, **kwargs): - self.line_numbers = line_numbers - self.header = header - - f = UTF8Recoder(f, encoding) - - self.reader = csv.reader(f, **kwargs) - - if field_size_limit: - csv.field_size_limit(field_size_limit) - - def next(self): - try: - row = next(self.reader) - except csv.Error as e: - # Terrible way to test for this exception, but there is no subclass - if 'field larger than field limit' in str(e): - raise FieldSizeLimitError(csv.field_size_limit()) - else: - raise e - - if self.line_numbers: - if self.header and self.line_num == 1: - row.insert(0, 'line_numbers') - else: - row.insert(0, str(self.line_num - 1 if self.header else self.line_num)) - - return [six.text_type(s, 'utf-8') for s in row] - - def __iter__(self): - return self - - @property - def dialect(self): - return self.reader.dialect - - @property - def line_num(self): - return self.reader.line_num - - -class UnicodeWriter(object): - """ - A CSV writer which will write rows to a file in the specified encoding. - - NB: Optimized so that eight-bit encodings skip re-encoding. See: - https://github.com/wireservice/csvkit/issues/175 - """ - def __init__(self, f, encoding='utf-8', **kwargs): - self.encoding = encoding - self._eight_bit = (self.encoding.lower().replace('_', '-') in EIGHT_BIT_ENCODINGS) - - if self._eight_bit: - self.writer = csv.writer(f, **kwargs) - else: - # Redirect output to a queue for reencoding - self.queue = six.StringIO() - self.writer = csv.writer(self.queue, **kwargs) - self.stream = f - self.encoder = codecs.getincrementalencoder(encoding)() - - def writerow(self, row): - if self._eight_bit: - self.writer.writerow([six.text_type(s if s is not None else '').encode(self.encoding) for s in row]) - else: - self.writer.writerow([six.text_type(s if s is not None else '').encode('utf-8') for s in row]) - # Fetch UTF-8 output from the queue... - data = self.queue.getvalue() - data = data.decode('utf-8') - # ...and reencode it into the target encoding - data = self.encoder.encode(data) - # write to the file - self.stream.write(data) - # empty the queue - self.queue.truncate(0) - - def writerows(self, rows): - for row in rows: - self.writerow(row) - - -class UnicodeDictReader(csv.DictReader): - """ - Defer almost all implementation to :class:`csv.DictReader`, but wraps our - unicode reader instead of :func:`csv.reader`. - """ - def __init__(self, f, fieldnames=None, restkey=None, restval=None, *args, **kwargs): - reader = UnicodeReader(f, *args, **kwargs) - - if 'encoding' in kwargs: - kwargs.pop('encoding') - - csv.DictReader.__init__(self, f, fieldnames, restkey, restval, *args, **kwargs) - - self.reader = reader - - -class UnicodeDictWriter(csv.DictWriter): - """ - Defer almost all implementation to :class:`csv.DictWriter`, but wraps our - unicode writer instead of :func:`csv.writer`. - """ - def __init__(self, f, fieldnames, restval='', extrasaction='raise', *args, **kwds): - self.fieldnames = fieldnames - self.restval = restval - - if extrasaction.lower() not in ('raise', 'ignore'): - raise ValueError('extrasaction (%s) must be "raise" or "ignore"' % extrasaction) - - self.extrasaction = extrasaction - - self.writer = UnicodeWriter(f, *args, **kwds) - - -class Reader(UnicodeReader): - """ - A unicode-aware CSV reader. - """ - pass - - -class Writer(UnicodeWriter): - """ - A unicode-aware CSV writer. - """ - def __init__(self, f, encoding='utf-8', line_numbers=False, **kwargs): - self.row_count = 0 - self.line_numbers = line_numbers - - if 'lineterminator' not in kwargs: - kwargs['lineterminator'] = '\n' - - UnicodeWriter.__init__(self, f, encoding, **kwargs) - - def _append_line_number(self, row): - if self.row_count == 0: - row.insert(0, 'line_number') - else: - row.insert(0, self.row_count) - - self.row_count += 1 - - def writerow(self, row): - if self.line_numbers: - row = list(row) - self._append_line_number(row) - - # Convert embedded Mac line endings to unix style line endings so they get quoted - row = [i.replace('\r', '\n') if isinstance(i, six.string_types) else i for i in row] - - UnicodeWriter.writerow(self, row) - - def writerows(self, rows): - for row in rows: - self.writerow(row) - - -class DictReader(UnicodeDictReader): - """ - A unicode-aware CSV DictReader. - """ - pass - - -class DictWriter(UnicodeDictWriter): - """ - A unicode-aware CSV DictWriter. - """ - def __init__(self, f, fieldnames, encoding='utf-8', line_numbers=False, **kwargs): - self.row_count = 0 - self.line_numbers = line_numbers - - if 'lineterminator' not in kwargs: - kwargs['lineterminator'] = '\n' - - UnicodeDictWriter.__init__(self, f, fieldnames, encoding=encoding, **kwargs) - - def _append_line_number(self, row): - if self.row_count == 0: - row['line_number'] = 0 - else: - row['line_number'] = self.row_count - - self.row_count += 1 - - def writerow(self, row): - if self.line_numbers: - row = list(row) - self._append_line_number(row) - - # Convert embedded Mac line endings to unix style line endings so they get quoted - row = dict([(k, v.replace('\r', '\n')) if isinstance(v, basestring) else (k, v) for k, v in row.items()]) - - UnicodeDictWriter.writerow(self, row) - - def writerows(self, rows): - for row in rows: - self.writerow(row) - - -class Sniffer(object): - """ - A functional wrapper of ``csv.Sniffer()``. - """ - def sniff(self, sample): - """ - A functional version of ``csv.Sniffer().sniff``, that extends the - list of possible delimiters to include some seen in the wild. - """ - try: - dialect = csv.Sniffer().sniff(sample, POSSIBLE_DELIMITERS) - except: - dialect = None - - return dialect - - -def reader(*args, **kwargs): - """ - A replacement for Python's :func:`csv.reader` that uses - :class:`.csv_py2.Reader`. - """ - return Reader(*args, **kwargs) - - -def writer(*args, **kwargs): - """ - A replacement for Python's :func:`csv.writer` that uses - :class:`.csv_py2.Writer`. - """ - return Writer(*args, **kwargs) diff --git a/agate/csv_py3.py b/agate/csv_py3.py index 66f5b925..a408924c 100644 --- a/agate/csv_py3.py +++ b/agate/csv_py3.py @@ -1,19 +1,16 @@ -#!/usr/bin/env python - """ This module contains the Python 3 replacement for :mod:`csv`. """ import csv - -import six +import warnings from agate.exceptions import FieldSizeLimitError POSSIBLE_DELIMITERS = [',', '\t', ';', ' ', ':', '|'] -class Reader(six.Iterator): +class Reader: """ A wrapper around Python 3's builtin :func:`csv.reader`. """ @@ -35,20 +32,20 @@ def __next__(self): except csv.Error as e: # Terrible way to test for this exception, but there is no subclass if 'field larger than field limit' in str(e): - raise FieldSizeLimitError(csv.field_size_limit()) + raise FieldSizeLimitError(csv.field_size_limit(), self.line_num) else: raise e if not self.line_numbers: return row - else: - if self.line_numbers: - if self.header and self.line_num == 1: - row.insert(0, 'line_numbers') - else: - row.insert(0, str(self.line_num - 1 if self.header else self.line_num)) - return row + if self.line_numbers: + if self.header and self.line_num == 1: + row.insert(0, 'line_numbers') + else: + row.insert(0, str(self.line_num - 1 if self.header else self.line_num)) + + return row @property def dialect(self): @@ -59,7 +56,7 @@ def line_num(self): return self.reader.line_num -class Writer(object): +class Writer: """ A wrapper around Python 3's builtin :func:`csv.writer`. """ @@ -86,7 +83,7 @@ def writerow(self, row): self._append_line_number(row) # Convert embedded Mac line endings to unix style line endings so they get quoted - row = [i.replace('\r', '\n') if isinstance(i, six.string_types) else i for i in row] + row = [i.replace('\r', '\n') if isinstance(i, str) else i for i in row] self.writer.writerow(row) @@ -128,7 +125,7 @@ def _append_line_number(self, row): def writerow(self, row): # Convert embedded Mac line endings to unix style line endings so they get quoted - row = dict([(k, v.replace('\r', '\n')) if isinstance(v, six.string_types) else (k, v) for k, v in row.items()]) + row = dict([(k, v.replace('\r', '\n')) if isinstance(v, str) else (k, v) for k, v in row.items()]) if self.line_numbers: self._append_line_number(row) @@ -140,7 +137,7 @@ def writerows(self, rows): self.writerow(row) -class Sniffer(object): +class Sniffer: """ A functional wrapper of ``csv.Sniffer()``. """ @@ -151,7 +148,8 @@ def sniff(self, sample): """ try: dialect = csv.Sniffer().sniff(sample, POSSIBLE_DELIMITERS) - except: + except csv.Error as e: + warnings.warn('Error sniffing CSV dialect: %s' % e, RuntimeWarning, stacklevel=2) dialect = None return dialect diff --git a/agate/data_types/__init__.py b/agate/data_types/__init__.py index 1a9cef43..0bb3d1d8 100644 --- a/agate/data_types/__init__.py +++ b/agate/data_types/__init__.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - """ Data types define how data should be imported during the creation of a :class:`.Table`. @@ -9,11 +7,11 @@ control how types are guessed. """ -from agate.data_types.base import DEFAULT_NULL_VALUES, DataType # noqa -from agate.data_types.boolean import Boolean, DEFAULT_TRUE_VALUES, DEFAULT_FALSE_VALUES # noqa -from agate.data_types.date import Date # noqa -from agate.data_types.date_time import DateTime # noqa -from agate.data_types.number import Number # noqa -from agate.data_types.text import Text # noqa -from agate.data_types.time_delta import TimeDelta # noqa -from agate.exceptions import CastError # noqa +from agate.data_types.base import DEFAULT_NULL_VALUES, DataType +from agate.data_types.boolean import DEFAULT_FALSE_VALUES, DEFAULT_TRUE_VALUES, Boolean +from agate.data_types.date import Date +from agate.data_types.date_time import DateTime +from agate.data_types.number import Number +from agate.data_types.text import Text +from agate.data_types.time_delta import TimeDelta +from agate.exceptions import CastError diff --git a/agate/data_types/base.py b/agate/data_types/base.py index de7eeb5a..951ab072 100644 --- a/agate/data_types/base.py +++ b/agate/data_types/base.py @@ -1,14 +1,10 @@ -#!/usr/bin/env python - -import six - from agate.exceptions import CastError #: Default values which will be automatically cast to :code:`None` DEFAULT_NULL_VALUES = ('', 'na', 'n/a', 'none', 'null', '.') -class DataType(object): # pragma: no cover +class DataType: # pragma: no cover """ Specifies how values should be parsed when creating a :class:`.Table`. @@ -16,7 +12,7 @@ class DataType(object): # pragma: no cover :code:`None` when encountered by this data type. """ def __init__(self, null_values=DEFAULT_NULL_VALUES): - self.null_values = null_values + self.null_values = [v.lower() for v in null_values] def test(self, d): """ @@ -45,7 +41,7 @@ def csvify(self, d): if d is None: return None - return six.text_type(d) + return str(d) def jsonify(self, d): """ @@ -54,4 +50,4 @@ def jsonify(self, d): if d is None: return None - return six.text_type(d) + return str(d) diff --git a/agate/data_types/boolean.py b/agate/data_types/boolean.py index ae890bd2..63137a3e 100644 --- a/agate/data_types/boolean.py +++ b/agate/data_types/boolean.py @@ -1,13 +1,6 @@ -#!/usr/bin/env python +from decimal import Decimal -try: - from cdecimal import Decimal -except ImportError: # pragma: no cover - from decimal import Decimal - -import six - -from agate.data_types.base import DataType, DEFAULT_NULL_VALUES +from agate.data_types.base import DEFAULT_NULL_VALUES, DataType from agate.exceptions import CastError #: Default values which will be automatically cast to :code:`True`. @@ -29,8 +22,9 @@ class Boolean(DataType): :param false_values: A sequence of values which should be cast to :code:`False` when encountered with this type. """ - def __init__(self, true_values=DEFAULT_TRUE_VALUES, false_values=DEFAULT_FALSE_VALUES, null_values=DEFAULT_NULL_VALUES): - super(Boolean, self).__init__(null_values=null_values) + def __init__(self, true_values=DEFAULT_TRUE_VALUES, false_values=DEFAULT_FALSE_VALUES, + null_values=DEFAULT_NULL_VALUES): + super().__init__(null_values=null_values) self.true_values = true_values self.false_values = false_values @@ -44,23 +38,23 @@ def cast(self, d): """ if d is None: return d - elif type(d) is bool and type(d) is not int: + if type(d) is bool and type(d) is not int: return d - elif type(d) is int or isinstance(d, Decimal): + if type(d) is int or isinstance(d, Decimal): if d == 1: return True - elif d == 0: + if d == 0: return False - elif isinstance(d, six.string_types): + if isinstance(d, str): d = d.replace(',', '').strip() d_lower = d.lower() if d_lower in self.null_values: return None - elif d_lower in self.true_values: + if d_lower in self.true_values: return True - elif d_lower in self.false_values: + if d_lower in self.false_values: return False raise CastError('Can not convert value %s to bool.' % d) diff --git a/agate/data_types/date.py b/agate/data_types/date.py index 57e329b6..18d6e1e0 100644 --- a/agate/data_types/date.py +++ b/agate/data_types/date.py @@ -1,16 +1,11 @@ -#!/usr/bin/env python - +import locale from datetime import date, datetime, time -import isodate -import locale import parsedatetime -import six from agate.data_types.base import DataType from agate.exceptions import CastError - ZERO_DT = datetime.combine(date.min, time.min) @@ -26,12 +21,12 @@ class Date(DataType): for parsing formatted dates. """ def __init__(self, date_format=None, locale=None, **kwargs): - super(Date, self).__init__(**kwargs) + super().__init__(**kwargs) self.date_format = date_format self.locale = locale - self._constants = parsedatetime.Constants(localeID=self.locale, usePyICU=True) + self._constants = parsedatetime.Constants(localeID=self.locale) self._parser = parsedatetime.Calendar(constants=self._constants, version=parsedatetime.VERSION_CONTEXT_STYLE) def __getstate__(self): @@ -51,7 +46,7 @@ def __setstate__(self, ndict): of the parsedatetime Calendar class. """ self.__dict__.update(ndict) - self._constants = parsedatetime.Constants(localeID=self.locale, usePyICU=True) + self._constants = parsedatetime.Constants(localeID=self.locale) self._parser = parsedatetime.Calendar(constants=self._constants, version=parsedatetime.VERSION_CONTEXT_STYLE) def cast(self, d): @@ -61,12 +56,12 @@ def cast(self, d): If both `date_format` and `locale` have been specified in the `agate.Date` instance, the `cast()` function is not thread-safe. - :returns: - :class:`datetime.date` or :code:`None`. + :returns: :class:`datetime.date` or :code:`None`. """ if type(d) is date or d is None: return d - elif isinstance(d, six.string_types): + + if isinstance(d, str): d = d.strip() if d.lower() in self.null_values: @@ -82,7 +77,7 @@ def cast(self, d): try: dt = datetime.strptime(d, self.date_format) - except: + except (ValueError, TypeError): raise CastError('Value "%s" does not match date format.' % d) finally: if orig_locale: diff --git a/agate/data_types/date_time.py b/agate/data_types/date_time.py index eea6840c..71fad4fd 100644 --- a/agate/data_types/date_time.py +++ b/agate/data_types/date_time.py @@ -1,11 +1,8 @@ -#!/usr/bin/env python - import datetime +import locale import isodate -import locale import parsedatetime -import six from agate.data_types.base import DataType from agate.exceptions import CastError @@ -19,14 +16,13 @@ class DateTime(DataType): A formatting string for :meth:`datetime.datetime.strptime` to use instead of using regex-based parsing. :param timezone: - A `pytz `_ timezone to apply to each - parsed date. + A ``ZoneInfo`` timezone to apply to each parsed date. :param locale: A locale specification such as :code:`en_US` or :code:`de_DE` to use for parsing formatted datetimes. """ def __init__(self, datetime_format=None, timezone=None, locale=None, **kwargs): - super(DateTime, self).__init__(**kwargs) + super().__init__(**kwargs) self.datetime_format = datetime_format self.timezone = timezone @@ -36,7 +32,7 @@ def __init__(self, datetime_format=None, timezone=None, locale=None, **kwargs): self._source_time = datetime.datetime( now.year, now.month, now.day, 0, 0, 0, 0, None ) - self._constants = parsedatetime.Constants(localeID=self.locale, usePyICU=True) + self._constants = parsedatetime.Constants(localeID=self.locale) self._parser = parsedatetime.Calendar(constants=self._constants, version=parsedatetime.VERSION_CONTEXT_STYLE) def __getstate__(self): @@ -56,7 +52,7 @@ def __setstate__(self, ndict): of the parsedatetime Calendar class. """ self.__dict__.update(ndict) - self._constants = parsedatetime.Constants(localeID=self.locale, usePyICU=True) + self._constants = parsedatetime.Constants(localeID=self.locale) self._parser = parsedatetime.Calendar(constants=self._constants, version=parsedatetime.VERSION_CONTEXT_STYLE) def cast(self, d): @@ -66,14 +62,13 @@ def cast(self, d): If both `date_format` and `locale` have been specified in the `agate.DateTime` instance, the `cast()` function is not thread-safe. - :returns: - :class:`datetime.datetime` or :code:`None`. + :returns: :class:`datetime.datetime` or :code:`None`. """ if isinstance(d, datetime.datetime) or d is None: return d - elif isinstance(d, datetime.date): + if isinstance(d, datetime.date): return datetime.datetime.combine(d, datetime.time(0, 0, 0)) - elif isinstance(d, six.string_types): + if isinstance(d, str): d = d.strip() if d.lower() in self.null_values: @@ -89,7 +84,7 @@ def cast(self, d): try: dt = datetime.datetime.strptime(d, self.datetime_format) - except: + except (ValueError, TypeError): raise CastError('Value "%s" does not match date format.' % d) finally: if orig_locale: @@ -99,7 +94,7 @@ def cast(self, d): try: (_, _, _, _, matched_text), = self._parser.nlp(d, sourceTime=self._source_time) - except: + except Exception: matched_text = None else: value, ctx = self._parser.parseDT( @@ -110,14 +105,14 @@ def cast(self, d): if matched_text == d and ctx.hasDate and ctx.hasTime: return value - elif matched_text == d and ctx.hasDate and not ctx.hasTime: + if matched_text == d and ctx.hasDate and not ctx.hasTime: return datetime.datetime.combine(value.date(), datetime.time.min) try: dt = isodate.parse_datetime(d) return dt - except: + except Exception: pass raise CastError('Can not parse value "%s" as datetime.' % d) diff --git a/agate/data_types/number.py b/agate/data_types/number.py index a92b95ec..b6dbee99 100644 --- a/agate/data_types/number.py +++ b/agate/data_types/number.py @@ -1,21 +1,14 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - -try: - from cdecimal import Decimal, InvalidOperation -except ImportError: # pragma: no cover - from decimal import Decimal, InvalidOperation - import warnings +from decimal import Decimal, InvalidOperation from babel.core import Locale -import six from agate.data_types.base import DataType from agate.exceptions import CastError -#: A list of currency symbols sourced from `Xe `_. -DEFAULT_CURRENCY_SYMBOLS = [u'؋', u'$', u'ƒ', u'៛', u'¥', u'₡', u'₱', u'£', u'€', u'¢', u'﷼', u'₪', u'₩', u'₭', u'₮', u'₦', u'฿', u'₤', u'₫'] +#: A list of currency symbols sourced from `Xe `_. +DEFAULT_CURRENCY_SYMBOLS = ['؋', '$', 'ƒ', '៛', '¥', '₡', '₱', '£', '€', '¢', '﷼', '₪', '₩', '₭', '₮', + '₦', '฿', '₤', '₫'] POSITIVE = Decimal('1') NEGATIVE = Decimal('-1') @@ -37,8 +30,9 @@ class Number(DataType): :param currency_symbols: A sequence of currency symbols to strip from numbers. """ - def __init__(self, locale='en_US', group_symbol=None, decimal_symbol=None, currency_symbols=DEFAULT_CURRENCY_SYMBOLS, **kwargs): - super(Number, self).__init__(**kwargs) + def __init__(self, locale='en_US', group_symbol=None, decimal_symbol=None, + currency_symbols=DEFAULT_CURRENCY_SYMBOLS, **kwargs): + super().__init__(**kwargs) self.locale = Locale.parse(locale) @@ -49,8 +43,11 @@ def __init__(self, locale='en_US', group_symbol=None, decimal_symbol=None, curre with warnings.catch_warnings(): warnings.simplefilter("ignore") - self.group_symbol = group_symbol or self.locale.number_symbols.get('group', ',') - self.decimal_symbol = decimal_symbol or self.locale.number_symbols.get('decimal', '.') + # Babel 2.14 support. + # https://babel.pocoo.org/en/latest/changelog.html#possibly-backwards-incompatible-changes + number_symbols = self.locale.number_symbols.get('latn', self.locale.number_symbols) + self.group_symbol = group_symbol or number_symbols.get('group', ',') + self.decimal_symbol = decimal_symbol or number_symbols.get('decimal', '.') def cast(self, d): """ @@ -66,15 +63,13 @@ def cast(self, d): if t is int: return Decimal(d) - elif six.PY2 and t is long: - return Decimal(d) - elif t is float: + if t is float: return Decimal(repr(d)) - elif d is False: + if d is False: return Decimal(0) - elif d is True: + if d is True: return Decimal(1) - elif not isinstance(d, six.string_types): + if not isinstance(d, str): raise CastError('Can not parse value "%s" as Decimal.' % d) d = d.strip() diff --git a/agate/data_types/text.py b/agate/data_types/text.py index 6bd210ea..263d2157 100644 --- a/agate/data_types/text.py +++ b/agate/data_types/text.py @@ -1,7 +1,3 @@ -#!/usr/bin/env python - -import six - from agate.data_types.base import DataType @@ -14,7 +10,7 @@ class Text(DataType): converted to `None`. Disable to retain them as strings. """ def __init__(self, cast_nulls=True, **kwargs): - super(Text, self).__init__(**kwargs) + super().__init__(**kwargs) self.cast_nulls = cast_nulls @@ -29,8 +25,8 @@ def cast(self, d): """ if d is None: return d - elif isinstance(d, six.string_types): + if isinstance(d, str): if self.cast_nulls and d.strip().lower() in self.null_values: return None - return six.text_type(d) + return str(d) diff --git a/agate/data_types/time_delta.py b/agate/data_types/time_delta.py index a577a81b..0c0fc73d 100644 --- a/agate/data_types/time_delta.py +++ b/agate/data_types/time_delta.py @@ -1,9 +1,6 @@ -#!/usr/bin/env python - import datetime import pytimeparse -import six from agate.data_types.base import DataType from agate.exceptions import CastError @@ -24,7 +21,7 @@ def cast(self, d): """ if isinstance(d, datetime.timedelta) or d is None: return d - elif isinstance(d, six.string_types): + if isinstance(d, str): d = d.strip() if d.lower() in self.null_values: diff --git a/agate/exceptions.py b/agate/exceptions.py index 7fbfcc29..6bf4294f 100644 --- a/agate/exceptions.py +++ b/agate/exceptions.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - """ This module contains various exceptions raised by agate. """ @@ -34,7 +32,8 @@ class FieldSizeLimitError(Exception): # pragma: no cover This length may be the default or one set by the user. """ - def __init__(self, limit): - super(FieldSizeLimitError, self).__init__( - 'CSV contains fields longer than maximum length of %i characters. Try raising the maximum with the field_size_limit parameter, or try setting quoting=csv.QUOTE_NONE.' % limit + def __init__(self, limit, line_number): + super().__init__( + 'CSV contains a field longer than the maximum length of %i characters on line %i. Try raising the maximum ' + 'with the field_size_limit parameter, or try setting quoting=csv.QUOTE_NONE.' % (limit, line_number) ) diff --git a/agate/fixed.py b/agate/fixed.py index 944e0126..42fd89e7 100644 --- a/agate/fixed.py +++ b/agate/fixed.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - """ This module contains a generic parser for fixed-width files. It operates similar to Python's built-in CSV reader. @@ -7,13 +5,10 @@ from collections import OrderedDict, namedtuple -import six - - Field = namedtuple('Field', ['name', 'start', 'length']) -class Reader(six.Iterator): +class Reader: """ Reads a fixed-width file using a column schema in CSV format. diff --git a/agate/mapped_sequence.py b/agate/mapped_sequence.py index 16fb3c92..bb1d16c5 100644 --- a/agate/mapped_sequence.py +++ b/agate/mapped_sequence.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - """ This module contains the :class:`MappedSequence` class that forms the foundation for agate's :class:`.Row` and :class:`.Column` as well as for named sequences of @@ -7,13 +5,7 @@ """ from collections import OrderedDict -try: - from collections.abc import Sequence -except ImportError: - from collections import Sequence - -import six -from six.moves import range # pylint: disable=W0622 +from collections.abc import Sequence from agate.utils import memoize @@ -66,22 +58,22 @@ def __unicode__(self): """ Print a unicode sample of the contents of this sequence. """ - sample = u', '.join(repr(d) for d in self.values()[:5]) + sample = ', '.join(repr(d) for d in self.values()[:5]) if len(self) > 5: - sample = u'%s, ...' % sample + sample = '%s, ...' % sample - return u'' % (type(self).__name__, sample) + return f'' def __str__(self): """ Print an ascii sample of the contents of this sequence. """ - if six.PY2: # pragma: no cover - return str(self.__unicode__().encode('utf8')) - return str(self.__unicode__()) + def __repr__(self): + return self.__str__() + def __getitem__(self, key): """ Retrieve values from this array by index, slice or key. @@ -89,13 +81,11 @@ def __getitem__(self, key): if isinstance(key, slice): indices = range(*key.indices(len(self))) values = self.values() - return tuple(values[i] for i in indices) # Note: can't use isinstance because bool is a subclass of int elif type(key) is int: return self.values()[key] - else: - return self.dict()[key] + return self.dict()[key] def __setitem__(self, key, value): """ @@ -159,8 +149,7 @@ def get(self, key, default=None): except KeyError: if default: return default - else: - return None + return None @memoize def dict(self): diff --git a/agate/rows.py b/agate/rows.py index 431118d2..5b0bf0cc 100644 --- a/agate/rows.py +++ b/agate/rows.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - """ This module contains agate's :class:`Row` implementation. Rows are independent of both the :class:`.Table` that contains them as well as the :class:`.Columns` diff --git a/agate/table/__init__.py b/agate/table/__init__.py index 150d9ee9..51afca46 100644 --- a/agate/table/__init__.py +++ b/agate/table/__init__.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - """ The :class:`.Table` object is the most important class in agate. Tables are created by supplying row data, column names and subclasses of :class:`.DataType` @@ -19,25 +17,21 @@ rows, row names are optional.) """ -from itertools import chain import sys import warnings +from io import StringIO +from itertools import chain -import six -from six.moves import range # pylint: disable=W0622 - +from agate import utils from agate.columns import Column from agate.data_types import DataType +from agate.exceptions import CastError from agate.mapped_sequence import MappedSequence from agate.rows import Row from agate.type_tester import TypeTester -from agate import utils -from agate.exceptions import CastError -from agate.warns import warn_duplicate_column, warn_unnamed_column -@six.python_2_unicode_compatible -class Table(object): +class Table: """ A dataset consisting of rows and columns. Columns refer to "vertical" slices of data that must all be of the same type. Rows refer to "horizontal" slices @@ -77,15 +71,17 @@ class Table(object): assumed to be :class:`.Row` instances, rather than raw data. """ def __init__(self, rows, column_names=None, column_types=None, row_names=None, _is_fork=False): - if isinstance(rows, six.string_types): - raise ValueError('When created directly, the first argument to Table must be a sequence of rows. Did you want agate.Table.from_csv?') + if isinstance(rows, str): + raise ValueError('When created directly, the first argument to Table must be a sequence of rows. ' + 'Did you want agate.Table.from_csv?') # Validate column names if column_names: self._column_names = utils.deduplicate(column_names, column_names=True) elif rows: self._column_names = tuple(utils.letter_name(i) for i in range(len(rows[0]))) - warnings.warn('Column names not specified. "%s" will be used as names.' % str(self._column_names), RuntimeWarning, stacklevel=2) + warnings.warn('Column names not specified. "%s" will be used as names.' % str(self._column_names), + RuntimeWarning, stacklevel=2) else: self._column_names = tuple() @@ -121,7 +117,9 @@ def __init__(self, rows, column_names=None, column_types=None, row_names=None, _ len_row = len(row) if len_row > len_column_names: - raise ValueError('Row %i has %i values, but Table only has %i columns.' % (i, len_row, len_column_names)) + raise ValueError( + 'Row %i has %i values, but Table only has %i columns.' % (i, len_row, len_column_names) + ) elif len(row) < len_column_names: row = chain(row, [None] * (len_column_names - len_row)) @@ -130,7 +128,7 @@ def __init__(self, rows, column_names=None, column_types=None, row_names=None, _ try: row_values.append(cast_funcs[j](d)) except CastError as e: - raise CastError(str(e) + ' Error at row %s column %s.' % (i, self._column_names[j])) + raise CastError(str(e) + f' Error at row {i} column {self._column_names[j]}.') new_rows.append(Row(row_values, self._column_names)) else: @@ -139,7 +137,7 @@ def __init__(self, rows, column_names=None, column_types=None, row_names=None, _ if row_names: computed_row_names = [] - if isinstance(row_names, six.string_types): + if isinstance(row_names, str): for row in new_rows: name = row[row_names] computed_row_names.append(name) @@ -179,7 +177,7 @@ def __str__(self): """ Print the table's structure using :meth:`.Table.print_structure`. """ - structure = six.StringIO() + structure = StringIO() self.print_structure(output=structure) diff --git a/agate/table/aggregate.py b/agate/table/aggregate.py index f6c04377..e5071092 100644 --- a/agate/table/aggregate.py +++ b/agate/table/aggregate.py @@ -1,6 +1,3 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - from collections import OrderedDict from agate import utils @@ -29,7 +26,7 @@ def aggregate(self, aggregations): results[name] = agg.run(self) return results - else: - aggregations.validate(self) - return aggregations.run(self) + aggregations.validate(self) + + return aggregations.run(self) diff --git a/agate/table/bar_chart.py b/agate/table/bar_chart.py index 9e3da515..2c7ec636 100644 --- a/agate/table/bar_chart.py +++ b/agate/table/bar_chart.py @@ -1,10 +1,5 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - import leather -from agate import utils - def bar_chart(self, label=0, value=1, path=None, width=None, height=None): """ diff --git a/agate/table/bins.py b/agate/table/bins.py index b4f3fe2f..3514877f 100644 --- a/agate/table/bins.py +++ b/agate/table/bins.py @@ -1,15 +1,9 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - -try: - from cdecimal import Decimal -except ImportError: # pragma: no cover - from decimal import Decimal +from decimal import Decimal from babel.numbers import format_decimal -from agate.aggregations import Min, Max from agate import utils +from agate.aggregations import Max, Min def bins(self, column_name, count=10, start=None, end=None): @@ -65,9 +59,9 @@ def name_bin(i, j, first_exclusive=True, last_exclusive=False): inclusive = format_decimal(i, format=break_formatter) exclusive = format_decimal(j, format=break_formatter) - output = u'[' if first_exclusive else u'(' - output += u'%s - %s' % (inclusive, exclusive) - output += u']' if last_exclusive else u')' + output = '[' if first_exclusive else '(' + output += f'{inclusive} - {exclusive}' + output += ']' if last_exclusive else ')' return output diff --git a/agate/table/column_chart.py b/agate/table/column_chart.py index 11bca44a..c2467ee8 100644 --- a/agate/table/column_chart.py +++ b/agate/table/column_chart.py @@ -1,10 +1,5 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - import leather -from agate import utils - def column_chart(self, label=0, value=1, path=None, width=None, height=None): """ diff --git a/agate/table/compute.py b/agate/table/compute.py index 887fe88f..11b94c88 100644 --- a/agate/table/compute.py +++ b/agate/table/compute.py @@ -1,6 +1,3 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - from collections import OrderedDict from copy import copy @@ -29,7 +26,9 @@ def compute(self, computations, replace=False): if new_column_name in column_names: if not replace: - raise ValueError('New column name "%s" already exists. Specify replace=True to replace with computed data.') + raise ValueError( + 'New column name "%s" already exists. Specify replace=True to replace with computed data.' + ) i = column_names.index(new_column_name) column_types[i] = new_column_type diff --git a/agate/table/denormalize.py b/agate/table/denormalize.py index beec713b..8175187b 100644 --- a/agate/table/denormalize.py +++ b/agate/table/denormalize.py @@ -1,22 +1,14 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - from collections import OrderedDict +from decimal import Decimal -try: - from cdecimal import Decimal -except ImportError: # pragma: no cover - from decimal import Decimal - -import six - +from agate import utils from agate.data_types import Number -from agate.type_tester import TypeTester from agate.rows import Row -from agate import utils +from agate.type_tester import TypeTester -def denormalize(self, key=None, property_column='property', value_column='value', default_value=utils.default, column_types=None): +def denormalize(self, key=None, property_column='property', value_column='value', default_value=utils.default, + column_types=None): """ Create a new table with row values converted into columns. @@ -90,7 +82,7 @@ def denormalize(self, key=None, property_column='property', value_column='value' if row_key not in row_data: row_data[row_key] = OrderedDict() - f = six.text_type(row[property_column]) + f = str(row[property_column]) v = row[value_column] if f not in field_names: diff --git a/agate/table/distinct.py b/agate/table/distinct.py index a991bc2e..9f510967 100644 --- a/agate/table/distinct.py +++ b/agate/table/distinct.py @@ -1,6 +1,3 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - from agate import utils diff --git a/agate/table/exclude.py b/agate/table/exclude.py index 1e713505..b4a7a4d7 100644 --- a/agate/table/exclude.py +++ b/agate/table/exclude.py @@ -1,6 +1,3 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - from agate import utils diff --git a/agate/table/find.py b/agate/table/find.py index d11ab1de..d13d7cdd 100644 --- a/agate/table/find.py +++ b/agate/table/find.py @@ -1,7 +1,3 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - - def find(self, test): """ Find the first row that passes a test. diff --git a/agate/table/from_csv.py b/agate/table/from_csv.py index 1e962a97..3b1ac074 100644 --- a/agate/table/from_csv.py +++ b/agate/table/from_csv.py @@ -1,11 +1,10 @@ -#!/usr/bin/env python - -import io -import six +import itertools +from io import StringIO @classmethod -def from_csv(cls, path, column_names=None, column_types=None, row_names=None, skip_lines=0, header=True, sniff_limit=0, encoding='utf-8', **kwargs): +def from_csv(cls, path, column_names=None, column_types=None, row_names=None, skip_lines=0, header=True, sniff_limit=0, + encoding='utf-8', row_limit=None, **kwargs): """ Create a new table from a CSV. @@ -25,8 +24,7 @@ def from_csv(cls, path, column_names=None, column_types=None, row_names=None, sk :param row_names: See :meth:`.Table.__init__`. :param skip_lines: - The number of lines to skip from the top of the file. Note that skip - lines will not work with + The number of lines to skip from the top of the file. :param header: If :code:`True`, the first row of the CSV is assumed to contain column names. If :code:`header` and :code:`column_names` are both specified @@ -38,6 +36,8 @@ def from_csv(cls, path, column_names=None, column_types=None, row_names=None, sk Character encoding of the CSV file. Note: if passing in a file handle it is assumed you have already opened it with the correct encoding specified. + :param row_limit: + Limit how many rows of data will be read. """ from agate import csv from agate.table import Table @@ -48,10 +48,7 @@ def from_csv(cls, path, column_names=None, column_types=None, row_names=None, sk if hasattr(path, 'read'): f = path else: - if six.PY2: - f = open(path, 'Urb') - else: - f = io.open(path, encoding=encoding) + f = open(path, encoding=encoding) close = True @@ -62,16 +59,13 @@ def from_csv(cls, path, column_names=None, column_types=None, row_names=None, sk else: raise ValueError('skip_lines argument must be an int') - contents = six.StringIO(f.read()) + contents = StringIO(f.read()) if sniff_limit is None: kwargs['dialect'] = csv.Sniffer().sniff(contents.getvalue()) elif sniff_limit > 0: kwargs['dialect'] = csv.Sniffer().sniff(contents.getvalue()[:sniff_limit]) - if six.PY2: - kwargs['encoding'] = encoding - reader = csv.reader(contents, header=header, **kwargs) if header: @@ -80,7 +74,10 @@ def from_csv(cls, path, column_names=None, column_types=None, row_names=None, sk else: next(reader) - rows = tuple(reader) + if row_limit is None: + rows = tuple(reader) + else: + rows = tuple(itertools.islice(reader, row_limit)) finally: if close: diff --git a/agate/table/from_fixed.py b/agate/table/from_fixed.py index f73fee85..08ad9a0e 100644 --- a/agate/table/from_fixed.py +++ b/agate/table/from_fixed.py @@ -1,13 +1,9 @@ -#!/usr/bin/env python - -import io - -from agate import fixed -from agate import utils +from agate import fixed, utils @classmethod -def from_fixed(cls, path, schema_path, column_names=utils.default, column_types=None, row_names=None, encoding='utf-8', schema_encoding='utf-8'): +def from_fixed(cls, path, schema_path, column_names=utils.default, column_types=None, row_names=None, encoding='utf-8', + schema_encoding='utf-8'): """ Create a new table from a fixed-width file and a CSV schema. @@ -42,13 +38,13 @@ def from_fixed(cls, path, schema_path, column_names=utils.default, column_types= try: if not hasattr(path, 'read'): - f = io.open(path, encoding=encoding) + f = open(path, encoding=encoding) close_f = True else: f = path if not hasattr(schema_path, 'read'): - schema_f = io.open(schema_path, encoding=schema_encoding) + schema_f = open(schema_path, encoding=schema_encoding) close_schema_f = True else: schema_f = path diff --git a/agate/table/from_json.py b/agate/table/from_json.py index 8516702a..77a4062d 100644 --- a/agate/table/from_json.py +++ b/agate/table/from_json.py @@ -1,10 +1,6 @@ -#!/usr/bin/env python - +import json from collections import OrderedDict from decimal import Decimal -import io -import json -import six @classmethod @@ -34,8 +30,8 @@ def from_json(cls, path, row_names=None, key=None, newline=False, column_types=N :param encoding: According to RFC4627, JSON text shall be encoded in Unicode; the default encoding is UTF-8. You can override this by using any encoding supported by your Python's open() function - if :code:`path` is a filepath. If passing in a file handle, it is assumed you have already opened it with the correct - encoding specified. + if :code:`path` is a filepath. If passing in a file handle, it is assumed you have already opened it with the + correct encoding specified. """ from agate.table import Table @@ -52,7 +48,7 @@ def from_json(cls, path, row_names=None, key=None, newline=False, column_types=N for line in path: js.append(json.loads(line, object_pairs_hook=OrderedDict, parse_float=Decimal, **kwargs)) else: - f = io.open(path, encoding=encoding) + f = open(path, encoding=encoding) close = True for line in f: @@ -61,14 +57,16 @@ def from_json(cls, path, row_names=None, key=None, newline=False, column_types=N if hasattr(path, 'read'): js = json.load(path, object_pairs_hook=OrderedDict, parse_float=Decimal, **kwargs) else: - f = io.open(path, encoding=encoding) + f = open(path, encoding=encoding) close = True js = json.load(f, object_pairs_hook=OrderedDict, parse_float=Decimal, **kwargs) if isinstance(js, dict): if not key: - raise TypeError('When converting a JSON document with a top-level dictionary element, a key must be specified.') + raise TypeError( + 'When converting a JSON document with a top-level dictionary element, a key must be specified.' + ) js = js[key] diff --git a/agate/table/from_object.py b/agate/table/from_object.py index 5675d7a9..f114e918 100644 --- a/agate/table/from_object.py +++ b/agate/table/from_object.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate import utils @@ -42,6 +40,25 @@ def from_object(cls, obj, row_names=None, column_types=None): Not all rows are required to have the same keys. Missing elements will be filled in with null values. + Keys containing a slash (``/``) can collide with other keys. For example: + + .. code-block:: python + + { + 'a/b': 2, + 'a': { + 'b': False + } + } + + Would generate: + + .. code-block:: python + + { + 'a/b': false + } + :param obj: Filepath or file-like object from which to read JSON data. :param row_names: diff --git a/agate/table/group_by.py b/agate/table/group_by.py index 0a7d8e23..b7bc405f 100644 --- a/agate/table/group_by.py +++ b/agate/table/group_by.py @@ -1,6 +1,3 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - from collections import OrderedDict from agate.data_types import Text @@ -55,6 +52,9 @@ def group_by(self, key, key_name=None, key_type=None): groups[group_name].append(row) + if not groups: + return TableSet([self._fork([])], [], key_name=key_name, key_type=key_type) + output = OrderedDict() for group, rows in groups.items(): diff --git a/agate/table/homogenize.py b/agate/table/homogenize.py index dd29a2d8..46c3fbe8 100644 --- a/agate/table/homogenize.py +++ b/agate/table/homogenize.py @@ -1,8 +1,5 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - -from agate.rows import Row from agate import utils +from agate.rows import Row def homogenize(self, key, compare_values, default_row=None): @@ -56,12 +53,14 @@ def homogenize(self, key, compare_values, default_row=None): column_values = [self._columns.get(name) for name in key] column_indexes = [self._column_names.index(name) for name in key] + compare_values = [[column_values[i].data_type.cast(v) for i, v in enumerate(values)] for values in compare_values] + column_values = zip(*column_values) differences = list(set(map(tuple, compare_values)) - set(column_values)) for difference in differences: if callable(default_row): - rows.append(Row(default_row(difference), self._column_names)) + new_row = default_row(difference) else: if default_row is not None: new_row = list(default_row) @@ -71,6 +70,8 @@ def homogenize(self, key, compare_values, default_row=None): for i, d in zip(column_indexes, difference): new_row.insert(i, d) - rows.append(Row(new_row, self._column_names)) + new_row = [self._columns[i].data_type.cast(v) for i, v in enumerate(new_row)] + rows.append(Row(new_row, self._column_names)) - return self._fork(rows) + # Do not copy the row_names, since this function adds rows. + return self._fork(rows, row_names=[]) diff --git a/agate/table/join.py b/agate/table/join.py index 48ee5cac..7ed2ea6c 100644 --- a/agate/table/join.py +++ b/agate/table/join.py @@ -1,17 +1,15 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - -from agate.rows import Row from agate import utils +from agate.rows import Row -def join(self, right_table, left_key=None, right_key=None, inner=False, full_outer=False, require_match=False, columns=None): +def join(self, right_table, left_key=None, right_key=None, inner=False, full_outer=False, require_match=False, + columns=None): """ Create a new table by joining two table's on common values. This method implements most varieties of SQL join, in addition to some unique features. If :code:`left_key` and :code:`right_key` are both :code:`None` then this - method will peform a "sequential join", which is to say it will join on row + method will perform a "sequential join", which is to say it will join on row number. The :code:`inner` and :code:`full_outer` arguments will determine whether dangling left-hand and right-hand rows are included, respectively. diff --git a/agate/table/limit.py b/agate/table/limit.py index adc2988f..701e6693 100644 --- a/agate/table/limit.py +++ b/agate/table/limit.py @@ -1,7 +1,3 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - - def limit(self, start_or_stop=None, stop=None, step=None): """ Create a new table with fewer rows. diff --git a/agate/table/line_chart.py b/agate/table/line_chart.py index 6e4c680c..3c40d3da 100644 --- a/agate/table/line_chart.py +++ b/agate/table/line_chart.py @@ -1,10 +1,5 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - import leather -from agate import utils - def line_chart(self, x=0, y=1, path=None, width=None, height=None): """ diff --git a/agate/table/merge.py b/agate/table/merge.py index d47fbc72..0e50dd39 100644 --- a/agate/table/merge.py +++ b/agate/table/merge.py @@ -1,6 +1,3 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - from collections import OrderedDict from agate.exceptions import DataTypeError diff --git a/agate/table/normalize.py b/agate/table/normalize.py index 3b941f59..0c0caa31 100644 --- a/agate/table/normalize.py +++ b/agate/table/normalize.py @@ -1,9 +1,6 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - -from agate.type_tester import TypeTester -from agate.rows import Row from agate import utils +from agate.rows import Row +from agate.type_tester import TypeTester def normalize(self, key, properties, property_column='property', value_column='value', column_types=None): @@ -94,4 +91,4 @@ def normalize(self, key, properties, property_column='property', value_column='v else: new_column_types = key_column_types + list(column_types) - return Table(new_rows, new_column_names, new_column_types, row_names=row_names) + return Table(new_rows, new_column_names, new_column_types) diff --git a/agate/table/order_by.py b/agate/table/order_by.py index 80f93ce3..7bfdf0bd 100644 --- a/agate/table/order_by.py +++ b/agate/table/order_by.py @@ -1,6 +1,3 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - from agate import utils @@ -19,32 +16,32 @@ def order_by(self, key, reverse=False): """ if len(self._rows) == 0: return self._fork(self._rows) - else: - key_is_row_function = hasattr(key, '__call__') - key_is_sequence = utils.issequence(key) - def sort_key(data): - row = data[1] + key_is_row_function = hasattr(key, '__call__') + key_is_sequence = utils.issequence(key) + + def sort_key(data): + row = data[1] - if key_is_row_function: - k = key(row) - elif key_is_sequence: - k = tuple(utils.NullOrder() if row[n] is None else row[n] for n in key) - else: - k = row[key] + if key_is_row_function: + k = key(row) + elif key_is_sequence: + k = tuple(utils.NullOrder() if row[n] is None else row[n] for n in key) + else: + k = row[key] - if k is None: - return utils.NullOrder() + if k is None: + return utils.NullOrder() - return k + return k - results = sorted(enumerate(self._rows), key=sort_key, reverse=reverse) + results = sorted(enumerate(self._rows), key=sort_key, reverse=reverse) - indices, rows = zip(*results) + indices, rows = zip(*results) - if self._row_names is not None: - row_names = [self._row_names[i] for i in indices] - else: - row_names = None + if self._row_names is not None: + row_names = [self._row_names[i] for i in indices] + else: + row_names = None - return self._fork(rows, row_names=row_names) + return self._fork(rows, row_names=row_names) diff --git a/agate/table/pivot.py b/agate/table/pivot.py index 1d90cbc7..f74d848c 100644 --- a/agate/table/pivot.py +++ b/agate/table/pivot.py @@ -1,10 +1,5 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - -import six - -from agate.aggregations import Count from agate import utils +from agate.aggregations import Count def pivot(self, key=None, pivot=None, aggregation=None, computation=None, default_value=utils.default, key_name=None): @@ -95,8 +90,8 @@ def pivot(self, key=None, pivot=None, aggregation=None, computation=None, defaul for k in key: groups = groups.group_by(k, key_name=key_name) - aggregation_name = six.text_type(aggregation) - computation_name = six.text_type(computation) if computation else None + aggregation_name = str(aggregation) + computation_name = str(computation) if computation else None def apply_computation(table): computed = table.compute([ @@ -124,7 +119,8 @@ def apply_computation(table): column_types = [column_type] * pivot_count - table = table.denormalize(key, pivot, computation_name or aggregation_name, default_value=default_value, column_types=column_types) + table = table.denormalize(key, pivot, computation_name or aggregation_name, default_value=default_value, + column_types=column_types) else: table = groups.aggregate([ (aggregation_name, aggregation) diff --git a/agate/table/print_bars.py b/agate/table/print_bars.py index 838d42a4..0f0cd589 100644 --- a/agate/table/print_bars.py +++ b/agate/table/print_bars.py @@ -1,28 +1,17 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- -# pylint: disable=W0212 - -from collections import OrderedDict - -try: - from cdecimal import Decimal -except ImportError: # pragma: no cover - from decimal import Decimal - import sys - +from collections import OrderedDict +from decimal import Decimal from babel.numbers import format_decimal -import six -from agate.aggregations import Min, Max -from agate import config +from agate import config, utils +from agate.aggregations import Max, Min from agate.data_types import Number from agate.exceptions import DataTypeError -from agate import utils -def print_bars(self, label_column_name='group', value_column_name='Count', domain=None, width=120, output=sys.stdout, printable=False): +def print_bars(self, label_column_name='group', value_column_name='Count', domain=None, width=120, output=sys.stdout, + printable=False): """ Print a text-based bar chart based on this table. @@ -77,7 +66,7 @@ def print_bars(self, label_column_name='group', value_column_name='Count', domai formatted_labels = [] for label in label_column: - formatted_labels.append(six.text_type(label)) + formatted_labels.append(str(label)) formatted_values = [] for value in value_column: @@ -90,8 +79,8 @@ def print_bars(self, label_column_name='group', value_column_name='Count', domai locale=locale )) - max_label_width = max(max([len(l) for l in formatted_labels]), len(y_label)) - max_value_width = max(max([len(v) for v in formatted_values]), len(x_label)) + max_label_width = max(max([len(label) for label in formatted_labels]), len(y_label)) + max_value_width = max(max([len(value) for value in formatted_values]), len(x_label)) plot_width = width - (max_label_width + max_value_width + 2) @@ -133,8 +122,7 @@ def print_bars(self, label_column_name='group', value_column_name='Count', domai def project(value): if value >= 0: return plot_negative_width + int((plot_positive_width * (value / x_max)).to_integral_value()) - else: - return plot_negative_width - int((plot_negative_width * (value / x_min)).to_integral_value()) + return plot_negative_width - int((plot_negative_width * (value / x_min)).to_integral_value()) # Calculate ticks ticks = OrderedDict() @@ -184,7 +172,7 @@ def write(line): output.write(line + '\n') # Chart top - top_line = u'%s %s' % (y_label.ljust(max_label_width), x_label.rjust(max_value_width)) + top_line = f'{y_label.ljust(max_label_width)} {x_label.rjust(max_value_width)}' write(top_line) # Bars @@ -203,7 +191,7 @@ def write(line): bar = bar_mark * bar_width if value is not None and value >= 0: - gap = (u' ' * plot_negative_width) + gap = (' ' * plot_negative_width) # All positive if x_min <= 0: @@ -211,7 +199,7 @@ def write(line): else: bar = bar + gap + zero_mark else: - bar = u' ' * (plot_negative_width - bar_width) + bar + bar = ' ' * (plot_negative_width - bar_width) + bar # All negative or mixed signs if value is None or x_max > value: @@ -219,11 +207,11 @@ def write(line): bar = bar.ljust(plot_width) - write('%s %s %s' % (label_text, value_text, bar)) + write(f'{label_text} {value_text} {bar}') # Axis & ticks axis = horizontal_line * plot_width - tick_text = u' ' * width + tick_text = ' ' * width for i, (tick, label) in enumerate(ticks_formatted.items()): # First tick diff --git a/agate/table/print_html.py b/agate/table/print_html.py index 411a6bd3..41c0837b 100644 --- a/agate/table/print_html.py +++ b/agate/table/print_html.py @@ -1,17 +1,13 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - +import math import sys from babel.numbers import format_decimal -import six -from agate import config +from agate import config, utils from agate.data_types import Number, Text -from agate import utils -def print_html(self, max_rows=20, max_columns=6, output=sys.stdout, max_column_width=20, locale=None): +def print_html(self, max_rows=20, max_columns=6, output=sys.stdout, max_column_width=20, locale=None, max_precision=3): """ Print an HTML version of this table. @@ -32,6 +28,10 @@ def print_html(self, max_rows=20, max_columns=6, output=sys.stdout, max_column_w :param locale: Provide a locale you would like to be used to format the output. By default it will use the system's setting. + :max_precision: + Puts a limit on the maximum precision displayed for number types. + Numbers with lesser precision won't be affected. + This defaults to :code:`3`. Pass :code:`None` to disable limit. """ if max_rows is None: max_rows = len(self._rows) @@ -39,7 +39,12 @@ def print_html(self, max_rows=20, max_columns=6, output=sys.stdout, max_column_w if max_columns is None: max_columns = len(self._columns) + if max_precision is None: + max_precision = float('inf') + ellipsis = config.get_option('ellipsis_chars') + truncation = config.get_option('text_truncation_chars') + len_truncation = len(truncation) locale = locale or config.get_option('default_locale') rows_truncated = max_rows < len(self._rows) @@ -60,7 +65,11 @@ def print_html(self, max_rows=20, max_columns=6, output=sys.stdout, max_column_w if isinstance(c.data_type, Number): max_places = utils.max_precision(c[:max_rows]) - number_formatters.append(utils.make_number_formatter(max_places)) + add_ellipsis = False + if max_places > max_precision: + add_ellipsis = True + max_places = max_precision + number_formatters.append(utils.make_number_formatter(max_places, add_ellipsis)) else: number_formatters.append(None) @@ -76,17 +85,17 @@ def print_html(self, max_rows=20, max_columns=6, output=sys.stdout, max_column_w v = ellipsis elif v is None: v = '' - elif number_formatters[j] is not None: + elif number_formatters[j] is not None and not math.isinf(v): v = format_decimal( v, format=number_formatters[j], locale=locale ) else: - v = six.text_type(v) + v = str(v) if max_column_width is not None and len(v) > max_column_width: - v = '%s...' % v[:max_column_width - 3] + v = '%s%s' % (v[:max_column_width - len_truncation], truncation) formatted_row.append(v) diff --git a/agate/table/print_structure.py b/agate/table/print_structure.py index d099aea8..1c94d1a1 100644 --- a/agate/table/print_structure.py +++ b/agate/table/print_structure.py @@ -1,6 +1,3 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - import sys from agate.data_types import Text diff --git a/agate/table/print_table.py b/agate/table/print_table.py index 7e57d30e..53b1ee94 100644 --- a/agate/table/print_table.py +++ b/agate/table/print_table.py @@ -1,18 +1,14 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - import math import sys from babel.numbers import format_decimal -import six -from agate import config +from agate import config, utils from agate.data_types import Number, Text -from agate import utils -def print_table(self, max_rows=20, max_columns=6, output=sys.stdout, max_column_width=20, locale=None, max_precision=3): +def print_table(self, max_rows=20, max_columns=6, output=sys.stdout, max_column_width=20, locale=None, + max_precision=3): """ Print a text-based view of the data in this table. @@ -49,6 +45,8 @@ def print_table(self, max_rows=20, max_columns=6, output=sys.stdout, max_column_ max_precision = float('inf') ellipsis = config.get_option('ellipsis_chars') + truncation = config.get_option('text_truncation_chars') + len_truncation = len(truncation) h_line = config.get_option('horizontal_line_char') v_line = config.get_option('vertical_line_char') locale = locale or config.get_option('default_locale') @@ -58,7 +56,7 @@ def print_table(self, max_rows=20, max_columns=6, output=sys.stdout, max_column_ column_names = [] for column_name in self.column_names[:max_columns]: if max_column_width is not None and len(column_name) > max_column_width: - column_names.append('%s...' % column_name[:max_column_width - 3]) + column_names.append('%s%s' % (column_name[:max_column_width - len_truncation], truncation)) else: column_names.append(column_name) @@ -104,7 +102,7 @@ def print_table(self, max_rows=20, max_columns=6, output=sys.stdout, max_column_ locale=locale ) else: - v = six.text_type(v) + v = str(v) vs = v.splitlines() if len(vs) > 1: multi_line_rows = True @@ -126,7 +124,7 @@ def print_table(self, max_rows=20, max_columns=6, output=sys.stdout, max_column_ formatted_row[k][j] = xv if max_column_width is not None and len(v) > max_column_width: - v = '%s...' % v[:max_column_width - 3] + v = '%s%s' % (v[:max_column_width - len_truncation], truncation) if len(v) > widths[j]: widths[j] = len(v) @@ -158,14 +156,13 @@ def write_row(formatted_row): text = v_line.join(row_output) - write('%s%s%s' % (v_line, text, v_line)) + write(f'{v_line}{text}{v_line}') # horizontal and vertical dividers. - divider = '%(v_line)s %(columns)s %(v_line)s' % { - 'h_line': h_line, - 'v_line': v_line, - 'columns': ' | '.join(h_line * w for w in widths) - } + divider = '{v_line} {columns} {v_line}'.format( + v_line=v_line, + columns=' | '.join(h_line * w for w in widths) + ) # Headers write_row(column_names) diff --git a/agate/table/rename.py b/agate/table/rename.py index e023e351..b245d503 100644 --- a/agate/table/rename.py +++ b/agate/table/rename.py @@ -1,6 +1,3 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - from agate import utils @@ -57,5 +54,5 @@ def rename(self, column_names=None, row_names=None, slug_columns=False, slug_row row_names = self._row_names return Table(self._rows, column_names, self._column_types, row_names=row_names, _is_fork=False) - else: - return self._fork(self._rows, column_names, self._column_types, row_names=row_names) + + return self._fork(self._rows, column_names, self._column_types, row_names=row_names) diff --git a/agate/table/scatterplot.py b/agate/table/scatterplot.py index 0de29665..141c3aea 100644 --- a/agate/table/scatterplot.py +++ b/agate/table/scatterplot.py @@ -1,10 +1,5 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - import leather -from agate import utils - def scatterplot(self, x=0, y=1, path=None, width=None, height=None): """ diff --git a/agate/table/select.py b/agate/table/select.py index 3321738a..8c999366 100644 --- a/agate/table/select.py +++ b/agate/table/select.py @@ -1,8 +1,5 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - -from agate.rows import Row from agate import utils +from agate.rows import Row def select(self, key): diff --git a/agate/table/to_csv.py b/agate/table/to_csv.py index 9890fcac..4be7a96c 100644 --- a/agate/table/to_csv.py +++ b/agate/table/to_csv.py @@ -1,6 +1,3 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - import os @@ -9,7 +6,9 @@ def to_csv(self, path, **kwargs): Write this table to a CSV. This method uses agate's builtin CSV writer, which supports unicode on both Python 2 and Python 3. - `kwargs` will be passed through to the CSV writer. + ``kwargs`` will be passed through to the CSV writer. + + The ``lineterminator`` defaults to the newline character (LF, ``\\n``). :param path: Filepath or file-like object to write to. diff --git a/agate/table/to_json.py b/agate/table/to_json.py index 55346e15..51afdc87 100644 --- a/agate/table/to_json.py +++ b/agate/table/to_json.py @@ -1,12 +1,6 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - -import codecs -from collections import OrderedDict import json import os - -import six +from collections import OrderedDict def to_json(self, path, key=None, newline=False, indent=None, **kwargs): @@ -41,9 +35,6 @@ def to_json(self, path, key=None, newline=False, indent=None, **kwargs): 'indent': indent } - if six.PY2: - json_kwargs['encoding'] = 'utf-8' - # Pass remaining kwargs through to JSON encoder json_kwargs.update(kwargs) @@ -61,9 +52,6 @@ def to_json(self, path, key=None, newline=False, indent=None, **kwargs): os.makedirs(os.path.dirname(path)) f = open(path, 'w') - if six.PY2: - f = codecs.getwriter('utf-8')(f) - def dump_json(data): json.dump(data, f, **json_kwargs) @@ -78,10 +66,10 @@ def dump_json(data): if key_is_row_function: k = key(row) else: - k = str(row[key]) if six.PY3 else unicode(row[key]) + k = str(row[key]) if k in output: - raise ValueError('Value %s is not unique in the key column.' % six.text_type(k)) + raise ValueError('Value %s is not unique in the key column.' % str(k)) values = tuple(json_funcs[i](d) for i, d in enumerate(row)) output[k] = OrderedDict(zip(row.keys(), values)) diff --git a/agate/table/where.py b/agate/table/where.py index cea3e36e..90259771 100644 --- a/agate/table/where.py +++ b/agate/table/where.py @@ -1,7 +1,3 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - - def where(self, test): """ Create a new :class:`.Table` with only those rows that pass a test. diff --git a/agate/tableset/__init__.py b/agate/tableset/__init__.py index cc5b0a55..4876d00b 100644 --- a/agate/tableset/__init__.py +++ b/agate/tableset/__init__.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - """ The :class:`.TableSet` class collects a set of related tables in a single data structure. The most common way of creating a :class:`.TableSet` is using the @@ -24,12 +22,11 @@ dimensions. """ -import six -from six.moves import zip_longest +from io import StringIO +from itertools import zip_longest from agate.data_types import Text from agate.mapped_sequence import MappedSequence -from agate.table import Table class TableSet(MappedSequence): @@ -87,7 +84,7 @@ def __str__(self): """ Print the tableset's structure via :meth:`TableSet.print_structure`. """ - structure = six.StringIO() + structure = StringIO() self.print_structure(output=structure) @@ -168,9 +165,8 @@ def _proxy(self, method_name, *args, **kwargs): from agate.tableset.line_chart import line_chart from agate.tableset.merge import merge from agate.tableset.print_structure import print_structure -from agate.tableset.proxy_methods import bins, compute, denormalize, distinct, \ - exclude, find, group_by, homogenize, join, limit, normalize, order_by, \ - pivot, select, where +from agate.tableset.proxy_methods import (bins, compute, denormalize, distinct, exclude, find, group_by, homogenize, + join, limit, normalize, order_by, pivot, select, where) from agate.tableset.scatterplot import scatterplot from agate.tableset.to_csv import to_csv from agate.tableset.to_json import to_json diff --git a/agate/tableset/aggregate.py b/agate/tableset/aggregate.py index e6aff37b..b6af1a08 100644 --- a/agate/tableset/aggregate.py +++ b/agate/tableset/aggregate.py @@ -1,6 +1,3 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - from agate.table import Table diff --git a/agate/tableset/bar_chart.py b/agate/tableset/bar_chart.py index 4fd26b95..6cf886ce 100644 --- a/agate/tableset/bar_chart.py +++ b/agate/tableset/bar_chart.py @@ -1,10 +1,5 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - import leather -from agate import utils - def bar_chart(self, label=0, value=1, path=None, width=None, height=None): """ diff --git a/agate/tableset/column_chart.py b/agate/tableset/column_chart.py index 4e5caf49..68342f8a 100644 --- a/agate/tableset/column_chart.py +++ b/agate/tableset/column_chart.py @@ -1,10 +1,5 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - import leather -from agate import utils - def column_chart(self, label=0, value=1, path=None, width=None, height=None): """ diff --git a/agate/tableset/from_csv.py b/agate/tableset/from_csv.py index 81af9b49..7549289d 100644 --- a/agate/tableset/from_csv.py +++ b/agate/tableset/from_csv.py @@ -1,8 +1,6 @@ -#!/usr/bin/env python - +import os from collections import OrderedDict from glob import glob -import os from agate.table import Table @@ -29,7 +27,7 @@ def from_csv(cls, dir_path, column_names=None, column_types=None, row_names=None from agate.tableset import TableSet if not os.path.isdir(dir_path): - raise IOError('Specified path doesn\'t exist or isn\'t a directory.') + raise OSError('Specified path doesn\'t exist or isn\'t a directory.') tables = OrderedDict() diff --git a/agate/tableset/from_json.py b/agate/tableset/from_json.py index b2befe44..209b39b6 100644 --- a/agate/tableset/from_json.py +++ b/agate/tableset/from_json.py @@ -1,12 +1,8 @@ -#!/usr/bin/env python - +import json +import os from collections import OrderedDict from decimal import Decimal from glob import glob -import json -import os - -import six from agate.table import Table @@ -31,12 +27,12 @@ def from_json(cls, path, column_names=None, column_types=None, keys=None, **kwar """ from agate.tableset import TableSet - if isinstance(path, six.string_types) and not os.path.isdir(path) and not os.path.isfile(path): - raise IOError('Specified path doesn\'t exist.') + if isinstance(path, str) and not os.path.isdir(path) and not os.path.isfile(path): + raise OSError('Specified path doesn\'t exist.') tables = OrderedDict() - if isinstance(path, six.string_types) and os.path.isdir(path): + if isinstance(path, str) and os.path.isdir(path): filepaths = glob(os.path.join(path, '*.json')) if keys is not None and len(keys) != len(filepaths): @@ -54,7 +50,7 @@ def from_json(cls, path, column_names=None, column_types=None, keys=None, **kwar if hasattr(path, 'read'): js = json.load(path, object_pairs_hook=OrderedDict, parse_float=Decimal, **kwargs) else: - with open(path, 'r') as f: + with open(path) as f: js = json.load(f, object_pairs_hook=OrderedDict, parse_float=Decimal, **kwargs) for key, value in js.items(): diff --git a/agate/tableset/having.py b/agate/tableset/having.py index a3667351..bb0f46ac 100644 --- a/agate/tableset/having.py +++ b/agate/tableset/having.py @@ -1,7 +1,3 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - - def having(self, aggregations, test): """ Create a new :class:`.TableSet` with only those tables that pass a test. diff --git a/agate/tableset/line_chart.py b/agate/tableset/line_chart.py index 77400886..51a569d7 100644 --- a/agate/tableset/line_chart.py +++ b/agate/tableset/line_chart.py @@ -1,10 +1,5 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - import leather -from agate import utils - def line_chart(self, x=0, y=1, path=None, width=None, height=None): """ diff --git a/agate/tableset/merge.py b/agate/tableset/merge.py index aa1df5fc..2c83c7ad 100644 --- a/agate/tableset/merge.py +++ b/agate/tableset/merge.py @@ -1,8 +1,5 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - from agate.rows import Row -from agate.tableset import Table +from agate.table import Table def merge(self, groups=None, group_name=None, group_type=None): diff --git a/agate/tableset/print_structure.py b/agate/tableset/print_structure.py index f52b2e53..e8422494 100644 --- a/agate/tableset/print_structure.py +++ b/agate/tableset/print_structure.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - import sys from agate.data_types import Text diff --git a/agate/tableset/proxy_methods.py b/agate/tableset/proxy_methods.py index e8657b62..e7ac2b75 100644 --- a/agate/tableset/proxy_methods.py +++ b/agate/tableset/proxy_methods.py @@ -1,90 +1,101 @@ -#!/usr/bin/env python - - def bins(self, *args, **kwargs): """ Calls :meth:`.Table.bins` on each table in the TableSet. """ return self._proxy('bins', *args, **kwargs) + def compute(self, *args, **kwargs): """ Calls :meth:`.Table.compute` on each table in the TableSet. """ return self._proxy('compute', *args, **kwargs) + def denormalize(self, *args, **kwargs): """ Calls :meth:`.Table.denormalize` on each table in the TableSet. """ return self._proxy('denormalize', *args, **kwargs) + def distinct(self, *args, **kwargs): """ Calls :meth:`.Table.distinct` on each table in the TableSet. """ return self._proxy('distinct', *args, **kwargs) + def exclude(self, *args, **kwargs): """ Calls :meth:`.Table.exclude` on each table in the TableSet. """ return self._proxy('exclude', *args, **kwargs) + def find(self, *args, **kwargs): """ Calls :meth:`.Table.find` on each table in the TableSet. """ return self._proxy('find', *args, **kwargs) + def group_by(self, *args, **kwargs): """ Calls :meth:`.Table.group_by` on each table in the TableSet. """ return self._proxy('group_by', *args, **kwargs) + def homogenize(self, *args, **kwargs): """ Calls :meth:`.Table.homogenize` on each table in the TableSet. """ return self._proxy('homogenize', *args, **kwargs) + def join(self, *args, **kwargs): """ Calls :meth:`.Table.join` on each table in the TableSet. """ return self._proxy('join', *args, **kwargs) + def limit(self, *args, **kwargs): """ Calls :meth:`.Table.limit` on each table in the TableSet. """ return self._proxy('limit', *args, **kwargs) + def normalize(self, *args, **kwargs): """ Calls :meth:`.Table.normalize` on each table in the TableSet. """ return self._proxy('normalize', *args, **kwargs) + def order_by(self, *args, **kwargs): """ Calls :meth:`.Table.order_by` on each table in the TableSet. """ return self._proxy('order_by', *args, **kwargs) + def pivot(self, *args, **kwargs): """ Calls :meth:`.Table.pivot` on each table in the TableSet. """ return self._proxy('pivot', *args, **kwargs) + def select(self, *args, **kwargs): """ Calls :meth:`.Table.select` on each table in the TableSet. """ return self._proxy('select', *args, **kwargs) + def where(self, *args, **kwargs): """ Calls :meth:`.Table.where` on each table in the TableSet. diff --git a/agate/tableset/scatterplot.py b/agate/tableset/scatterplot.py index 0d9f554a..dace13d5 100644 --- a/agate/tableset/scatterplot.py +++ b/agate/tableset/scatterplot.py @@ -1,10 +1,5 @@ -#!/usr/bin/env python -# pylint: disable=W0212 - import leather -from agate import utils - def scatterplot(self, x=0, y=1, path=None, width=None, height=None): """ diff --git a/agate/tableset/to_csv.py b/agate/tableset/to_csv.py index 7c268b5b..b3536fc3 100644 --- a/agate/tableset/to_csv.py +++ b/agate/tableset/to_csv.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - import os diff --git a/agate/tableset/to_json.py b/agate/tableset/to_json.py index 2a0ca262..67b5e647 100644 --- a/agate/tableset/to_json.py +++ b/agate/tableset/to_json.py @@ -1,10 +1,7 @@ -#!/usr/bin/env python - -from collections import OrderedDict import json import os - -import six +from collections import OrderedDict +from io import StringIO def to_json(self, path, nested=False, indent=None, **kwargs): @@ -37,7 +34,7 @@ def to_json(self, path, nested=False, indent=None, **kwargs): tableset_dict = OrderedDict() for name, table in self.items(): - output = six.StringIO() + output = StringIO() table.to_json(output, **kwargs) tableset_dict[name] = json.loads(output.getvalue(), object_pairs_hook=OrderedDict) @@ -54,9 +51,6 @@ def to_json(self, path, nested=False, indent=None, **kwargs): json_kwargs = {'ensure_ascii': False, 'indent': indent} - if six.PY2: - json_kwargs['encoding'] = 'utf-8' - json_kwargs.update(kwargs) json.dump(tableset_dict, f, **json_kwargs) diff --git a/agate/testcase.py b/agate/testcase.py index 89438200..72aa228a 100644 --- a/agate/testcase.py +++ b/agate/testcase.py @@ -1,9 +1,4 @@ -#!/usr/bin/env python - -try: - import unittest2 as unittest -except ImportError: - import unittest +import unittest import agate diff --git a/agate/type_tester.py b/agate/type_tester.py index e153fdb3..adcc3472 100644 --- a/agate/type_tester.py +++ b/agate/type_tester.py @@ -1,7 +1,7 @@ -#!/usr/bin/env python - +import warnings from copy import copy +from agate.data_types.base import DEFAULT_NULL_VALUES from agate.data_types.boolean import Boolean from agate.data_types.date import Date from agate.data_types.date_time import DateTime @@ -10,7 +10,7 @@ from agate.data_types.time_delta import TimeDelta -class TypeTester(object): +class TypeTester: """ Control how data types are inferred for columns in a given set of data. @@ -52,8 +52,11 @@ class TypeTester(object): options such as ``locale`` to :class:`.Number` or ``cast_nulls`` to :class:`.Text`. Take care in specifying the order of the list. It is the order they are tested in. :class:`.Text` should always be last. + :param null_values: + If :code:`types` is :code:`None`, a sequence of values which should be + cast to :code:`None` when encountered by the default data types. """ - def __init__(self, force={}, limit=None, types=None): + def __init__(self, force={}, limit=None, types=None, null_values=DEFAULT_NULL_VALUES): self._force = force self._limit = limit @@ -62,12 +65,12 @@ def __init__(self, force={}, limit=None, types=None): else: # In order of preference self._possible_types = [ - Boolean(), - Number(), - TimeDelta(), - Date(), - DateTime(), - Text() + Boolean(null_values=null_values), + Number(null_values=null_values), + TimeDelta(null_values=null_values), + Date(null_values=null_values), + DateTime(null_values=null_values), + Text(null_values=null_values) ] def run(self, rows, column_names): @@ -86,7 +89,7 @@ def run(self, rows, column_names): try: force_indices.append(column_names.index(name)) except ValueError: - raise ValueError('"%s" does not match the name of any column in this table.' % name) + warnings.warn('"%s" does not match the name of any column in this table.' % name, RuntimeWarning) if self._limit: sample_rows = rows[:self._limit] diff --git a/agate/utils.py b/agate/utils.py index ae6aab9f..69bb01eb 100644 --- a/agate/utils.py +++ b/agate/utils.py @@ -1,30 +1,19 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - """ This module contains a collection of utility classes and functions used in agate. """ -from collections import OrderedDict -try: - from collections.abc import Sequence -except ImportError: - from collections import Sequence -from functools import wraps import math import string -import warnings -from slugify import slugify as pslugify -from agate.warns import warn_duplicate_column, warn_unnamed_column - -try: - from cdecimal import Decimal, ROUND_FLOOR, ROUND_CEILING, getcontext -except ImportError: # pragma: no cover - from decimal import Decimal, ROUND_FLOOR, ROUND_CEILING, getcontext +from collections import OrderedDict +from collections.abc import Sequence +from decimal import ROUND_CEILING, ROUND_FLOOR, Decimal, getcontext +from functools import wraps -import six +from slugify import slugify as pslugify +from agate import config +from agate.warns import warn_duplicate_column, warn_unnamed_column #: Sentinal for use when `None` is an valid argument value default = object() @@ -48,7 +37,7 @@ def wrapper(self): return wrapper -class NullOrder(object): +class NullOrder: """ Dummy object used for sorting in place of None. @@ -84,6 +73,9 @@ def __len__(self): def __repr__(self): return repr(self._quantiles) + def __eq__(self, other): + return self._quantiles == other._quantiles + def locate(self, value): """ Identify which quantile a given value is part of. @@ -169,9 +161,9 @@ def make_number_formatter(decimal_places, add_ellipsis=False): :param add_ellipsis: Optionally add an ellipsis symbol at the end of a number """ - fraction = u'0' * decimal_places - ellipsis = u'…' if add_ellipsis else u'' - return u''.join([u'#,##0.', fraction, ellipsis, u';-#,##0.', fraction, ellipsis]) + fraction = '0' * decimal_places + ellipsis = config.get_option('number_truncation_chars') if add_ellipsis else '' + return ''.join(['#,##0.', fraction, ellipsis, ';-#,##0.', fraction, ellipsis]) def round_limits(minimum, maximum): @@ -244,7 +236,7 @@ def parse_object(obj, path=''): d = OrderedDict() for key, value in iterator: - key = six.text_type(key) + key = str(key) d.update(parse_object(value, path + key + '/')) return d @@ -255,7 +247,7 @@ def issequence(obj): Returns :code:`True` if the given object is an instance of :class:`.Sequence` that is not also a string. """ - return isinstance(obj, Sequence) and not isinstance(obj, six.string_types) + return isinstance(obj, Sequence) and not isinstance(obj, str) def deduplicate(values, column_names=False, separator='_'): @@ -278,7 +270,7 @@ def deduplicate(values, column_names=False, separator='_'): if not value: new_value = letter_name(i) warn_unnamed_column(i, new_value) - elif isinstance(value, six.string_types): + elif isinstance(value, str): new_value = value else: raise ValueError('Column names must be strings or None.') @@ -318,5 +310,5 @@ def slugify(values, ensure_unique=False, **kwargs): if ensure_unique: new_values = tuple(pslugify(value, **slug_args) for value in values) return deduplicate(new_values, separator=slug_args['separator']) - else: - return tuple(pslugify(value, **slug_args) for value in values) + + return tuple(pslugify(value, **slug_args) for value in values) diff --git a/agate/warns.py b/agate/warns.py index 1106f7f9..b5eacc74 100644 --- a/agate/warns.py +++ b/agate/warns.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - import warnings @@ -13,7 +11,7 @@ class NullCalculationWarning(RuntimeWarning): # pragma: no cover def warn_null_calculation(operation, column): - warnings.warn('Column "%s" contains nulls. These will be excluded from %s calculation.' % ( + warnings.warn('Column "{}" contains nulls. These will be excluded from {} calculation.'.format( column.name, operation.__class__.__name__ ), NullCalculationWarning, stacklevel=2) @@ -28,7 +26,7 @@ class DuplicateColumnWarning(RuntimeWarning): # pragma: no cover def warn_duplicate_column(column_name, column_rename): - warnings.warn('Column name "%s" already exists in Table. Column will be renamed to "%s".' % ( + warnings.warn('Column name "{}" already exists in Table. Column will be renamed to "{}".'.format( column_name, column_rename ), DuplicateColumnWarning, stacklevel=2) diff --git a/benchmarks/test_joins.py b/benchmarks/test_joins.py index c6f57969..c9e1432b 100644 --- a/benchmarks/test_joins.py +++ b/benchmarks/test_joins.py @@ -1,24 +1,14 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - +import unittest from random import shuffle from timeit import Timer -try: - import unittest2 as unittest -except ImportError: - import unittest - -import six -from six.moves import range - import agate class TestTableJoin(unittest.TestCase): def test_join(self): - left_rows = [(six.text_type(i), i) for i in range(100000)] - right_rows = [(six.text_type(i), i) for i in range(100000)] + left_rows = [(str(i), i) for i in range(100000)] + right_rows = [(str(i), i) for i in range(100000)] shuffle(left_rows) shuffle(right_rows) @@ -36,4 +26,4 @@ def test(): min_time = min(results) - self.assertLess(min_time, 0) + self.assertLess(min_time, 20) # CI unreliable, 15s witnessed on PyPy diff --git a/charts.py b/charts.py old mode 100644 new mode 100755 index d228c9ee..53f1abf2 --- a/charts.py +++ b/charts.py @@ -2,7 +2,6 @@ import agate - table = agate.Table.from_csv('examples/realdata/Datagov_FY10_EDU_recp_by_State.csv') table.limit(10).bar_chart('State Name', 'TOTAL', 'docs/images/bar_chart.svg') diff --git a/docs/about.rst b/docs/about.rst index e29f8ca6..bc8cbff4 100644 --- a/docs/about.rst +++ b/docs/about.rst @@ -11,7 +11,7 @@ Why agate? * Decimal precision everywhere. * Exhaustive user documentation. * Pluggable `extensions `_ that add SQL integration, Excel support, and more. -* Designed with `iPython `_, `Jupyter `_ and `atom/hydrogen `_ in mind. +* Designed with `iPython `_, `Jupyter `_ and `atom/hydrogen `_ in mind. * Pure Python. No C dependencies to compile. * Exhaustive test coverage. * MIT licensed and free for all purposes. @@ -21,7 +21,7 @@ Why agate? Principles ========== -agate is a intended to fill a very particular programming niche. It should not be allowed to become as complex as `numpy `_ or `pandas `_. Please bear in mind the following principles when considering a new feature: +agate is a intended to fill a very particular programming niche. It should not be allowed to become as complex as `numpy `_ or `pandas `_. Please bear in mind the following principles when considering a new feature: * Humans have less time than computers. Optimize for humans. * Most datasets are small. Don't optimize for "big data". diff --git a/docs/api/csv.rst b/docs/api/csv.rst index 4d02128c..59c79771 100644 --- a/docs/api/csv.rst +++ b/docs/api/csv.rst @@ -25,19 +25,6 @@ Python 3 agate.csv_py3.DictReader agate.csv_py3.DictWriter -Python 2 --------- - -.. autosummary:: - :nosignatures: - - agate.csv_py2.reader - agate.csv_py2.writer - agate.csv_py2.Reader - agate.csv_py2.Writer - agate.csv_py2.DictReader - agate.csv_py2.DictWriter - Python 3 details ---------------- @@ -47,13 +34,3 @@ Python 3 details .. autoclass:: agate.csv_py3.Writer .. autoclass:: agate.csv_py3.DictReader .. autoclass:: agate.csv_py3.DictWriter - -Python 2 details ----------------- - -.. autofunction:: agate.csv_py2.reader -.. autofunction:: agate.csv_py2.writer -.. autoclass:: agate.csv_py2.Reader -.. autoclass:: agate.csv_py2.Writer -.. autoclass:: agate.csv_py2.DictReader -.. autoclass:: agate.csv_py2.DictWriter diff --git a/docs/conf.py b/docs/conf.py index 826f7101..4b9a07bc 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -1,48 +1,46 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- - +# Configuration file for the Sphinx documentation builder. +# +# For the full list of built-in configuration values, see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html import os import sys -# Path munging sys.path.insert(0, os.path.abspath('..')) -# Extensions +# -- Project information ----------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information + +project = 'agate' +copyright = '2017, Christopher Groskopf' +version = '1.9.1' +release = version + +# -- General configuration --------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration + extensions = [ 'sphinx.ext.autosummary', 'sphinx.ext.autodoc', 'sphinx.ext.intersphinx' ] -# autodoc_member_order = 'bysource' -autodoc_default_flags = ['members', 'show-inheritance'] - -intersphinx_mapping = { - 'python': ('http://docs.python.org/3.5', None), - 'leather': ('http://leather.readthedocs.io/en/latest/', None) -} -# Templates templates_path = ['_templates'] -master_doc = 'index' +exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] -# Metadata -project = u'agate' -copyright = u'2017, Christopher Groskopf' -version = '1.6.2' -release = '1.6.2' +# -- Options for HTML output ------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output -exclude_patterns = ['_build'] -pygments_style = 'sphinx' +html_theme = 'furo' -# HTMl theming -html_theme = 'default' - -on_rtd = os.environ.get('READTHEDOCS', None) == 'True' +htmlhelp_basename = 'agatedoc' -if not on_rtd: # only import and set the theme if we're building docs locally - import sphinx_rtd_theme - html_theme = 'sphinx_rtd_theme' - html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] +autodoc_default_options = { + 'members': None, + 'member-order': 'bysource', + 'show-inheritance': True, +} -html_static_path = ['_static'] -htmlhelp_basename = 'agatedoc' +intersphinx_mapping = { + 'python': ('https://docs.python.org/3', None), + 'leather': ('https://leather.readthedocs.io/en/latest/', None) +} diff --git a/docs/contributing.rst b/docs/contributing.rst index 4352c600..495083fb 100644 --- a/docs/contributing.rst +++ b/docs/contributing.rst @@ -30,8 +30,7 @@ Hacker? We'd love to have you hack with us. Please follow this process to make y #. Fork the project on `GitHub `_. #. If you don't have a specific task in mind, check out the `issue tracker `_ and find a task that needs to be done and is of a scope you can realistically expect to complete in a few days. Don't worry about the priority of the issues at first, but try to choose something you'll enjoy. You're much more likely to finish something to the point it can be merged if it's something you really enjoy hacking on. #. If you already have a task you know you want to work on, open a ticket or comment on the existing ticket letting everyone know you're going to be working on it. It's also good practice to provide some general idea of how you plan on resolving the issue so that other developers can make suggestions. -#. Write tests for the feature you're building. Follow the format of the existing tests in the test directory to see how this works. You can run all the tests with the command ``nosetests tests``. -#. Verify your tests work on all supported versions of Python by runing ``tox``. +#. Write tests for the feature you're building. Follow the format of the existing tests in the test directory to see how this works. You can run all the tests with the command ``pytest``. #. Write the code. Try to stay consistent with the style and organization of the existing codebase. A good patch won't be refused for stylistic reasons, but large parts of it may be rewritten and nobody wants that. #. As you are coding, periodically merge in work from the master branch and verify you haven't broken anything by running the test suite. #. Write documentation. This means docstrings on all classes and methods, including parameter explanations. It also means, when relevant, cookbook recipes and updates to the agate user tutorial. diff --git a/docs/cookbook/charting.rst b/docs/cookbook/charting.rst index f5aa8c47..7b466092 100644 --- a/docs/cookbook/charting.rst +++ b/docs/cookbook/charting.rst @@ -2,7 +2,7 @@ Charts ====== -Agate offers two kinds of built in charting: very simple text bar charts and SVG charting via `leather `_. Both are intended for efficiently exploring data, rather than producing publication-ready charts. +Agate offers two kinds of built in charting: very simple text bar charts and SVG charting via `leather `_. Both are intended for efficiently exploring data, rather than producing publication-ready charts. Text-based bar chart ==================== @@ -119,7 +119,7 @@ SVG lattice chart Using matplotlib ================ -If you need to make more complex charts, you can always use agate with `matplotlib `_. +If you need to make more complex charts, you can always use agate with `matplotlib `_. Here is an example of how you might generate a line chart: diff --git a/docs/cookbook/compute.rst b/docs/cookbook/compute.rst index 738aecda..d9e0a2aa 100644 --- a/docs/cookbook/compute.rst +++ b/docs/cookbook/compute.rst @@ -145,8 +145,7 @@ Implementing Levenshtein requires writing a custom :class:`.Computation`. To sav import agate from Levenshtein import distance - import six - + class LevenshteinDistance(agate.Computation): """ Computes Levenshtein edit distance between the column and a given string. @@ -199,7 +198,7 @@ The resulting column will contain an integer measuring the edit distance between USA Today Diversity Index ========================= -The `USA Today Diversity Index `_ is a widely cited method for evaluating the racial diversity of a given area. Using a custom :class:`.Computation` makes it simple to calculate. +The `USA Today Diversity Index `_ is a widely cited method for evaluating the racial diversity of a given area. Using a custom :class:`.Computation` makes it simple to calculate. Assuming that your data has a column for the total population, another for the population of each race and a final column for the hispanic population, you can implement the diversity index like this: diff --git a/docs/cookbook/create.rst b/docs/cookbook/create.rst index abafde7a..534e6ce4 100644 --- a/docs/cookbook/create.rst +++ b/docs/cookbook/create.rst @@ -139,27 +139,23 @@ From newline-delimited JSON From a SQL database =================== -Use the `agate-sql `_ extension. +Use the `agate-sql `_ extension. .. code-block:: python import agatesql - agatesql.patch() - table = agate.Table.from_sql('postgresql:///database', 'input_table') From an Excel spreadsheet ========================= -Use the `agate-excel `_ extension. It supports both .xls and .xlsx files. +Use the `agate-excel `_ extension. It supports both .xls and .xlsx files. .. code-block:: python import agateexcel - agateexcel.patch() - table = agate.Table.from_xls('test.xls', sheet='data') table2 = agate.Table.from_xlsx('test.xlsx', sheet='data') @@ -167,28 +163,24 @@ Use the `agate-excel `_ extension. It suppo From a DBF table ================ -DBF is the file format used to hold tabular data for ArcGIS shapefiles. `agate-dbf `_ extension. +DBF is the file format used to hold tabular data for ArcGIS shapefiles. `agate-dbf `_ extension. .. code-block:: python import agatedbf - agatedbf.patch() - table = agate.Table.from_dbf('test.dbf') From a remote file ================== -Use the `agate-remote `_ extension. +Use the `agate-remote `_ extension. .. code-block:: python import agateremote - agateremote.patch() - table = agate.Table.from_url('https://raw.githubusercontent.com/wireservice/agate/master/examples/test.csv') agate-remote also let’s you create an Archive, which is a reference to a group of tables with a known path structure. diff --git a/docs/cookbook/datetime.rst b/docs/cookbook/datetime.rst index 78ffcf0c..5f9f781a 100644 --- a/docs/cookbook/datetime.rst +++ b/docs/cookbook/datetime.rst @@ -34,9 +34,13 @@ The second way is to specify a timezone as an argument to the type constructor: .. code-block:: python - import pytz + try: + from zoneinfo import ZoneInfo + except ImportError: + # Fallback for Python < 3.9 + from backports.zoneinfo import ZoneInfo - eastern = pytz.timezone('US/Eastern') + eastern = ZoneInfo('US/Eastern') datetime_type = agate.DateTime(timezone=eastern) In this case all timezones that are processed will be set to have the Eastern timezone. Note, the timezone will be **set**, not converted. You cannot use this method to convert your timezones from UTC to another timezone. To do that see :ref:`convert_timezones`. @@ -60,9 +64,13 @@ If you load data from a spreadsheet in one timezone and you need to convert it t .. code-block:: python - import pytz + try: + from zoneinfo import ZoneInfo + except ImportError: + # Fallback for Python < 3.9 + from backports.zoneinfo import ZoneInfo - us_eastern = pytz.timezone('US/Eastern') + us_eastern = ZoneInfo('US/Eastern') datetime_type = agate.DateTime(timezone=us_eastern) column_names = ['what', 'when'] @@ -70,7 +78,7 @@ If you load data from a spreadsheet in one timezone and you need to convert it t table = agate.Table.from_csv('events.csv', columns) - rome = timezone('Europe/Rome') + rome = ZoneInfo('Europe/Rome') timezone_shifter = agate.Formula(lambda r: r['when'].astimezone(rome)) table = agate.Table.compute([ diff --git a/docs/cookbook/lookup.rst b/docs/cookbook/lookup.rst index cb58e5de..4fe1c82e 100644 --- a/docs/cookbook/lookup.rst +++ b/docs/cookbook/lookup.rst @@ -33,8 +33,6 @@ We can map the ``year`` column to its annual CPI index in one lookup call. import agatelookup - agatelookup.patch() - join_year_cpi = table.lookup('year', 'cpi') The return table will have now have a new column: diff --git a/docs/cookbook/save.rst b/docs/cookbook/save.rst index 0c7fc4fd..529ab165 100644 --- a/docs/cookbook/save.rst +++ b/docs/cookbook/save.rst @@ -26,7 +26,7 @@ To newline-delimited JSON To a SQL database ================= -Use the `agate-sql `_ extension. +Use the `agate-sql `_ extension. .. code-block:: python diff --git a/docs/cookbook/sql.rst b/docs/cookbook/sql.rst index c85e4ae6..fef2a4b8 100644 --- a/docs/cookbook/sql.rst +++ b/docs/cookbook/sql.rst @@ -6,7 +6,7 @@ agate's command structure is very similar to SQL. The primary difference between .. note:: - All examples in this section use the `PostgreSQL `_ dialect for comparison. + All examples in this section use the `PostgreSQL `_ dialect for comparison. If you want to read and write data from SQL, see :ref:`load_a_table_from_a_sql_database`. diff --git a/docs/cookbook/statistics.rst b/docs/cookbook/statistics.rst index 51dd3e73..7208cc1d 100644 --- a/docs/cookbook/statistics.rst +++ b/docs/cookbook/statistics.rst @@ -2,7 +2,7 @@ Statistics ========== -Common descriptive and aggregate statistics are included with the core agate library. For additional statistical methods beyond the scope of agate consider using the `agate-stats `_ extension or integrating with `scipy `_. +Common descriptive and aggregate statistics are included with the core agate library. For additional statistical methods beyond the scope of agate consider using the `agate-stats `_ extension or integrating with `scipy `_. Descriptive statistics ====================== @@ -26,9 +26,9 @@ Or, get several at once: .. code-block:: python table.aggregate([ - agate.Min('salary'), - agate.Mean('salary'), - agate.Max('salary') + ('salary_min', agate.Min('salary')), + ('salary_ave', agate.Mean('salary')), + ('salary_max', agate.Max('salary')), ]) Aggregate statistics @@ -86,14 +86,12 @@ The output table will be the same format as the previous example, except the val Identify outliers ================= -The `agate-stats `_ extension adds methods for finding outliers. +The `agate-stats `_ extension adds methods for finding outliers. .. code-block:: python import agatestats - agatestats.patch() - outliers = table.stdev_outliers('salary', deviations=3, reject=False) By specifying :code:`reject=True` you can instead return a table including only those values **not** identified as outliers. diff --git a/docs/extensions.rst b/docs/extensions.rst index 7cf7d8e1..e265d943 100644 --- a/docs/extensions.rst +++ b/docs/extensions.rst @@ -7,7 +7,7 @@ The core agate library is designed rely on as few dependencies as possible. Howe Using extensions ================ -agate support's plugin-style extensions using a monkey-patching pattern. Libraries can be created that add new methods onto :class:`.Table` and :class:`.TableSet`. For example, `agate-sql `_ adds the ability to read and write tables from a SQL database: +agate support's plugin-style extensions using a monkey-patching pattern. Libraries can be created that add new methods onto :class:`.Table` and :class:`.TableSet`. For example, `agate-sql `_ adds the ability to read and write tables from a SQL database: .. code-block:: python @@ -23,12 +23,12 @@ List of extensions Here is a list of agate extensions that are known to be actively maintained: -* `agate-sql `_: Read and write tables in SQL databases -* `agate-stats `_: Additional statistical methods -* `agate-excel `_: Read excel tables (xls and xlsx) -* `agate-dbf `_: Read dbf tables (from shapefiles) -* `agate-remote `_: Read from remote files -* `agate-lookup `_: Instantly join to hosted `lookup `_ tables. +* `agate-sql `_: Read and write tables in SQL databases +* `agate-stats `_: Additional statistical methods +* `agate-excel `_: Read excel tables (xls and xlsx) +* `agate-dbf `_: Read dbf tables (from shapefiles) +* `agate-remote `_: Read from remote files +* `agate-lookup `_: Instantly join to hosted `lookup `_ tables. Writing your own extensions =========================== diff --git a/docs/install.rst b/docs/install.rst index c9491b07..169f0ece 100644 --- a/docs/install.rst +++ b/docs/install.rst @@ -9,11 +9,7 @@ To use agate install it with pip:: pip install agate -.. note:: - - Need more speed? Upgrade to Python 3. It's 3-5x faster than Python 2. - - If you must use Python 2 you can you can :code:`pip install cdecimal` for a performance boost. +For non-English locale support, `install PyICU `__. Developers ---------- @@ -24,20 +20,15 @@ If you are a developer that also wants to hack on agate, install it from git:: cd agate mkvirtualenv agate - # If running Python 3 (strongly recommended for development) - pip install -r requirements-py3.txt - - # If running Python 2 - pip install -r requirements-py2.txt + pip install -e .[test] python setup.py develop - tox .. note:: To run the agate tests with coverage:: - nosetests --with-coverage tests + pytest --cov agate Supported platforms ------------------- @@ -46,10 +37,10 @@ agate supports the following versions of Python: * Python 2.7 * Python 3.5+ -* `PyPy `_ versions >= 4.0.0 +* `PyPy `_ versions >= 4.0.0 It is tested primarily on OSX, but due to its minimal dependencies it should work perfectly on both Linux and Windows. .. note:: - `iPython `_ or `Jupyter `_ user? Agate works great there too. + `iPython `_ or `Jupyter `_ user? Agate works great there too. diff --git a/docs/release_process.rst b/docs/release_process.rst index 180a3a2c..046b3157 100644 --- a/docs/release_process.rst +++ b/docs/release_process.rst @@ -2,22 +2,21 @@ Release process =============== -This is the release process for agate: +If substantial changes were made to the code: -1. Verify all unit tests pass with fresh environments: ``tox -r``. -2. Verify 100% test coverage: ``nosetests --with-coverage tests``. -3. Ensure any new modules have been added to setup.py's ``packages`` list. -#. Ensure any new public interfaces have been added to the documentation. -#. Ensure TableSet proxy methods have been added for new Table methods. -#. Make sure the example script still works: ``python example.py``. -#. Ensure ``python charts.py`` works and has been run recently. -#. Ensure ``CHANGELOG.rst`` is up to date. Add the release date and summary. -#. Create a release tag: ``git tag -a x.y.z -m "x.y.z release."`` -#. Push tags upstream: ``git push --tags`` -#. If this is a major release, merge ``master`` into ``stable``: ``git checkout stable; git merge master; git push`` -#. Upload to `PyPI `_: ``python setup.py sdist bdist_wheel upload``. -#. Flag the release to build on `RTFD `_. -#. Update the "default version" on `RTFD `_ to the latest. -#. Rev to latest version: ``docs/conf.py``, ``setup.py``, ``CHANGELOG.rst`` need updates. -#. Find/replace ``en/[old version]`` to ``en/[new version]`` in ``tutorial.ipynb``. -#. Commit revision: ``git commit -am "Update to version x.y.z for development."``. +#. Ensure any new modules have been added to setup.py's ``packages`` list +#. Ensure any new public interfaces have been added to the documentation +#. Ensure TableSet proxy methods have been added for new Table methods + +Then: + +#. All tests pass on continuous integration +#. The changelog is up-to-date and dated +#. The version number is correct in: + * setup.py + * docs/conf.py +#. Check for new authors: ``git log --invert-grep --author='James McKinney'`` +#. Run ``python charts.py`` to update images in the documentation +#. Tag the release: ``git tag -a x.y.z -m 'x.y.z release.'; git push --follow-tags`` +#. Upload to PyPI: ``rm -rf dist; python setup.py sdist bdist_wheel; twine upload dist/*`` +#. Build the documentation on ReadTheDocs manually diff --git a/docs/requirements.txt b/docs/requirements.txt new file mode 100644 index 00000000..a669e4fa --- /dev/null +++ b/docs/requirements.txt @@ -0,0 +1,3 @@ +furo +sphinx>2 +docutils>=0.18 diff --git a/examples/test_from_json_ambiguous.json b/examples/test_from_json_ambiguous.json new file mode 100644 index 00000000..5435946e --- /dev/null +++ b/examples/test_from_json_ambiguous.json @@ -0,0 +1,8 @@ +[ + { + "a/b": 2, + "a": { + "b": false + } + } +] diff --git a/exonerations.py b/exonerations.py index 923600d1..9a7ba05d 100755 --- a/exonerations.py +++ b/exonerations.py @@ -1,8 +1,9 @@ #!/usr/bin/env python -import agate import proof +import agate + def load_data(data): data['exonerations'] = agate.Table.from_csv('examples/realdata/exonerations-20150828.csv') @@ -88,6 +89,7 @@ def race_and_age(data): # Print out the results sorted_groups.print_table(max_rows=10) + analysis = proof.Analysis(load_data) analysis.then(confessions) analysis.then(median_age) diff --git a/requirements-py2.txt b/requirements-py2.txt deleted file mode 100644 index 9bc3a8d8..00000000 --- a/requirements-py2.txt +++ /dev/null @@ -1,19 +0,0 @@ -unittest2>=1.1.0 -nose>=1.1.2 -tox>=1.3 -Sphinx>=1.2.2 -coverage>=3.7.1 -six>=1.9.0 -sphinx_rtd_theme>=0.1.6 -wheel>=0.24.0 -pytimeparse>=1.1.5 -Babel>=2.0 -parsedatetime>=2.1 -pytz>=2015.4 -mock>=1.3.0 -isodate>=0.5.4 -python-slugify>=1.2.1 -lxml>=3.6.0,<4.0.0 -cssselect>=0.9.1 -leather>=0.3.2 -PyICU>=2.4.2 diff --git a/requirements-py3.txt b/requirements-py3.txt deleted file mode 100644 index c94c368b..00000000 --- a/requirements-py3.txt +++ /dev/null @@ -1,17 +0,0 @@ -nose>=1.1.2 -tox>=1.3 -Sphinx>=1.2.2 -coverage>=3.7.1 -six>=1.9.0 -sphinx_rtd_theme>=0.1.6 -wheel>=0.24.0 -pytimeparse>=1.1.5 -Babel>=2.0 -parsedatetime>=2.1 -pytz>=2015.4 -isodate>=0.5.4 -python-slugify>=1.2.1 -lxml>=3.6.0 -cssselect>=0.9.1 -leather>=0.3.2 -PyICU>=2.4.2 diff --git a/setup.cfg b/setup.cfg index 2a9acf13..9bf57093 100644 --- a/setup.cfg +++ b/setup.cfg @@ -1,2 +1,17 @@ +[flake8] +max-line-length = 119 +per-file-ignores = + # imported but unused, unable to detect undefined names + agate/__init__.py: F401,F403 + agate/aggregations/__init__.py: F401 + agate/computations/__init__.py: F401 + agate/data_types/__init__.py: F401 + # module level import not at top of file + agate/tableset/__init__.py: E402 + agate/table/__init__.py: E402 + +[isort] +line_length = 119 + [bdist_wheel] universal = 1 diff --git a/setup.py b/setup.py index f797dbba..49851d19 100644 --- a/setup.py +++ b/setup.py @@ -1,26 +1,20 @@ -#!/usr/bin/env python +from setuptools import find_packages, setup -from setuptools import setup - -install_requires = [ - 'six>=1.9.0', - 'pytimeparse>=1.1.5', - 'parsedatetime>=2.1', - 'Babel>=2.0', - 'isodate>=0.5.4', - 'python-slugify>=1.2.1', - 'leather>=0.3.2', - 'PyICU>=2.4.2', -] +with open('README.rst') as f: + long_description = f.read() setup( name='agate', - version='1.6.2', + version='1.9.1', description='A data analysis library that is optimized for humans instead of machines.', - long_description=open('README.rst').read(), + long_description=long_description, + long_description_content_type='text/x-rst', author='Christopher Groskopf', author_email='chrisgroskopf@gmail.com', - url='http://agate.readthedocs.org/', + url='https://agate.readthedocs.org/', + project_urls={ + 'Source': 'https://github.com/wireservice/agate', + }, license='MIT', classifiers=[ 'Development Status :: 5 - Production/Stable', @@ -31,25 +25,37 @@ 'Natural Language :: English', 'Operating System :: OS Independent', 'Programming Language :: Python', - 'Programming Language :: Python :: 2', - 'Programming Language :: Python :: 2.7', - 'Programming Language :: Python :: 3', - 'Programming Language :: Python :: 3.5', - 'Programming Language :: Python :: 3.6', - 'Programming Language :: Python :: 3.7', 'Programming Language :: Python :: 3.8', + 'Programming Language :: Python :: 3.9', + 'Programming Language :: Python :: 3.10', + 'Programming Language :: Python :: 3.11', + 'Programming Language :: Python :: 3.12', 'Programming Language :: Python :: Implementation :: CPython', 'Programming Language :: Python :: Implementation :: PyPy', 'Topic :: Scientific/Engineering :: Information Analysis', 'Topic :: Software Development :: Libraries :: Python Modules', ], - packages=[ - 'agate', - 'agate.aggregations', - 'agate.computations', - 'agate.data_types', - 'agate.table', - 'agate.tableset' + packages=find_packages(exclude=['benchmarks', 'tests', 'tests.*']), + install_requires=[ + 'Babel>=2.0', + 'isodate>=0.5.4', + 'leather>=0.3.2', + # KeyError: 's' https://github.com/bear/parsedatetime/pull/233 https://github.com/wireservice/agate/issues/743 + 'parsedatetime>=2.1,!=2.5', + 'python-slugify>=1.2.1', + 'pytimeparse>=1.1.5', + 'tzdata>=2023.3;platform_system=="Windows"', ], - install_requires=install_requires + extras_require={ + 'test': [ + 'coverage>=3.7.1', + 'cssselect>=0.9.1', + 'lxml>=3.6.0', + # CI is not configured to install PyICU on macOS and Windows. + 'PyICU>=2.4.2;sys_platform=="linux"', + 'pytest', + 'pytest-cov', + 'backports.zoneinfo;python_version<"3.9"', + ], + } ) diff --git a/tests/__init__.py b/tests/__init__.py index e69de29b..ee8beb57 100644 --- a/tests/__init__.py +++ b/tests/__init__.py @@ -0,0 +1,4 @@ +import locale + +# The test fixtures can break if the locale is non-US. +locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') diff --git a/tests/test_agate.py b/tests/test_agate.py index cd6e4fcc..78a77da7 100644 --- a/tests/test_agate.py +++ b/tests/test_agate.py @@ -1,24 +1,11 @@ -#!/usr/bin/env python - -try: - import unittest2 as unittest -except ImportError: - import unittest - -import six +import unittest import agate class TestCSV(unittest.TestCase): def test_agate(self): - if six.PY2: - self.assertIs(agate.csv.reader, agate.csv_py2.reader) - self.assertIs(agate.csv.writer, agate.csv_py2.writer) - self.assertIs(agate.csv.DictReader, agate.csv_py2.DictReader) - self.assertIs(agate.csv.DictWriter, agate.csv_py2.DictWriter) - else: - self.assertIs(agate.csv.reader, agate.csv_py3.reader) - self.assertIs(agate.csv.writer, agate.csv_py3.writer) - self.assertIs(agate.csv.DictReader, agate.csv_py3.DictReader) - self.assertIs(agate.csv.DictWriter, agate.csv_py3.DictWriter) + self.assertIs(agate.csv.reader, agate.csv_py3.reader) + self.assertIs(agate.csv.writer, agate.csv_py3.writer) + self.assertIs(agate.csv.DictReader, agate.csv_py3.DictReader) + self.assertIs(agate.csv.DictWriter, agate.csv_py3.DictWriter) diff --git a/tests/test_aggregations.py b/tests/test_aggregations.py index 3722b5f3..c8dba876 100644 --- a/tests/test_aggregations.py +++ b/tests/test_aggregations.py @@ -1,23 +1,16 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - import datetime -from decimal import Decimal -import platform import sys +import unittest import warnings - -try: - import unittest2 as unittest -except ImportError: - import unittest - -import six +from decimal import Decimal from agate import Table -from agate.aggregations import * -from agate.data_types import * -from agate.exceptions import * +from agate.aggregations import (IQR, MAD, All, Any, Count, Deciles, First, HasNulls, Max, MaxLength, MaxPrecision, + Mean, Median, Min, Mode, Percentiles, PopulationStDev, PopulationVariance, Quartiles, + Quintiles, StDev, Sum, Summary, Variance) +from agate.data_types import Boolean, DateTime, Number, Text, TimeDelta +from agate.exceptions import DataTypeError +from agate.utils import Quantiles from agate.warns import NullCalculationWarning @@ -181,43 +174,78 @@ def test_all(self): class TestDateTimeAggregation(unittest.TestCase): - def test_min(self): - rows = [ + def setUp(self): + self.rows = [ [datetime.datetime(1994, 3, 3, 6, 31)], [datetime.datetime(1994, 3, 3, 6, 30, 30)], [datetime.datetime(1994, 3, 3, 6, 30)], ] - table = Table(rows, ['test'], [DateTime()]) + self.table = Table(self.rows, ['test', 'null'], [DateTime(), DateTime()]) + + self.time_delta_rows = [ + [datetime.timedelta(seconds=10), datetime.timedelta(seconds=15), None], + [datetime.timedelta(seconds=20), None, None], + ] + + self.time_delta_table = Table( + self.time_delta_rows, ['test', 'mixed', 'null'], [TimeDelta(), TimeDelta(), TimeDelta()] + ) + + def test_min(self): + self.assertIsInstance(Min('test').get_aggregate_data_type(self.table), DateTime) + Min('test').validate(self.table) + self.assertEqual(Min('test').run(self.table), datetime.datetime(1994, 3, 3, 6, 30)) - self.assertIsInstance(Min('test').get_aggregate_data_type(table), DateTime) - Min('test').validate(table) - self.assertEqual(Min('test').run(table), datetime.datetime(1994, 3, 3, 6, 30)) + def test_min_all_nulls(self): + self.assertIsNone(Min('null').run(self.table)) + + def test_min_time_delta(self): + self.assertIsInstance(Min('test').get_aggregate_data_type(self.time_delta_table), TimeDelta) + Min('test').validate(self.time_delta_table) + self.assertEqual(Min('test').run(self.time_delta_table), datetime.timedelta(0, 10)) def test_max(self): - rows = [ - [datetime.datetime(1994, 3, 3, 6, 31)], - [datetime.datetime(1994, 3, 3, 6, 30, 30)], - [datetime.datetime(1994, 3, 3, 6, 30)], - ] + self.assertIsInstance(Max('test').get_aggregate_data_type(self.table), DateTime) + Max('test').validate(self.table) + self.assertEqual(Max('test').run(self.table), datetime.datetime(1994, 3, 3, 6, 31)) - table = Table(rows, ['test'], [DateTime()]) + def test_max_all_nulls(self): + self.assertIsNone(Max('null').run(self.table)) - self.assertIsInstance(Max('test').get_aggregate_data_type(table), DateTime) - Max('test').validate(table) - self.assertEqual(Max('test').run(table), datetime.datetime(1994, 3, 3, 6, 31)) + def test_max_time_delta(self): + self.assertIsInstance(Max('test').get_aggregate_data_type(self.time_delta_table), TimeDelta) + Max('test').validate(self.time_delta_table) + self.assertEqual(Max('test').run(self.time_delta_table), datetime.timedelta(0, 20)) - def test_sum(self): - rows = [ - [datetime.timedelta(seconds=10)], - [datetime.timedelta(seconds=20)], - ] + def test_mean(self): + with self.assertWarns(NullCalculationWarning): + Mean('mixed').validate(self.time_delta_table) + + Mean('test').validate(self.time_delta_table) + + self.assertEqual(Mean('test').run(self.time_delta_table), datetime.timedelta(seconds=15)) + + def test_mean_all_nulls(self): + self.assertIsNone(Mean('null').run(self.time_delta_table)) + + def test_mean_with_nulls(self): + warnings.simplefilter('ignore') + + try: + Mean('mixed').validate(self.time_delta_table) + finally: + warnings.resetwarnings() + + self.assertAlmostEqual(Mean('mixed').run(self.time_delta_table), datetime.timedelta(seconds=15)) - table = Table(rows, ['test'], [TimeDelta()]) + def test_sum(self): + self.assertIsInstance(Sum('test').get_aggregate_data_type(self.time_delta_table), TimeDelta) + Sum('test').validate(self.time_delta_table) + self.assertEqual(Sum('test').run(self.time_delta_table), datetime.timedelta(seconds=30)) - self.assertIsInstance(Sum('test').get_aggregate_data_type(table), TimeDelta) - Sum('test').validate(table) - self.assertEqual(Sum('test').run(table), datetime.timedelta(seconds=30)) + def test_sum_all_nulls(self): + self.assertEqual(Sum('null').run(self.time_delta_table), datetime.timedelta(0)) class TestNumberAggregation(unittest.TestCase): @@ -246,6 +274,9 @@ def test_max_precision(self): self.assertEqual(MaxPrecision('one').run(self.table), 1) self.assertEqual(MaxPrecision('two').run(self.table), 2) + def test_max_precision_all_nulls(self): + self.assertEqual(MaxPrecision('four').run(self.table), 0) + def test_sum(self): with self.assertRaises(DataTypeError): Sum('three').validate(self.table) @@ -255,6 +286,9 @@ def test_sum(self): self.assertEqual(Sum('one').run(self.table), Decimal('6.5')) self.assertEqual(Sum('two').run(self.table), Decimal('13.13')) + def test_sum_all_nulls(self): + self.assertEqual(Sum('four').run(self.table), Decimal('0')) + def test_min(self): with self.assertRaises(DataTypeError): Min('three').validate(self.table) @@ -264,6 +298,9 @@ def test_min(self): self.assertEqual(Min('one').run(self.table), Decimal('1.1')) self.assertEqual(Min('two').run(self.table), Decimal('2.19')) + def test_min_all_nulls(self): + self.assertIsNone(Min('four').run(self.table)) + def test_max(self): with self.assertRaises(DataTypeError): Max('three').validate(self.table) @@ -273,6 +310,9 @@ def test_max(self): self.assertEqual(Max('one').run(self.table), Decimal('2.7')) self.assertEqual(Max('two').run(self.table), Decimal('4.1')) + def test_max_all_nulls(self): + self.assertIsNone(Max('four').run(self.table)) + def test_mean(self): with self.assertWarns(NullCalculationWarning): Mean('one').validate(self.table) @@ -285,14 +325,6 @@ def test_mean(self): self.assertEqual(Mean('two').run(self.table), Decimal('3.2825')) def test_mean_all_nulls(self): - """ - Test to confirm mean of only nulls doesn't cause a critical error. - - The assumption here is that if you attempt to perform a mean - calculation, on a column which contains only null values, then a null - value should be returned to the caller. - :return: - """ self.assertIsNone(Mean('four').run(self.table)) def test_mean_with_nulls(self): @@ -322,6 +354,9 @@ def test_median(self): self.assertIsInstance(Median('two').get_aggregate_data_type(self.table), Number) self.assertEqual(Median('two').run(self.table), Decimal('3.42')) + def test_median_all_nulls(self): + self.assertIsNone(Median('four').run(self.table)) + def test_mode(self): with warnings.catch_warnings(): warnings.simplefilter('error') @@ -342,6 +377,9 @@ def test_mode(self): self.assertIsInstance(Mode('two').get_aggregate_data_type(self.table), Number) self.assertEqual(Mode('two').run(self.table), Decimal('3.42')) + def test_mode_all_nulls(self): + self.assertIsNone(Mode('four').run(self.table)) + def test_iqr(self): with warnings.catch_warnings(): warnings.simplefilter('error') @@ -362,6 +400,9 @@ def test_iqr(self): self.assertIsInstance(IQR('two').get_aggregate_data_type(self.table), Number) self.assertEqual(IQR('two').run(self.table), Decimal('0.955')) + def test_irq_all_nulls(self): + self.assertIsNone(IQR('four').run(self.table)) + def test_variance(self): with warnings.catch_warnings(): warnings.simplefilter('error') @@ -385,6 +426,9 @@ def test_variance(self): Decimal('0.6332') ) + def test_variance_all_nulls(self): + self.assertIsNone(Variance('four').run(self.table)) + def test_population_variance(self): with warnings.catch_warnings(): warnings.simplefilter('error') @@ -408,6 +452,9 @@ def test_population_variance(self): Decimal('0.4749') ) + def test_population_variance_all_nulls(self): + self.assertIsNone(PopulationVariance('four').run(self.table)) + def test_stdev(self): with warnings.catch_warnings(): warnings.simplefilter('error') @@ -431,6 +478,9 @@ def test_stdev(self): Decimal('0.7958') ) + def test_stdev_all_nulls(self): + self.assertIsNone(StDev('four').run(self.table)) + def test_population_stdev(self): with warnings.catch_warnings(): warnings.simplefilter('error') @@ -454,6 +504,9 @@ def test_population_stdev(self): Decimal('0.6891') ) + def test_population_stdev_all_nulls(self): + self.assertIsNone(PopulationStDev('four').run(self.table)) + def test_mad(self): with warnings.catch_warnings(): warnings.simplefilter('error') @@ -474,6 +527,9 @@ def test_mad(self): self.assertIsInstance(MAD('two').get_aggregate_data_type(self.table), Number) self.assertAlmostEqual(MAD('two').run(self.table), Decimal('0')) + def test_mad_all_nulls(self): + self.assertIsNone(MAD('four').run(self.table)) + def test_percentiles(self): with warnings.catch_warnings(): warnings.simplefilter('error') @@ -504,6 +560,9 @@ def test_percentiles(self): self.assertEqual(percentiles[99], Decimal('990.5')) self.assertEqual(percentiles[100], Decimal('1000')) + def test_percentiles_all_nulls(self): + self.assertEqual(Percentiles('four').run(self.table), Quantiles([None] * 101)) + def test_percentiles_locate(self): rows = [(n,) for n in range(1, 1001)] @@ -622,6 +681,9 @@ def test_quartiles(self): for i, v in enumerate(['1', '2', '4', '6', '7']): self.assertEqual(quartiles[i], Decimal(v)) + def test_quartiles_all_nulls(self): + self.assertEqual(Quartiles('four').run(self.table), Quantiles([None] * 5)) + def test_quartiles_locate(self): """ CDF quartile tests from: @@ -662,11 +724,16 @@ def test_quintiles(self): finally: warnings.resetwarnings() - rows = [(n,) for n in range(1, 1001)] + rows = [(n,) for n in range(1, 1000)] table = Table(rows, ['ints'], [self.number_type]) - quintiles = Quintiles('ints').run(table) # noqa + quintiles = Quintiles('ints').run(table) + for i, v in enumerate(['1', '200', '400', '600', '800', '999']): + self.assertEqual(quintiles[i], Decimal(v)) + + def test_quintiles_all_nulls(self): + self.assertEqual(Quintiles('four').run(self.table), Quantiles([None] * 6)) def test_deciles(self): with warnings.catch_warnings(): @@ -685,25 +752,35 @@ def test_deciles(self): finally: warnings.resetwarnings() - rows = [(n,) for n in range(1, 1001)] + rows = [(n,) for n in range(1, 1000)] table = Table(rows, ['ints'], [self.number_type]) - deciles = Deciles('ints').run(table) # noqa + deciles = Deciles('ints').run(table) + for i, v in enumerate(['1', '100', '200', '300', '400', '500', '600', '700', '800', '900', '999']): + self.assertEqual(deciles[i], Decimal(v)) + + def test_deciles_all_nulls(self): + self.assertEqual(Deciles('four').run(self.table), Quantiles([None] * 11)) class TestTextAggregation(unittest.TestCase): - def test_max_length(self): - rows = [ - ['a'], - ['gobble'], - ['w'] + def setUp(self): + self.rows = [ + ['a', None], + ['gobble', None], + ['w', None] ] - table = Table(rows, ['test'], [Text()]) - MaxLength('test').validate(table) - self.assertEqual(MaxLength('test').run(table), 6) - self.assertIsInstance(MaxLength('test').run(table), Decimal) + self.table = Table(self.rows, ['test', 'null'], [Text(), Text()]) + + def test_max_length(self): + MaxLength('test').validate(self.table) + self.assertEqual(MaxLength('test').run(self.table), 6) + self.assertIsInstance(MaxLength('test').run(self.table), Decimal) + + def test_max_length_all_nulls(self): + self.assertEqual(MaxLength('null').run(self.table), 0) def test_max_length_unicode(self): """ @@ -716,7 +793,7 @@ def test_max_length_unicode(self): """ rows = [ ['a'], - [u'👍'], + ['👍'], ['w'] ] diff --git a/tests/test_columns.py b/tests/test_columns.py index 22ec9d79..3ff3f4b0 100644 --- a/tests/test_columns.py +++ b/tests/test_columns.py @@ -1,20 +1,9 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - import pickle - -try: - from cdecimal import Decimal -except ImportError: # pragma: no cover - from decimal import Decimal - -try: - import unittest2 as unittest -except ImportError: - import unittest +import unittest +from decimal import Decimal from agate import Table -from agate.data_types import * +from agate.data_types import Number, Text class TestColumn(unittest.TestCase): diff --git a/tests/test_computations.py b/tests/test_computations.py index b8c118ae..8821f359 100644 --- a/tests/test_computations.py +++ b/tests/test_computations.py @@ -1,18 +1,12 @@ -#!/usr/bin/env Python - import datetime -from decimal import Decimal +import unittest import warnings - -try: - import unittest2 as unittest -except ImportError: - import unittest +from decimal import Decimal from agate import Table -from agate.data_types import * -from agate.computations import * -from agate.exceptions import * +from agate.computations import Change, Formula, Percent, PercentChange, PercentileRank, Rank, Slug +from agate.data_types import Boolean, Date, DateTime, Number, Text, TimeDelta +from agate.exceptions import CastError, DataTypeError from agate.warns import NullCalculationWarning @@ -54,7 +48,7 @@ def test_formula(self): def test_formula_invalid(self): with self.assertRaises(CastError): - new_table = self.table.compute([ # noqa + self.table.compute([ ('test', Formula(self.number_type, lambda r: r['one'])) ]) @@ -179,11 +173,11 @@ def to_one_place(d): self.assertEqual(to_one_place(new_table.columns['test'][3]), Decimal('60.0')) with self.assertRaises(DataTypeError): - new_table = self.table.compute([ + self.table.compute([ ('test', Percent('two', 0)) ]) with self.assertRaises(DataTypeError): - new_table = self.table.compute([ + self.table.compute([ ('test', Percent('two', -1)) ]) with self.assertRaises(DataTypeError): @@ -250,12 +244,12 @@ def to_one_place(d): def test_percent_change_invalid_columns(self): with self.assertRaises(DataTypeError): - new_table = self.table.compute([ + self.table.compute([ ('test', PercentChange('one', 'three')) ]) with self.assertRaises(DataTypeError): - new_table = self.table.compute([ # noqa + self.table.compute([ ('test', PercentChange('three', 'one')) ]) diff --git a/tests/test_data_types.py b/tests/test_data_types.py index 4476631d..4f2bac53 100644 --- a/tests/test_data_types.py +++ b/tests/test_data_types.py @@ -1,20 +1,17 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - import datetime -from decimal import Decimal import pickle +import unittest +from decimal import Decimal + import parsedatetime try: - import unittest2 as unittest + from zoneinfo import ZoneInfo except ImportError: - import unittest - -import pytz + # Fallback for Python < 3.9 + from backports.zoneinfo import ZoneInfo -from agate.columns import * -from agate.data_types import * +from agate.data_types import Boolean, Date, DateTime, Number, Text, TimeDelta from agate.exceptions import CastError @@ -40,7 +37,7 @@ def test_test(self): self.assertEqual(self.type.test(datetime.timedelta(hours=4, minutes=10)), True) self.assertEqual(self.type.test('a'), True) self.assertEqual(self.type.test('A\nB'), True) - self.assertEqual(self.type.test(u'👍'), True) + self.assertEqual(self.type.test('👍'), True) self.assertEqual(self.type.test('05_leslie3d_base'), True) self.assertEqual(self.type.test('2016-12-29'), True) self.assertEqual(self.type.test('2016-12-29T11:43:30Z'), True) @@ -48,9 +45,9 @@ def test_test(self): self.assertEqual(self.type.test('2016-12-29T11:43:30-06:00'), True) def test_cast(self): - values = ('a', 1, None, Decimal('2.7'), 'n/a', u'👍', ' foo', 'foo ') + values = ('a', 1, None, Decimal('2.7'), 'n/a', '👍', ' foo', 'foo ') casted = tuple(self.type.cast(v) for v in values) - self.assertSequenceEqual(casted, ('a', '1', None, '2.7', None, u'👍', ' foo', 'foo ')) + self.assertSequenceEqual(casted, ('a', '1', None, '2.7', None, '👍', ' foo', 'foo ')) def test_no_cast_nulls(self): values = ('', 'N/A', None) @@ -63,6 +60,10 @@ def test_no_cast_nulls(self): casted = tuple(t.cast(v) for v in values) self.assertSequenceEqual(casted, ('', 'N/A', None)) + def test_null_values(self): + t = Text(null_values=['Bad Value']) + self.assertEqual(t.cast('Bad Value'), None) + class TestBoolean(unittest.TestCase): def setUp(self): @@ -90,7 +91,7 @@ def test_test(self): self.assertEqual(self.type.test(datetime.timedelta(hours=4, minutes=10)), False) self.assertEqual(self.type.test('a'), False) self.assertEqual(self.type.test('A\nB'), False) - self.assertEqual(self.type.test(u'👍'), False) + self.assertEqual(self.type.test('👍'), False) self.assertEqual(self.type.test('05_leslie3d_base'), False) self.assertEqual(self.type.test('2016-12-29'), False) self.assertEqual(self.type.test('2016-12-29T11:43:30Z'), False) @@ -139,7 +140,7 @@ def test_test(self): self.assertEqual(self.type.test(datetime.timedelta(hours=4, minutes=10)), False) self.assertEqual(self.type.test('a'), False) self.assertEqual(self.type.test('A\nB'), False) - self.assertEqual(self.type.test(u'👍'), False) + self.assertEqual(self.type.test('👍'), False) self.assertEqual(self.type.test('05_leslie3d_base'), False) self.assertEqual(self.type.test('2016-12-29'), False) self.assertEqual(self.type.test('2016-12-29T11:43:30Z'), False) @@ -149,12 +150,10 @@ def test_test(self): def test_cast(self): values = (2, 1, None, Decimal('2.7'), 'n/a', '2.7', '200,000,000') casted = tuple(self.type.cast(v) for v in values) - self.assertSequenceEqual(casted, (Decimal('2'), Decimal('1'), None, Decimal('2.7'), None, Decimal('2.7'), Decimal('200000000'))) - - @unittest.skipIf(six.PY3, 'Not supported in Python 3.') - def test_cast_long(self): - self.assertEqual(self.type.test(long('141414')), True) - self.assertEqual(self.type.cast(long('141414')), Decimal('141414')) + self.assertSequenceEqual( + casted, + (Decimal('2'), Decimal('1'), None, Decimal('2.7'), None, Decimal('2.7'), Decimal('200000000')) + ) def test_boolean_cast(self): values = (True, False) @@ -162,14 +161,20 @@ def test_boolean_cast(self): self.assertSequenceEqual(casted, (Decimal('1'), Decimal('0'))) def test_currency_cast(self): - values = ('$2.70', '-$0.70', u'€14', u'50¢', u'-75¢', u'-$1,287') + values = ('$2.70', '-$0.70', '€14', '50¢', '-75¢', '-$1,287') casted = tuple(self.type.cast(v) for v in values) - self.assertSequenceEqual(casted, (Decimal('2.7'), Decimal('-0.7'), Decimal('14'), Decimal('50'), Decimal('-75'), Decimal('-1287'))) + self.assertSequenceEqual( + casted, + (Decimal('2.7'), Decimal('-0.7'), Decimal('14'), Decimal('50'), Decimal('-75'), Decimal('-1287')) + ) def test_cast_locale(self): values = (2, 1, None, Decimal('2.7'), 'n/a', '2,7', '200.000.000') - casted = tuple(Number(locale='de_DE').cast(v) for v in values) - self.assertSequenceEqual(casted, (Decimal('2'), Decimal('1'), None, Decimal('2.7'), None, Decimal('2.7'), Decimal('200000000'))) + casted = tuple(Number(locale='de_DE.UTF-8').cast(v) for v in values) + self.assertSequenceEqual( + casted, + (Decimal('2'), Decimal('1'), None, Decimal('2.7'), None, Decimal('2.7'), Decimal('200000000')) + ) def test_cast_text(self): with self.assertRaises(CastError): @@ -206,7 +211,7 @@ def test_test(self): self.assertEqual(self.type.test(datetime.timedelta(hours=4, minutes=10)), False) self.assertEqual(self.type.test('a'), False) self.assertEqual(self.type.test('A\nB'), False) - self.assertEqual(self.type.test(u'👍'), False) + self.assertEqual(self.type.test('👍'), False) self.assertEqual(self.type.test('05_leslie3d_base'), False) self.assertEqual(self.type.test('2016-12-29'), True) self.assertEqual(self.type.test('2016-12-29T11:43:30Z'), False) @@ -255,7 +260,7 @@ def test_cast_format(self): )) def test_cast_format_locale(self): - date_type = Date(date_format='%d-%b-%Y', locale='de_DE') + date_type = Date(date_format='%d-%b-%Y', locale='de_DE.UTF-8') # March can be abbreviated to Mrz or Mär depending on the locale version, # so we use December in the first value to ensure the test passes everywhere @@ -272,7 +277,7 @@ def test_cast_format_locale(self): def test_cast_locale(self): date_type = Date(locale='fr_FR') - values = ('01 mars 1994', u'jeudi 17 février 2011', None, '5 janvier 1984', 'n/a') + values = ('01 mars 1994', 'jeudi 17 février 2011', None, '5 janvier 1984', 'n/a') casted = tuple(date_type.cast(v) for v in values) self.assertSequenceEqual(casted, ( datetime.date(1994, 3, 1), @@ -316,7 +321,7 @@ def test_test(self): self.assertEqual(self.type.test(datetime.timedelta(hours=4, minutes=10)), False) self.assertEqual(self.type.test('a'), False) self.assertEqual(self.type.test('A\nB'), False) - self.assertEqual(self.type.test(u'👍'), False) + self.assertEqual(self.type.test('👍'), False) self.assertEqual(self.type.test('05_leslie3d_base'), False) self.assertEqual(self.type.test('2016-12-29'), True) self.assertEqual(self.type.test('2016-12-29T11:43:30Z'), True) @@ -352,16 +357,16 @@ def test_cast_parser(self): )) def test_cast_parser_timezone(self): - tzinfo = pytz.timezone('US/Pacific') + tzinfo = ZoneInfo('US/Pacific') datetime_type = DateTime(timezone=tzinfo) values = ('3/1/1994 12:30 PM', '2/17/2011 06:30', None, 'January 5th, 1984 22:37', 'n/a') casted = tuple(datetime_type.cast(v) for v in values) self.assertSequenceEqual(casted, ( - tzinfo.localize(datetime.datetime(1994, 3, 1, 12, 30, 0, 0)), - tzinfo.localize(datetime.datetime(2011, 2, 17, 6, 30, 0, 0)), + datetime.datetime(1994, 3, 1, 12, 30, 0, 0, tzinfo=tzinfo), + datetime.datetime(2011, 2, 17, 6, 30, 0, 0, tzinfo=tzinfo), None, - tzinfo.localize(datetime.datetime(1984, 1, 5, 22, 37, 0, 0)), + datetime.datetime(1984, 1, 5, 22, 37, 0, 0, tzinfo=tzinfo), None )) @@ -379,7 +384,7 @@ def test_cast_format(self): )) def test_cast_format_locale(self): - date_type = DateTime(datetime_format='%Y-%m-%d %I:%M %p', locale='ko_KR') + date_type = DateTime(datetime_format='%Y-%m-%d %I:%M %p', locale='ko_KR.UTF-8') # Date formats depend on the platform's strftime/strptime implementation; # some platforms like macOS always return AM/PM for day periods (%p), @@ -456,7 +461,7 @@ def test_test(self): self.assertEqual(self.type.test(datetime.timedelta(hours=4, minutes=10)), True) self.assertEqual(self.type.test('a'), False) self.assertEqual(self.type.test('A\nB'), False) - self.assertEqual(self.type.test(u'👍'), False) + self.assertEqual(self.type.test('👍'), False) self.assertEqual(self.type.test('05_leslie3d_base'), False) self.assertEqual(self.type.test('2016-12-29'), False) self.assertEqual(self.type.test('2016-12-29T11:43:30Z'), False) diff --git a/tests/test_fixed.py b/tests/test_fixed.py index a8305610..22a95da5 100644 --- a/tests/test_fixed.py +++ b/tests/test_fixed.py @@ -1,12 +1,6 @@ -#!/usr/bin/env python +import unittest -try: - import unittest2 as unittest -except ImportError: - import unittest - -from agate import csv -from agate import fixed +from agate import csv, fixed class TestFixed(unittest.TestCase): diff --git a/tests/test_from_json.py b/tests/test_from_json.py index 3674711d..63ba0602 100644 --- a/tests/test_from_json.py +++ b/tests/test_from_json.py @@ -1,18 +1,15 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - from agate import Table +from agate.data_types import Boolean, Date, DateTime, Number, Text, TimeDelta +from agate.rows import Row from agate.testcase import AgateTestCase -from agate.data_types import * from agate.type_tester import TypeTester -import six class TestJSON(AgateTestCase): def setUp(self): self.rows = ( (1, 'a', True, '11/4/2015', '11/4/2015 12:22 PM', '4:15'), - (2, u'👍', False, '11/5/2015', '11/4/2015 12:45 PM', '6:18'), + (2, '👍', False, '11/5/2015', '11/4/2015 12:45 PM', '6:18'), (None, 'b', None, None, None, None) ) @@ -35,12 +32,8 @@ def test_from_json(self): def test_from_json_file_like_object(self): table1 = Table(self.rows, self.column_names, self.column_types) - if six.PY2: - with open('examples/test.json') as f: - table2 = Table.from_json(f) - else: - with open('examples/test.json', encoding='utf-8') as f: - table2 = Table.from_json(f) + with open('examples/test.json', encoding='utf-8') as f: + table2 = Table.from_json(f) self.assertColumnNames(table2, self.column_names) self.assertColumnTypes(table2, [Number, Text, Boolean, Date, DateTime, TimeDelta]) @@ -62,7 +55,7 @@ def test_from_json_mixed_keys(self): self.assertRows(table, [ [1, 4, 'a', None, None], [2, 3, 'b', 'd', None], - [None, 2, u'👍', None, 5] + [None, 2, '👍', None, 5] ]) def test_from_json_nested(self): @@ -92,4 +85,11 @@ def test_from_json_no_type_tester(self): def test_from_json_error_newline_key(self): with self.assertRaises(ValueError): - table = Table.from_json('examples/test.json', newline=True, key='test') # noqa + Table.from_json('examples/test.json', newline=True, key='test') + + def test_from_json_ambiguous(self): + table = Table.from_json('examples/test_from_json_ambiguous.json') + + self.assertColumnNames(table, ('a/b',)) + self.assertColumnTypes(table, [Boolean]) + self.assertRows(table, [Row([False])]) diff --git a/tests/test_mapped_sequence.py b/tests/test_mapped_sequence.py index 6258d31a..c483c309 100644 --- a/tests/test_mapped_sequence.py +++ b/tests/test_mapped_sequence.py @@ -1,20 +1,12 @@ -#!/usr/bin/env python +import unittest -try: - import unittest2 as unittest -except ImportError: - import unittest - -import six - -from agate.data_types import * from agate.mapped_sequence import MappedSequence class TestMappedSequence(unittest.TestCase): def setUp(self): self.column_names = ('one', 'two', 'three') - self.data = (u'a', u'b', u'c') + self.data = ('a', 'b', 'c') self.row = MappedSequence(self.data, self.column_names) def test_is_immutable(self): @@ -25,20 +17,14 @@ def test_is_immutable(self): self.row['one'] = 100 def test_stringify(self): - if six.PY2: - self.assertEqual(str(self.row), "") - else: - self.assertEqual(str(self.row), "") + self.assertEqual(str(self.row), "") def test_stringify_long(self): column_names = ('one', 'two', 'three', 'four', 'five', 'six') - data = (u'a', u'b', u'c', u'd', u'e', u'f') + data = ('a', 'b', 'c', 'd', 'e', 'f') row = MappedSequence(data, column_names) - if six.PY2: - self.assertEqual(str(row), "") - else: - self.assertEqual(str(row), "") + self.assertEqual(str(row), "") def test_length(self): self.assertEqual(len(self.row), 3) @@ -46,19 +32,19 @@ def test_length(self): def test_eq(self): row2 = MappedSequence(self.data, self.column_names) - self.assertTrue(self.row == (u'a', u'b', u'c')) - self.assertTrue(self.row == [u'a', u'b', u'c']) + self.assertTrue(self.row == ('a', 'b', 'c')) + self.assertTrue(self.row == ['a', 'b', 'c']) self.assertTrue(self.row == row2) - self.assertFalse(self.row == (u'a', u'b', u'c', u'd')) + self.assertFalse(self.row == ('a', 'b', 'c', 'd')) self.assertFalse(self.row == 1) def test_ne(self): row2 = MappedSequence(self.data, self.column_names) - self.assertFalse(self.row != (u'a', u'b', u'c')) - self.assertFalse(self.row != [u'a', u'b', u'c']) + self.assertFalse(self.row != ('a', 'b', 'c')) + self.assertFalse(self.row != ['a', 'b', 'c']) self.assertFalse(self.row != row2) - self.assertTrue(self.row != (u'a', u'b', u'c', u'd')) + self.assertTrue(self.row != ('a', 'b', 'c', 'd')) self.assertTrue(self.row != 1) def test_contains(self): @@ -67,10 +53,10 @@ def test_contains(self): def test_set_item(self): with self.assertRaises(TypeError): - self.row['one'] = u't' + self.row['one'] = 't' with self.assertRaises(TypeError): - self.row['five'] = u'g' + self.row['five'] = 'g' def test_get_item(self): self.assertEqual(self.row['one'], 'a') diff --git a/tests/test_py2.py b/tests/test_py2.py deleted file mode 100644 index d1e384e8..00000000 --- a/tests/test_py2.py +++ /dev/null @@ -1,359 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- - -import csv -import os - -try: - import unittest2 as unittest -except ImportError: - import unittest - -import six - -from agate import csv_py2 -from agate.exceptions import FieldSizeLimitError - - -@unittest.skipIf(six.PY3, "Not supported in Python 3.") -class TestUnicodeReader(unittest.TestCase): - def setUp(self): - self.rows = [ - ['number', 'text', 'boolean', 'date', 'datetime', 'timedelta'], - ['1', 'a', 'True', '2015-11-04', '2015-11-04T12:22:00', '0:04:15'], - ['2', u'👍', 'False', '2015-11-05', '2015-11-04T12:45:00', '0:06:18'], - ['', 'b', '', '', '', ''] - ] - - def test_utf8(self): - with open('examples/test.csv') as f: - rows = list(csv_py2.UnicodeReader(f, encoding='utf-8')) - - for a, b in zip(self.rows, rows): - self.assertEqual(a, b) - - def test_latin1(self): - with open('examples/test_latin1.csv') as f: - reader = csv_py2.UnicodeReader(f, encoding='latin1') - self.assertEqual(next(reader), ['a', 'b', 'c']) - self.assertEqual(next(reader), ['1', '2', '3']) - self.assertEqual(next(reader), ['4', '5', u'©']) - - def test_utf16_big(self): - with open('examples/test_utf16_big.csv') as f: - reader = csv_py2.UnicodeReader(f, encoding='utf-16') - self.assertEqual(next(reader), ['a', 'b', 'c']) - self.assertEqual(next(reader), ['1', '2', '3']) - self.assertEqual(next(reader), ['4', '5', u'ʤ']) - - def test_utf16_little(self): - with open('examples/test_utf16_little.csv') as f: - reader = csv_py2.UnicodeReader(f, encoding='utf-16') - self.assertEqual(next(reader), ['a', 'b', 'c']) - self.assertEqual(next(reader), ['1', '2', '3']) - self.assertEqual(next(reader), ['4', '5', u'ʤ']) - - -@unittest.skipIf(six.PY3, "Not supported in Python 3.") -class TestUnicodeWriter(unittest.TestCase): - def test_utf8(self): - output = six.StringIO() - writer = csv_py2.UnicodeWriter(output, encoding='utf-8') - self.assertEqual(writer._eight_bit, True) - writer.writerow(['a', 'b', 'c']) - writer.writerow(['1', '2', '3']) - writer.writerow(['4', '5', u'ʤ']) - - written = six.StringIO(output.getvalue()) - - reader = csv_py2.UnicodeReader(written, encoding='utf-8') - self.assertEqual(next(reader), ['a', 'b', 'c']) - self.assertEqual(next(reader), ['1', '2', '3']) - self.assertEqual(next(reader), ['4', '5', u'ʤ']) - - def test_latin1(self): - output = six.StringIO() - writer = csv_py2.UnicodeWriter(output, encoding='latin1') - self.assertEqual(writer._eight_bit, True) - writer.writerow(['a', 'b', 'c']) - writer.writerow(['1', '2', '3']) - writer.writerow(['4', '5', u'©']) - - written = six.StringIO(output.getvalue()) - - reader = csv_py2.UnicodeReader(written, encoding='latin1') - self.assertEqual(next(reader), ['a', 'b', 'c']) - self.assertEqual(next(reader), ['1', '2', '3']) - self.assertEqual(next(reader), ['4', '5', u'©']) - - def test_utf16_big(self): - output = six.StringIO() - writer = csv_py2.UnicodeWriter(output, encoding='utf-16-be') - self.assertEqual(writer._eight_bit, False) - writer.writerow(['a', 'b', 'c']) - writer.writerow(['1', '2', '3']) - writer.writerow(['4', '5', u'ʤ']) - - written = six.StringIO(output.getvalue()) - - reader = csv_py2.UnicodeReader(written, encoding='utf-16-be') - self.assertEqual(next(reader), ['a', 'b', 'c']) - self.assertEqual(next(reader), ['1', '2', '3']) - self.assertEqual(next(reader), ['4', '5', u'\u02A4']) - - def test_utf16_little(self): - output = six.StringIO() - writer = csv_py2.UnicodeWriter(output, encoding='utf-16-le') - self.assertEqual(writer._eight_bit, False) - writer.writerow(['a', 'b', 'c']) - writer.writerow(['1', '2', '3']) - writer.writerow(['4', '5', u'ʤ']) - - written = six.StringIO(output.getvalue()) - - reader = csv_py2.UnicodeReader(written, encoding='utf-16-le') - self.assertEqual(next(reader), ['a', 'b', 'c']) - self.assertEqual(next(reader), ['1', '2', '3']) - self.assertEqual(next(reader), ['4', '5', u'\u02A4']) - - -@unittest.skipIf(six.PY3, "Not supported in Python 3.") -class TestUnicodeDictReader(unittest.TestCase): - def setUp(self): - self.rows = [ - ['number', 'text', 'boolean', 'date', 'datetime', 'timedelta'], - ['1', 'a', 'True', '2015-11-04', '2015-11-04T12:22:00', '0:04:15'], - ['2', u'👍', 'False', '2015-11-05', '2015-11-04T12:45:00', '0:06:18'], - ['', 'b', '', '', '', ''] - ] - - self.f = open('examples/test.csv') - - def tearDown(self): - self.f.close() - - def test_reader(self): - reader = csv_py2.UnicodeDictReader(self.f, encoding='utf-8') - - self.assertEqual(next(reader), dict(zip(self.rows[0], self.rows[1]))) - - def test_latin1(self): - with open('examples/test_latin1.csv') as f: - reader = csv_py2.UnicodeDictReader(f, encoding='latin1') - self.assertEqual(next(reader), { - u'a': u'1', - u'b': u'2', - u'c': u'3' - }) - self.assertEqual(next(reader), { - u'a': u'4', - u'b': u'5', - u'c': u'©' - }) - - -@unittest.skipIf(six.PY3, "Not supported in Python 3.") -class TestUnicodeDictWriter(unittest.TestCase): - def setUp(self): - self.output = six.StringIO() - - def tearDown(self): - self.output.close() - - def test_writer(self): - writer = csv_py2.UnicodeDictWriter(self.output, ['a', 'b', 'c'], lineterminator='\n') - writer.writeheader() - writer.writerow({ - u'a': u'1', - u'b': u'2', - u'c': u'☃' - }) - - result = self.output.getvalue() - - self.assertEqual(result, 'a,b,c\n1,2,☃\n') - - -@unittest.skipIf(six.PY3, "Not supported in Python 3.") -class TestFieldSizeLimit(unittest.TestCase): - def setUp(self): - self.lim = csv.field_size_limit() - - with open('.test.csv', 'w') as f: - f.write('a' * 10) - - def tearDown(self): - # Resetting limit to avoid failure in other tests. - csv.field_size_limit(self.lim) - os.remove('.test.csv') - - def test_field_size_limit(self): - # Testing field_size_limit for failure. Creating data using str * int. - with open('.test.csv', 'r') as f: - c = csv_py2.UnicodeReader(f, field_size_limit=9) - try: - c.next() - except FieldSizeLimitError: - pass - else: - raise AssertionError('Expected FieldSizeLimitError') - - # Now testing higher field_size_limit. - with open('.test.csv', 'r') as f: - c = csv_py2.UnicodeReader(f, field_size_limit=11) - self.assertEqual(['a' * 10], c.next()) - - -@unittest.skipIf(six.PY3, "Not supported in Python 3.") -class TestReader(unittest.TestCase): - def setUp(self): - self.rows = [ - ['number', 'text', 'boolean', 'date', 'datetime', 'timedelta'], - ['1', 'a', 'True', '2015-11-04', '2015-11-04T12:22:00', '0:04:15'], - ['2', u'👍', 'False', '2015-11-05', '2015-11-04T12:45:00', '0:06:18'], - ['', 'b', '', '', '', ''] - ] - - def test_utf8(self): - with open('examples/test.csv') as f: - rows = list(csv_py2.Reader(f, encoding='utf-8')) - - for a, b in zip(self.rows, rows): - self.assertEqual(a, b) - - def test_reader_alias(self): - with open('examples/test.csv') as f: - rows = list(csv_py2.Reader(f, encoding='utf-8')) - - for a, b in zip(self.rows, rows): - self.assertEqual(a, b) - - def test_line_numbers(self): - with open('examples/test.csv') as f: - rows = list(csv_py2.Reader(f, encoding='utf-8', line_numbers=True)) - - sample_rows = [ - ['line_numbers', 'number', 'text', 'boolean', 'date', 'datetime', 'timedelta'], - ['1', '1', 'a', 'True', '2015-11-04', '2015-11-04T12:22:00', '0:04:15'], - ['2', '2', u'👍', 'False', '2015-11-05', '2015-11-04T12:45:00', '0:06:18'], - ['3', '', 'b', '', '', '', ''] - ] - - for a, b in zip(sample_rows, rows): - self.assertEqual(a, b) - - def test_properties(self): - with open('examples/test.csv') as f: - reader = csv_py2.Reader(f, encoding='utf-8') - - self.assertEqual(reader.dialect.delimiter, ',') - self.assertEqual(reader.line_num, 0) - - next(reader) - - self.assertEqual(reader.line_num, 1) - - -@unittest.skipIf(six.PY3, "Not supported in Python 3.") -class TestWriter(unittest.TestCase): - def test_utf8(self): - output = six.StringIO() - writer = csv_py2.Writer(output, encoding='utf-8') - self.assertEqual(writer._eight_bit, True) - writer.writerow(['a', 'b', 'c']) - writer.writerow(['1', '2', '3']) - writer.writerow(['4', '5', u'ʤ']) - - written = six.StringIO(output.getvalue()) - - reader = csv_py2.Reader(written, encoding='utf-8') - self.assertEqual(next(reader), ['a', 'b', 'c']) - self.assertEqual(next(reader), ['1', '2', '3']) - self.assertEqual(next(reader), ['4', '5', u'ʤ']) - - def test_writer_alias(self): - output = six.StringIO() - writer = csv_py2.writer(output, encoding='utf-8') - self.assertEqual(writer._eight_bit, True) - writer.writerow(['a', 'b', 'c']) - writer.writerow(['1', '2', '3']) - writer.writerow(['4', '5', u'ʤ']) - - written = six.StringIO(output.getvalue()) - - reader = csv_py2.reader(written, encoding='utf-8') - self.assertEqual(next(reader), ['a', 'b', 'c']) - self.assertEqual(next(reader), ['1', '2', '3']) - self.assertEqual(next(reader), ['4', '5', u'ʤ']) - - -@unittest.skipIf(six.PY3, "Not supported in Python 3.") -class TestDictReader(unittest.TestCase): - def setUp(self): - self.rows = [ - ['number', 'text', 'boolean', 'date', 'datetime', 'timedelta'], - ['1', 'a', 'True', '2015-11-04', '2015-11-04T12:22:00', '0:04:15'], - ['2', u'👍', 'False', '2015-11-05', '2015-11-04T12:45:00', '0:06:18'], - ['', 'b', '', '', '', ''] - ] - - self.f = open('examples/test.csv') - - def tearDown(self): - self.f.close() - - def test_reader(self): - reader = csv_py2.DictReader(self.f, encoding='utf-8') - - self.assertEqual(next(reader), dict(zip(self.rows[0], self.rows[1]))) - - def test_reader_alias(self): - reader = csv_py2.DictReader(self.f, encoding='utf-8') - - self.assertEqual(next(reader), dict(zip(self.rows[0], self.rows[1]))) - - -@unittest.skipIf(six.PY3, "Not supported in Python 3.") -class TestDictWriter(unittest.TestCase): - def setUp(self): - self.output = six.StringIO() - - def tearDown(self): - self.output.close() - - def test_writer(self): - writer = csv_py2.DictWriter(self.output, ['a', 'b', 'c']) - writer.writeheader() - writer.writerow({ - u'a': u'1', - u'b': u'2', - u'c': u'☃' - }) - - result = self.output.getvalue() - - self.assertEqual(result, 'a,b,c\n1,2,☃\n') - - def test_writer_alias(self): - writer = csv_py2.DictWriter(self.output, ['a', 'b', 'c']) - writer.writeheader() - writer.writerow({ - u'a': u'1', - u'b': u'2', - u'c': u'☃' - }) - - result = self.output.getvalue() - - self.assertEqual(result, 'a,b,c\n1,2,☃\n') - - -@unittest.skipIf(six.PY3, "Not supported in Python 3.") -class TestSniffer(unittest.TestCase): - def setUp(self): - pass - - def test_sniffer(self): - with open('examples/test.csv') as f: - contents = f.read() - self.assertEqual(csv_py2.Sniffer().sniff(contents).__dict__, csv.Sniffer().sniff(contents).__dict__) diff --git a/tests/test_py3.py b/tests/test_py3.py index 1a85d9c0..e087cbc8 100644 --- a/tests/test_py3.py +++ b/tests/test_py3.py @@ -1,20 +1,14 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- - import csv -import six import os - -try: - import unittest2 as unittest -except ImportError: - import unittest +import platform +import sys +import unittest +from io import StringIO from agate import csv_py3 from agate.exceptions import FieldSizeLimitError -@unittest.skipIf(six.PY2, "Not supported in Python 2.") class TestReader(unittest.TestCase): def setUp(self): self.rows = [ @@ -56,7 +50,7 @@ def test_line_numbers(self): sample_rows = [ ['line_numbers', 'number', 'text', 'boolean', 'date', 'datetime', 'timedelta'], ['1', '1', 'a', 'True', '2015-11-04', '2015-11-04T12:22:00', '0:04:15'], - ['2', '2', u'👍', 'False', '2015-11-05', '2015-11-04T12:45:00', '0:06:18'], + ['2', '2', '👍', 'False', '2015-11-05', '2015-11-04T12:45:00', '0:06:18'], ['3', '', 'b', '', '', '', ''] ] @@ -64,7 +58,6 @@ def test_line_numbers(self): self.assertEqual(a, b) -@unittest.skipIf(six.PY2, "Not supported in Python 2.") class TestFieldSizeLimit(unittest.TestCase): def setUp(self): self.lim = csv.field_size_limit() @@ -79,7 +72,7 @@ def tearDown(self): def test_field_size_limit(self): # Testing field_size_limit for failure. Creating data using str * int. - with open('.test.csv', 'r', encoding='utf-8') as f: + with open('.test.csv', encoding='utf-8') as f: c = csv_py3.Reader(f, field_size_limit=9) try: c.__next__() @@ -89,73 +82,71 @@ def test_field_size_limit(self): raise AssertionError('Expected FieldSizeLimitError') # Now testing higher field_size_limit. - with open('.test.csv', 'r', encoding='utf-8') as f: + with open('.test.csv', encoding='utf-8') as f: c = csv_py3.Reader(f, field_size_limit=11) self.assertEqual(['a' * 10], c.__next__()) -@unittest.skipIf(six.PY2, "Not supported in Python 2.") class TestWriter(unittest.TestCase): def test_utf8(self): - output = six.StringIO() + output = StringIO() writer = csv_py3.Writer(output) writer.writerow(['a', 'b', 'c']) writer.writerow(['1', '2', '3']) - writer.writerow(['4', '5', u'ʤ']) + writer.writerow(['4', '5', 'ʤ']) - written = six.StringIO(output.getvalue()) + written = StringIO(output.getvalue()) reader = csv_py3.Reader(written) self.assertEqual(next(reader), ['a', 'b', 'c']) self.assertEqual(next(reader), ['1', '2', '3']) - self.assertEqual(next(reader), ['4', '5', u'ʤ']) + self.assertEqual(next(reader), ['4', '5', 'ʤ']) def test_writer_alias(self): - output = six.StringIO() + output = StringIO() writer = csv_py3.writer(output) writer.writerow(['a', 'b', 'c']) writer.writerow(['1', '2', '3']) - writer.writerow(['4', '5', u'ʤ']) + writer.writerow(['4', '5', 'ʤ']) - written = six.StringIO(output.getvalue()) + written = StringIO(output.getvalue()) reader = csv_py3.reader(written) self.assertEqual(next(reader), ['a', 'b', 'c']) self.assertEqual(next(reader), ['1', '2', '3']) - self.assertEqual(next(reader), ['4', '5', u'ʤ']) + self.assertEqual(next(reader), ['4', '5', 'ʤ']) def test_line_numbers(self): - output = six.StringIO() + output = StringIO() writer = csv_py3.Writer(output, line_numbers=True) writer.writerow(['a', 'b', 'c']) writer.writerow(['1', '2', '3']) - writer.writerow(['4', '5', u'ʤ']) + writer.writerow(['4', '5', 'ʤ']) - written = six.StringIO(output.getvalue()) + written = StringIO(output.getvalue()) reader = csv_py3.Reader(written) self.assertEqual(next(reader), ['line_number', 'a', 'b', 'c']) self.assertEqual(next(reader), ['1', '1', '2', '3']) - self.assertEqual(next(reader), ['2', '4', '5', u'ʤ']) + self.assertEqual(next(reader), ['2', '4', '5', 'ʤ']) def test_writerows(self): - output = six.StringIO() + output = StringIO() writer = csv_py3.Writer(output) writer.writerows([ ['a', 'b', 'c'], ['1', '2', '3'], - ['4', '5', u'ʤ'] + ['4', '5', 'ʤ'] ]) - written = six.StringIO(output.getvalue()) + written = StringIO(output.getvalue()) reader = csv_py3.Reader(written) self.assertEqual(next(reader), ['a', 'b', 'c']) self.assertEqual(next(reader), ['1', '2', '3']) - self.assertEqual(next(reader), ['4', '5', u'ʤ']) + self.assertEqual(next(reader), ['4', '5', 'ʤ']) -@unittest.skipIf(six.PY2, "Not supported in Python 2.") class TestDictReader(unittest.TestCase): def setUp(self): self.rows = [ @@ -181,10 +172,9 @@ def test_reader_alias(self): self.assertEqual(next(reader), dict(zip(self.rows[0], self.rows[1]))) -@unittest.skipIf(six.PY2, "Not supported in Python 2.") class TestDictWriter(unittest.TestCase): def setUp(self): - self.output = six.StringIO() + self.output = StringIO() def tearDown(self): self.output.close() @@ -193,9 +183,9 @@ def test_writer(self): writer = csv_py3.DictWriter(self.output, ['a', 'b', 'c']) writer.writeheader() writer.writerow({ - u'a': u'1', - u'b': u'2', - u'c': u'☃' + 'a': '1', + 'b': '2', + 'c': '☃' }) result = self.output.getvalue() @@ -206,9 +196,9 @@ def test_writer_alias(self): writer = csv_py3.DictWriter(self.output, ['a', 'b', 'c']) writer.writeheader() writer.writerow({ - u'a': u'1', - u'b': u'2', - u'c': u'☃' + 'a': '1', + 'b': '2', + 'c': '☃' }) result = self.output.getvalue() @@ -219,9 +209,9 @@ def test_line_numbers(self): writer = csv_py3.DictWriter(self.output, ['a', 'b', 'c'], line_numbers=True) writer.writeheader() writer.writerow({ - u'a': u'1', - u'b': u'2', - u'c': u'☃' + 'a': '1', + 'b': '2', + 'c': '☃' }) result = self.output.getvalue() @@ -232,9 +222,9 @@ def test_writerows(self): writer = csv_py3.DictWriter(self.output, ['a', 'b', 'c'], line_numbers=True) writer.writeheader() writer.writerows([{ - u'a': u'1', - u'b': u'2', - u'c': u'☃' + 'a': '1', + 'b': '2', + 'c': '☃' }]) result = self.output.getvalue() @@ -242,12 +232,17 @@ def test_writerows(self): self.assertEqual(result, 'line_number,a,b,c\n1,1,2,☃\n') -@unittest.skipIf(six.PY2, "Not supported in Python 2.") class TestSniffer(unittest.TestCase): - def setUp(self): - pass - + @unittest.skipIf( + platform.system() == 'Darwin' and sys.version_info[:2] == (3, 10), + reason='The (macos-latest, 3.10) job fails on GitHub Actions' + ) def test_sniffer(self): with open('examples/test.csv', encoding='utf-8') as f: contents = f.read() - self.assertEqual(csv_py3.Sniffer().sniff(contents).__dict__, csv.Sniffer().sniff(contents).__dict__) + direct = csv.Sniffer().sniff(contents, csv_py3.POSSIBLE_DELIMITERS).__dict__ + actual = csv_py3.Sniffer().sniff(contents).__dict__ + expected = csv.Sniffer().sniff(contents).__dict__ + + self.assertEqual(direct, expected, f'{direct!r} != {expected!r}') + self.assertEqual(actual, expected, f'{actual!r} != {expected!r}') diff --git a/tests/test_table/__init__.py b/tests/test_table/__init__.py index f16aee42..ac9dacad 100644 --- a/tests/test_table/__init__.py +++ b/tests/test_table/__init__.py @@ -1,19 +1,10 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - -try: - from cdecimal import Decimal -except ImportError: # pragma: no cover - from decimal import Decimal - -import platform import warnings - -import six +from decimal import Decimal from agate import Table -from agate.data_types import * from agate.computations import Formula +from agate.data_types import Number, Text +from agate.exceptions import CastError from agate.testcase import AgateTestCase from agate.warns import DuplicateColumnWarning @@ -23,7 +14,7 @@ def setUp(self): self.rows = ( (1, 4, 'a'), (2, 3, 'b'), - (None, 2, u'👍') + (None, 2, '👍') ) self.number_type = Number() @@ -46,7 +37,7 @@ def test_create_table(self): def test_create_filename(self): with self.assertRaises(ValueError): - table = Table('foo.csv') # noqa + Table('foo.csv') def test_create_empty_table(self): table = Table([]) @@ -75,7 +66,7 @@ def test_create_table_column_types(self): self.assertRows(table, [ (1, '4', 'a'), (2, '3', 'b'), - (None, '2', u'👍') + (None, '2', '👍') ]) def test_create_table_column_types_dict(self): @@ -118,7 +109,7 @@ def test_create_table_cast_error(self): column_types = [self.number_type, self.number_type, self.number_type] with self.assertRaises(CastError) as e: - table = Table(self.rows, self.column_names, column_types) # noqa + Table(self.rows, self.column_names, column_types) self.assertIn('Error at row 0 column three.', str(e.exception)) @@ -126,31 +117,31 @@ def test_create_table_null_column_names(self): column_names = ['one', None, 'three'] with self.assertWarns(RuntimeWarning): - table2 = Table(self.rows, column_names, self.column_types) # noqa + Table(self.rows, column_names, self.column_types) warnings.simplefilter('ignore') try: - table3 = Table(self.rows, column_names, self.column_types) + table = Table(self.rows, column_names, self.column_types) finally: warnings.resetwarnings() - self.assertColumnNames(table3, ['one', 'b', 'three']) + self.assertColumnNames(table, ['one', 'b', 'three']) def test_create_table_empty_column_names(self): column_names = ['one', '', 'three'] with self.assertWarns(RuntimeWarning): - table2 = Table(self.rows, column_names, self.column_types) # noqa + Table(self.rows, column_names, self.column_types) warnings.simplefilter('ignore') try: - table3 = Table(self.rows, column_names, self.column_types) + table = Table(self.rows, column_names, self.column_types) finally: warnings.resetwarnings() - self.assertColumnNames(table3, ['one', 'b', 'three']) + self.assertColumnNames(table, ['one', 'b', 'three']) def test_create_table_non_datatype_columns(self): column_types = [self.number_type, self.number_type, 'foo'] @@ -232,8 +223,8 @@ def test_create_table_no_column_names(self): with self.assertRaises(KeyError): table.columns['one'] - self.assertSequenceEqual(table.columns[2], ('a', 'b', u'👍')) - self.assertSequenceEqual(table.columns['c'], ('a', 'b', u'👍')) + self.assertSequenceEqual(table.columns[2], ('a', 'b', '👍')) + self.assertSequenceEqual(table.columns['c'], ('a', 'b', '👍')) with self.assertRaises(KeyError): table.columns[''] @@ -253,12 +244,12 @@ def test_row_too_long(self): ) with self.assertRaises(ValueError): - table = Table(rows, self.column_names, self.column_types) # noqa + Table(rows, self.column_names, self.column_types) def test_row_names(self): table = Table(self.rows, self.column_names, self.column_types, row_names='three') - self.assertRowNames(table, ['a', 'b', u'👍']) + self.assertRowNames(table, ['a', 'b', '👍']) def test_row_names_non_string(self): table = Table(self.rows, self.column_names, self.column_types, row_names=[Decimal('2'), True, None]) @@ -270,11 +261,11 @@ def test_row_names_non_string(self): ]) self.assertSequenceEqual(table.rows[Decimal('2')], (1, 4, 'a')) self.assertSequenceEqual(table.rows[True], (2, 3, 'b')) - self.assertSequenceEqual(table.rows[None], (None, 2, u'👍')) + self.assertSequenceEqual(table.rows[None], (None, 2, '👍')) def test_row_names_int(self): with self.assertRaises(ValueError): - table = Table(self.rows, self.column_names, self.column_types, row_names=['a', 'b', 3]) # noqa + Table(self.rows, self.column_names, self.column_types, row_names=['a', 'b', 3]) def test_row_names_func(self): table = Table(self.rows, self.column_names, self.column_types, row_names=lambda r: (r['one'], r['three'])) @@ -282,16 +273,16 @@ def test_row_names_func(self): self.assertSequenceEqual(table.row_names, [ (Decimal('1'), 'a'), (Decimal('2'), 'b'), - (None, u'👍') + (None, '👍') ]) self.assertSequenceEqual(table.rows[(Decimal('1'), 'a')], (1, 4, 'a')) self.assertSequenceEqual(table.rows[(Decimal('2'), 'b')], (2, 3, 'b')) - self.assertSequenceEqual(table.rows[(None, u'👍')], (None, 2, u'👍')) + self.assertSequenceEqual(table.rows[(None, '👍')], (None, 2, '👍')) def test_row_names_invalid(self): with self.assertRaises(ValueError): - table = Table( # noqa + Table( self.rows, self.column_names, self.column_types, @@ -299,28 +290,15 @@ def test_row_names_invalid(self): ) def test_stringify(self): - column_names = ['foo', 'bar', u'👍'] + column_names = ['foo', 'bar', '👍'] table = Table(self.rows, column_names) - if six.PY2: - u = unicode(table) - - self.assertIn('foo', u) - self.assertIn('bar', u) - self.assertIn(u'👍', u) - - s = str(table) - - self.assertIn('foo', s) - self.assertIn('bar', s) - self.assertIn(u'👍'.encode('utf-8'), s) - else: - u = str(table) + u = str(table) - self.assertIn('foo', u) - self.assertIn('bar', u) - self.assertIn(u'👍', u) + self.assertIn('foo', u) + self.assertIn('bar', u) + self.assertIn('👍', u) def test_str(self): warnings.simplefilter('ignore') @@ -379,7 +357,7 @@ def test_select(self): self.assertRows(new_table, [ [4, 'a'], [3, 'b'], - [2, u'👍'] + [2, '👍'] ]) def test_select_single(self): @@ -391,14 +369,14 @@ def test_select_single(self): self.assertRows(new_table, [ ['a'], ['b'], - [u'👍'] + ['👍'] ]) def test_select_with_row_names(self): table = Table(self.rows, self.column_names, self.column_types, row_names='three') new_table = table.select(('three',)) - self.assertRowNames(new_table, ['a', 'b', u'👍']) + self.assertRowNames(new_table, ['a', 'b', '👍']) def test_select_does_not_exist(self): table = Table(self.rows, self.column_names, self.column_types) @@ -418,7 +396,7 @@ def test_exclude(self): self.assertRows(new_table, [ ['a'], ['b'], - [u'👍'] + ['👍'] ]) def test_exclude_single(self): @@ -433,14 +411,14 @@ def test_exclude_single(self): self.assertRows(new_table, [ [4, 'a'], [3, 'b'], - [2, u'👍'] + [2, '👍'] ]) def test_exclude_with_row_names(self): table = Table(self.rows, self.column_names, self.column_types, row_names='three') new_table = table.exclude(('one', 'two')) - self.assertRowNames(new_table, ['a', 'b', u'👍']) + self.assertRowNames(new_table, ['a', 'b', '👍']) def test_where(self): table = Table(self.rows, self.column_names, self.column_types) @@ -460,7 +438,7 @@ def test_where_with_row_names(self): table = Table(self.rows, self.column_names, self.column_types, row_names='three') new_table = table.where(lambda r: r['one'] in (2, None)) - self.assertRowNames(new_table, ['b', u'👍']) + self.assertRowNames(new_table, ['b', '👍']) def test_find(self): table = Table(self.rows, self.column_names, self.column_types) diff --git a/tests/test_table/test_aggregate.py b/tests/test_table/test_aggregate.py index a7ad6d0b..a8f7a180 100644 --- a/tests/test_table/test_aggregate.py +++ b/tests/test_table/test_aggregate.py @@ -1,6 +1,3 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - from agate import Table from agate.aggregations import Count, Sum from agate.data_types import Number, Text @@ -12,7 +9,7 @@ def setUp(self): self.rows = ( (1, 4, 'a'), (2, 3, 'b'), - (None, 2, u'👍') + (None, 2, '👍') ) self.number_type = Number() diff --git a/tests/test_table/test_bins.py b/tests/test_table/test_bins.py index 4245d3ee..3f2f68eb 100644 --- a/tests/test_table/test_bins.py +++ b/tests/test_table/test_bins.py @@ -1,14 +1,9 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- +from decimal import Decimal from babel.numbers import get_decimal_symbol -try: - from cdecimal import Decimal -except ImportError: # pragma: no cover - from decimal import Decimal from agate import Table -from agate.data_types import * +from agate.data_types import Number, Text from agate.testcase import AgateTestCase @@ -111,9 +106,18 @@ def test_bins_decimals(self): self.assertColumnNames(new_table, ['number', 'Count']) self.assertColumnTypes(new_table, [Text, Number]) - self.assertSequenceEqual(new_table.rows[0], [u'[0' + get_decimal_symbol() + u'0 - 0' + get_decimal_symbol() + u'1)', 10]) - self.assertSequenceEqual(new_table.rows[3], [u'[0' + get_decimal_symbol() + u'3 - 0' + get_decimal_symbol() + u'4)', 10]) - self.assertSequenceEqual(new_table.rows[9], [u'[0' + get_decimal_symbol() + u'9 - 1' + get_decimal_symbol() + u'0]', 10]) + self.assertSequenceEqual( + new_table.rows[0], + ['[0' + get_decimal_symbol() + '0 - 0' + get_decimal_symbol() + '1)', 10] + ) + self.assertSequenceEqual( + new_table.rows[3], + ['[0' + get_decimal_symbol() + '3 - 0' + get_decimal_symbol() + '4)', 10] + ) + self.assertSequenceEqual( + new_table.rows[9], + ['[0' + get_decimal_symbol() + '9 - 1' + get_decimal_symbol() + '0]', 10] + ) def test_bins_nulls(self): rows = [] @@ -128,7 +132,16 @@ def test_bins_nulls(self): self.assertColumnNames(new_table, ['number', 'Count']) self.assertColumnTypes(new_table, [Text, Number]) - self.assertSequenceEqual(new_table.rows[0], [u'[0' + get_decimal_symbol() + u'0 - 0' + get_decimal_symbol() + u'1)', 10]) - self.assertSequenceEqual(new_table.rows[3], [u'[0' + get_decimal_symbol() + u'3 - 0' + get_decimal_symbol() + u'4)', 10]) - self.assertSequenceEqual(new_table.rows[9], [u'[0' + get_decimal_symbol() + u'9 - 1' + get_decimal_symbol() + u'0]', 10]) + self.assertSequenceEqual( + new_table.rows[0], + ['[0' + get_decimal_symbol() + '0 - 0' + get_decimal_symbol() + '1)', 10] + ) + self.assertSequenceEqual( + new_table.rows[3], + ['[0' + get_decimal_symbol() + '3 - 0' + get_decimal_symbol() + '4)', 10] + ) + self.assertSequenceEqual( + new_table.rows[9], + ['[0' + get_decimal_symbol() + '9 - 1' + get_decimal_symbol() + '0]', 10] + ) self.assertSequenceEqual(new_table.rows[10], [None, 1]) diff --git a/tests/test_table/test_charting.py b/tests/test_table/test_charting.py index 9238dba1..9656f80c 100644 --- a/tests/test_table/test_charting.py +++ b/tests/test_table/test_charting.py @@ -1,11 +1,3 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - -try: - import unittest2 as unittest -except ImportError: - import unittest - import leather from agate import Table @@ -17,7 +9,7 @@ def setUp(self): self.rows = ( (1, 4, 'a'), (2, 3, 'b'), - (None, 2, u'👍') + (None, 2, '👍') ) self.number_type = Number() diff --git a/tests/test_table/test_compute.py b/tests/test_table/test_compute.py index 62589fe1..976ee2dd 100644 --- a/tests/test_table/test_compute.py +++ b/tests/test_table/test_compute.py @@ -1,11 +1,6 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - -import six - from agate import Table -from agate.data_types import * from agate.computations import Formula +from agate.data_types import Number, Text from agate.testcase import AgateTestCase @@ -45,7 +40,7 @@ def test_compute(self): def test_compute_multiple(self): new_table = self.table.compute([ ('number', Formula(self.number_type, lambda r: r['two'] + r['three'])), - ('text', Formula(self.text_type, lambda r: (r['one'] or '-') + six.text_type(r['three']))) + ('text', Formula(self.text_type, lambda r: (r['one'] or '-') + str(r['three']))) ]) self.assertIsNot(new_table, self.table) @@ -61,7 +56,7 @@ def test_compute_with_row_names(self): new_table = table.compute([ ('number', Formula(self.number_type, lambda r: r['two'] + r['three'])), - ('text', Formula(self.text_type, lambda r: (r['one'] or '-') + six.text_type(r['three']))) + ('text', Formula(self.text_type, lambda r: (r['one'] or '-') + str(r['three']))) ]) self.assertRowNames(new_table, [3, 5, 4, 6]) diff --git a/tests/test_table/test_denormalize.py b/tests/test_table/test_denormalize.py index cedf8e15..05d80564 100644 --- a/tests/test_table/test_denormalize.py +++ b/tests/test_table/test_denormalize.py @@ -1,10 +1,7 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - from agate import Table -from agate.data_types import * -from agate.type_tester import TypeTester +from agate.data_types import Number, Text from agate.testcase import AgateTestCase +from agate.type_tester import TypeTester class TestDenormalize(AgateTestCase): @@ -98,7 +95,8 @@ def test_denormalize_column_types(self): def test_denormalize_column_type_tester(self): table = Table(self.rows, self.column_names, self.column_types) - normalized_table = table.denormalize(None, 'property', 'value', column_types=TypeTester(force={'gender': Text()})) + type_tester = TypeTester(force={'gender': Text()}) + normalized_table = table.denormalize(None, 'property', 'value', column_types=type_tester) # NB: value has been overwritten normal_rows = ( diff --git a/tests/test_table/test_from_csv.py b/tests/test_table/test_from_csv.py index 560813d8..d4cbc693 100644 --- a/tests/test_table/test_from_csv.py +++ b/tests/test_table/test_from_csv.py @@ -1,14 +1,8 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - -import io import warnings -import six - from agate import Table +from agate.data_types import Boolean, Date, DateTime, Number, Text, TimeDelta from agate.testcase import AgateTestCase -from agate.data_types import * from agate.type_tester import TypeTester @@ -16,7 +10,7 @@ class TestFromCSV(AgateTestCase): def setUp(self): self.rows = ( (1, 'a', True, '11/4/2015', '11/4/2015 12:22 PM', '4:15'), - (2, u'👍', False, '11/5/2015', '11/4/2015 12:45 PM', '6:18'), + (2, '👍', False, '11/5/2015', '11/4/2015 12:45 PM', '6:18'), (None, 'b', None, None, None, None) ) @@ -58,10 +52,7 @@ def test_from_csv_cr(self): def test_from_csv_file_like_object(self): table1 = Table(self.rows, self.column_names, self.column_types) - if six.PY2: - f = open('examples/test.csv', 'rb') - else: - f = io.open('examples/test.csv', encoding='utf-8') + f = open('examples/test.csv', encoding='utf-8') table2 = Table.from_csv(f) f.close() @@ -105,7 +96,6 @@ def test_from_csv_no_header_columns(self): self.assertColumnTypes(table, [Number, Text, Boolean, Date, DateTime, TimeDelta]) def test_from_csv_sniff_limit_0(self): - table1 = Table(self.rows, self.column_names, self.column_types) table2 = Table.from_csv('examples/test_csv_sniff.csv', sniff_limit=0) self.assertColumnNames(table2, ['number|text|boolean|date|datetime|timedelta']) @@ -170,3 +160,30 @@ def test_from_csv_skip_lines_cr(self): self.assertColumnTypes(table2, [Number, Text, Boolean, Date, DateTime, TimeDelta]) self.assertRows(table2, table1.rows) + + def test_from_csv_row_limit(self): + table1 = Table(self.rows[:2], self.column_names, self.column_types) + table2 = Table.from_csv('examples/test.csv', row_limit=2) + + self.assertColumnNames(table2, table1.column_names) + self.assertColumnTypes(table2, [Number, Text, Boolean, Date, DateTime, TimeDelta]) + + self.assertRows(table2, table1.rows) + + def test_from_csv_row_limit_no_header_columns(self): + table1 = Table(self.rows[:2], self.column_names, self.column_types) + table2 = Table.from_csv('examples/test_no_header.csv', self.column_names, header=False, row_limit=2) + + self.assertColumnNames(table2, table1.column_names) + self.assertColumnTypes(table2, [Number, Text, Boolean, Date, DateTime, TimeDelta]) + + self.assertRows(table2, table1.rows) + + def test_from_csv_row_limit_too_high(self): + table1 = Table(self.rows, self.column_names, self.column_types) + table2 = Table.from_csv('examples/test.csv', row_limit=200) + + self.assertColumnNames(table2, table1.column_names) + self.assertColumnTypes(table2, [Number, Text, Boolean, Date, DateTime, TimeDelta]) + + self.assertRows(table2, table1.rows) diff --git a/tests/test_table/test_from_fixed.py b/tests/test_table/test_from_fixed.py index 914fb72c..a546125a 100644 --- a/tests/test_table/test_from_fixed.py +++ b/tests/test_table/test_from_fixed.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from agate import Table from agate.testcase import AgateTestCase diff --git a/tests/test_table/test_group_by.py b/tests/test_table/test_group_by.py index d3c8f95e..cb51ffd4 100644 --- a/tests/test_table/test_group_by.py +++ b/tests/test_table/test_group_by.py @@ -1,13 +1,7 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - -try: - from cdecimal import Decimal -except ImportError: # pragma: no cover - from decimal import Decimal +from decimal import Decimal from agate import Table, TableSet -from agate.data_types import * +from agate.data_types import Boolean, Number, Text from agate.testcase import AgateTestCase diff --git a/tests/test_table/test_homogenize.py b/tests/test_table/test_homogenize.py index 9c9ad7b3..fb2ab751 100644 --- a/tests/test_table/test_homogenize.py +++ b/tests/test_table/test_homogenize.py @@ -1,9 +1,5 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - -from six.moves import range from agate import Table -from agate.data_types import * +from agate.data_types import Number, Text from agate.testcase import AgateTestCase @@ -32,6 +28,8 @@ def test_homogenize_column_name(self): (2, 3, 'd') ) + homogenized.print_table() + self.assertColumnNames(homogenized, self.column_names) self.assertColumnTypes(homogenized, [Number, Number, Text]) self.assertRows(homogenized, rows) @@ -47,6 +45,8 @@ def test_homogenize_default_row(self): (2, None, None) ) + homogenized.print_table() + self.assertColumnNames(homogenized, self.column_names) self.assertColumnTypes(homogenized, [Number, Number, Text]) self.assertRows(homogenized, rows) @@ -65,6 +65,8 @@ def column_two(count): (2, 5, 'c') ) + homogenized.print_table() + self.assertColumnNames(homogenized, self.column_names) self.assertColumnTypes(homogenized, [Number, Number, Text]) self.assertRows(homogenized, rows) @@ -86,6 +88,8 @@ def column_two(count): (2, 4, 'c') ) + homogenized.print_table() + self.assertColumnNames(homogenized, self.column_names) self.assertColumnTypes(homogenized, [Number, Number, Text]) self.assertRows(homogenized, rows) diff --git a/tests/test_table/test_join.py b/tests/test_table/test_join.py index 1201e06b..2d6d7878 100644 --- a/tests/test_table/test_join.py +++ b/tests/test_table/test_join.py @@ -1,8 +1,5 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - from agate import Table -from agate.data_types import * +from agate.data_types import Number, Text from agate.testcase import AgateTestCase @@ -213,12 +210,12 @@ def test_join_with_row_names(self): def test_join_require_match(self): with self.assertRaises(ValueError): - new_table = self.left.join(self.right, 'one', 'five', require_match=True) # noqa + self.left.join(self.right, 'one', 'five', require_match=True) with self.assertRaises(ValueError): - new_table = self.left.join(self.right, 'one', 'five', require_match=True) # noqa + self.left.join(self.right, 'one', 'five', require_match=True) - new_table = self.left.join(self.right, 'one', 'four', require_match=True) # noqa + self.left.join(self.right, 'one', 'four', require_match=True) def test_join_columns_kwarg(self): new_table = self.left.join(self.right, 'one', 'four', columns=['six']) diff --git a/tests/test_table/test_merge.py b/tests/test_table/test_merge.py index 2e2fff23..f23f9f50 100644 --- a/tests/test_table/test_merge.py +++ b/tests/test_table/test_merge.py @@ -1,10 +1,7 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - from agate import Table -from agate.data_types import * -from agate.testcase import AgateTestCase +from agate.data_types import Number, Text from agate.exceptions import DataTypeError +from agate.testcase import AgateTestCase class TestMerge(AgateTestCase): @@ -76,7 +73,7 @@ def test_merge_different_types(self): table_b = Table(self.rows, self.column_names, column_types) with self.assertRaises(DataTypeError): - table_c = Table.merge([table_a, table_b]) # noqa + Table.merge([table_a, table_b]) def test_merge_with_row_names(self): table_a = Table(self.rows, self.column_names, self.column_types, row_names='three') diff --git a/tests/test_table/test_normalize.py b/tests/test_table/test_normalize.py index 68f2f742..a7bab9c2 100644 --- a/tests/test_table/test_normalize.py +++ b/tests/test_table/test_normalize.py @@ -1,10 +1,7 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - from agate import Table -from agate.data_types import * -from agate.type_tester import TypeTester +from agate.data_types import Number, Text from agate.testcase import AgateTestCase +from agate.type_tester import TypeTester class TestNormalize(AgateTestCase): diff --git a/tests/test_table/test_order_py.py b/tests/test_table/test_order_py.py index 4f0d1c8d..c6bbaa7d 100644 --- a/tests/test_table/test_order_py.py +++ b/tests/test_table/test_order_py.py @@ -1,9 +1,6 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - from agate import Table +from agate.data_types import Number, Text from agate.testcase import AgateTestCase -from agate.data_types import * class TestOrderBy(AgateTestCase): @@ -11,7 +8,7 @@ def setUp(self): self.rows = ( (1, 4, 'a'), (2, 3, 'b'), - (None, 2, u'👍') + (None, 2, '👍') ) self.number_type = Number() @@ -142,9 +139,8 @@ def test_order_by_with_row_names(self): table = Table(self.rows, self.column_names, self.column_types, row_names='three') new_table = table.order_by('two') - self.assertRowNames(new_table, [u'👍', 'b', 'a']) + self.assertRowNames(new_table, ['👍', 'b', 'a']) def test_order_by_empty_table(self): table = Table([], self.column_names) - - new_table = table.order_by('three') # noqa + table.order_by('three') diff --git a/tests/test_table/test_pivot.py b/tests/test_table/test_pivot.py index 68c51521..c401fa93 100644 --- a/tests/test_table/test_pivot.py +++ b/tests/test_table/test_pivot.py @@ -1,17 +1,10 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - import sys - -try: - from cdecimal import Decimal -except ImportError: # pragma: no cover - from decimal import Decimal +from decimal import Decimal from agate import Table from agate.aggregations import Sum from agate.computations import Percent -from agate.data_types import * +from agate.data_types import Number, Text from agate.testcase import AgateTestCase @@ -83,7 +76,7 @@ def test_pivot_by_lambda_group_name_sequence_invalid(self): table = Table(self.rows, self.column_names, self.column_types) with self.assertRaises(ValueError): - pivot_table = table.pivot(['race', 'gender'], key_name='foo') # noqa + table.pivot(['race', 'gender'], key_name='foo') def test_pivot_no_key(self): table = Table(self.rows, self.column_names, self.column_types) diff --git a/tests/test_table/test_print_bars.py b/tests/test_table/test_print_bars.py index 5026280f..50ab0dfa 100644 --- a/tests/test_table/test_print_bars.py +++ b/tests/test_table/test_print_bars.py @@ -1,13 +1,11 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- +from io import StringIO from babel.numbers import format_decimal -import six from agate import Table -from agate.data_types import * -from agate.testcase import AgateTestCase +from agate.data_types import Number, Text from agate.exceptions import DataTypeError +from agate.testcase import AgateTestCase class TestPrintBars(AgateTestCase): @@ -19,7 +17,7 @@ def setUp(self): ) self.number_type = Number() - self.international_number_type = Number(locale='de_DE') + self.international_number_type = Number(locale='de_DE.UTF-8') self.text_type = Text() self.column_names = ['one', 'two', 'three'] @@ -32,27 +30,27 @@ def setUp(self): def test_print_bars(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.print_bars('three', 'one', output=output) - lines = output.getvalue().split('\n') # noqa + output.getvalue().split('\n') def test_print_bars_width(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.print_bars('three', 'one', width=40, output=output) lines = output.getvalue().split('\n') - self.assertEqual(max([len(l) for l in lines]), 40) + self.assertEqual(max([len(line) for line in lines]), 40) def test_print_bars_width_overlap(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.print_bars('three', 'one', width=20, output=output) lines = output.getvalue().split('\n') - self.assertEqual(max([len(l) for l in lines]), 20) + self.assertEqual(max([len(line) for line in lines]), 20) def test_print_bars_domain(self): table = Table(self.rows, self.column_names, self.column_types) @@ -94,13 +92,13 @@ def test_print_bars_invalid_values(self): def test_print_bars_with_nulls(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.print_bars('three', 'two', width=20, printable=True, output=output) self.assertEqual(output.getvalue(), "three two\n" - "a " + format_decimal(2000, format=u'#,##0') + " |:::::::\n" + "a " + format_decimal(2000, format='#,##0') + " |:::::::\n" "None - | \n" "c 1 | \n" " +------+\n" - " 0 " + format_decimal(2000, format=u'#,##0') + "\n") + " 0 " + format_decimal(2000, format='#,##0') + "\n") diff --git a/tests/test_table/test_print_html.py b/tests/test_table/test_print_html.py index 7884b7cd..10833d08 100644 --- a/tests/test_table/test_print_html.py +++ b/tests/test_table/test_print_html.py @@ -1,25 +1,22 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - import warnings - -import six -from six.moves import html_parser +from html.parser import HTMLParser +from io import StringIO from agate import Table from agate.data_types import Number, Text from agate.testcase import AgateTestCase -class TableHTMLParser(html_parser.HTMLParser): +class TableHTMLParser(HTMLParser): """ Parser for use in testing HTML rendering of tables. """ + def __init__(self): warnings.simplefilter('ignore') try: - html_parser.HTMLParser.__init__(self) + HTMLParser.__init__(self) finally: warnings.resetwarnings() @@ -99,7 +96,7 @@ def setUp(self): self.rows = ( (1, 4, 'a'), (2, 3, 'b'), - (None, 2, u'👍') + (None, 2, '👍') ) self.number_type = Number() @@ -110,7 +107,7 @@ def setUp(self): def test_print_html(self): table = Table(self.rows, self.column_names, self.column_types) - table_html = six.StringIO() + table_html = StringIO() table.print_html(output=table_html) table_html = table_html.getvalue() @@ -138,7 +135,7 @@ def test_print_html(self): def test_print_html_tags(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.print_html(output=output) html = output.getvalue() @@ -149,7 +146,7 @@ def test_print_html_tags(self): def test_print_html_max_rows(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.print_html(max_rows=2, output=output) html = output.getvalue() @@ -160,7 +157,7 @@ def test_print_html_max_rows(self): def test_print_html_max_columns(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.print_html(max_columns=2, output=output) html = output.getvalue() diff --git a/tests/test_table/test_print_structure.py b/tests/test_table/test_print_structure.py index 75f4ba90..4966ddcb 100644 --- a/tests/test_table/test_print_structure.py +++ b/tests/test_table/test_print_structure.py @@ -1,10 +1,9 @@ # !/usr/bin/env python -# -*- coding: utf8 -*- -import six +from io import StringIO from agate import Table -from agate.data_types import * +from agate.data_types import Number, Text from agate.testcase import AgateTestCase @@ -17,7 +16,7 @@ def setUp(self): ) self.number_type = Number() - self.international_number_type = Number(locale='de_DE') + self.international_number_type = Number(locale='de_DE.UTF-8') self.text_type = Text() self.column_names = ['one', 'two', 'three'] @@ -30,7 +29,7 @@ def setUp(self): def test_print_structure(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.print_structure(output=output) lines = output.getvalue().strip().split('\n') diff --git a/tests/test_table/test_print_table.py b/tests/test_table/test_print_table.py index 7efc3079..e0cb9b51 100644 --- a/tests/test_table/test_print_table.py +++ b/tests/test_table/test_print_table.py @@ -1,11 +1,9 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- +from io import StringIO from babel.numbers import get_decimal_symbol -import six from agate import Table -from agate.data_types import * +from agate.data_types import Number, Text from agate.testcase import AgateTestCase @@ -19,7 +17,7 @@ def setUp(self): self.number_type = Number() self.american_number_type = Number(locale='en_US') - self.german_number_type = Number(locale='de_DE') + self.german_number_type = Number(locale='de_DE.UTF-8') self.text_type = Text() self.column_names = ['one', 'two', 'three', 'four'] @@ -33,7 +31,7 @@ def setUp(self): def test_print_table(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.print_table(output=output) lines = output.getvalue().split('\n') @@ -43,7 +41,7 @@ def test_print_table(self): def test_print_table_max_rows(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.print_table(max_rows=2, output=output) lines = output.getvalue().split('\n') @@ -53,7 +51,7 @@ def test_print_table_max_rows(self): def test_print_table_max_columns(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.print_table(max_columns=2, output=output) lines = output.getvalue().split('\n') @@ -75,22 +73,22 @@ def test_print_table_max_precision(self): ] table = Table(rows, column_names, column_types) - output = six.StringIO() + output = StringIO() table.print_table(output=output, max_precision=2) lines = output.getvalue().split('\n') # Text shouldn't be affected - self.assertIn(u' 1.745 ', lines[2]) - self.assertIn(u' 11.123456 ', lines[3]) - self.assertIn(u' 0 ', lines[4]) + self.assertIn(' 1.745 ', lines[2]) + self.assertIn(' 11.123456 ', lines[3]) + self.assertIn(' 0 ', lines[4]) # Test real precision above max - self.assertIn(u' 1' + get_decimal_symbol() + u'74… ', lines[2]) - self.assertIn(u' 11' + get_decimal_symbol() + u'12… ', lines[3]) - self.assertIn(u' 0' + get_decimal_symbol() + u'00… ', lines[4]) + self.assertIn(' 1' + get_decimal_symbol() + '74… ', lines[2]) + self.assertIn(' 11' + get_decimal_symbol() + '12… ', lines[3]) + self.assertIn(' 0' + get_decimal_symbol() + '00… ', lines[4]) # Test real precision below max - self.assertIn(u' 1' + get_decimal_symbol() + u'72 ', lines[2]) - self.assertIn(u' 5' + get_decimal_symbol() + u'10 ', lines[3]) - self.assertIn(u' 0' + get_decimal_symbol() + u'10 ', lines[4]) + self.assertIn(' 1' + get_decimal_symbol() + '72 ', lines[2]) + self.assertIn(' 5' + get_decimal_symbol() + '10 ', lines[3]) + self.assertIn(' 0' + get_decimal_symbol() + '10 ', lines[4]) def test_print_table_max_column_width(self): rows = ( @@ -102,7 +100,7 @@ def test_print_table_max_column_width(self): column_names = ['one', 'two', 'three', 'also, this is long'] table = Table(rows, column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.print_table(output=output, max_column_width=7) lines = output.getvalue().split('\n') @@ -117,7 +115,7 @@ def test_print_table_locale_american(self): """ table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.print_table(max_columns=2, output=output, locale='en_US') # If it's working, 2000 should appear as the english '2,000' self.assertTrue("2,000" in output.getvalue()) @@ -129,7 +127,7 @@ def test_print_table_locale_german(self): """ table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() - table.print_table(max_columns=2, output=output, locale='de_DE') + output = StringIO() + table.print_table(max_columns=2, output=output, locale='de_DE.UTF-8') # If it's working, the english '2,000' should appear as '2.000' self.assertTrue("2.000" in output.getvalue()) diff --git a/tests/test_table/test_rename.py b/tests/test_table/test_rename.py index 859d91ea..16f71b2c 100644 --- a/tests/test_table/test_rename.py +++ b/tests/test_table/test_rename.py @@ -1,11 +1,8 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - import warnings from agate import Table +from agate.data_types import Number, Text from agate.testcase import AgateTestCase -from agate.data_types import * class TestRename(AgateTestCase): @@ -93,7 +90,7 @@ def test_rename_slugify_rows(self): self.assertRowNames(table3, ['test.koz', 'test.2', 'test.2.2']) def test_rename_slugify_columns_in_place(self): - column_names = [u'Test kož', 'test 2', 'test 2'] + column_names = ['Test kož', 'test 2', 'test 2'] warnings.simplefilter('ignore') @@ -105,7 +102,7 @@ def test_rename_slugify_columns_in_place(self): table2 = table.rename(slug_columns=True) table3 = table.rename(slug_columns=True, separator='.') - self.assertColumnNames(table, [u'Test kož', 'test 2', 'test 2_2']) + self.assertColumnNames(table, ['Test kož', 'test 2', 'test 2_2']) self.assertColumnNames(table2, ['test_koz', 'test_2', 'test_2_2']) self.assertColumnNames(table3, ['test.koz', 'test.2', 'test.2.2']) diff --git a/tests/test_table/test_to_csv.py b/tests/test_table/test_to_csv.py index 9939bf49..766d82c9 100644 --- a/tests/test_table/test_to_csv.py +++ b/tests/test_table/test_to_csv.py @@ -1,21 +1,17 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - -import six - import os import sys +from io import StringIO from agate import Table +from agate.data_types import Boolean, Date, DateTime, Number, Text, TimeDelta from agate.testcase import AgateTestCase -from agate.data_types import * class TestToCSV(AgateTestCase): def setUp(self): self.rows = ( (1, 'a', True, '11/4/2015', '11/4/2015 12:22 PM', '4:15'), - (2, u'👍', False, '11/5/2015', '11/4/2015 12:45 PM', '6:18'), + (2, '👍', False, '11/5/2015', '11/4/2015 12:45 PM', '6:18'), (None, 'b', None, None, None, None) ) @@ -64,7 +60,7 @@ def test_to_csv_file_like_object(self): def test_to_csv_to_stdout(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.to_csv(output) contents1 = output.getvalue() @@ -94,7 +90,7 @@ def test_print_csv(self): table = Table(self.rows, self.column_names, self.column_types) old = sys.stdout - sys.stdout = six.StringIO() + sys.stdout = StringIO() try: table.print_csv() diff --git a/tests/test_table/test_to_json.py b/tests/test_table/test_to_json.py index 376afc09..ad023a44 100644 --- a/tests/test_table/test_to_json.py +++ b/tests/test_table/test_to_json.py @@ -1,21 +1,18 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - +import json import os import sys -import six -import json +from io import StringIO from agate import Table +from agate.data_types import Boolean, Date, DateTime, Number, Text, TimeDelta from agate.testcase import AgateTestCase -from agate.data_types import * class TestJSON(AgateTestCase): def setUp(self): self.rows = ( (1, 'a', True, '11/4/2015', '11/4/2015 12:22 PM', '4:15'), - (2, u'👍', False, '11/5/2015', '11/4/2015 12:45 PM', '6:18'), + (2, '👍', False, '11/5/2015', '11/4/2015 12:45 PM', '6:18'), (None, 'b', None, None, None, None) ) @@ -30,7 +27,7 @@ def setUp(self): def test_to_json(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.to_json(output, indent=4) js1 = json.loads(output.getvalue()) @@ -43,7 +40,7 @@ def test_to_json(self): def test_to_json_key(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.to_json(output, key='text', indent=4) js1 = json.loads(output.getvalue()) @@ -56,7 +53,7 @@ def test_to_json_key(self): def test_to_json_non_string_key(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.to_json(output, key='number', indent=4) js1 = json.loads(output.getvalue()) @@ -69,7 +66,7 @@ def test_to_json_non_string_key(self): def test_to_json_key_func(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.to_json(output, key=lambda r: r['text'], indent=4) js1 = json.loads(output.getvalue()) @@ -82,7 +79,7 @@ def test_to_json_key_func(self): def test_to_json_newline_delimited(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() table.to_json(output, newline=True) js1 = json.loads(output.getvalue().split('\n')[0]) @@ -95,7 +92,7 @@ def test_to_json_newline_delimited(self): def test_to_json_error_newline_indent(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() with self.assertRaises(ValueError): table.to_json(output, newline=True, indent=4) @@ -103,7 +100,7 @@ def test_to_json_error_newline_indent(self): def test_to_json_error_newline_key(self): table = Table(self.rows, self.column_names, self.column_types) - output = six.StringIO() + output = StringIO() with self.assertRaises(ValueError): table.to_json(output, key='three', newline=True) @@ -143,7 +140,7 @@ def test_print_json(self): table = Table(self.rows, self.column_names, self.column_types) old = sys.stdout - sys.stdout = six.StringIO() + sys.stdout = StringIO() try: table.print_json() diff --git a/tests/test_tableset/__init__.py b/tests/test_tableset/__init__.py index 8fc38b0f..8815bf60 100644 --- a/tests/test_tableset/__init__.py +++ b/tests/test_tableset/__init__.py @@ -1,19 +1,11 @@ -#!/usr/bin/env python - -from collections import OrderedDict - -try: - from StringIO import StringIO -except ImportError: - from io import StringIO - -import shutil import json +import shutil +from collections import OrderedDict +from io import StringIO from agate import Table, TableSet -from agate.aggregations import * -from agate.data_types import * from agate.computations import Formula +from agate.data_types import Number, Text from agate.testcase import AgateTestCase @@ -62,7 +54,7 @@ def test_create_tableset_mismatched_column_names(self): ]) with self.assertRaises(ValueError): - tableset = TableSet(tables.values(), tables.keys()) # noqa + TableSet(tables.values(), tables.keys()) def test_create_tableset_mismatched_column_types(self): tables = OrderedDict([ @@ -72,7 +64,7 @@ def test_create_tableset_mismatched_column_types(self): ]) with self.assertRaises(ValueError): - tableset = TableSet(tables.values(), tables.keys()) # noqa + TableSet(tables.values(), tables.keys()) def test_iter(self): tableset = TableSet(self.tables.values(), self.tables.keys()) @@ -148,7 +140,11 @@ def test_from_json_file(self): tableset3 = TableSet.from_json(filelike) self.assertSequenceEqual(tableset1.column_names, tableset2.column_names, tableset3.column_names) - self.assertSequenceEqual([type(t) for t in tableset1.column_types], [type(t) for t in tableset2.column_types], [type(t) for t in tableset3.column_types]) + self.assertSequenceEqual( + [type(t) for t in tableset1.column_types], + [type(t) for t in tableset2.column_types], + [type(t) for t in tableset3.column_types] + ) self.assertEqual(len(tableset1), len(tableset2), len(tableset3)) @@ -162,7 +158,7 @@ def test_from_json_file(self): def test_from_json_false_path(self): with self.assertRaises(IOError): - tableset1 = TableSet.from_json('notapath') # noqa + TableSet.from_json('notapath') def test_to_json(self): tableset = TableSet(self.tables.values(), self.tables.keys()) diff --git a/tests/test_tableset/test_aggregate.py b/tests/test_tableset/test_aggregate.py index 19c252b8..c19aa0b4 100644 --- a/tests/test_tableset/test_aggregate.py +++ b/tests/test_tableset/test_aggregate.py @@ -1,15 +1,9 @@ -#!/usr/bin/env python - from collections import OrderedDict - -try: - from cdecimal import Decimal -except ImportError: # pragma: no cover - from decimal import Decimal +from decimal import Decimal from agate import Table, TableSet -from agate.aggregations import * -from agate.data_types import * +from agate.aggregations import Count, MaxLength, Mean, Min, Sum +from agate.data_types import Number, Text from agate.exceptions import DataTypeError from agate.testcase import AgateTestCase diff --git a/tests/test_tableset/test_charting.py b/tests/test_tableset/test_charting.py index ad03c8bd..7eb459f3 100644 --- a/tests/test_tableset/test_charting.py +++ b/tests/test_tableset/test_charting.py @@ -1,13 +1,5 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- - from collections import OrderedDict -try: - import unittest2 as unittest -except ImportError: - import unittest - import leather from agate import Table, TableSet diff --git a/tests/test_tableset/test_having.py b/tests/test_tableset/test_having.py index 83d5ad9b..19d5bcd8 100644 --- a/tests/test_tableset/test_having.py +++ b/tests/test_tableset/test_having.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python - from collections import OrderedDict from agate import Table, TableSet diff --git a/tests/test_tableset/test_merge.py b/tests/test_tableset/test_merge.py index 9b3d21c7..196ae00e 100644 --- a/tests/test_tableset/test_merge.py +++ b/tests/test_tableset/test_merge.py @@ -1,10 +1,7 @@ -#!/usr/bin/env python - from collections import OrderedDict from agate import Table, TableSet -from agate.aggregations import * -from agate.data_types import * +from agate.data_types import Number, Text from agate.testcase import AgateTestCase @@ -76,10 +73,10 @@ def test_merge_groups_invalid_length(self): tableset = TableSet(self.tables.values(), self.tables.keys()) with self.assertRaises(ValueError): - table = tableset.merge(groups=['red', 'blue'], group_name='color_code') # noqa + tableset.merge(groups=['red', 'blue'], group_name='color_code') def test_merge_groups_invalid_type(self): tableset = TableSet(self.tables.values(), self.tables.keys()) with self.assertRaises(ValueError): - table = tableset.merge(groups='invalid', group_name='color_code') # noqa + tableset.merge(groups='invalid', group_name='color_code') diff --git a/tests/test_type_tester.py b/tests/test_type_tester.py index 6288f9d7..43af7476 100644 --- a/tests/test_type_tester.py +++ b/tests/test_type_tester.py @@ -1,12 +1,6 @@ -#!/usr/bin/env python -# -*- coding: utf8 -*- +import unittest -try: - import unittest2 as unittest -except ImportError: - import unittest - -from agate.data_types import * +from agate.data_types import Boolean, Date, DateTime, Number, Text, TimeDelta from agate.type_tester import TypeTester @@ -14,6 +8,18 @@ class TestTypeTester(unittest.TestCase): def setUp(self): self.tester = TypeTester() + def test_empty(self): + rows = [ + (None,), + (None,), + (None,), + ] + + inferred = self.tester.run(rows, ['one']) + + # This behavior is not necessarily desirable. See https://github.com/wireservice/agate/issues/371 + self.assertIsInstance(inferred[0], Boolean) + def test_text_type(self): rows = [ ('a',), @@ -60,8 +66,8 @@ def test_number_currency(self): def test_number_currency_locale(self): rows = [ - (u'£1.7',), - (u'£200000000',), + ('£1.7',), + ('£200000000',), ('',) ] @@ -202,7 +208,7 @@ def test_types_number_locale(self): ('',) ] - tester = TypeTester(types=[Number(locale='de_DE'), Text()]) + tester = TypeTester(types=[Number(locale='de_DE.UTF-8'), Text()]) inferred = tester.run(rows, ['one']) self.assertIsInstance(inferred[0], Number) diff --git a/tests/test_utils.py b/tests/test_utils.py index 3705cc4d..f73b69f6 100644 --- a/tests/test_utils.py +++ b/tests/test_utils.py @@ -1,22 +1,7 @@ -#!/usr/bin/env python +import unittest +from decimal import Decimal -try: - from cdecimal import Decimal -except ImportError: # pragma: no cover - from decimal import Decimal - -try: - import unittest2 as unittest -except ImportError: - import unittest - -import sys -import warnings - -from agate.data_types import Text -from agate.mapped_sequence import MappedSequence -from agate.table import Table -from agate.utils import Quantiles, round_limits, letter_name +from agate.utils import Quantiles, letter_name, round_limits class TestQuantiles(unittest.TestCase): diff --git a/tox.ini b/tox.ini deleted file mode 100644 index 1fc719e9..00000000 --- a/tox.ini +++ /dev/null @@ -1,33 +0,0 @@ -[tox] -envlist = py27,py35,py36,py37,py38,pypy2,pypy3 - -[testenv] -commands=nosetests tests - -[testenv:py27] -deps = -rrequirements-py2.txt - -[testenv:py35] -deps = -rrequirements-py3.txt - -[testenv:py36] -deps = {[testenv:py35]deps} - -[testenv:py37] -deps = {[testenv:py35]deps} - -[testenv:py38] -deps = {[testenv:py35]deps} - -[testenv:pypy2] -deps = {[testenv:py27]deps} - -[testenv:pypy3] -deps = {[testenv:py35]deps} - -[flake8] -ignore=E128,E402,E501,F403 -# E128 continuation line under-indented for visual indent -# E402 module level import not at top of file -# E501 line too long (X > 79 characters) -# F403 'from xyz import *' used; unable to detect undefined names diff --git a/tutorial.ipynb b/tutorial.ipynb index aa0b20ec..cd1b4cf6 100644 --- a/tutorial.ipynb +++ b/tutorial.ipynb @@ -8,7 +8,7 @@ "\n", "The best way to learn to use any tool is to actually use it. In this tutorial we will use agate to answer some basic questions about a dataset.\n", "\n", - "The data we will be using is a copy of the [National Registry of Exonerations]( http://www.law.umich.edu/special/exoneration/Pages/detaillist.aspx) made on August 28th, 2015. This dataset lists individuals who are known to have been exonerated after having been wrongly convicted in United States courts. At the time this data was copied there were 1,651 entries in the registry." + "The data we will be using is a copy of the [National Registry of Exonerations]( https://www.law.umich.edu/special/exoneration/Pages/detaillist.aspx) made on August 28th, 2015. This dataset lists individuals who are known to have been exonerated after having been wrongly convicted in United States courts. At the time this data was copied there were 1,651 entries in the registry." ] }, { @@ -23,7 +23,7 @@ "\n", "Note: You should be installing agate inside a [virtualenv](https://virtualenv.readthedocs.io/en/stable/>). If for some crazy reason you aren't using virtualenv you will need to add a ``sudo`` to the previous command.*\n", "\n", - "For more detailed installation instructions, see the [Installation](http://agate.readthedocs.io/en/1.6.2/install.html) section of the documentation." + "For more detailed installation instructions, see the [Installation](https://agate.readthedocs.io/en/latest/install.html) section of the documentation." ] }, { @@ -67,7 +67,7 @@ "source": [ "## Loading data from a CSV\n", "\n", - "The [`Table`](http://agate.readthedocs.io/en/1.6.2/api/table.html#module-agate.table) is the basic class in agate. To create a table from a CSV we use the [`Table.from_csv`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.from_csv) class method:" + "The [`Table`](https://agate.readthedocs.io/en/latest/api/table.html#module-agate.table) is the basic class in agate. To create a table from a CSV we use the [`Table.from_csv`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.from_csv) class method:" ] }, { @@ -85,7 +85,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "With no other arguments specified, agate will automatically create an instance of [`TypeTester`](http://agate.readthedocs.io/en/1.6.2/api/type_tester.html#agate.TypeTester) and use it to figure out the type of each column. TypeTester is a \"best guess\" approach to determining the kinds of data in your table. It can guess wrong. In that case you can create a TypeTester manually and use the ``force`` argument to override its guess for a specific column:" + "With no other arguments specified, agate will automatically create an instance of [`TypeTester`](https://agate.readthedocs.io/en/latest/api/type_tester.html#agate.TypeTester) and use it to figure out the type of each column. TypeTester is a \"best guess\" approach to determining the kinds of data in your table. It can guess wrong. In that case you can create a TypeTester manually and use the ``force`` argument to override its guess for a specific column:" ] }, { @@ -107,9 +107,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "If you already know the types of your data you may wish to skip the TypeTester entirely. You may pass sequences of column names and column types to [`Table.from_csv`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.from_csv) as the ``column_names`` and ``column_types`` arguments, respectively.\n", + "If you already know the types of your data you may wish to skip the TypeTester entirely. You may pass sequences of column names and column types to [`Table.from_csv`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.from_csv) as the ``column_names`` and ``column_types`` arguments, respectively.\n", "\n", - "For larger datasets the [`TypeTester`](http://agate.readthedocs.io/en/1.6.2/api/type_tester.html#agate.TypeTester) can be slow to evaluate the data. In that case you can specify a `limit` argument to restrict the amount of data it will use to infer types:" + "For larger datasets the [`TypeTester`](https://agate.readthedocs.io/en/latest/api/type_tester.html#agate.TypeTester) can be slow to evaluate the data. In that case you can specify a `limit` argument to restrict the amount of data it will use to infer types:" ] }, { @@ -133,7 +133,7 @@ "\n", "**Note:** agate's CSV reader and writer support unicode and other encodings for both Python 2 and Python 3. Try using them as a drop-in replacement for Python's builtin module: `from agate import csv`.\n", "\n", - "**Note:** agate also has [`Table.from_json`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.from_json) for creating tables from JSON data." + "**Note:** agate also has [`Table.from_json`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.from_json) for creating tables from JSON data." ] }, { @@ -143,7 +143,7 @@ "Describing the table\n", "====================\n", "\n", - "If you're working with new data, or you just need a refresher, you may want to review what columns are in the table. You can do this with the [`.Table.print_structure`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.print_structure) method or by just calling `print` on the table:" + "If you're working with new data, or you just need a refresher, you may want to review what columns are in the table. You can do this with the [`.Table.print_structure`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.print_structure) method or by just calling `print` on the table:" ] }, { @@ -192,12 +192,12 @@ "Navigating table data\n", "=====================\n", "\n", - "agate goes to great pains to make accessing the data in your tables work seamlessly for a wide variety of use-cases. Access by both [`Column`](http://agate.readthedocs.io/en/1.6.2/api/columns_and_rows.html#agate.Column) and [`Row`](http://agate.readthedocs.io/en/1.6.2/api/columns_and_rows.html#agate.Row) is supported, via the [`Table.columns`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.columns) and [`Table.rows`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.rows) attributes respectively.\n", + "agate goes to great pains to make accessing the data in your tables work seamlessly for a wide variety of use-cases. Access by both [`Column`](https://agate.readthedocs.io/en/latest/api/columns_and_rows.html#agate.Column) and [`Row`](https://agate.readthedocs.io/en/latest/api/columns_and_rows.html#agate.Row) is supported, via the [`Table.columns`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.columns) and [`Table.rows`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.rows) attributes respectively.\n", "\n", - "All four of these objects are examples of [`.MappedSequence`](http://agate.readthedocs.io/en/1.6.2/api/columns_and_rows.html#agate.MappedSequence), the foundational type that underlies much of agate's functionality. A MappedSequence functions very similar to a standard Python [`dict`](https://docs.python.org/3/tutorial/datastructures.html#dictionaries), with a few important exceptions:\n", + "All four of these objects are examples of [`.MappedSequence`](https://agate.readthedocs.io/en/latest/api/columns_and_rows.html#agate.MappedSequence), the foundational type that underlies much of agate's functionality. A MappedSequence functions very similar to a standard Python [`dict`](https://docs.python.org/3/tutorial/datastructures.html#dictionaries), with a few important exceptions:\n", "\n", "* Data may be accessed either by numeric index (e.g. column number) or by a non-integer key (e.g. column name).\n", - "* Items are ordered, just like an instance of [`collections.OrderedDict`](https://docs.python.org/3.5/library/collections.html#collections.OrderedDict).\n", + "* Items are ordered, just like an instance of [`collections.OrderedDict`](https://docs.python.org/3/library/collections.html#collections.OrderedDict).\n", "* Iterating over the sequence returns its *values*, rather than its *keys*.\n", "\n", "To demonstrate the first point, these two lines are both valid ways of getting the first column in the `exonerations` table:" @@ -304,7 +304,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In this case we create our row names using a [`lambda`](https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions) function that takes a row and returns an unique identifer. If your data has a unique column, you can also just pass the column name. (For example, a column of USPS abbrevations or FIPS codes.) Note, however, that your row names can never be `int`, because that is reserved for indexing by numeric order. (A [`decimal.Decimal`](https://docs.python.org/3.5/library/decimal.html#decimal.Decimal) or stringified integer is just fine.)\n", + "In this case we create our row names using a [`lambda`](https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions) function that takes a row and returns an unique identifer. If your data has a unique column, you can also just pass the column name. (For example, a column of USPS abbrevations or FIPS codes.) Note, however, that your row names can never be `int`, because that is reserved for indexing by numeric order. (A [`decimal.Decimal`](https://docs.python.org/3/library/decimal.html#decimal.Decimal) or stringified integer is just fine.)\n", "\n", "Once you've got a specific row, you can then access its individual values (cells, in spreadsheet-speak) either by numeric index or column name:" ] @@ -412,7 +412,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For any instance of [`.MappedSequence`](http://agate.readthedocs.io/en/1.6.2/api/columns_and_rows.html#agate.MappedSequence), iteration returns values, *in order*. Here we print only the first ten:" + "For any instance of [`.MappedSequence`](https://agate.readthedocs.io/en/latest/api/columns_and_rows.html#agate.MappedSequence), iteration returns values, *in order*. Here we print only the first ten:" ] }, { @@ -450,7 +450,7 @@ "collapsed": true }, "source": [ - "To summarize, the four most common data structures in agate ([`Column`](http://agate.readthedocs.io/en/1.6.2/api/columns_and_rows.html#agate.Column), [`Row`](http://agate.readthedocs.io/en/1.6.2/api/columns_and_rows.html#agate.Row), [`Table.columns`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.columns) and [`Table.rows`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.rows)) are all instances of [`MappedSequence`](http://agate.readthedocs.io/en/1.6.2/api/columns_and_rows.html#agate.MappedSequence) and therefore all behave in a uniform way. This is also true of [`TableSet`](http://agate.readthedocs.io/en/1.6.2/api/tableset.html), which will discuss later on." + "To summarize, the four most common data structures in agate ([`Column`](https://agate.readthedocs.io/en/latest/api/columns_and_rows.html#agate.Column), [`Row`](https://agate.readthedocs.io/en/latest/api/columns_and_rows.html#agate.Row), [`Table.columns`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.columns) and [`Table.rows`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.rows)) are all instances of [`MappedSequence`](https://agate.readthedocs.io/en/latest/api/columns_and_rows.html#agate.MappedSequence) and therefore all behave in a uniform way. This is also true of [`TableSet`](https://agate.readthedocs.io/en/latest/api/tableset.html), which will discuss later on." ] }, { @@ -464,11 +464,11 @@ "\n", "**Question:** How many exonerations involved a false confession?\n", "\n", - "Answering this question involves counting the number of ``True`` values in the ``false_confession`` column. When we created the table we specified that the data in this column contained [`Boolean`](http://agate.readthedocs.io/en/1.6.2/api/data_types.html#agate.Boolean) data. Because of this, agate has taken care of coercing the original text data from the CSV into Python's ``True`` and ``False`` values.\n", + "Answering this question involves counting the number of ``True`` values in the ``false_confession`` column. When we created the table we specified that the data in this column contained [`Boolean`](https://agate.readthedocs.io/en/latest/api/data_types.html#agate.Boolean) data. Because of this, agate has taken care of coercing the original text data from the CSV into Python's ``True`` and ``False`` values.\n", "\n", - "We'll answer the question using an instance of [`Count`](http://agate.readthedocs.io/en/1.6.2/api/aggregations.html#agate.Count) which is a type of [`Aggregation`](http://agate.readthedocs.io/en/1.6.2/api/aggregations.html#agate.Aggregation). Aggregations are used to perform \"column-wise\" calculations. That is, they derive a new single value from the contents of a column. The [`Count`](http://agate.readthedocs.io/en/1.6.2/api/aggregations.html#agate.Count) aggregation can count either all values in a column, or how many times a particular value appears.\n", + "We'll answer the question using an instance of [`Count`](https://agate.readthedocs.io/en/latest/api/aggregations.html#agate.Count) which is a type of [`Aggregation`](https://agate.readthedocs.io/en/latest/api/aggregations.html#agate.Aggregation). Aggregations are used to perform \"column-wise\" calculations. That is, they derive a new single value from the contents of a column. The [`Count`](https://agate.readthedocs.io/en/latest/api/aggregations.html#agate.Count) aggregation can count either all values in a column, or how many times a particular value appears.\n", "\n", - "An Aggregation is applied to a table using [`Table.aggregate`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.aggregate).\n", + "An Aggregation is applied to a table using [`Table.aggregate`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.aggregate).\n", "\n", "It sounds complicated, but it's really simple. Putting it all together looks like this:" ] @@ -537,7 +537,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The answer to our question is \"26 years old\", however, as the warnings indicate, not every exonerated individual in the data has a value for the ``age`` column. The [`Median`](http://agate.readthedocs.io/en/1.6.2/api/aggregations.html#agate.Median) statistical operation has no standard way of accounting for null values, so it leaves them out of the calculation.\n", + "The answer to our question is \"26 years old\", however, as the warnings indicate, not every exonerated individual in the data has a value for the ``age`` column. The [`Median`](https://agate.readthedocs.io/en/latest/api/aggregations.html#agate.Median) statistical operation has no standard way of accounting for null values, so it leaves them out of the calculation.\n", "\n", "**Question:** How many individuals do not have an age specified in the data?\n", "\n", @@ -574,7 +574,7 @@ "source": [ "Only nine rows in this dataset don't have age, so it's certainly still useful to compute a median. However, we might still want to filter those rows out so we could have a consistent sample for all of our calculations. In the next section you'll learn how to do just that.\n", "\n", - "Different [`aggregations`](http://agate.readthedocs.io/en/1.6.2/api/aggregations.html) can be applied depending on the type of data in each column. If none of the provided aggregations suit your needs you can use [`Summary`](http://agate.readthedocs.io/en/1.6.2/api/aggregations.html#agate.Summary) to apply an arbitrary function to a column. If that still doesn't suit your needs you can always create your own aggregation from scratch by subclassing [`Aggregation`](http://agate.readthedocs.io/en/1.6.2/api/aggregations.html#agate.Aggregation)." + "Different [`aggregations`](https://agate.readthedocs.io/en/latest/api/aggregations.html) can be applied depending on the type of data in each column. If none of the provided aggregations suit your needs you can use [`Summary`](https://agate.readthedocs.io/en/latest/api/aggregations.html#agate.Summary) to apply an arbitrary function to a column. If that still doesn't suit your needs you can always create your own aggregation from scratch by subclassing [`Aggregation`](https://agate.readthedocs.io/en/latest/api/aggregations.html#agate.Aggregation)." ] }, { @@ -584,9 +584,9 @@ "Selecting and filtering data\n", "============================\n", "\n", - "So what if those rows with no age were going to flummox our analysis? Agate's [`Table`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table) class provides a full suite of SQL-like operations including [`Table.select`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.select) for grabbing specific columns, [`Table.where`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.where) for selecting particular rows and [`Table.group_by`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.group_by) for grouping rows by common values.\n", + "So what if those rows with no age were going to flummox our analysis? Agate's [`Table`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table) class provides a full suite of SQL-like operations including [`Table.select`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.select) for grabbing specific columns, [`Table.where`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.where) for selecting particular rows and [`Table.group_by`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.group_by) for grouping rows by common values.\n", "\n", - "Let's use [`Table.where`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.where) to filter our exonerations table to only those individuals that have an age specified." + "Let's use [`Table.where`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.where) to filter our exonerations table to only those individuals that have an age specified." ] }, { @@ -604,9 +604,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You'll notice we provide a [`lambda`](https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions) function to the [`Table.where`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.where). This function is applied to each row and if it returns ``True``, then the row is included in the output table.\n", + "You'll notice we provide a [`lambda`](https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions) function to the [`Table.where`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.where). This function is applied to each row and if it returns ``True``, then the row is included in the output table.\n", "\n", - "A crucial thing to understand about these table methods is that they return **new tables**. In our example above ``exonerations`` was a [`Table`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table) instance and we applied [`Table.where`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.where), so ``with_age`` is a new, different [`Table`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table). The tables themselves can't be changed. You can create new tables with these methods, but you can't modify them in-place. (If this seems weird, just trust me. There are lots of good computer science-y reasons to do it this way.)\n", + "A crucial thing to understand about these table methods is that they return **new tables**. In our example above ``exonerations`` was a [`Table`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table) instance and we applied [`Table.where`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.where), so ``with_age`` is a new, different [`Table`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table). The tables themselves can't be changed. You can create new tables with these methods, but you can't modify them in-place. (If this seems weird, just trust me. There are lots of good computer science-y reasons to do it this way.)\n", "\n", "We can verify this did what we expected by counting the rows in the original table and rows in the new table:" ] @@ -671,13 +671,13 @@ "Computing new columns\n", "=====================\n", "\n", - "In addition to \"column-wise\" [`aggregations`](http://agate.readthedocs.io/en/1.6.2/api/aggregations.html#module-agate.aggregations) there are also \"row-wise\" [`computations`](http://agate.readthedocs.io/en/1.6.2/api/computations.html#module-agate.computations). Computations go through a [`Table`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table) row-by-row and derive a new column using the existing data. To perform row computations in agate we use subclasses of [`Computation`](http://agate.readthedocs.io/en/1.6.2/api/computations.html#agate.Computation).\n", + "In addition to \"column-wise\" [`aggregations`](https://agate.readthedocs.io/en/latest/api/aggregations.html#module-agate.aggregations) there are also \"row-wise\" [`computations`](https://agate.readthedocs.io/en/latest/api/computations.html#module-agate.computations). Computations go through a [`Table`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table) row-by-row and derive a new column using the existing data. To perform row computations in agate we use subclasses of [`Computation`](https://agate.readthedocs.io/en/latest/api/computations.html#agate.Computation).\n", "\n", - "When one or more instances of [`Computation`](http://agate.readthedocs.io/en/1.6.2/api/computations.html#agate.Computation) are applied with the [`Table.compute`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.compute) method, a new table is created with additional columns.\n", + "When one or more instances of [`Computation`](https://agate.readthedocs.io/en/latest/api/computations.html#agate.Computation) are applied with the [`Table.compute`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.compute) method, a new table is created with additional columns.\n", "\n", "**Question:** How long did individuals remain in prison before being exonerated?\n", "\n", - "To answer this question we will apply the [`Change`](http://agate.readthedocs.io/en/1.6.2/api/computations.html#agate.Change) computation to the ``convicted`` and ``exonerated`` columns. Each of these columns contains the individual's age at the time of that event. All that [`Change`](http://agate.readthedocs.io/en/1.6.2/api/computations.html#agate.Change) does is compute the difference between two numbers. (In this case each of these columns contain a [`Number`](http://agate.readthedocs.io/en/1.6.2/api/data_types.html#agate.Number), but this will also work with [`Date`](http://agate.readthedocs.io/en/1.6.2/api/data_types.html#agate.Date) or [`DateTime`](http://agate.readthedocs.io/en/1.6.2/api/data_types.html#agate.DateTime).)" + "To answer this question we will apply the [`Change`](https://agate.readthedocs.io/en/latest/api/computations.html#agate.Change) computation to the ``convicted`` and ``exonerated`` columns. Each of these columns contains the individual's age at the time of that event. All that [`Change`](https://agate.readthedocs.io/en/latest/api/computations.html#agate.Change) does is compute the difference between two numbers. (In this case each of these columns contain a [`Number`](https://agate.readthedocs.io/en/latest/api/data_types.html#agate.Number), but this will also work with [`Date`](https://agate.readthedocs.io/en/latest/api/data_types.html#agate.Date) or [`DateTime`](https://agate.readthedocs.io/en/latest/api/data_types.html#agate.DateTime).)" ] }, { @@ -712,7 +712,7 @@ "source": [ "The median number of years an exonerated individual spent in prison was 8 years.\n", "\n", - "Sometimes, the built-in computations, such as [`Change`](http://agate.readthedocs.io/en/1.6.2/api/computations.html#agate.Change) won't suffice. I mentioned before that you could perform arbitrary column-wise aggregations using [`Summary`](http://agate.readthedocs.io/en/1.6.2/api/aggregations.html#agate.Summary). You can do the same thing for row-wise computations using [`Formula`](http://agate.readthedocs.io/en/1.6.2/api/computations.html#agate.Formula). This is somewhat analogous to Excel's cell formulas.\n", + "Sometimes, the built-in computations, such as [`Change`](https://agate.readthedocs.io/en/latest/api/computations.html#agate.Change) won't suffice. I mentioned before that you could perform arbitrary column-wise aggregations using [`Summary`](https://agate.readthedocs.io/en/latest/api/aggregations.html#agate.Summary). You can do the same thing for row-wise computations using [`Formula`](https://agate.readthedocs.io/en/latest/api/computations.html#agate.Formula). This is somewhat analogous to Excel's cell formulas.\n", "\n", "For example, this code will create a ``full_name`` column from the ``first_name`` and ``last_name`` columns in the data:" ] @@ -780,7 +780,7 @@ "source": [ "We add the ``replace`` argument to our ``compute`` method to replace the state column in place.\n", "\n", - "If [`Formula`](http://agate.readthedocs.io/en/1.6.2/api/computations.html#agate.Formula) is not flexible enough (for instance, if you needed to compute a new value based on the distribution of data in a column) you can always implement your own subclass of [`Computation`](http://agate.readthedocs.io/en/1.6.2/api/computations.html#agate.Computation). See the API documentation for [`computations`](http://agate.readthedocs.io/en/1.6.2/api/computations.html#module-agate.computations to see all of the supported ways to compute new data." + "If [`Formula`](https://agate.readthedocs.io/en/latest/api/computations.html#agate.Formula) is not flexible enough (for instance, if you needed to compute a new value based on the distribution of data in a column) you can always implement your own subclass of [`Computation`](https://agate.readthedocs.io/en/latest/api/computations.html#agate.Computation). See the API documentation for [`computations`](https://agate.readthedocs.io/en/latest/api/computations.html#module-agate.computations to see all of the supported ways to compute new data." ] }, { @@ -792,7 +792,7 @@ "\n", "**Question:** Who are the ten exonerated individuals who were youngest at the time they were arrested?\n", "\n", - "Remembering that methods of tables return tables, we will use [`Table.order_by`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.order_by) to sort our table:" + "Remembering that methods of tables return tables, we will use [`Table.order_by`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.order_by) to sort our table:" ] }, { @@ -810,7 +810,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can then use [`Table.limit`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.limit) get only the first ten rows of the data." + "We can then use [`Table.limit`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.limit) get only the first ten rows of the data." ] }, { @@ -828,7 +828,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now let's use [`Table.print_table`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.print_table) to help us pretty the results in a way we can easily review:" + "Now let's use [`Table.print_table`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.print_table) to help us pretty the results in a way we can easily review:" ] }, { @@ -865,11 +865,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "If you find it impossible to believe that an eleven year-old was convicted of murder, I encourage you to read the Registry's [description of the case](http://www.law.umich.edu/special/exoneration/Pages/casedetail.aspx?caseid=3499>).\n", + "If you find it impossible to believe that an eleven year-old was convicted of murder, I encourage you to read the Registry's [description of the case](https://www.law.umich.edu/special/exoneration/Pages/casedetail.aspx?caseid=3499>).\n", "\n", - "**Note:** In the previous example we could have omitted the [`Table.limit`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.limit) and passed a ``max_rows=10`` to [`Table.print_table`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.print_table) instead. In this case they accomplish exactly the same goal.\n", + "**Note:** In the previous example we could have omitted the [`Table.limit`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.limit) and passed a ``max_rows=10`` to [`Table.print_table`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.print_table) instead. In this case they accomplish exactly the same goal.\n", "\n", - "What if we were more curious about the *distribution* of ages, rather than the highest or lowest? agate includes the [`Table.pivot`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.pivot) and [`Table.bins`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.bins) methods for counting values individually or by ranges. Let's try binning the ages. Then, instead of using [`Table.print_table`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.print_table), we'll use [`Table.print_bars`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.print_bars) to generate a simple, text bar chart." + "What if we were more curious about the *distribution* of ages, rather than the highest or lowest? agate includes the [`Table.pivot`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.pivot) and [`Table.bins`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.bins) methods for counting values individually or by ranges. Let's try binning the ages. Then, instead of using [`Table.print_table`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.print_table), we'll use [`Table.print_bars`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.print_bars) to generate a simple, text bar chart." ] }, { @@ -925,7 +925,7 @@ "\n", "This question can't be answered by operating on a single column. What we need is the equivalent of SQL's ``GROUP BY``. agate supports a full set of SQL-like operations on tables. Unlike SQL, agate breaks grouping and aggregation into two discrete steps.\n", "\n", - "First, we use [`Table.group_by`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.group_by) to group the data by state." + "First, we use [`Table.group_by`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.group_by) to group the data by state." ] }, { @@ -943,9 +943,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This takes our original [`Table`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table) and groups it into a [`TableSet`](http://agate.readthedocs.io/en/1.6.2/api/tableset.html#agate.TableSet), which contains one table per state. As mentioned much earlier in this tutorial, TableSets are instances of [`MappedSequence`](http://agate.readthedocs.io/en/1.6.2/api/columns_and_rows.html#agate.MappedSequence). That means they work very much like [`Column`](http://agate.readthedocs.io/en/1.6.2/api/columns_and_rows.html#agate.Column) and [`Row`](http://agate.readthedocs.io/en/1.6.2/api/columns_and_rows.html#agate.Row).\n", + "This takes our original [`Table`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table) and groups it into a [`TableSet`](https://agate.readthedocs.io/en/latest/api/tableset.html#agate.TableSet), which contains one table per state. As mentioned much earlier in this tutorial, TableSets are instances of [`MappedSequence`](https://agate.readthedocs.io/en/latest/api/columns_and_rows.html#agate.MappedSequence). That means they work very much like [`Column`](https://agate.readthedocs.io/en/latest/api/columns_and_rows.html#agate.Column) and [`Row`](https://agate.readthedocs.io/en/latest/api/columns_and_rows.html#agate.Row).\n", "\n", - "Now we need to aggregate the total for each state. This works in a very similar way to how it did when we were aggregating columns of a single table, except that we'll use the [`Count`](http://agate.readthedocs.io/en/1.6.2/api/aggregations.html#agate.Count) aggregation to count the total number of rows in each group." + "Now we need to aggregate the total for each state. This works in a very similar way to how it did when we were aggregating columns of a single table, except that we'll use the [`Count`](https://agate.readthedocs.io/en/latest/api/aggregations.html#agate.Count) aggregation to count the total number of rows in each group." ] }, { @@ -984,11 +984,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You'll notice we pass a sequence of tuples to [`TableSet.aggregate`](http://agate.readthedocs.io/en/1.6.2/api/tableset.html#agate.TableSet.aggregate). Each one includes two elements. The first is the new column name being created. The second is an instance of some [`Aggregation`](http://agate.readthedocs.io/en/1.6.2/api/aggregations.html#agate.Aggregation). Unsurpringly, in this case the results appear to be roughly proportional to population.\n", + "You'll notice we pass a sequence of tuples to [`TableSet.aggregate`](https://agate.readthedocs.io/en/latest/api/tableset.html#agate.TableSet.aggregate). Each one includes two elements. The first is the new column name being created. The second is an instance of some [`Aggregation`](https://agate.readthedocs.io/en/latest/api/aggregations.html#agate.Aggregation). Unsurpringly, in this case the results appear to be roughly proportional to population.\n", "\n", "**Question:** What state has the longest median time in prison prior to exoneration?\n", "\n", - "This is a much more complicated question that's going to pull together a lot of the features we've been using. We'll repeat the computations we applied before, but this time we're going to roll those computations up in state-by-state groups and then take the [`Median`](http://agate.readthedocs.io/en/1.6.2/api/aggregations.html#agate.Median of each group. Then we'll sort the data and see where people have been stuck in prison the longest." + "This is a much more complicated question that's going to pull together a lot of the features we've been using. We'll repeat the computations we applied before, but this time we're going to roll those computations up in state-by-state groups and then take the [`Median`](https://agate.readthedocs.io/en/latest/api/aggregations.html#agate.Median of each group. Then we'll sort the data and see where people have been stuck in prison the longest." ] }, { @@ -1038,7 +1038,7 @@ "source": [ "DC? Nebraska? What accounts for these states having the longest times in prison before exoneration? I have no idea! Given that the group sizes are small, it would probably be wise to look for outliers.\n", "\n", - "As with [`Table.aggregate`]()http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.aggregate and [`Table.compute`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.compute), the [`TableSet.aggregate`](http://agate.readthedocs.io/en/1.6.2/api/tableset.html#agate.TableSet.aggregate) method takes a list of aggregations to perform. You can aggregate as many columns as you like in a single step and they will all appear in the output table." + "As with [`Table.aggregate`]()https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.aggregate and [`Table.compute`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.compute), the [`TableSet.aggregate`](https://agate.readthedocs.io/en/latest/api/tableset.html#agate.TableSet.aggregate) method takes a list of aggregations to perform. You can aggregate as many columns as you like in a single step and they will all appear in the output table." ] }, { @@ -1048,9 +1048,9 @@ "Multi-dimensional aggregation\n", "=============================\n", "\n", - "I've already shown you that you can use [`TableSet`](http://agate.readthedocs.io/en/1.6.2/api/tableset.html#agate.TableSet) to group instances of [`Table`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table). However, you can also use a [`TableSet`](http://agate.readthedocs.io/en/1.6.2/api/tableset.html#agate.TableSet) to group *other TableSets*. To put that another way, instances of [`TableSet`](http://agate.readthedocs.io/en/1.6.2/api/tableset.html#agate.TableSet) can be *nested*.\n", + "I've already shown you that you can use [`TableSet`](https://agate.readthedocs.io/en/latest/api/tableset.html#agate.TableSet) to group instances of [`Table`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table). However, you can also use a [`TableSet`](https://agate.readthedocs.io/en/latest/api/tableset.html#agate.TableSet) to group *other TableSets*. To put that another way, instances of [`TableSet`](https://agate.readthedocs.io/en/latest/api/tableset.html#agate.TableSet) can be *nested*.\n", "\n", - "The key to nesting data in this way is to use [`TableSet.group_by`](http://agate.readthedocs.io/en/1.6.2/api/tableset.html#agate.TableSet.group_by). This is one of many methods that can be called on a TableSet, which will then be applied to all the tables it contains. In the last section we used [`Table.group_by`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.group_by) to split data up into a group of tables. By calling [`TableSet.group_by`](http://agate.readthedocs.io/en/1.6.2/api/tableset.html#agate.TableSet.group_by), which essentially called ``group_by`` on each table and collect the results. This can be pretty hard to wrap your head around, so let's look at a concrete example.\n", + "The key to nesting data in this way is to use [`TableSet.group_by`](https://agate.readthedocs.io/en/latest/api/tableset.html#agate.TableSet.group_by). This is one of many methods that can be called on a TableSet, which will then be applied to all the tables it contains. In the last section we used [`Table.group_by`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.group_by) to split data up into a group of tables. By calling [`TableSet.group_by`](https://agate.readthedocs.io/en/latest/api/tableset.html#agate.TableSet.group_by), which essentially called ``group_by`` on each table and collect the results. This can be pretty hard to wrap your head around, so let's look at a concrete example.\n", "\n", "**Question:** Is there a collective relationship between race, age and time spent in prison prior to exoneration?\n", "\n", @@ -1118,9 +1118,9 @@ "source": [ "## Exploratory charting\n", "\n", - "Beginning with version 1.5.0, agate includes the pure-Python SVG charting library [leather](http://leather.readthedocs.io/en/latest/). Leather allows you to generate \"good enough\" charts with as little as one line of code. It's especially useful if you're working in a Jupyter Notebook, as the results will render inline.\n", + "Beginning with version 1.5.0, agate includes the pure-Python SVG charting library [leather](https://leather.readthedocs.io/en/latest/). Leather allows you to generate \"good enough\" charts with as little as one line of code. It's especially useful if you're working in a Jupyter Notebook, as the results will render inline.\n", "\n", - "There are currently four chart types support: [`Table.bar_chart`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.bar_chart), [`Table.column_chart`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.column_chart), [`Table.line_chart`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.line_chart), and [`Table.scatterplot`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.scatterplot).\n", + "There are currently four chart types support: [`Table.bar_chart`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.bar_chart), [`Table.column_chart`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.column_chart), [`Table.line_chart`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.line_chart), and [`Table.scatterplot`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.scatterplot).\n", "\n", "Let's create charts from a few slices of data we've made in this tutorial." ] @@ -1170,7 +1170,7 @@ "source": [ "### Exonerations by age bracket\n", "\n", - "When creating a chart you may omit the column name arguments. If you do so the first and second columns in the table will be used. This is especially useful for charting the output of [`TableSet.aggregate`](http://agate.readthedocs.io/en/1.6.2/api/tableset.html#agate.TableSet.aggregate) or [`Table.bins`](http://agate.readthedocs.io/en/1.6.2/api/table.html#agate.Table.bins)." + "When creating a chart you may omit the column name arguments. If you do so the first and second columns in the table will be used. This is especially useful for charting the output of [`TableSet.aggregate`](https://agate.readthedocs.io/en/latest/api/tableset.html#agate.TableSet.aggregate) or [`Table.bins`](https://agate.readthedocs.io/en/latest/api/table.html#agate.Table.bins)." ] }, { @@ -1293,7 +1293,7 @@ "source": [ "### Styling charts\n", "\n", - "As mentioned above, leather is designed for making \"good enough\" charts. You are never going to create a polished chart. However, sometimes you may want more control than agate offers through it's own methods. You can take more control over how your charts are presented by using [leather](http://leather.readthedocs.io/) directly." + "As mentioned above, leather is designed for making \"good enough\" charts. You are never going to create a polished chart. However, sometimes you may want more control than agate offers through it's own methods. You can take more control over how your charts are presented by using [leather](https://leather.readthedocs.io/) directly." ] }, { @@ -1336,9 +1336,9 @@ "Where to go next\n", "================\n", "\n", - "This tutorial only scratches the surface of agate's features. For many more ideas on how to apply agate, check out the [`Cookbook`](http://agate.readthedocs.io/en/1.6.2/cookbook.html), which includes dozens of examples of specific features of agate as well as recipes for substituting agate for Excel, SQL, R and more. Also check out the agate's [`Extensions`](http://agate.readthedocs.io/en/1.6.2/extensions.html) which add support for reading/writing SQL tables, performing statistical analysis and more.\n", + "This tutorial only scratches the surface of agate's features. For many more ideas on how to apply agate, check out the [`Cookbook`](https://agate.readthedocs.io/en/latest/cookbook.html), which includes dozens of examples of specific features of agate as well as recipes for substituting agate for Excel, SQL, R and more. Also check out the agate's [`Extensions`](https://agate.readthedocs.io/en/latest/extensions.html) which add support for reading/writing SQL tables, performing statistical analysis and more.\n", "\n", - "Also, if you're going to be doing data processing in Python you really ought to check out [`proof`](http://proof.readthedocs.org/en/1.6.2/), a library for building data processing pipelines that are repeatable and self-documenting. It will make your code cleaner and save you tons of time.\n", + "Also, if you're going to be doing data processing in Python you really ought to check out [`proof`](https://proof.readthedocs.io/en/0.3.0/), a library for building data processing pipelines that are repeatable and self-documenting. It will make your code cleaner and save you tons of time.\n", "\n", "Good luck in your reporting!" ]