multimeric · multimeric · Nov 17, 2019 · Dec 24, 2019 · Jan 20, 2020 · Jan 23, 2020
diff --git a/.gitignore b/.gitignore
diff --git a/.travis.yml b/.travis.yml
diff --git a/LICENSE b/LICENSE
diff --git a/README.rst b/README.rst
diff --git a/TODO.md b/TODO.md
@@ -0,0 +1,14 @@
+* [ ] Add validations that apply to every column in the DF equally (for the moment, users can just duplicate their validations)
+* [x] Add validations that use the entire DF like, uniqueness
+* [x] Fix CombinedValidations
+* [x] Add replacement for allow_empty Columns 
+* [ ] New column() tests
+* [x] New CombinedValidation tests
+* [x] Implement the negate flag in the indexer
+* [x] Add facility for allow_empty
+* [x] Fix messages
+* [x] Re-implement the or/and using operators
+* [ ] Allow and/or operators between Series-level and row-level validations
+* [ ] Separate ValidationClasses for each scope
+* [ ] Add row-level validations
+* [x] Fix message for DateAndOr test
diff --git a/UPDATE.md b/UPDATE.md
@@ -0,0 +1,47 @@
+# ValidationWarnings
+## Options for the ValidationWarning data
+* We keep it as is, with one single ValidationWarning class that stores a `message` and a reference to the validation
+that spawned it
+* PREFERRED: As above, but we add a dictionary of miscellaneous kwargs to the ValidationWarning for storing stuff like the row index that failed
+* We have a dataclass for each Validation type that stores things in a more structured way
+    * Why bother doing this if the Validation stores its own structure for the column index etc?
+
+## Options for the ValidationWarning message
+* It's generated from the Validation as a fixed string, as it is now
+* It's generated dynamically by the VW
+    * This means that custom messages means overriding the VW class
+* PREFERRED: It's generated dynamically in the VW by calling the parent Validation with a reference to itself, e.g. 
+  ```python
+  class ValidationWarning:
+      def __str__(self):
+          return self.validation.generate_message(self)
+
+  class Validation:
+      def generate_message(warning: ValidationWarning) -> str:
+          pass
+  ```
+    * This lets the message function use all the validation properties, and the dictionary of kwargs that it specified
+    * `generate_message()` will call `default_message(**kwargs)`, the dynamic class method, or `self.custom_message`, the
+    non-dynamic string specified by the user
+    * Each category of Validation will define a `create_prefix()` method, that creates the {row: 1, column: 2} prefix
+    that goes before each message. Thus, `generate_message()` will concatenate that with the actual message
+* 
+
+## Options for placing CombinedValidation in the inheritance hierarchy
+* In order to make both CombinedValidation and BooleanSeriesValidation both share a class, so they can be chained together,
+either we had to make a mixin that creates a "side path" that doesn't call `validate` (in this case, `validate_with_series`),
+or we 
+
+# Rework of Validation Indexing
+## All Indexed
+* All Validations now have an index and an axis
+* However, this index can be none, can be column only, row only, or both
+* When combined with each other, the resulting boolean series will be broadcast using numpy broadcasting rules
+* e.g. 
+    * A per-series validation might have index 0 (column 0) and return a scalar (the whole series is okay)
+    * A per-cell validation might have index 0 (column 0) and return a series (True, True, False) indicating that cell 0 and 1 of column 0 are okay
+    * A per-frame validation would have index None, and might return True if the whole frame meets the validation, or a series indicating which columns or rows match the validation
+
+# Rework of combinedvalidations
+## Bitwise
+* Could assign each validation a bit in a large bitwise enum, and `or` together a number each time that index fails a validatioin. This lets us track the origin of each warning, allowing us to slice them out by bit and generate an appropriate list of warnings
diff --git a/doc/common/introduction.rst b/doc/common/introduction.rst
diff --git a/doc/readme/README.rst b/doc/readme/README.rst
diff --git a/doc/readme/conf.py b/doc/readme/conf.py
diff --git a/doc/site/Makefile b/doc/site/Makefile
diff --git a/doc/site/conf.py b/doc/site/conf.py
diff --git a/doc/site/index.rst b/doc/site/index.rst
diff --git a/example/boolean.py b/example/boolean.py
diff --git a/example/boolean.txt b/example/boolean.txt
diff --git a/example/example.py b/example/example.py
diff --git a/example/example.txt b/example/example.txt
diff --git a/pandas_schema/__init__.py b/pandas_schema/__init__.py
@@ -1,4 +1,2 @@
-from .column import Column
 from .validation_warning import ValidationWarning
-from .schema import Schema
 from .version import __version__
diff --git a/pandas_schema/column.py b/pandas_schema/column.py
@@ -1,27 +1,117 @@
-import typing
-import pandas as pd
-
-from . import validation
-from .validation_warning import ValidationWarning
-
-class Column:
-    def __init__(self, name: str, validations: typing.Iterable['validation._BaseValidation'] = [], allow_empty=False):
-        """
-        Creates a new Column object
-
-        :param name: The column header that defines this column. This must be identical to the header used in the CSV/Data Frame you are validating.
-        :param validations: An iterable of objects implementing _BaseValidation that will generate ValidationErrors
-        :param allow_empty: True if an empty column is considered valid. False if we leave that logic up to the Validation
-        """
-        self.name = name
-        self.validations = list(validations)
-        self.allow_empty = allow_empty
-
-    def validate(self, series: pd.Series) -> typing.List[ValidationWarning]:
-        """
-        Creates a list of validation errors using the Validation objects contained in the Column
-
-        :param series: A pandas Series to validate
-        :return: An iterable of ValidationError instances generated by the validation
-        """
-        return [error for validation in self.validations for error in validation.get_errors(series, self)]
+from typing import Union, Iterable
+
+from pandas_schema.core import IndexValidation, BaseValidation
+from pandas_schema.index import AxisIndexer, IndexValue
+
+
+def column(
+        validations: Union[Iterable['IndexValidation'], 'IndexValidation'],
+        index = None,
+        override: bool = False,
+        recurse: bool = True,
+        allow_empty: bool = False
+) -> Union[Iterable['IndexValidation'], 'IndexValidation']:
+    """A utility method for setting the index data on a set of Validations
+
+    Args:
+      validations: A list of validations to modify
+      index: The index of the series that these validations will now consider
+      override: If true, override existing index values. Otherwise keep the existing ones
+      recurse: If true, recurse into child validations
+      allow_empty: Allow empty rows (NaN) to pass the validation
+    See :py:class:`pandas_schema.validation.IndexSeriesValidation` (Default value = False)
+    Returns:
+    """
+    # TODO: Abolish this, and instead propagate the individual validator indexes when we And/Or them together
+    def update_validation(validation: BaseValidation):
+        if isinstance(validation, IndexValidation):
+            if override or validation.index is None:
+                validation.index = index
+
+        if allow_empty:
+            return validation.optional()
+        else:
+            return validation
+
+    if isinstance(validations, Iterable):
+        ret = []
+        for valid in validations:
+            if recurse:
+                ret.append(valid.map(update_validation))
+            else:
+                ret.append(update_validation(valid))
+        return ret
+    else:
+        if recurse:
+            return validations.map(update_validation)
+        else:
+            return update_validation(validations)
+
+    return validations
+
+
+def column_sequence(
+        validations: Iterable['IndexValidation'],
+        override: bool = False
+) -> Iterable['IndexValidation']:
+    """A utility method for setting the index data on a set of Validations. Applies a sequential position based index, so
+    that the first validation gets index 0, the second gets index 1 etc. Note: this will not modify any index that
+    already has some kind of index unless you set override=True
+
+    Args:
+      validations: A list of validations to modify
+      override: If true, override existing index values. Otherwise keep the existing ones
+      validations: typing.Iterable['pandas_schema.core.IndexValidation']: 
+      override: bool:  (Default value = False)
+
+    Returns:
+
+    """
+    for i, valid in validations:
+        if override or valid.index is None:
+            valid.index = AxisIndexer(i, typ='positional')
+    return validations
+
+
+def each_column(validations: Iterable[IndexValidation], columns: IndexValue):
+    """Duplicates a validation and applies it to each column specified
+
+    Args:
+      validations: A list of validations to apply to each column
+      columns: An index that should, when applied to the column index, should return all columns you want this to
+      validations: typing.Iterable[pandas_schema.core.IndexValidation]: 
+      columns: IndexValue: 
+
+    Returns:
+
+    """
+
+#
+# def label_column(
+#         validations: typing.Iterable['pandas_schema.core.IndexSeriesValidation'],
+#         index: typing.Union[int, str],
+# ):
+#     """
+#     A utility method for setting the label-based column for each validation
+#     :param validations: A list of validations to modify
+#     :param index: The label of the series that these validations will now consider
+#     """
+#     return _column(
+#         validations,
+#         index,
+#         position=False
+#     )
+#
+# def positional_column(
+#         validations: typing.Iterable['pandas_schema.core.IndexSeriesValidation'],
+#         index: int,
+# ):
+#     """
+#     A utility method for setting the position-based column for each validation
+#     :param validations: A list of validations to modify
+#     :param index: The index of the series that these validations will now consider
+#     """
+#     return _column(
+#         validations,
+#         index,
+#         position=True