All notable changes to this project will be documented in this file. This project adheres to Semantic Versioning.
- #323 - Extend
DateTimeExp
to includeformat
- #327 - Add a --param option to handle override parameters with a comma in the value
- #328 - SftpActivity was broken in 3.0 - hard-coded to 'download'
- #324 - Workflow should be evaluated at the last minute possible
- #318 - SendFlowdockMessageActivity should use the corresponding HType in apply
- #320 - A few shell command based activity is missing input / output
- #315 - fixed a bug input and output reference in CopyActivity is not included
- #313 - added option to startThisHourAt schedule
- #310 - fix a bug where preconditions missing the referenced objects
- #213 - Start use the
name
field instead of forcingid
andname
to be the same
- #304 - Add the missing options to preconditions
- #300 - value option in encrypted and unencrypted method to create new parameters through the Parameter object
- #299 - Fixes ConstantExpression implicits to avoid unnecessary import
- #298 - Make sequence of native type to sequence of HType implicitly available
- #295 - Refactor parameter with adhoc polymorphism with type class instead of reflection TypeTags
- #248 - Refactor parameter to have EncryptedParameter and UnencryptedParameter
- #281 - Support for not failing on un-defined pipeline parameters
- #291 - Clean up the implicits
- #285 - SnsAlarm requires topic arn and added default subject and message
- #286 - Fix a bug in 3.0 that main class in jar activity is incorrect
- #282 - Add support for getting hyperion aws client by pipeline name
- #280 - Upgrade to scala 2.10.6
- #243 - Revisit and refactor expression and parameter
- The actionOnTaskFailure and actionOnResource failure is removed from emr activities, they do not belong there.
- Database objects are changed to be consistent with other objects, this means that one needs to initialize a database object instead of extending a trait
- Removed hadoopQueue from
HiveCopyActivity
andPigActivity
as it is not documented by AWS SparkJobActivity
is renamed toSparkTaskActivity
to be consistent with thepreActivityTaskConfig
field for similar activity naming from AWS
- #271 - Separate CLI with DataPipelineDef
- #214 - Extend CLI to be able to read parameters to be passed from pipeline
- #291 - Upgrade AWS SDK to 1.10.43
- #277 - InsertTableQuery actually needs the values placeholders
- #275 - Schedule is not honouring settings in non-application.conf config
- #273 - Add
ACCEPTINVCHARS
and the rest of Data Conversion Parameters to redshift copy options
- #269 - Sftp download auth cancel when using username and password
- #267 - Passing 0 to stopAfter should reset end to None
- #264 - CLI schedule override only the explicitly specified part
- #262 - Add slf4j-simple to examples
- #240 - Support EmrConfiguration and Property
- #241 - Support HttpProxy
- #255 - Provide explanations for CLI options
- #256 - Use a logging framework instead of println
- #209 - Override start activation time on command line
- #249 - Implement a simpleName value on MainClass to get just the class name itself
- #252 - Add option to Graph to exclude data nodes (or make it the default)
- #251 - Graph still emits resources (just not resource dependencies) when not using --include-resources
- #239 - Capability to generate graph of workflow
- #237 - Allow Spark*Activity to override driver-memory
- #234 - SplitMergeFiles should allow ignoring cases where there is no input files
- #224 - Spark*Activity should allow setting parameters for spark jobs
- #229 - Convert S3DistCpActivity to a HadoopActivity instead of EmrActivity
- #229 - Convert S3DistCpActivity to a HadoopActivity instead of EmrActivity
- #228 - Allow specifying options to S3DistCpActivity
- #226 - Improves SetS3AclActivity with canned acl enum and more flexible apply
- #223 - Contrib activity that sets S3 ACL
- #220 - Make SparkActivity download jar to different directory to avoid race condition of jobs running in parallel.
- #217 - DateTimeExpression methods returns the wrong expression.
- #211 - RedhishiftUnloadActivity fail when containing expressions with
'
- #207 - Make workflow expression DSL avaible to pipeline def by default.
- #204 - HadoopActivity and SparkJobActivity should support input and output data nodes
- #202 - WorkflowGraph fails with assertion if not using named
- #200 - SendEmailActivity must allow setting of debug and starttls
- #191 - Create a SparkActivity-type step that runs a single step using HadoopActivity instead of MapReduceActivity
- #160 - Better SNS alarm format support
- #197 - Update the default EMR AMI version to 3.7 and Spark version to 1.4.0
- #195 - RepartitionFile emitting empty files
- #192 - StringParameter should have implicit conversion to String
- #186 - Change collection constructors to use
.empty
- #188 - SftpDownloadActivity should obey skip-empty as well and it needs to properly handle empty compressed files
- #189 - SftpUploadActivity, SftpDownloadActivity and SplitMergeFilesActivity should be able to write a _SUCCESS file
- #184 - Properties for new notification activities are not properly exposed in the Activity definition
- #181 - Remove
spark.yarn.user.classpath.first
conf for running Spark
- #172 - Create activity to send generic SNS message
- #173 - Create activity to send generic SQS message
- #174 - Create activity to send Flowdock notifications
- 179 - Single quotes in SFTP Activitys date format breaks DataPipeline
- 177 - The SFTP activity should support a --since to download files since a date
- 175 - Need to be able to pass options to java in addition to arguments to the main class
- #166 - If the input is empty, split-merge should not create an empty file with headers
- #167 - SftpActivity needs an option to not upload empty files
- #157 - Use a separate workflow/dependency graph to manage dependency building
- #162 - Need way to specify no activity, to allow omitting steps in a workflow expression
- #155 - Workflow breaks when having ArrowDependency on the right hand side.
- #153 - The create --force action doesnt detect existing pipelines if there are more than 25 active pipelines
- #150 - The whenMet method returns DataNode instead of S3DataNode
- #149 - Preconditions are not returned in objects for DataNodes
- #146 - RepartitionFile doesnt properly add header if creating a single merged file
- #144 - SplitMergeFileActivity isnt properly compressing final merged output
- #142 - Arguments to SFTP activity are incorrect
- #140 - SendEmailActivity runner isnt being published
- #138 - Make parameter key work for starting letter with lower case
- #136 - Fix a bug that database object is not included
- #133 - SftpActivity needs to support S3 URLs for identity file and download as appropriate
- #131 - SplitMergeFiles should take strings for bufferSize and bytesPerFile
- #2 - Implement SftpUploadActivity
- #3 - Implement SftpDownloadActivity
- #98 - Add an activity to use SES to send emails rather than mailx
- #103 - Provide an activity to split files
- #107 - Support Worker Groups
- #108 - Add attemptTimeout
- #109 - Add lateAfterTimeout
- #110 - Add maximumRetries
- #111 - Add retryDelay
- #112 - Add failureAndRerunMode
- #115 - Add ShellScriptConfig
- #116 - Add HadoopActivity
- #125 - Support collections on WorkflowExpression
- #127 - Better type safety for MainClass
- #106 - Upgrade to Scala 2.11.7
- #113 - Reorder parameters for consistency
- #114 - Move non-core activities to a contrib project
- #117 - Better type safety for PipelineObjectId
- #118 - Better type safety for DpPeriod
- #119 - Better type safety for S3 URIs
- #120 - Better type safety for scripts/scriptUris
- #121 - RedshiftUnloadActivitys Access Key Id/Secret be encrypted StringParameters
- #122 - AdpS3DataNode should be a 1:1 match to AWS objects
- #123 - Rename S3DataNode.fromPath to apply
- #128 - Schedule to be constructed via cron/timeSeries/onceAtActivation
- #129 - Merge ExpressionDSL into Expression classes and expand functions available
- #130 - Rename DateTimeRef to RuntimeSlot to denote real uses
- #99 - Hyperion CLI driver should exit with appropriate error codes
- #91 - workflow dsl broken when the right hand side of andThen have dependencies. Note that
act1 + act2
is no longer the same asSeq(act1, act2)
any more.
- #101 - Allow workflow DSL to have duplicated activities.
- #25 - Added a run-python runner script and PythonActivity
- #89 - Added an activity to email input staging folders
- #90 - Added an activity to merge input staging folders and upload to output staging folders
- #80 - Change jar-based activities/steps to require a jar
- #83 - Remove dependency assertion in WorkflowDSL
- #84 - Drop dependsOn and require WorkflowDSL
- #81 - Regression: --region parameter is now effectively required on non-EC2 instances due to call to
getCurrentRegion
.
- #78 - Strip trailing $ from MainClass
- #65 - Ability to use roles via STS assume-role
- #68 - No longer specify AWS keys in configuration for RedshiftUnloadActivity - now must specify as arguments to activity
- #74 - DataNode should return path using toString
- #64 - Supports non-default region
- #69 - Role and ResourceRole were not getting properly defaulted on resources
- #4 - Added S3DistCpActivity
- #63 - ActionOn* and SchedulerType case objects properly inherit from trait
- #62 - role and resourceRole to EmrCluster types as well as additional missing properties
- #59 - workflow DSL
- #54 - with* methods that take a sequence are now additive, and replaced withColumns(Seq[String]) with withColumns(String...)
- #56 - reorganize objects into packages by type
- #50 - In ShellCommandActivity, make command and scriptUri Either
- #51 - When taskInstanceCount == 0 need to make sure other taskInstance parameters are set to None
- #48 - Pipeline blows up if sns.topic is not set
- #46 - Support remaining properties on resources
- #45 - Support VPC by adding subnetId
- Use Option to construct options instead of Some
- #40 - Hyperion CLI continue retry to delete the pipeline when --force is used
- #41 - Refactor Option to Option[Seq] functions
- #33 - Added support for tags
- #6 - Support remaining schedule aspects
- #14 - Make datapipelineDef be able to have an CLI and remove the Hyperion executable
- #5 - Support parameters
- #26 - ShellCommandActivity input and output should actually be a sequence of DataNodes.
- #18 - Renamed runCopyActivity on EC2Resource to runCopy
- #13 - Support SQL related databases and the relevant data nodes
- #20 - Support Actions
- #9 - Additional activity types (PigActivity, HiveActivity, HiveCopyActivity, CopyActivity)
- #15 - downgrade json4s to 3.2.10
- #11 - Spark and MapReduce should dependOn PipelineActivity
- First public release