Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature to ignore Iceberg tables #185

Merged
merged 67 commits into from
Nov 28, 2024
Merged
Show file tree
Hide file tree
Changes from 59 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
8a00255
Updated to BK-core
Nov 20, 2024
6baeee4
Updated to path-cleanup
Nov 20, 2024
7090dc4
Update PagingCleanupServiceTest.java
Nov 20, 2024
0f99aa6
cleanup
javsanbel2 Nov 20, 2024
9bbd52d
Merge branch 'feature/prevent-actions-on-iceberg-tables' of github.co…
javsanbel2 Nov 20, 2024
1e339c9
cleanup 2
javsanbel2 Nov 20, 2024
62c68d5
main business logic
javsanbel2 Nov 20, 2024
b6c718f
adding exception
javsanbel2 Nov 20, 2024
c5ad343
Add DB & table name to exception message
Nov 20, 2024
fd454a8
Update IcebergValidator.java
Nov 20, 2024
e41ac6c
Create IcebergValidatorTest.java
Nov 20, 2024
a2939d5
Update HiveMetadataCleanerTest.java
Nov 20, 2024
223e086
Updating and adding S3PathCleaner tests
Nov 20, 2024
1f6e360
Adding IcebergValidator to constructors
Nov 20, 2024
1905b0a
Updating Junit imports
Nov 20, 2024
efca2c9
Update SchedulerApiaryTest.java
Nov 20, 2024
d16bc0a
Update CommonBeans
Nov 20, 2024
9bf7248
clean-up add comment
Nov 21, 2024
4c45de2
Remove extra deletion
Nov 21, 2024
61c2f88
adding beans
javsanbel2 Nov 21, 2024
4e0b82b
fix tests
Nov 21, 2024
e548873
fixing it tests for metadata cleanup
javsanbel2 Nov 21, 2024
8b1ca85
fix path cleanup
javsanbel2 Nov 21, 2024
631502b
fix main problem with tests
javsanbel2 Nov 21, 2024
c1a7c96
Fix BeekeeperDryRunPathCleanupIntegrationTest
Nov 21, 2024
90c2871
revert changes to fix BeekeeperExpiredMetadataSchedulerApiaryIntegrat…
Nov 21, 2024
45fcc26
Added missing properties to fix BeekeeperUnreferencedPathSchedulerApi…
Nov 21, 2024
06914d1
Add integration test for metadatacleanup
Nov 22, 2024
a09e9f1
Update metadataHandler to catch beekeeperException
Nov 24, 2024
8c1ce38
cleanup
Nov 24, 2024
33a22c1
Update path-cleanup housekeeping status
Nov 25, 2024
a4b896a
cleanup
Nov 25, 2024
1be81b8
cleanup
Nov 25, 2024
66ad261
cleanup
Nov 25, 2024
0948aea
Update beekeeper to runtime exception
Nov 25, 2024
ed8745f
bump versions for testing
Nov 25, 2024
1047c57
Add Hadoop dependencies
Nov 25, 2024
eb13799
Update pom.xml
Nov 25, 2024
812565e
Revert changes to beekeeper-path
Nov 26, 2024
26b404c
revert more path-cleanup
Nov 26, 2024
101ab88
Revert path-cleanup
Nov 26, 2024
c646650
cleanup
Nov 26, 2024
e71a5ae
Added logging for table params
Nov 26, 2024
eea8403
add logging
Nov 26, 2024
95e6c64
remove logs to check filters
Nov 26, 2024
804be2f
cleaning up
javsanbel2 Nov 27, 2024
ae26519
fix validator tests
javsanbel2 Nov 27, 2024
c2e0b3f
clean up it tests
javsanbel2 Nov 27, 2024
07174b2
change expired metadata handler
javsanbel2 Nov 27, 2024
58c6e65
fix leninet
javsanbel2 Nov 27, 2024
a32e9d0
Add IcebergTableListenerEventFilter
Nov 27, 2024
9a93bd7
add event
javsanbel2 Nov 27, 2024
db1352a
Add integration test for scheduler
Nov 27, 2024
f300e60
Revert versions used for testing & changelog
Nov 27, 2024
bacd477
Revert testing version
Nov 27, 2024
04bb806
Update beekeeper-scheduler-apiary/src/main/java/com/expediagroup/beek…
HamzaJugon Nov 27, 2024
f94ba5d
Updating asserts and remove unused logging
Nov 27, 2024
1206eb8
Merge branch 'feature/prevent-actions-on-iceberg-tables' of https://g…
Nov 27, 2024
5517c7f
Implement IsIcebergTablePredicate
Nov 27, 2024
e66982e
revert changes to schedulerApiary
Nov 27, 2024
5e67a64
Update SchedulerApiary.java
Nov 27, 2024
a65f066
Updating logging so we only see stack trace on debug level
Nov 27, 2024
fd6bd88
Update logging in ExpiredMetadataHandler
Nov 27, 2024
026e769
Updating for minor comments
Nov 27, 2024
070b34d
Update logging
Nov 28, 2024
1418f1b
Update CHANGELOG.md
HamzaJugon Nov 28, 2024
b80e71d
Update CHANGELOG.md
HamzaJugon Nov 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [3.6.0] - 2024-10-27
HamzaJugon marked this conversation as resolved.
Show resolved Hide resolved
### Added
- Added filter for Iceberg tables in `beekeeper-scheduler-apiary` to prevent scheduling paths and metadata for deletion.
- Added `IcebergValidator` to ensure Iceberg tables are identified and excluded from cleanup operations.

## [3.5.7] - 2024-10-25
### Changed
- Added error handling for bad requests with incorrect sort parameters.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,28 +18,35 @@
import com.expediagroup.beekeeper.cleanup.metadata.CleanerClient;
import com.expediagroup.beekeeper.cleanup.metadata.MetadataCleaner;
import com.expediagroup.beekeeper.cleanup.monitoring.DeletedMetadataReporter;
import com.expediagroup.beekeeper.cleanup.validation.IcebergValidator;
import com.expediagroup.beekeeper.core.config.MetadataType;
import com.expediagroup.beekeeper.core.model.HousekeepingMetadata;
import com.expediagroup.beekeeper.core.monitoring.TimedTaggable;

public class HiveMetadataCleaner implements MetadataCleaner {

private DeletedMetadataReporter deletedMetadataReporter;
private IcebergValidator icebergValidator;

public HiveMetadataCleaner(DeletedMetadataReporter deletedMetadataReporter) {
public HiveMetadataCleaner(DeletedMetadataReporter deletedMetadataReporter, IcebergValidator icebergValidator) {
this.deletedMetadataReporter = deletedMetadataReporter;
this.icebergValidator = icebergValidator;
}

@Override
@TimedTaggable("hive-table-deleted")
public void dropTable(HousekeepingMetadata housekeepingMetadata, CleanerClient client) {
icebergValidator.throwExceptionIfIceberg(housekeepingMetadata.getDatabaseName(),
housekeepingMetadata.getTableName());
client.dropTable(housekeepingMetadata.getDatabaseName(), housekeepingMetadata.getTableName());
deletedMetadataReporter.reportTaggable(housekeepingMetadata, MetadataType.HIVE_TABLE);
}

@Override
@TimedTaggable("hive-partition-deleted")
public boolean dropPartition(HousekeepingMetadata housekeepingMetadata, CleanerClient client) {
icebergValidator.throwExceptionIfIceberg(housekeepingMetadata.getDatabaseName(),
housekeepingMetadata.getTableName());
boolean partitionDeleted = client
.dropPartition(housekeepingMetadata.getDatabaseName(), housekeepingMetadata.getTableName(),
housekeepingMetadata.getPartitionName());
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
/**
* Copyright (C) 2019-2024 Expedia, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.expediagroup.beekeeper.cleanup.validation;

import static java.lang.String.format;

import java.util.Map;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.expediagroup.beekeeper.cleanup.metadata.CleanerClient;
import com.expediagroup.beekeeper.cleanup.metadata.CleanerClientFactory;
import com.expediagroup.beekeeper.core.error.BeekeeperIcebergException;
import com.expediagroup.beekeeper.core.predicate.IsIcebergTablePredicate;

public class IcebergValidator {

private static final Logger log = LoggerFactory.getLogger(IcebergValidator.class);

private final CleanerClientFactory cleanerClientFactory;
private final IsIcebergTablePredicate isIcebergTablePredicate;

public IcebergValidator(CleanerClientFactory cleanerClientFactory) {
this.cleanerClientFactory = cleanerClientFactory;
this.isIcebergTablePredicate = new IsIcebergTablePredicate();
}

/**
* Beekeeper currently does not support the Iceberg format. Iceberg tables in the Hive Metastore do not store partition information,
* causing Beekeeper to attempt to clean up the entire table due to the missing information. This method checks if
* the table is an Iceberg table and throws a BeekeeperIcebergException to stop the process.
*
* @param databaseName
* @param tableName
*/
public void throwExceptionIfIceberg(String databaseName, String tableName) {
try (CleanerClient client = cleanerClientFactory.newInstance()) {
Map<String, String> tableParameters = client.getTableProperties(databaseName, tableName);

if (isIcebergTablePredicate.test(tableParameters)) {
throw new BeekeeperIcebergException(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe using IllegalStateException would have been clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are purposely using a custom exception to handle the exception in the main class later on.

format("Iceberg table %s.%s is not currently supported in Beekeeper.", databaseName, tableName));
}
} catch (Exception e) {
throw new BeekeeperIcebergException(
format("Unexpected exception when identifying if table %s.%s is Iceberg.", databaseName, tableName), e);
}
}
}
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* Copyright (C) 2019-2023 Expedia, Inc.
* Copyright (C) 2019-2024 Expedia, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -23,13 +23,13 @@
import java.time.LocalDateTime;

import org.apache.hadoop.fs.s3a.BasicAWSCredentialsProvider;
import org.junit.Rule;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.Mock;
import org.mockito.junit.jupiter.MockitoExtension;
import org.testcontainers.containers.localstack.LocalStackContainer;
import org.testcontainers.junit.jupiter.Container;
import org.testcontainers.junit.jupiter.Testcontainers;
import org.testcontainers.utility.DockerImageName;

Expand Down Expand Up @@ -58,20 +58,18 @@ class S3DryRunPathCleanerTest {
private HousekeepingPath housekeepingPath;
private AmazonS3 amazonS3;
private @Mock BytesDeletedReporter bytesDeletedReporter;

private boolean dryRunEnabled = true;

private S3PathCleaner s3DryRunPathCleaner;

@Rule
@Container
public static LocalStackContainer awsContainer = new LocalStackContainer(
DockerImageName.parse("localstack/localstack:0.14.2")).withServices(S3);
static {
awsContainer.start();
}
public static String S3_ENDPOINT = awsContainer.getEndpointConfiguration(S3).getServiceEndpoint();

@BeforeEach
void setUp() {
String S3_ENDPOINT = awsContainer.getEndpointConfiguration(S3).getServiceEndpoint();
amazonS3 = AmazonS3ClientBuilder
.standard()
.withCredentials(new BasicAWSCredentialsProvider("accesskey", "secretkey"))
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* Copyright (C) 2019-2021 Expedia, Inc.
* Copyright (C) 2019-2024 Expedia, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -15,6 +15,8 @@
*/
package com.expediagroup.beekeeper.cleanup.hive;

import static org.junit.Assert.assertThrows;
import static org.mockito.Mockito.doThrow;
import static org.mockito.Mockito.never;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;
Expand All @@ -26,7 +28,9 @@
import org.mockito.junit.jupiter.MockitoExtension;

import com.expediagroup.beekeeper.cleanup.monitoring.DeletedMetadataReporter;
import com.expediagroup.beekeeper.cleanup.validation.IcebergValidator;
import com.expediagroup.beekeeper.core.config.MetadataType;
import com.expediagroup.beekeeper.core.error.BeekeeperIcebergException;
import com.expediagroup.beekeeper.core.model.HousekeepingMetadata;

@ExtendWith(MockitoExtension.class)
Expand All @@ -35,6 +39,7 @@ public class HiveMetadataCleanerTest {
private @Mock HousekeepingMetadata housekeepingMetadata;
private @Mock DeletedMetadataReporter deletedMetadataReporter;
private @Mock HiveClient hiveClient;
private @Mock IcebergValidator icebergValidator;

private HiveMetadataCleaner cleaner;
private static final String DATABASE = "database";
Expand All @@ -43,14 +48,18 @@ public class HiveMetadataCleanerTest {

@BeforeEach
public void init() {
cleaner = new HiveMetadataCleaner(deletedMetadataReporter);
cleaner = new HiveMetadataCleaner(deletedMetadataReporter, icebergValidator);
}

@Test
public void typicalDropTable() {
when(housekeepingMetadata.getDatabaseName()).thenReturn(DATABASE);
when(housekeepingMetadata.getTableName()).thenReturn(TABLE_NAME);

cleaner.dropTable(housekeepingMetadata, hiveClient);

verify(icebergValidator).throwExceptionIfIceberg(DATABASE, TABLE_NAME);
verify(hiveClient).dropTable(DATABASE, TABLE_NAME);
verify(deletedMetadataReporter).reportTaggable(housekeepingMetadata, MetadataType.HIVE_TABLE);
}

Expand All @@ -62,6 +71,9 @@ public void typicalDropPartition() {
when(hiveClient.dropPartition(DATABASE, TABLE_NAME, PARTITION_NAME)).thenReturn(true);

cleaner.dropPartition(housekeepingMetadata, hiveClient);

verify(icebergValidator).throwExceptionIfIceberg(DATABASE, TABLE_NAME);
verify(hiveClient).dropPartition(DATABASE, TABLE_NAME, PARTITION_NAME);
verify(deletedMetadataReporter).reportTaggable(housekeepingMetadata, MetadataType.HIVE_PARTITION);
}

Expand All @@ -81,4 +93,36 @@ public void tableExists() {
cleaner.tableExists(hiveClient, DATABASE, TABLE_NAME);
verify(hiveClient).tableExists(DATABASE, TABLE_NAME);
}

@Test
public void doesNotDropTableWhenIcebergTable() {
when(housekeepingMetadata.getDatabaseName()).thenReturn(DATABASE);
when(housekeepingMetadata.getTableName()).thenReturn(TABLE_NAME);
doThrow(new BeekeeperIcebergException("Iceberg table"))
.when(icebergValidator).throwExceptionIfIceberg(DATABASE, TABLE_NAME);

assertThrows(
BeekeeperIcebergException.class,
() -> cleaner.dropTable(housekeepingMetadata, hiveClient)
);

verify(hiveClient, never()).dropTable(DATABASE, TABLE_NAME);
verify(deletedMetadataReporter, never()).reportTaggable(housekeepingMetadata, MetadataType.HIVE_TABLE);
}

@Test
public void doesNotDropPartitionWhenIcebergTable() {
when(housekeepingMetadata.getDatabaseName()).thenReturn(DATABASE);
when(housekeepingMetadata.getTableName()).thenReturn(TABLE_NAME);
doThrow(new BeekeeperIcebergException("Iceberg table"))
.when(icebergValidator).throwExceptionIfIceberg(DATABASE, TABLE_NAME);

assertThrows(
BeekeeperIcebergException.class,
() -> cleaner.dropPartition(housekeepingMetadata, hiveClient)
);

verify(hiveClient, never()).dropPartition(DATABASE, TABLE_NAME, PARTITION_NAME);
verify(deletedMetadataReporter, never()).reportTaggable(housekeepingMetadata, MetadataType.HIVE_PARTITION);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
/**
* Copyright (C) 2019-2024 Expedia, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.expediagroup.beekeeper.cleanup.validation;

import static org.assertj.core.api.AssertionsForClassTypes.assertThatThrownBy;
import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;

import java.util.HashMap;
import java.util.Map;

import org.junit.Before;
import org.junit.Test;

import com.expediagroup.beekeeper.cleanup.metadata.CleanerClient;
import com.expediagroup.beekeeper.cleanup.metadata.CleanerClientFactory;
import com.expediagroup.beekeeper.core.error.BeekeeperIcebergException;

public class IcebergValidatorTest {

private CleanerClientFactory cleanerClientFactory;
private CleanerClient cleanerClient;
private IcebergValidator icebergValidator;

@Before
public void setUp() throws Exception {
cleanerClientFactory = mock(CleanerClientFactory.class);
cleanerClient = mock(CleanerClient.class);
when(cleanerClientFactory.newInstance()).thenReturn(cleanerClient);
icebergValidator = new IcebergValidator(cleanerClientFactory);
}

@Test(expected = BeekeeperIcebergException.class)
public void shouldThrowExceptionWhenTableTypeIsIceberg() throws Exception {
Map<String, String> properties = new HashMap<>();
properties.put("table_type", "ICEBERG");

when(cleanerClient.getTableProperties("db", "table")).thenReturn(properties);

icebergValidator.throwExceptionIfIceberg("db", "table");
verify(cleanerClientFactory).newInstance();
verify(cleanerClient).close();
}

@Test(expected = BeekeeperIcebergException.class)
public void shouldThrowExceptionWhenMetadataIsIceberg() throws Exception {
Map<String, String> properties = new HashMap<>();
properties.put("metadata_location", "s3://db/table/metadata/0000.json");

when(cleanerClient.getTableProperties("db", "table")).thenReturn(properties);

icebergValidator.throwExceptionIfIceberg("db", "table");
}

@Test
public void shouldNotThrowExceptionForNonIcebergTable() throws Exception {
Map<String, String> properties = new HashMap<>();
properties.put("table_type", "HIVE_TABLE");

when(cleanerClient.getTableProperties("db", "table")).thenReturn(properties);

icebergValidator.throwExceptionIfIceberg("db", "table");
verify(cleanerClientFactory).newInstance();
verify(cleanerClient).close();
}

@Test
public void shouldThrowExceptionWhenOutputFormatIsNull() throws Exception {
Map<String, String> properties = new HashMap<>();
properties.put("table_type", null);
properties.put("metadata_location", null);

when(cleanerClient.getTableProperties("db", "table")).thenReturn(properties);

assertThatThrownBy(() -> icebergValidator.throwExceptionIfIceberg("db", "table")).isInstanceOf(
BeekeeperIcebergException.class);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
/**
* Copyright (C) 2019-2024 Expedia, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.expediagroup.beekeeper.core.error;

public class BeekeeperIcebergException extends BeekeeperException {

private static final long serialVersionUID = 1L;

public BeekeeperIcebergException(String message, Exception e) {
super(message, e);
}

public BeekeeperIcebergException(String message, Throwable e) {
super(message, e);
}

public BeekeeperIcebergException(String message) {
super(message);
}
}
Loading