Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add some args (upload/download interval and parallelism on upload) #5

Closed
wants to merge 14 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,4 @@ out/

### VS Code ###
.vscode/
mirror/
29 changes: 17 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,22 +31,27 @@ Each argument is prepended with "`--`".

* You need at least Java 11 to run the tool.
* Obviously: read access on the source repository, write access on the target repository.
* The source maven repository has to provide an "index-like" html view of your repository (successfully tested with Nexus2).
* The source maven repository has to provide an "index-like" html view of your repository (successfully tested with Nexus2 and Nexus3).
* The destination repository must accept HTTP PUT requests for storing maven artifacts and accept HTTP Basic authentication (successfully tested with Nexus3).

### Configuration
The command-line arguments are as follows (arguments are prepended with "`--`"):

Argument | Mandatory | Description
----------------|-----------|---------------
source.root-url | Yes | The root URL from which the source repository should be scraped.
source.user | No | The username used on the source repository (default: no authentication).
source.password | No | The password used on the source repository (default: no authentication).
mirror-path | No | The path on the local file system to be used for mirroring repository content (default: `./mirror/`)
target.root-url | Yes | The root URL where the content of the mirror repository should be uploaded.
target.user | No | The username used on the target repository (default: no authentication).
target.password | No | The password used on the target repository (default: no authentication).
actions | No | Comma-separated list of actions to be performed.<br/>Possible values:<ul><li>**`mirror`:** copy content from the source repository to the `mirror-path` on the local filesystem</li><li>**`publish`:** upload content from the `mirror-path` on the local filesystem to the target repository</li></ul>(default: `mirror,publish`)
----------------|----------|---------------
source.root-url | Yes | The root URL from which the source repository should be scraped.
source.user | No | The username used on the source repository (default: no authentication).
source.password | No | The password used on the source repository (default: no authentication).
source.download-interval | No | The interval between downloads, value in milliseconds, used to reduce load on source-server (default: 1000)
mirror-path | No | The path on the local file system to be used for mirroring repository content (default: `./mirror/`)
target.root-url | Yes | The root URL where the content of the mirror repository should be uploaded.
target.user | No | The username used on the target repository (default: no authentication).
target.password | No | The password used on the target repository (default: no authentication).
target.upload-interval | No | The interval between uploads, value in milliseconds, used to reduce load on target-server (default: 1000)
target.concurrent-uploads| No | Number of parallel uploads, parallelism is only supported on same folder, like foo/bar/artifact/1.0.0/ (default: 2)
netmikey marked this conversation as resolved.
Show resolved Hide resolved
target.skip-existing | No | Boolean value (`true` or `false`).<p>If set to `true`, skip upload of artifacts that already exist on the target. Setting this to `true` also adds an additionnal http HEAD request before each upload to check whether the file already exists on the target.<p>Setting this to `false` will skip the http HEAD request and will upload every mirrored file regardles of whether it already exists on target. (default: `false`)
target.abort-on-error | No | Boolean value (`true` or `false`).<p>If set to `true`, abort processing immediately when an upload fails. If set to `false`, continue uploading until all mirrored files have been processed, regardless of errors. (default: `true`)
actions | No | Comma-separated list of actions to be performed.<br/>Possible values:<ul><li>**`check`:** just check source and target connections by trying to read the repositories' indexes. This is a read-only action and doesn't write anything, neither locally nor rempotely.</li><li>**`mirror`:** copy content from the source repository to the `mirror-path` on the local filesystem</li><li>**`publish`:** upload content from the `mirror-path` on the local filesystem to the target repository</li></ul>(default: `mirror,publish`)

### Example

Expand All @@ -63,8 +68,8 @@ This would:
## Noteworthy

### Resumability
* The mirroring can be resumed later: artifacts already present in the mirror-path won't be downloaded again (no consistency checks are done though!)
* The upload will not check for already-uploaded artifacts and will always start from the beginning and upload everything.
* The mirroring can be resumed later: artifacts already present in the mirror-path won't be downloaded again (no consistency checks are done though!).
* If `target.skip-existing` is set to `true`, the publish action will still run through all mirrored files to check whether they're present on target, but won't re-transfer them if they're already present (no consistency checks are done though!).

### Publishing path consistency
You can:
Expand Down
97 changes: 97 additions & 0 deletions src/main/java/io/github/netmikey/mvncloner/mvncloner/Checker.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
package io.github.netmikey.mvncloner.mvncloner;

import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpRequest.BodyPublishers;
import java.net.http.HttpResponse.BodyHandlers;
import java.util.List;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;

import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;

/**
* Checks the connections that will be used by the Scraper and the Publisher.
*/
@Component
public class Checker {

private static final Logger LOG = LoggerFactory.getLogger(Checker.class);

@Value("${source.root-url}")
private String sourceRootUrl;

@Value("${source.user:#{null}}")
private String sourceUsername;

@Value("${source.password:#{null}}")
private String sourcePassword;

@Value("${target.root-url}")
private String targetRootUrl;

@Value("${target.user:#{null}}")
private String targetUsername;

@Value("${target.password:#{null}}")
private String targetPassword;

public void check() {
checkSource();
checkTarget();
}

private void checkSource() {
try {
Utils.withNewWebClient(sourceRootUrl, sourceUsername, sourcePassword, webClient -> {
try {
var page = webClient.getPage(sourceRootUrl);
LOG.info("Source connection check succeeded! ({}, responded with {})", sourceRootUrl,
page.getWebResponse().getStatusCode());
} catch (MalformedURLException e) {
throw new IllegalStateException("Connection check failed: malformed source url " + sourceRootUrl
+ " caused: " + e.getMessage(), e);
} catch (IOException e) {
throw new IllegalStateException("Connection check failed: source at " + sourceRootUrl
+ " caused IOException: " + e.getMessage(), e);
} catch (FailingHttpStatusCodeException e) {
throw new IllegalStateException("Connection check failed: source at " + sourceRootUrl
+ " responded with http status code " + e.getStatusCode() + ": " + e.getMessage(), e);
}
});
} catch (URISyntaxException e) {
throw new IllegalStateException("Connection check failed: malformed source url " + sourceRootUrl
+ " caused: " + e.getMessage(), e);
}
}

private void checkTarget() throws IllegalStateException {
HttpClient httpClient = HttpClient.newBuilder().build();
try {
var baseReq = Utils.setCredentials(HttpRequest.newBuilder(), targetUsername, targetPassword)
.uri(new URI(targetRootUrl));
var response = httpClient.send(baseReq.method("HEAD", BodyPublishers.noBody()).build(), BodyHandlers.discarding());

if (!List.of(200, 404).contains(response.statusCode())) {
throw new IllegalStateException("Connection check failed: target at " + targetRootUrl
+ " responded with http status code " + response.statusCode());
}
} catch (IOException e) {
throw new IllegalStateException("Connection check failed: target at " + targetRootUrl
+ " caused IOException: " + e.getMessage(), e);
} catch (InterruptedException e) {
throw new IllegalStateException("Connection check failed: interrupted while checking target at " + targetRootUrl
+ ": " + e.getMessage(), e);
} catch (URISyntaxException e) {
throw new IllegalStateException("Connection check failed: malformed target url " + targetRootUrl
+ " caused: " + e.getMessage(), e);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@
public class MvnCloner implements CommandLineRunner {

private static final org.slf4j.Logger LOG = org.slf4j.LoggerFactory.getLogger(MvnCloner.class);

@Autowired
private Checker checker;

@Autowired
private Scraper scraper;
Expand All @@ -26,7 +29,9 @@ public class MvnCloner implements CommandLineRunner {

@Override
public void run(String... args) throws Exception {

if (actions.contains("check")) {
checker.check();
}
if (actions.contains("mirror")) {
scraper.mirror();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
@SpringBootApplication
public class MvnclonerApplication {

public static void main(String[] args) {
SpringApplication.run(MvnclonerApplication.class, args);
}
public static void main(String[] args) {
SpringApplication.run(MvnclonerApplication.class, args);
}

}
98 changes: 69 additions & 29 deletions src/main/java/io/github/netmikey/mvncloner/mvncloner/Publisher.java
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,8 @@
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import java.util.ArrayList;
import java.util.List;
import java.util.*;
import java.util.concurrent.*;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
Expand All @@ -22,7 +21,7 @@

/**
* Publishes from the mirror directory into a remote target maven repository.
*
*
* @author mike
*/
@Component
Expand All @@ -39,68 +38,109 @@ public class Publisher {
@Value("${target.password:#{null}}")
private String password;

@Value("${target.skip-existing:false}")
private Boolean skipExisting;

@Value("${target.abort-on-error:true}")
private Boolean abortOnError;

@Value("${target.upload-interval:1000}")
private Integer uploadInterval;

@Value("${target.concurrent-uploads:2}")
private Integer concurrentUploads;

@Value("${mirror-path:./mirror/}")
private String rootMirrorPath;

private ExecutorService runner;

private PublishingSummary summary;

public void publish() throws Exception {
summary = new PublishingSummary();
runner = Executors.newFixedThreadPool(concurrentUploads);
LOG.info("Publishing to " + rootUrl + " ...");
HttpClient httpClient = HttpClient.newBuilder().build();
publishDirectory(httpClient, rootUrl, Paths.get(rootMirrorPath).normalize());
LOG.info("Publishing complete.");
runner.shutdown();
LOG.info("Publishing complete. Summary of processed artifacts: " + summary);
}

public void publishDirectory(HttpClient httpClient, String repositoryUrl, Path mirrorPath)
throws IOException, InterruptedException {
throws IOException, ExecutionException, InterruptedException {

LOG.debug("Switching to mirror directory: " + mirrorPath.toAbsolutePath());

List<Path> recursePaths = new ArrayList<>();

List<Path> recursePaths = new LinkedList<>();
Collection<Future<?>> futures = new HashSet<>();
try (DirectoryStream<Path> stream = Files.newDirectoryStream(mirrorPath)) {
for (Path path : stream) {
if (Files.isDirectory(path)) {
recursePaths.add(path);
} else {
handleFile(httpClient, repositoryUrl, path);
futures.add(runner.submit(() -> handleFile(httpClient, repositoryUrl, path)));
}
}
}

for (var future : futures) {
future.get();
}

// Tail recursion
for (Path recursePath : recursePaths) {
String subpath = mirrorPath.relativize(recursePath).toString();
publishDirectory(httpClient, appendUrlPathSegment(repositoryUrl, subpath), recursePath);
}
}

private void handleFile(HttpClient httpClient, String repositoryUrl, Path path)
throws IOException, InterruptedException {
private void handleFile(HttpClient httpClient, String repositoryUrl, Path path) {

String filename = path.getFileName().toString();
String targetUrl = repositoryUrl + filename;
LOG.info("Uploading " + targetUrl);

Utils.sleep(1000);
HttpRequest request = Utils.setCredentials(HttpRequest.newBuilder(), username, password)
.uri(URI.create(targetUrl))
.PUT(BodyPublishers.ofInputStream(() -> {
try {
return Files.newInputStream(path, StandardOpenOption.READ);
} catch (IOException e) {
throw new RuntimeException(e);
try {
var baseReq = Utils.setCredentials(HttpRequest.newBuilder(), username, password)
.uri(URI.create(targetUrl));

if (skipExisting) {
var getResponse = httpClient.send(baseReq.method("HEAD", BodyPublishers.noBody()).build(),
BodyHandlers.discarding());
if (getResponse.statusCode() == 200) {
LOG.info("Artifact {} already exists", targetUrl);
LOG.debug(" Response headers: " + getResponse.headers());
summary.incrementSkippedAlreadyPresent();
return;
}
}))
.build();
HttpResponse<String> response = httpClient.send(request, BodyHandlers.ofString());
if (response.statusCode() < 200 || response.statusCode() > 299) {
LOG.error("Error uploading " + targetUrl + " : Response code was " + response.statusCode());
LOG.debug(" Response headers: " + response.headers());
LOG.debug(" Response body: " + response.body());
}

LOG.info("Uploading " + targetUrl);
Utils.sleep(uploadInterval);

var putRequest = baseReq.PUT(BodyPublishers.ofFile(path)).build();

HttpResponse<String> putResponse = httpClient.send(putRequest, BodyHandlers.ofString());
if (putResponse.statusCode() < 200 || putResponse.statusCode() > 299) {
LOG.error("Error uploading " + targetUrl + " : Response code was " + putResponse.statusCode());
LOG.debug(" Response headers: " + putResponse.headers());
LOG.debug(" Response body: " + putResponse.body());
summary.incrementFailed();
}
summary.incrementPublished();
} catch (Error | InterruptedException e) {
summary.incrementFailed();
throw new RuntimeException(e);
} catch (Exception e) {
summary.incrementFailed();
LOG.error("Failed to send " + filename + " to " + targetUrl, e);
if (abortOnError) {
throw new RuntimeException(e);
}
}
}

private String appendUrlPathSegment(String baseUrl, String segment) {
StringBuffer result = new StringBuffer(baseUrl);
StringBuilder result = new StringBuilder(baseUrl);

if (!baseUrl.endsWith("/")) {
result.append('/');
Expand Down
Loading