Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: [log retention improvements pt. 3] impl and use per hour file layout #2570

Open
wants to merge 82 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
ee510cd
add time measurements in cli, engine, log file parsing
tedim52 Jul 31, 2024
f881126
add more granular measurements
tedim52 Jul 31, 2024
487e5d1
use buffered log channel
tedim52 Aug 1, 2024
f316abe
batch send log lines
tedim52 Aug 1, 2024
0614976
refactor to use log line sender
tedim52 Aug 9, 2024
75fd409
encapsulate buff channel inside log line sender
tedim52 Aug 9, 2024
42f7a30
refactor again and get tests to pass
tedim52 Aug 9, 2024
5fe30ff
flush logs and close channel when empty
tedim52 Aug 9, 2024
929f4b2
clean up
tedim52 Aug 9, 2024
fdd8bf3
undo build script change
tedim52 Aug 9, 2024
0883f39
Merge branch 'main' into tedi/logspeedup
tedim52 Aug 9, 2024
dc9d1d1
name mutex
tedim52 Aug 9, 2024
d2b9f86
lint
tedim52 Aug 9, 2024
813c98b
increase seconds to wait for logs
tedim52 Aug 9, 2024
6349c61
rename send logl ine
tedim52 Aug 10, 2024
9510e22
move log line before function
tedim52 Aug 10, 2024
3b73af1
flush before follow
tedim52 Aug 10, 2024
470c61f
clear buffers after flushing
tedim52 Aug 10, 2024
2c489e2
revert times
tedim52 Aug 10, 2024
2c1f0bf
lint
tedim52 Aug 10, 2024
815eded
turn off cypress tests
tedim52 Aug 10, 2024
ccd49c3
remove k cloud ref
tedim52 Aug 10, 2024
684cbd9
use latest docs checker
tedim52 Aug 10, 2024
fff5ff0
use latest docs checker again
tedim52 Aug 10, 2024
d00bb2b
create file layout interface
tedim52 Aug 13, 2024
8734578
reimplement per week using file layout
tedim52 Aug 13, 2024
3febce2
progress on file layout
tedim52 Aug 13, 2024
7d3dd38
add get log filepath, migrate tests to use get log filepath, add some…
tedim52 Aug 14, 2024
11449c1
Merge branch 'main' into tedi/granularetention
tedim52 Aug 14, 2024
aacffb6
remove screenshots
tedim52 Aug 14, 2024
2b73907
remove per hour for now
tedim52 Aug 15, 2024
8daf34d
get test to pass
tedim52 Aug 15, 2024
0b75005
impl per week get log files beyond retention period
tedim52 Aug 15, 2024
b191925
refactor log file manager to use file layout
tedim52 Aug 15, 2024
c6bd4f7
use file layout for removing logs beyond retention period
tedim52 Aug 15, 2024
bf45ed2
remove getFilepathStr function
tedim52 Aug 15, 2024
f589e52
move log file manager inside logs db client
tedim52 Aug 15, 2024
7890081
lint
tedim52 Aug 15, 2024
62f08c3
lint
tedim52 Aug 15, 2024
ae5019b
Merge branch 'main' into tedi/granularetention
tedim52 Aug 19, 2024
99a2c58
add log retention flag
tedim52 Aug 19, 2024
581c3fd
add validation of parsed duration
tedim52 Aug 19, 2024
a3418c3
add docs
tedim52 Aug 19, 2024
ddd274a
log retention period
tedim52 Aug 19, 2024
a8761a2
grammar
tedim52 Aug 19, 2024
a07b1ac
fill in default
tedim52 Aug 19, 2024
02e0bb3
start per hour file layout
tedim52 Aug 23, 2024
703e13c
rename weekly and hourly filepath methods
tedim52 Oct 14, 2024
a35a8ac
use new mock logs clock constructor for per week
tedim52 Oct 14, 2024
fcf9a22
test mock logs clock
tedim52 Oct 14, 2024
0fe7f9e
fix tests
tedim52 Oct 14, 2024
80dfc94
add todo
tedim52 Oct 14, 2024
878b65e
refactor
tedim52 Oct 14, 2024
ea3206f
add across years test
tedim52 Oct 15, 2024
c0d6509
remove with file layout
tedim52 Oct 16, 2024
ee54acd
add walk to filesystem interface
tedim52 Oct 16, 2024
e4667a4
impl get all filepaths in log file mngr instead
tedim52 Oct 17, 2024
d0b9481
fix get all log file paths and tests
tedim52 Oct 17, 2024
f8806c2
use duration instead of int
tedim52 Oct 17, 2024
518ccd4
Merge branch 'main' into tedi/perhour
tedim52 Oct 17, 2024
b1d5ef2
revert go work to go1.2
tedim52 Oct 17, 2024
646fa15
revert go mods
tedim52 Oct 17, 2024
6234fc2
convert to duration
tedim52 Oct 17, 2024
936b4ad
fix some tests
tedim52 Oct 17, 2024
6fb7218
rename per week stream logs strategy
tedim52 Oct 17, 2024
6edc128
update go sums
tedim52 Oct 17, 2024
84a31f2
lint engine
tedim52 Oct 17, 2024
d0061b0
update vector storage path
tedim52 Oct 17, 2024
a5f7c46
add todo
tedim52 Oct 17, 2024
b4124e2
change to lowercase u
tedim52 Oct 18, 2024
3cb70fa
parametrize base logs storage path
tedim52 Oct 18, 2024
0c4d4ea
unformat day
tedim52 Oct 18, 2024
e724d90
convert sunday from golang time to vector time
tedim52 Oct 18, 2024
1d6ff1e
add check to ensure formats are equivalent
tedim52 Oct 18, 2024
9102416
fix filepath formatting and check
tedim52 Oct 18, 2024
b8ce98e
adjust log stmt
tedim52 Oct 21, 2024
304ade0
pin besu img to pass tests
tedim52 Oct 21, 2024
642e7f3
refactor and test convert weeks to duration
tedim52 Oct 21, 2024
24ecb4a
fix newline in go work
tedim52 Oct 21, 2024
5c32a08
update persistent volume db tests
tedim52 Oct 22, 2024
8f83284
add across week day hour test
tedim52 Oct 22, 2024
85012df
simplify get all log filepaths and add cmt
tedim52 Oct 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,11 @@ const (
fileSinkIdSuffix = "file"
fileTypeId = "\"file\""

// We instruct vector to store log files per-year, per-week (00-53), per-enclave, per-service
// To construct the filepath, we utilize vectors template syntax that allows us to reference fields in log events
// https://vector.dev/docs/reference/configuration/template-syntax/
baseLogsFilepath = "\"" + logsStorageDirpath + "%%Y/%%V/"
baseLogsFilepath = "\"" + logsStorageDirpath + "%%Y/%%V/%%u/%%H/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth a comment explaining what this does, equivalent to the one removed


uuidLogsFilepath = baseLogsFilepath + "{{ enclave_uuid }}/{{ service_uuid }}.json\""
VectorLogsFilepathFormat = baseLogsFilepath + "{{ enclave_uuid }}/{{ service_uuid }}.json\""

sourceConfigFileTemplateName = "srcVectorConfigFileTemplate"
sinkConfigFileTemplateName = "sinkVectorConfigFileTemplate"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ func newDefaultVectorConfig(listeningPortNumber uint16) *VectorConfig {
Id: "uuid_" + fileSinkIdSuffix,
Type: fileTypeId,
Inputs: []string{fluentBitSourceId},
Filepath: uuidLogsFilepath,
Filepath: VectorLogsFilepathFormat,
},
},
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ type LogFileLayout interface {
// GetLogFileLayoutFormat returns a string representation the "format" that files are laid out in
// Formats are composed:
// - "/" - representing a nested directory
// - "<enclaveUuid>" - representing where an enclave uuid is inserted
// - "<serviceUuid>" - representing where a service uuid is inserted
// - "{{ enclaveUuid }}" - representing where an enclave uuid is inserted
// - "{{ serviceUuid }}" - representing where a service uuid is inserted
// - time formats specified by strftime https://cplusplus.com/reference/ctime/strftime/
// - any other ascii text
GetLogFileLayoutFormat() string
Expand All @@ -21,6 +21,6 @@ type LogFileLayout interface {
// GetLogFilePaths retrieves a list of filepaths [filesystem] for [serviceUuid] in [enclaveUuid]
// If [retentionPeriodIntervals] is set to -1, retrieves all filepaths from the currentTime till [retentionPeriod] in order
// If [retentionPeriodIntervals] is positive, retrieves all filepaths within the range [currentTime - retentionPeriod] and [currentTime - (retentionPeriodIntervals) * retentionPeriod]
// Returned filepaths sorted from most recent to least recent
// Returned filepaths sorted from oldest to most recent
GetLogFilePaths(filesystem volume_filesystem.VolumeFilesystem, retentionPeriod time.Duration, retentionPeriodIntervals int, enclaveUuid, serviceUuid string) ([]string, error)
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
package file_layout

import (
"fmt"
"github.com/kurtosis-tech/kurtosis/engine/server/engine/centralized_logs/client_implementations/persistent_volume/logs_clock"
"github.com/kurtosis-tech/kurtosis/engine/server/engine/centralized_logs/client_implementations/persistent_volume/volume_consts"
"github.com/kurtosis-tech/kurtosis/engine/server/engine/centralized_logs/client_implementations/persistent_volume/volume_filesystem"
"golang.org/x/exp/slices"
"math"
"os"
"strconv"
"time"
)

const (
// basepath year/week/day/hour/
perHourDirPathFmtStr = "%s%s/%s/%s/%s/"

// ... enclave-uuid/service-uuid<filetype>
perHourFilePathFmtSt = perHourDirPathFmtStr + "%s/%s%s"
)

type PerHourFileLayout struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this struct actually represent?

time logs_clock.LogsClock
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, what is this guy? I initially thought that a "PerHourFileLayout" had one instance per each hour, or something similar. But now I'm seeing that's not the case, so curious about this prop

baseLogsFilePath string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a filepath, or a dirpath? I'd expect it to be a directory, and all logs go under a single dir.. right?

}

func NewPerHourFileLayout(time logs_clock.LogsClock, baseLogsFilePath string) *PerHourFileLayout {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation of FileLayout stores log files per hour so that logs can be removed per hour.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this in a code comment so it doesn't get lost when this PR is closed?

return &PerHourFileLayout{
time: time,
baseLogsFilePath: baseLogsFilePath,
}
}

func (phf *PerHourFileLayout) GetLogFileLayoutFormat() string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this seems to be very vector-specific, should this be called GetVectorLayoutFormat or something like that?

// Right now this format is specifically made for Vector Logs Aggregators format
// This wil be used my Vector LogsAggregator to determine the path to output to
return fmt.Sprintf("\"%s%%%%Y/%%%%V/%%%%u/%%%%H/{{ enclave_uuid }}/{{ service_uuid }}.json\"", volume_consts.LogsStorageDirpath)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case, rather than wrestling with all the crazy escaping, might be cleaner to read with a simple " + volume_consts.LogsStorageDirpath + ...the rest...

}

func (phf *PerHourFileLayout) GetLogFilePath(time time.Time, enclaveUuid, serviceUuid string) string {
year, week, day, hour := TimeToWeekDayHour(time)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny little sixth sense wondering why this is a public function, and if it should be private first (but will wait to see what happens in the rest of the PR)

return phf.getHourlyLogFilePath(year, week, day, hour, enclaveUuid, serviceUuid)
}

func (phf *PerHourFileLayout) GetLogFilePaths(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about the purpose of this function. At first blush, I'd expect that PerHourFileLayout is an abstraction that hides all the internals away.

Also curious why I need to pass in a filesystem, retentionPeriod, etc. for this call, but not for GetLogFilePath?

filesystem volume_filesystem.VolumeFilesystem,
retentionPeriod time.Duration,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut is saying that it's odd I need to tell the PerHourFileLayout what retention it has in a getter method. Wouldn't that stuff be declared in the constructor of the PHFL?

retentionPeriodIntervals int,
enclaveUuid, serviceUuid string,
) ([]string, error) {
var paths []string
retentionPeriodInHours := DurationToHours(retentionPeriod)

if retentionPeriodIntervals < 0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe could find a way to merge this getLogFilePathsFromNowTillRetentionPeriod and getLogFilePathsBeyondRetentionPeriod?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems sensible to me!

return phf.getLogFilePathsFromNowTillRetentionPeriod(filesystem, retentionPeriodInHours, enclaveUuid, serviceUuid)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a missing errcheck & stacktrace.Propagate here? seems like we wouldn't be capturing the params that we pass to getLogFilePathsFromNowTillRetentionPeriod

} else {
paths = phf.getLogFilePathsBeyondRetentionPeriod(filesystem, retentionPeriodInHours, retentionPeriodIntervals, enclaveUuid, serviceUuid)
}

return paths, nil
}

func (phf *PerHourFileLayout) getLogFilePathsFromNowTillRetentionPeriod(fs volume_filesystem.VolumeFilesystem, retentionPeriodInHours int, enclaveUuid, serviceUuid string) ([]string, error) {
var paths []string
currentTime := phf.time.Now()

// scan for first existing log file
firstHourWithLogs := 0
for i := 0; i < retentionPeriodInHours; i++ {
year, week, day, hour := TimeToWeekDayHour(currentTime.Add(time.Duration(-i) * time.Hour))
filePathStr := phf.getHourlyLogFilePath(year, week, day, hour, enclaveUuid, serviceUuid)
if _, err := fs.Stat(filePathStr); err == nil {
paths = append(paths, filePathStr)
firstHourWithLogs = i
break
} else {
// return if error is not due to nonexistent file path
if !os.IsNotExist(err) {
return paths, err
}
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug question: if we don't find any log files in the loop above, won't it mean that firstHourWithLogs will stay at 0 and then in the loop below will add a bunch of nonexistent paths to the array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm yes, will add a test for this and fix : )

// scan for remaining files as far back as they exist before the retention period
for i := firstHourWithLogs + 1; i < retentionPeriodInHours; i++ {
year, week, day, hour := TimeToWeekDayHour(currentTime.Add(time.Duration(-i) * time.Hour))
filePathStr := phf.getHourlyLogFilePath(year, week, day, hour, enclaveUuid, serviceUuid)
if _, err := fs.Stat(filePathStr); err != nil {
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this mean that a single missing file blocks the return of all log files after that (even if they're successful)? Maybe an error would be better here

}
paths = append(paths, filePathStr)
}

// reverse for oldest to most recent
slices.Reverse(paths)

return paths, nil
}

func (phf *PerHourFileLayout) getLogFilePathsBeyondRetentionPeriod(fs volume_filesystem.VolumeFilesystem, retentionPeriodInHours int, retentionPeriodIntervals int, enclaveUuid, serviceUuid string) []string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this signature, I'm confused about the relationship between retentionPeriodInHours and retentionPeriodIntervals

var paths []string
currentTime := phf.time.Now()

// scan for log files just beyond the retention period
for i := 0; i < retentionPeriodIntervals; i++ {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sort of confused about this loop - why are we adding retentionPeriodIntervals (as i) to retentionPeriodInHours?

numHoursToGoBack := retentionPeriodInHours + i
year, week, day, hour := TimeToWeekDayHour(currentTime.Add(time.Duration(-numHoursToGoBack) * time.Hour))
filePathStr := phf.getHourlyLogFilePath(year, week, day, hour, enclaveUuid, serviceUuid)
if _, err := fs.Stat(filePathStr); err != nil {
continue
}
paths = append(paths, filePathStr)
}

return paths
}

func (phf *PerHourFileLayout) getHourlyLogFilePath(year, week, day, hour int, enclaveUuid, serviceUuid string) string {
// match the format in which Vector outputs week, hours, days
formattedWeekNum := fmt.Sprintf("%02d", week)
formattedHourNum := fmt.Sprintf("%02d", hour)
return fmt.Sprintf(perHourFilePathFmtSt, phf.baseLogsFilePath, strconv.Itoa(year), formattedWeekNum, strconv.Itoa(day), formattedHourNum, enclaveUuid, serviceUuid, volume_consts.Filetype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggestion: might make it easier to read if you newlined each of these

}

func TimeToWeekDayHour(time time.Time) (int, int, int, int) {
year, week := time.ISOWeek()
hour := time.Hour()
day := int(time.Weekday())
// convert sunday in golang's time(0) to sunday (0) in strftime/Vector log aggregator time(7)
if day == 0 {
day = 7
}
return year, week, day, hour
}

func DurationToHours(duration time.Duration) int {
return int(math.Ceil(duration.Hours()))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay finishing this class, I think I'm still a bit confused what the purpose of this struct is. I think the confusion is coming from the idea of retention being sort-of baked into this struct, but not fully handled by it.

I think my gut would expect either:

  1. this class fully handles retention (maybe some background goroutine that cleans up stuff or something) and it's transparent to the caller of the struct or
  2. this struct doesn't know anything about retention - literally just the per-hour file layouting - and whoever calls this struct is responsible for the retention-tracking

Loading
Loading