Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

요약 실패, Input Token 적은 경우에 대한 대응 #103

Merged
merged 8 commits into from
Sep 23, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
"cross-env": "^7.0.3",
"express": "^4.19.2",
"iconv-lite": "^0.6.3",
"js-tiktoken": "^1.0.14",
"lodash": "^4.17.21",
"mongoose": "^8.4.1",
"openai": "^4.52.2",
Expand Down
12 changes: 11 additions & 1 deletion pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion src/app.module.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,12 @@ import { DiscordModule } from './infrastructure/discord/discord.module';
import { AuthModule } from './modules/auth/auth.module';
import { ClassificationModule } from './modules/classification/classification.module';
import { FoldersModule } from './modules/folders/folders.module';
import { LaunchingEventsModule } from './modules/launching-events/launching-events.module';
import { LinksModule } from './modules/links/links.module';
import { MetricsModule } from './modules/metrics/metrics.module';
import { OnboardModule } from './modules/onboard/onboard.module';
import { PostsModule } from './modules/posts/posts.module';
import { UsersModule } from './modules/users/users.module';
import { LaunchingEventsModule } from './modules/launching-events/launching-events.module';

@Module({
imports: [
Expand Down
2 changes: 1 addition & 1 deletion src/bootstrap.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
import { NestFactory } from '@nestjs/core';
import { ExpressAdapter } from '@nestjs/platform-express';
import express from 'express';
import { nestAppConfig, nestResponseConfig } from './app.config';
import { AppModule } from './app.module';
import { nestSwaggerConfig } from './app.swagger';
import { nestAppConfig, nestResponseConfig } from './app.config';

export async function bootstrap() {
const expressInstance: express.Express = express();
Expand Down
1 change: 1 addition & 0 deletions src/common/constant/index.ts
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
export const IS_LOCAL = process.env.NODE_ENV === 'local';
export const DEFAULT_FOLDER_NAME = '나중에 읽을 링크';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

굿~

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

구웃~~

6 changes: 6 additions & 0 deletions src/common/utils/parser.util.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ export async function parseLinkTitleAndContent(url: string): Promise<{
title: string;
content: string;
thumbnail: string;
thumbnailDescription: string;
}> {
const response = await fetch(url);
const arrayBuffer = await response.arrayBuffer();
Expand Down Expand Up @@ -49,6 +50,10 @@ export async function parseLinkTitleAndContent(url: string): Promise<{
const title = $('title').text();
// Page Thumbnail Parsing
const thumbnail = $('meta[property="og:image"]').attr('content');
// Page Thumbnail Description Parsing
const thumbnailDescription = $('meta[property="og:description"]').attr(
'content',
);
// HTML Body내에 있는 script태그랑 css style태그 제거
$('body script, body style').remove();
// HTML Element의 body의 text content
Expand All @@ -61,6 +66,7 @@ export async function parseLinkTitleAndContent(url: string): Promise<{
title: title ?? '',
content,
thumbnail: sanitizeThumbnail(thumbnail),
thumbnailDescription,
};
}

Expand Down
41 changes: 41 additions & 0 deletions src/common/utils/tokenizer.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
// Do not import 'tiktoken'

import { gptVersion } from '@src/infrastructure/ai/ai.constant';
import { summarizeURLContentFunctionFactory } from '@src/infrastructure/ai/functions';
import { encodingForModel } from 'js-tiktoken';

const encoder = encodingForModel(gptVersion);

// Reference: https://platform.openai.com/docs/advanced-usage/managing-tokens
// 주의: 실제 Open AI랑 약간의 오차 존재
export function promptTokenCalculator(content: string, folderList: string[]) {
let tokenCount = 0;

// Prompt Calculation
const messages = [
{
role: 'system',
content: '한글로 답변 부탁해',
},
{
role: 'user',
content: `주어진 글에 대해 요약하고 키워드 추출, 분류 부탁해

${content}
`,
},
];
for (const message of messages) {
// Message struct Overhead
tokenCount += 4;
tokenCount += encoder.encode(message.role).length;
tokenCount += encoder.encode(message.content).length;
}
tokenCount += 2;
Comment on lines +29 to +33
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@J-Hoplin
구문에서 4랑 2 더해주는 이유가 궁금쓰~

그리구 프롬프트에 넣는 role length는 넣은 이유가있어?
파라메터로 받는 content 내용(url 내용 파싱한거)추출 length만으로 판단하면 직관적이고 util 함수도 필요없을 것 같은데!

만약에 필요하다면 reduce를 사용해서 tokenCount 리턴하는것도 좋을듯~

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

나도 궁금쓰~

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenAI에서 프롬프팅 할때 Function, Role 담은 JSON을 Stringify해서 보내기때문에 그거랑 동일하게 연산하기 위해서 계산하는거로 공유 완료!


// Function Calculation
tokenCount += encoder.encode(
JSON.stringify(summarizeURLContentFunctionFactory(folderList)),
).length;
return tokenCount;
}
24 changes: 23 additions & 1 deletion src/infrastructure/ai/ai.service.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import { Injectable } from '@nestjs/common';
import { ConfigService } from '@nestjs/config';
import { promptTokenCalculator } from '@src/common/utils/tokenizer';
import OpenAI, { OpenAIError, RateLimitError } from 'openai';
import { DiscordAIWebhookProvider } from '../discord/discord-ai-webhook.provider';
import { gptVersion } from './ai.constant';
Expand All @@ -14,6 +15,7 @@ import { SummarizeURLContent } from './types/types';
@Injectable()
export class AiService {
private openai: OpenAI;
private leastTokenThreshold = 300;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

클래스 내의 한 메소드에서만 쓰는데 클래스의 private 멤버 변수로 선언한 이유가 궁금합니당

Copy link
Collaborator Author

@J-Hoplin J-Hoplin Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

그러겡ㅎㅎ 인스턴스 프로퍼티로 들고있을 필요는 없을꺼같아서 변경했어!


constructor(
private readonly config: ConfigService,
Expand All @@ -26,13 +28,23 @@ export class AiService {

async summarizeLinkContent(
content: string,
baseThumbnailContent: string,
userFolderList: string[],
url: string,
temperature = 0.5,
): Promise<SummarizeURLContentDto> {
try {
// 사용자 폴더 + 서버에서 임의로 붙여주는 폴더 리스트
const folderLists = [...userFolderList];
// Calculate post content
const tokenCount = promptTokenCalculator(content, folderLists);
if (tokenCount <= this.leastTokenThreshold) {
return new SummarizeURLContentDto({
success: false,
message: 'Too low input token count',
thumbnailContent: baseThumbnailContent,
});
}
// AI Summary 호출
const promptResult = await this.invokeAISummary(
content,
Expand Down Expand Up @@ -63,7 +75,18 @@ export class AiService {
: err instanceof OpenAIError
? err.message
: '요약에 실패하였습니다.',
thumbnailContent: baseThumbnailContent,
});

// return new SummarizeURLContentDto({
// success: false,
// message:
// err instanceof RateLimitError
// ? '크레딧을 모두 소진하였습니다.'
// : err instanceof OpenAIError
// ? err.message
// : '요약에 실패하였습니다.',
// });
}
}

Expand Down Expand Up @@ -104,7 +127,6 @@ ${content}
const functionResult: AiClassificationFunctionResult = JSON.parse(
promptResult.choices[0].message.tool_calls[0].function.arguments,
);

return functionResult;
}

Expand Down
3 changes: 3 additions & 0 deletions src/infrastructure/ai/dto/summarizeURL.response.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ type SummarizeSuccessType = {
type SummarizeFailType = {
success: false;
message: string;
thumbnailContent: string;
};

type SummarizeResultType = SummarizeSuccessType | SummarizeFailType;
Expand All @@ -18,6 +19,7 @@ export class SummarizeURLContentDto {
isUserCategory?: boolean;
response?: SummarizeURLContent;
message?: string;
thumbnailContent?: string;

constructor(data: SummarizeResultType) {
// true를 명시하지 않으면 Discriminate Union이 동작 안함
Expand All @@ -27,6 +29,7 @@ export class SummarizeURLContentDto {
this.response = data.response;
} else {
this.message = data.message;
this.thumbnailContent = data.thumbnailContent;
}
}
}
1 change: 1 addition & 0 deletions src/infrastructure/aws-lambda/type.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
export type AiClassificationPayload = {
postContent: string;
postThumbnailContent: string;
folderList: { id: string; name: string }[];
userId: string;
postId: string;
Expand Down
9 changes: 6 additions & 3 deletions src/modules/ai-classification/ai-classification.service.ts
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ export class AiClassificationService {
const start = process.hrtime();
const summarizeUrlContent = await this.aiService.summarizeLinkContent(
payload.postContent,
payload.postThumbnailContent,
Object.keys(folderMapper),
payload.url,
);
Expand Down Expand Up @@ -100,15 +101,17 @@ export class AiClassificationService {
await this.metricsRepository.createMetrics(
summarizeUrlContent.success,
timeSecond,
post.url,
post._id.toString(),
payload.url,
payload.postId,
);

await this.postRepository.updatePostClassificationForAIClassification(
postAiStatus,
postId,
classificationId,
summarizeUrlContent.response.summary,
summarizeUrlContent.success === true
? summarizeUrlContent.response.summary
: summarizeUrlContent.thumbnailContent,
);
return summarizeUrlContent;
} catch (error: unknown) {
Expand Down
7 changes: 3 additions & 4 deletions src/modules/posts/posts.service.ts
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,8 @@ export class PostsService {
});

// NOTE : URL에서 얻은 정보 가져옴
const { title, content, thumbnail } = await parseLinkTitleAndContent(
createPostDto.url,
);

const { title, content, thumbnail, thumbnailDescription } =
await parseLinkTitleAndContent(createPostDto.url);
const userFolderList = await this.folderRepository.findByUserId(userId);
const folderList = userFolderList.map((folder) => {
return {
Expand All @@ -101,6 +99,7 @@ export class PostsService {
const payload = {
url: createPostDto.url,
postContent: content,
postThumbnailContent: thumbnailDescription,
folderList: folderList,
postId: post._id.toString(),
userId,
Expand Down
3 changes: 2 additions & 1 deletion src/modules/users/users.service.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { Injectable } from '@nestjs/common';
import { DEFAULT_FOLDER_NAME } from '@src/common/constant';
import { FolderType } from '@src/infrastructure/database/types/folder-type.enum';
import { JwtPayload } from 'src/common/types/type';
import { AuthService } from '../auth/auth.service';
Expand All @@ -23,7 +24,7 @@ export class UsersService {
user = await this.userRepository.findOrCreate(dto.deviceToken);
await this.folderRepository.create(
user._id.toString(),
'나중에 읽을 링크',
DEFAULT_FOLDER_NAME,
FolderType.DEFAULT,
);
}
Expand Down
1 change: 1 addition & 0 deletions tsconfig.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
"emitDecoratorMetadata": true,
"experimentalDecorators": true,
"allowSyntheticDefaultImports": true,
"resolveJsonModule": true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json읽어올 일이 있었던건가?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

아 원래 js-tiktoken 말고 tiktoken 사용하려했는데 거기서 JSON을 직접 import 해야되는게 있었어! 지금은 다른 패키지니 지워도 될듯 합니다 ㅎ

"target": "ES2021",
"sourceMap": true,
"outDir": "./dist",
Expand Down