[Test score : 0.9222] #23

taemin6697 · 2023-04-13T14:31:34Z

모델 : monologg/koelectra-base-v3-discriminator 0.91x lr=0.00002860270719188072 weight_decay = 0.5, batch_size=16 (성공)
모델 : lighthouse/mdeberta-v3-base-kor-further 0.91x lr=0.00002340865224868444 weight_decay=0.5, batch_size=8 (성공)
전처리 기법
def preprocess_text(self,text):
# normalize repeated characters using soynlp library
text = repeat_normalize(text, num_repeats=2)
# remove stopwords
#text = ' '.join([token for token in text.split() if not token in stopwords])
# remove special characters and numbers
# text = re.sub('[^가-힣 ]', '', text)
# text = re.sub('[^a-zA-Zㄱ-ㅎ가-힣]', '', text)
# tokenize text using soynlp tokenizer
tokens = Regextokenizer.tokenize(text)
# lowercase all tokens
tokens = [token.lower() for token in tokens]
# join tokens back into sentence
text = ' '.join(tokens)
# kospacing_sent = spacing(text)
return text
모델 훈련 mdeberta 약 8에포크 훈련후 7에포크 사용
1-2. Dev 데이터셋으로 동일 데이터셋 전처리 적용 후 lr 1/10 줄인 후 2 epoch 학습
2 모델 훈련 koelectra 약 10에포크 훈련후 6에포크 사용
2-1. Dev 데이터셋으로 동일 데이터셋 전처리 적용 후 lr 1/10 줄인 후 2 epoch 학습

ESNB코드로 앙상블

…e-v3-discriminator #23 [Test 0.92222 ]Baseline in preprocessing code, monologg/koelectra-base-v3-discriminator #23

taemin6697 added the Augmentation label Apr 13, 2023

taemin6697 self-assigned this Apr 13, 2023

Kim-Ju-won added a commit that referenced this issue Apr 13, 2023

[Test 0.92222 ]Baseline in preprocessing code, monologg/koelectra-bas…

07f771f

…e-v3-discriminator #23 [Test 0.92222 ]Baseline in preprocessing code, monologg/koelectra-base-v3-discriminator #23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Test score : 0.9222] #23

[Test score : 0.9222] #23

taemin6697 commented Apr 13, 2023 •

edited

Loading

[Test score : 0.9222] #23

[Test score : 0.9222] #23

Comments

taemin6697 commented Apr 13, 2023 • edited Loading

taemin6697 commented Apr 13, 2023 •

edited

Loading