You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
모델 : monologg/koelectra-base-v3-discriminator 0.91x lr=0.00002860270719188072 weight_decay = 0.5, batch_size=16 (성공)
모델 : lighthouse/mdeberta-v3-base-kor-further 0.91x lr=0.00002340865224868444 weight_decay=0.5, batch_size=8 (성공)
전처리 기법
def preprocess_text(self,text):
# normalize repeated characters using soynlp library
text = repeat_normalize(text, num_repeats=2)
# remove stopwords
#text = ' '.join([token for token in text.split() if not token in stopwords])
# remove special characters and numbers
# text = re.sub('[^가-힣 ]', '', text)
# text = re.sub('[^a-zA-Zㄱ-ㅎ가-힣]', '', text)
# tokenize text using soynlp tokenizer
tokens = Regextokenizer.tokenize(text)
# lowercase all tokens
tokens = [token.lower() for token in tokens]
# join tokens back into sentence
text = ' '.join(tokens)
# kospacing_sent = spacing(text)
return text
모델 훈련 mdeberta 약 8에포크 훈련후 7에포크 사용
1-2. Dev 데이터셋으로 동일 데이터셋 전처리 적용 후 lr 1/10 줄인 후 2 epoch 학습
2 모델 훈련 koelectra 약 10에포크 훈련후 6에포크 사용
2-1. Dev 데이터셋으로 동일 데이터셋 전처리 적용 후 lr 1/10 줄인 후 2 epoch 학습
ESNB코드로 앙상블
The text was updated successfully, but these errors were encountered:
전처리 기법
def preprocess_text(self,text):
# normalize repeated characters using soynlp library
text = repeat_normalize(text, num_repeats=2)
# remove stopwords
#text = ' '.join([token for token in text.split() if not token in stopwords])
# remove special characters and numbers
# text = re.sub('[^가-힣 ]', '', text)
# text = re.sub('[^a-zA-Zㄱ-ㅎ가-힣]', '', text)
# tokenize text using soynlp tokenizer
tokens = Regextokenizer.tokenize(text)
# lowercase all tokens
tokens = [token.lower() for token in tokens]
# join tokens back into sentence
text = ' '.join(tokens)
# kospacing_sent = spacing(text)
return text
1-2. Dev 데이터셋으로 동일 데이터셋 전처리 적용 후 lr 1/10 줄인 후 2 epoch 학습
2 모델 훈련 koelectra 약 10에포크 훈련후 6에포크 사용
2-1. Dev 데이터셋으로 동일 데이터셋 전처리 적용 후 lr 1/10 줄인 후 2 epoch 학습
ESNB코드로 앙상블
The text was updated successfully, but these errors were encountered: