Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Test score : 0.9222] #23

Open
taemin6697 opened this issue Apr 13, 2023 · 0 comments
Open

[Test score : 0.9222] #23

taemin6697 opened this issue Apr 13, 2023 · 0 comments
Assignees
Labels
Augmentation Data Augmentation

Comments

@taemin6697
Copy link
Contributor

taemin6697 commented Apr 13, 2023

  1. 모델 : monologg/koelectra-base-v3-discriminator 0.91x lr=0.00002860270719188072 weight_decay = 0.5, batch_size=16 (성공)
  2. 모델 : lighthouse/mdeberta-v3-base-kor-further 0.91x lr=0.00002340865224868444 weight_decay=0.5, batch_size=8 (성공)
    전처리 기법
    def preprocess_text(self,text):
    # normalize repeated characters using soynlp library
    text = repeat_normalize(text, num_repeats=2)
    # remove stopwords
    #text = ' '.join([token for token in text.split() if not token in stopwords])
    # remove special characters and numbers
    # text = re.sub('[^가-힣 ]', '', text)
    # text = re.sub('[^a-zA-Zㄱ-ㅎ가-힣]', '', text)
    # tokenize text using soynlp tokenizer
    tokens = Regextokenizer.tokenize(text)
    # lowercase all tokens
    tokens = [token.lower() for token in tokens]
    # join tokens back into sentence
    text = ' '.join(tokens)
    # kospacing_sent = spacing(text)
    return text
  3. 모델 훈련 mdeberta 약 8에포크 훈련후 7에포크 사용
    1-2. Dev 데이터셋으로 동일 데이터셋 전처리 적용 후 lr 1/10 줄인 후 2 epoch 학습
    2 모델 훈련 koelectra 약 10에포크 훈련후 6에포크 사용
    2-1. Dev 데이터셋으로 동일 데이터셋 전처리 적용 후 lr 1/10 줄인 후 2 epoch 학습

ESNB코드로 앙상블

@taemin6697 taemin6697 added the Augmentation Data Augmentation label Apr 13, 2023
@taemin6697 taemin6697 self-assigned this Apr 13, 2023
Kim-Ju-won added a commit that referenced this issue Apr 13, 2023
…e-v3-discriminator #23

[Test 0.92222 ]Baseline in preprocessing code, monologg/koelectra-base-v3-discriminator #23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Augmentation Data Augmentation
Projects
None yet
Development

No branches or pull requests

1 participant