Skip to content
View justHungryMan's full-sized avatar

Block or report justHungryMan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
justHungryMan/README.md

Hi there πŸ‘‹

I'm interested in Large-scale Engineering, Data Engineering, Representation Learning, Multi-modal Understanding, Training Optimization, Data Curation

Experience

πŸ› οΈ LLM Data Engineer (now) - 42dot

πŸ” Research Intern - Kakaobrain

🌿 Research Intern @kakaobrain

πŸ‡ΊπŸ‡Έ Intern as a UI developer - Wavity

Education

πŸ‡°πŸ‡· Bachelor degree of Computer Science Engineering at Sogang University (2012 - 2019)

πŸ‡°πŸ‡· Master degree of Computer Science Engineering at Sogang University (2020 - 2022)

Competitions

πŸ₯ˆ 2020 Korea Health Dataton 2nd Prize (Binary Classification on Breast Cancer Pathology Image)

πŸ₯‡ 2020 Naver AI Rush Challenge, 1st Prize on 3 Areas (Auto Tagging on Naver Shopping Image, Mood Classification on Music, Genre Classification on Japanese Music)

Projects and Publications

πŸ“š coyo-700M Dataset: A large-scale dataset aimed at enhancing data curation and multi-modal understanding, publicly released for the research community. Check it out here: coyo-700M.

✍️ ViT Alignment Blog Post on Hugging Face: Based on the coyo-700M dataset, this blog post discusses the reproduction of Vision Transformer (ViT) models. Read the blog post: vit-align.

Pinned Loading

  1. huggingface/datatrove huggingface/datatrove Public

    Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

    Python 2.1k 158

  2. kakaobrain/coyo-dataset kakaobrain/coyo-dataset Public

    COYO-700M: Large-scale Image-Text Pair Dataset

    Python 1.2k 38

  3. vision-transformer-tf vision-transformer-tf Public

    Reproduction of Vision Transformer in Tensorflow2. Train from scratch and Finetune.

    Python 47 8

  4. kakaobrain/coyo-vit kakaobrain/coyo-vit Public

    ViT trained on COYO-Labeled-300M dataset

    Python 30 1