support for trust_remote_code / 8k context #410

BlairSadewitz · 2023-07-19T14:38:52Z

Hello,

There are a number of models I'd like to try which require this. I know that I asked you about this in the past, and IIRC you mentioned that you removed it because you wanted to implement it properly.
In the interim, would you kindly instruct me on what I have to change in order to pass this flag to the appropriate call(s) (you don't have to do it for every conceivable situation/type of model, just for hf or hf_torch or whichever is necessary (16-bit, don't worry about loading in 8 or 4 bit) to load e.g. llama-based models, maybe falcon, etc. I'd just as happily patch transformers itself; whatever gets it to work. I'm mostly trying to load the models with increased context size.

Thanks.

BlairSadewitz · 2023-07-19T14:43:52Z

Being able to use a monkey patch would be cool, too, but I assume that's even more work.

BlairSadewitz · 2023-07-19T15:12:47Z

What I am most interested in is being able to use models which use this:

https://github.com/bhenrym14/qlora-airoboros-longcontext/blob/main/scaledllama/llama_rope_scaled_monkey_patch-16k.py

Most of them are 8k.

https://huggingface.co/TheBloke/airoboros-33B-gpt4-1-4-SuperHOT-8K-fp16/tree/main

henk717 · 2023-07-19T16:07:07Z

This is planned as a seperate addon but currently unfinished.

BlairSadewitz · 2023-07-20T16:00:06Z

Oh, OK, fair enough. Whenever you have a spare moment, would you kindly tell me where in the code the call is which loads a 16-bit llama-based model (you know, that I'd download from HF) is so I could just rig it myself to work? Whenever I have the time, I will figure out how to use python to just tell me the line number. If that happens before you get around to replying to this, I'll close out the PR. It could be either the code in KoboldAI or the code in transformers itself, I don't care which.

henk717 · 2023-07-20T18:49:23Z

The easiest way to do it is with our Basic HF backend since there it will be in the from_pretrained lines, in the main backend its quite complicated. The hold-up is that the Basic HF backend is unfinished and unstable, so your milage may strongly vary.

BlairSadewitz · 2023-07-22T01:24:09Z

Hmm, yeah, I'm having some issues with it. :(

Check this out, though:
RoPE scaling got merged to transformers. Models don't have to be pretrained to use it, though apparently you lose accuracy if they aren't. Maybe you'd want to add support for this at some point? It works for gptneox, too, according to the chatter online.

huggingface/transformers@34d9409#diff-9ba75cc28be7924a2fc43de1d2c8c7779ad597129d33d1af39153951463cd0bc

Also, there's this:

https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/

The patch is three lines. That code ameliorates the decrease in perplexity. Here's a colab:

https://colab.research.google.com/drive/1VI2nhlyKvd5cw4-zHvAIk00cAVj2lCCC#scrollTo=b80b3f37

BlairSadewitz · 2023-07-23T21:55:27Z

I just noticed everything you merged. Thanks! I'd been hopping between forks, and this makes my life a lot easier.

BlairSadewitz · 2023-08-01T04:20:16Z

In case you aren't aware, transformers now has support for rope scaling.

https://huggingface.co/docs/transformers/main/model_doc/llama#transformers.LlamaConfig

henk717 · 2023-08-01T11:02:33Z

We automatically use rope scaling if its present in a models config. Manual control for it is planned.

BlairSadewitz · 2023-08-04T18:19:01Z

Ooh, nice. That makes my life a lot easier.

Incidentally, I stumbled upon this:

https://github.com/jquesnelle/scaled-rope

Basically, it builds a wheel with the necessary code to support all these different scaling methods along with patch functions, e.g.

def patch_llama_for_linear_scaled_rotary_embeddings(model, scale):
from .LlamaLinearScaledRotaryEmbedding import LlamaLinearScaledRotaryEmbedding
for each in model.model.layers:
each.self_attn.rotary_emb = LlamaLinearScaledRotaryEmbedding(
each.self_attn.head_dim, scale=scale, device=each.self_attn.rotary_emb.inv_freq.device)

I found it because I had problems loading some different models because of the layers, which it takes care of.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for trust_remote_code / 8k context #410

support for trust_remote_code / 8k context #410

BlairSadewitz commented Jul 19, 2023

BlairSadewitz commented Jul 19, 2023

BlairSadewitz commented Jul 19, 2023

henk717 commented Jul 19, 2023

BlairSadewitz commented Jul 20, 2023 •

edited

Loading

henk717 commented Jul 20, 2023

BlairSadewitz commented Jul 22, 2023

BlairSadewitz commented Jul 23, 2023

BlairSadewitz commented Aug 1, 2023

henk717 commented Aug 1, 2023

BlairSadewitz commented Aug 4, 2023

support for trust_remote_code / 8k context #410

support for trust_remote_code / 8k context #410

Comments

BlairSadewitz commented Jul 19, 2023

BlairSadewitz commented Jul 19, 2023

BlairSadewitz commented Jul 19, 2023

henk717 commented Jul 19, 2023

BlairSadewitz commented Jul 20, 2023 • edited Loading

henk717 commented Jul 20, 2023

BlairSadewitz commented Jul 22, 2023

BlairSadewitz commented Jul 23, 2023

BlairSadewitz commented Aug 1, 2023

henk717 commented Aug 1, 2023

BlairSadewitz commented Aug 4, 2023

BlairSadewitz commented Jul 20, 2023 •

edited

Loading