Added DynamicEmbedding RFC #446

divyashreepathihalli · 2023-05-16T22:40:20Z

Added DynamicEmbedding RFC

rhdong · 2023-05-17T04:23:09Z

rfcs/20230515-DynamicEmbedding.md

+       DynamicEmbedding(
+           input_dim=5,
+           output_dim=2,
+           input_length=5,


Can we support the inputs with dynamic shapes?

input to the layer can be dynamic but if you are asking if input_dim which would be same as vocabulary size - this is not dynamic.

Excellent, thank you for your answer! I would like to know what the input_dim means. From my understanding, input_dim should be less or equal to the vocabulary size, which is fixed when training going on, is it right?

input_dim should be vocabulary size

Thank you for the clarification! If the input_dim and vocabulary size are not dynamic, some critical scenarios may not be supported. Some industrial scenarios of real dynamic embedding request the algorithm engineers to use uint64_t for the encoded features which has a possible range of [0, std::numeric_limits<uint64_t>::max]. That means the input_dim and vocabulary size should not be set cause it's almost unlimited.

@rhdong, I would like to clarify that for the layer initialization inp_dim is vocabulary size(tried to keep it consistent with the [embedding layer](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding#:~:text=.Embedding(-,input_dim%2C,-output_dim%2C%0A%C2%A0%20%C2%A0%20embeddings_initializer) - . The input to the layer can be of any dynamic shape.

Hi @divyashreepathihalli, thank you for your clarification. I understand now. About The input to the layer can be of any dynamic shape. it total makes sense. But I'm afraid that the input_dim setting would limit the features encoding space. In the dynamic embedding context(compared to the original static embedding in current TensorFlow), the input_dim should be std::numeric_limits<uint64_t>::max. I would try to explain it in one google doc. Before that, possibly you could refer to the TFRA API design that only the embedding_size need to be configured (similar with out_dim) https://github.com/tensorflow/recommenders-addons/blob/master/tensorflow_recommenders_addons/dynamic_embedding/python/keras/layers/embedding.py#L117

@rhdong, I would like to clarify that for the layer initialization inp_dim is vocabulary size(tried to keep it consistent with the [embedding layer](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding#:~:text=.Embedding(-,input_dim%2C,-output_dim%2C%0A%C2%A0%20%C2%A0%20embeddings_initializer) - . The input to the layer can be of any dynamic shape.

Possibly, I think @divyashreepathihalli may a little misunderstand the meaning of dynamic shape embedding. For example, there is a training feature input that are both large-scale and sparse, such as USER_ID. If we apply the vocabulary method to USER_ID, it will only map USER_ID to the dimension of vocabulary size, which is a compression of the information dimension. Since the vocabulary size is fixed, this is still a static embedding. Dynamic embedding means that all inputs can be processed with a non-conflicting method through a hashmap. The size of the dynamic embedding is not fixed and is unpredictable because the USER_ID grows with the growth of the business.

Besides the example of USER_ID by @MoFHeka, in our recommender system, we use user&item crossed features to enhance the accuracy and relevance of our recommendations. By combining multiple features into a unique identifier, we can create a more comprehensive representation of the relationship between users and items, resulting in better recommendations. When using tf.sparse.cross_hash or xxhash, a sparse key in the range of [0, std::numeric_limits<uint64_t>::max] is generated. For such a large-scale and sparse feature, a dynamic size is mandatory.

@rhdong @MoFHeka thank you for the clarification. I tried to read up further. If I understand correctly you are looking for a dynamic vocabulary size and a dynamic embedding matrix as well, correct? One that would keep growing?

As of now our scope of work will be limited to maintaining a fixed size vocabulary and fixed embedding size, updating the vocabulary based on inputs received by the layer and eviction policies. The embedding values will be remapped whenever the vocabulary is updated based on input patterns (most frequently seen input, TTL, etc). If the input key is not in the vocab it will be mapped to a default value, however we keep track of these keys and add it to the vocab when the updates are done in the callback(new keys are added in the vocab by kicking out old keys based on eviction policies specified)

@divyashreepathihalli It's my pleasure. Considering the practical scenario of dynamic embedding we reached out to, the hashtable-based dynamic vocabulary size would be a fundamental requirement. I guess one of the PROs of your current design is that there is no need to modify the tf.optimizer; that makes sense, but in addition to the considerations we discussed above, I'm also a little worried it will introduce the data consistency issue caused by decoupling the embedding indexing and embedding looking up, especially in eviction involved. Applying atomic or lock mechanisms on ID and embedding is challenging when they are operated in two separate OPs.

rfcs/20230515-DynamicEmbedding.md

rhdong · 2023-05-18T00:35:37Z

rfcs/20230515-DynamicEmbedding.md

+
+The image below illustrates the workflow when the parameter server
+strategy is used. PSS supports asynchronous training. Each worker
+will have a copy of the vocabulary, which will be consistent across


Hi @divyashreepathihalli, may I have your confirmation here? If it means each worker will hold a full set of vocabulary that maps the vocab to index, and the real embedding vectors stored in some PSs with dense format(for example the tf.Variable)? Am I correct? Thank you so much!

That is correct. Each worker should have a copy of the vocabulary( vocab->index mapping). The embedding variable will be split in distributed servers.

Hi @divyashreepathihalli, thank you for your comment! If we have a full set copy of the key-index mapping on each worker, there should be some upper limitations on the vocabulary size. To my best knowledge, some vocabulary size in some industrial scenarios can be tens or hundreds of billions, which causes the memory consumption on GPU/TPU to be significantly large and unbearable. One of the practical solutions is storing the key-value in the format of an abstract hashtable in a distributed way like TFRA. Hope it's helpful. Thanks!

I agree with you. The proposed design would be the initial implementation and the distributed KV server would definitely be the way to go going forward.

Mr-Nineteen · 2023-06-05T05:41:59Z

Dynamic embedding is a very important feature for us.

When training the sorting model that supports scenarios such as search, recommendation, and advertisement, we encountered the following problems:

For the feature selection of the sorting model in the e-commerce search and promotion scenario, the current industry mostly adopts the idea of large-scale discrete IDs. ID features (product IDs, user IDs, brand IDs, etc.) are large and sparse, and the native TF framework is not applicable.
The TensorFlow variable has a fixed size and cannot dynamically increase the ID without restarting training.

With this feature, the main reasons:

Support dynamic scale-out of dynamic embedding features at the TB level.

Lifann · 2023-06-05T06:19:12Z

rfcs/20230515-DynamicEmbedding.md

+In this design approach, the DynamicEmbedding layer is composed of two
+layers: the DynamicLookup layer and the Embedding layer. The
+DynamicLookup layer is responsible for the following tasks:
+    * Maintaining a vocabulary table using an eviction policy that is 


We are currently using parameter size in volume about 1E13 bytes in production. Will it be very expansive to maintain vocabulary and indexes for large parameter?

Lifann · 2023-06-05T06:25:49Z

rfcs/20230515-DynamicEmbedding.md

+    updated based on input pattern.
+    * Performing vocabulary lookup for the given input and returning
+    integer indexes.
+    * The index is then passed to the Embedding layer, which looks


In many case, we don't know how many keys in a feature exactly, since the property of videos, images, commodity, video games, etc. are always in change. Preset a vocab/index range may lead to waste in storage or feature conflicts.

pjannaty · 2023-06-30T00:12:02Z

rfcs/20230515-DynamicEmbedding.md

+updates corresponding to evolving input patterns and vocabulary changes.
+### Goal
+    * Works across accelerators (GPU / TPU)
+    * Works with Parameter server strategy (asynchroous distributed training)


Typo: asynchronous

added DynamicEmbedding RFC

0bbb52f

divyashreepathihalli requested review from theadactyl and ematejska as code owners May 16, 2023 22:40

divyashreepathihalli added 3 commits May 16, 2023 22:40

added RFC number

5a105ff

reformat

6459bf1

up[date file name

bf81723

rhdong reviewed May 17, 2023

View reviewed changes

rcrowe-google reviewed May 17, 2023

View reviewed changes

rfcs/20230515-DynamicEmbedding.md Outdated Show resolved Hide resolved

rhdong reviewed May 18, 2023

View reviewed changes

updated line break

0366389

Lifann reviewed Jun 5, 2023

View reviewed changes

pjannaty reviewed Jun 30, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added DynamicEmbedding RFC #446

Added DynamicEmbedding RFC #446

divyashreepathihalli commented May 16, 2023

rhdong May 17, 2023

divyashreepathihalli May 17, 2023

rhdong May 18, 2023

divyashreepathihalli May 19, 2023

rhdong May 23, 2023 •

edited

Loading

rhdong May 25, 2023 •

edited

Loading

MoFHeka May 25, 2023 •

edited

Loading

thorneliu Jun 2, 2023 •

edited

Loading

divyashreepathihalli Jun 2, 2023

rhdong Jun 5, 2023 •

edited

Loading

rhdong May 18, 2023 •

edited

Loading

divyashreepathihalli May 19, 2023

rhdong May 23, 2023 •

edited

Loading

divyashreepathihalli May 24, 2023

Mr-Nineteen commented Jun 5, 2023

Lifann Jun 5, 2023 •

edited

Loading

Lifann Jun 5, 2023 •

edited

Loading

pjannaty Jun 30, 2023 •

edited

Loading

Added DynamicEmbedding RFC #446

Are you sure you want to change the base?

Added DynamicEmbedding RFC #446

Conversation

divyashreepathihalli commented May 16, 2023

rhdong May 17, 2023

Choose a reason for hiding this comment

divyashreepathihalli May 17, 2023

Choose a reason for hiding this comment

rhdong May 18, 2023

Choose a reason for hiding this comment

divyashreepathihalli May 19, 2023

Choose a reason for hiding this comment

rhdong May 23, 2023 • edited Loading

Choose a reason for hiding this comment

rhdong May 25, 2023 • edited Loading

Choose a reason for hiding this comment

MoFHeka May 25, 2023 • edited Loading

Choose a reason for hiding this comment

thorneliu Jun 2, 2023 • edited Loading

Choose a reason for hiding this comment

divyashreepathihalli Jun 2, 2023

Choose a reason for hiding this comment

rhdong Jun 5, 2023 • edited Loading

Choose a reason for hiding this comment

rhdong May 18, 2023 • edited Loading

Choose a reason for hiding this comment

divyashreepathihalli May 19, 2023

Choose a reason for hiding this comment

rhdong May 23, 2023 • edited Loading

Choose a reason for hiding this comment

divyashreepathihalli May 24, 2023

Choose a reason for hiding this comment

Mr-Nineteen commented Jun 5, 2023

Lifann Jun 5, 2023 • edited Loading

Choose a reason for hiding this comment

Lifann Jun 5, 2023 • edited Loading

Choose a reason for hiding this comment

pjannaty Jun 30, 2023 • edited Loading

Choose a reason for hiding this comment

rhdong May 23, 2023 •

edited

Loading

rhdong May 25, 2023 •

edited

Loading

MoFHeka May 25, 2023 •

edited

Loading

thorneliu Jun 2, 2023 •

edited

Loading

rhdong Jun 5, 2023 •

edited

Loading

rhdong May 18, 2023 •

edited

Loading

rhdong May 23, 2023 •

edited

Loading

Lifann Jun 5, 2023 •

edited

Loading

Lifann Jun 5, 2023 •

edited

Loading

pjannaty Jun 30, 2023 •

edited

Loading