ML Features In Wyvern
Chip has a fantastic article describing real-time ML which covers a lot of explanations of what features are in machine learning.
Batch Features
Batch features, or offline features, are features extracted from historical data, often with batch processing.
Real-time Features
Real-time features, or online features, can be used to generate features that are easier to compute and have higher accuracy online than offline. Examples include:
- Request metadata (user device, user location, etc) — you have 100% accuracy online
- Embedding similarity (user x product) — Storage complexity is significantly lower if you compute this online
- “Facts” about entities (ie product inventory count) — Eliminate the risk of stale (recently uploaded) data
- Real-time Embeddings — Capture the long-tail of queries as embedding the query in real-time gives 100% coverage
- Session Embeddings — Capture user interactions on the website ahead of this request and personalize content based on that
- External API calls — Some real-time features come from another API request
Define Features
Batch Feature Definition
[This part is not require for running the template ranking pipeline as that example doesn’t depend on any batch features] Currently Wyvern’s feature store solution is built on top of the open source feature store project feast. Please check out feast’s documentation for building a feature store. We plan to build more integrations with more feature stores in the future.
See how Wyvern integrates with feast to serve features.
Real-time Feature Definition
Wyvern provides the RealtimeFeatureComponent to represent a group of features.
For a ML feature, there is usually an entity or a composite of entities that the feature is associated to.
Wyvern’s RealtimeFeatureComponent supports single entity and composite entities (with the primary entity and secondary entity). Wyvern also supports the request scope, where you can define the scope of your RealtimeFeatureComponent. For example:
This line defines a group of features under RealtimeProductFeature
. It inherits the RealtimeFeatureComponent with 3 types indicating the entity and the request scope. For this one, there’s a group of features for the primary entity, Product
, as the first type hint to the RealtimeFeatureComponent
. There is no secondary entity, as the second type hint is Any
. In another word, it’s not a composite entity. These features could be applied under any request condition, as the third type hint is Any
.
Now let’s come back to the ranking example. Let’s define five features:
- search_score
- query_category_similarity
- query_brand_similarity
- query_title_similarity
- query_description_similarity
For each feature, we need to define the entity of the feature. For features that share the same entity, they could also share the same realtime feature component.
The entity for the search score of the candidates is obviously the “product”. Here’s the definition for the first feature search_score
:
For the remaining features, they’re all about similarity between the query and a product feature. This is what we call the “CompositeEntity” where the feature is the composite of two different entities, which Product + QueryEntity. Here’s the definition of this composite feature group:
Register Realtime Features
Now we go back to the pipelines/main.py
and register the realtime features with Wyvern service:
To register realtime features, pass the realtime_feature_components, containing the list of your realtime features, to WyvernService.generate_app
. This basically tells the Wyvern service that these are available realtime features ready.
Feature Retrieval
Wyvern automates the retrieval of all the features required by your pipeline from the feature store and store the features in memory per request during model inference. Under every Wyvern component class, you can do self.get_feature(identifier, "feature_name")
to get the feature value. See get_feature
Now that all the batch/offline features and realtime features are ready, let’s look at how to define the models.
Was this page helpful?