What is a batch feature?
Real-time features can be used to generate features that are easier to compute and have higher accuracy online than offline. Examples include:- Request metadata (user device, user location, etc) — you have 100% accuracy online
- Embedding similarity (user x product) — Storage complexity is significantly lower if you compute this online
- “Facts” about entities (ie product inventory count) — Eliminate the risk of stale (recently uploaded) data
- Real-time Embeddings — Capture the long-tail of queries as embedding the query in real-time gives 100% coverage
- Session Embeddings — Capture user interactions on the website ahead of this request and personalize content based on that
- External API calls — Some real-time features come from another API request
Define Features - RealtimeFeatureComponent
RealtimeFeatureComponent represents a group of features with the same scope of entity/entities and request. For a ML feature, there is usually an entity or a composite of entities that the feature is associated to. Wyvern’s RealtimeFeatureComponent supports single entity and composite entities (with the primary entity and secondary entity). Wyvern also supports the request scope, where you can define the scope of your RealtimeFeatureComponent. For example:RealtimeProductFeature
. It inherits the RealtimeFeatureComponent with 3 types indicating the entity and the request scope. For this one, there’s a group of features for the primary entity, Product
, as the first type hint to the RealtimeFeatureComponent
. There is no secondary entity, as the second type hint is Any
. In another word, it’s not a composite entity. These features could be applied under any request condition, as the third type hint is Any
.
Now let’s come back to the ranking example. Let’s define five features:
- search_score
- query_category_similarity
- query_brand_similarity
- query_title_similarity
- query_description_similarity
search_score
:
pipelines/product_ranking/realtime_features.py
pipelines/product_ranking/realtime_features.py
Register Realtime Features
Now we go back to thepipelines/main.py
and register the realtime features with Wyvern service:
pipelines/main.py
WyvernService.generate_app
. This basically tells the Wyvern service that these are available realtime features ready.
Now that all the batch/offline features and realtime features are ready, let’s look at how to define the models.