- Retrieval (Mostly for Search & Discovery use cases like recommendations and ranking)
- Offline feature retrieval
- Realtime feature generation
- Model inference
- Business logic
- Event logging
Define ML Pipeline
First, let’s think about the product ranking problem in a marketplace. Your marketplace has thousands or even millions of products and you’re using Elasticsearch or Algolia to do basic search. Now you want to build a ranking pipeline to add personalization. Assuming your search returns thousands of products. You would use your ranking model to factor in more than just text relevance: Image relevance, Newness of products, popularity of products, user personalization, session personalization. We’re going to build a linear ranking model considering these factors:- the search score of the product
- similarity between the query and the category of the product
- similarity between the query and the brand of the product
- similarity between the query and the title of the product
- similarity between the query and the description of the product
- relevance between the query and the image
- newness of the product
- popularity of the product
- relevance between user’s purchase history and the product
- relevance between session’s search history and the product
Requests
To start building a Wyvern Pipeline, we should define the request schema first. In the ranking case, we start by defining the product and the ranking request schemas:pipelines/product_ranking/schemas.py
user_page_size
: Integer. The max number of candidates that ranking API returns. If the number of candidates returned is smaller than wyvernPageSize, it’s the last wyvernPage.user_page
: Integer. It’s the zero-based index for the ranked “page” of the candidates that are passed to our ranking API. Defaults to 0.candidate_page_size
: Integer. This is the number of candidates/products your retrieval/search (Elasticsearch for example) returns. It’s the same as the “Elastic size” parameter sent to Elasticsearch. We’re adding a constraint.candidate_page
: Integer. This is the zero-indexed page number for our candidate set.request_id
: String. Unique identifier for the request, to join this with the user interaction.include_events
: Optional booleon. It measn whether the event logs will be included in the response. It is false by default.query
: The Query object. The search term inputted by the user.candidates
: The list of products. The product object schema is defined by users. In this example, only the product_id is a required field and other fields are optional in this example.