Pagination is an important aspect of handling large datasets in API responses. Lumos SDK provides a standardised approach to implement pagination across different APIs, ensuring efficient data retrieval and consistent behaviour.
Key principles
- Handle Large Datasets: Efficiently retrieve and process large amounts of data from APIs.
- Optimize Performance: Reduce load on both the client and server by fetching data in smaller chunks.
- Maintain Consistency: Provide a uniform interface for pagination across different API implementations.
- Pagination Behaviour: Ensure that the page token returned is
Nonewhen Lumos should stop calling the endpoint.
Implementing the pagination class
Pagination in Lumos SDK typically involves two main components:
- A
Paginationclass that represents the current page state. - A
NextPageTokenclass that handles serialization and deserialization of pagination information or the above page state.
# Example: abc/pagination.py
from connector.utils.pagination import (
NextPageTokenInterface,
PaginationBase,
create_next_page_token,
)
DEFAULT_PAGE_SIZE = 100
class AbcPagination(PaginationBase):
resource_type: str | None = None
page: int
@classmethod
def default(cls, resource_type: str) -> "AbcPagination":
return cls(
resource_type=resource_type,
page=1,
)
def next_page(self) -> AbcPagination:
return AbcPagination(
resource_type=self.resource_type,
page=self.page + 1,
)
AbcNextPageToken = create_next_page_token(AbcPagination, "AbcNextPageToken")
def make_page(next: list[AbcPagination], size: int) -> Page | None:
next_page_token = AbcNextPageToken.from_paginations(next)
return Page(token=next_page_token.token, size=size) if next_page_token.token else NoneRecommendations and requirements to consider:
-
PaginationClass: Represents the current page state (e.g.,AbcPagination).- Includes any relevant fields like
resource_typeandpagewhich you might need to paginate by.-
The resulting string token is using compression and can be populated with various parameters you might need to filter or paginate by on a per-capability basis. Note that all HTTP request parameters have size constraints.
-
- Implements
defaultmethod to create an initial pagination state. - Implements
next_pagemethod to generate the next page state.
- Includes any relevant fields like
-
NextPageTokenClass: Handles serialization and deserialization of pagination information.- Created using the
create_next_page_tokenutility function. - Provides methods to convert between token strings and
Paginationobjects.
- Created using the
-
Helper functions, while not mandatory, are often added to:
- Simplify the creation of
Pageobjects (e.g.,make_pagefunction). - Handle specific pagination logic for the API
- Provide utility methods for pagination-related operations
- Simplify the creation of
Using Pagination in Capabilities
The following is a code example of how to paginate:
# Example: abc/capabilities_read.py
from connector.oai.capability import get_page
def list_accounts(args: ListAccountsRequest) -> ListAccountsResponse:
page = get_page(args)
page_size = page.size or DEFAULT_PAGE_SIZE
try:
current_pagination = NextPageToken(page.token).paginations()[0]
except IndexError:
current_pagination = Pagination.default(resource_type)
async with AbcClient(args) as client:
params = {
"page": current_pagination.page,
"perPage": page_size,
}
response = await client.getUsers(params=params)
data = response.json()
accounts = data.to_accounts()
if more_data_available:
next_pagination = [current_pagination.next_page()]
else:
next_pagination = []
next_page = make_page(next_pagination, page_size)
return ListAccountsResponse(
response=accounts,
page=next_page,
)Key considerations
- The
get_page(args)helper function is used to retrieve pagination information from the request arguments. page_sizeis determined using the page size from the request or a default value.- The same
page_sizeis used across all pages to maintain consistency. - Instead of using template strings for endpoints, the endpoint is typically stored in the pagination object or determined based on the resource type or custom argument.
Handling Multiple Endpoint Pagination
For capabilities that internally call a larger set of API endpoints you can make use of the list nature of the Pagination token:
# Example: abc/capabilities_read.py:list_accounts
# ...
page = get_page(args)
page_size = page.size or DEFAULT_PAGE_SIZE
try:
current_paginations = NextPageToken(page.token).paginations()
except IndexError:
current_paginations = [Pagination.default(endpoint="/first-endpoint")]
async with Abcclient as client:
for pagination in current_paginations:
params = {
"page": pagination.page,
"perPage": page_size,
}
response = await client.get(pagination.endpoint, params=params)
# ...Storing Additional Information in the Pagination Tokens
In some cases, it's necessary to store additional data across pages to ensure data consistency or uniqueness.
class AbcPagination(PaginationBase):
page: int
unique_ids: list[str]
@classmethod
def default(cls, endpoint: str) -> "AbcPagination":
return cls(endpoint=endpoint, page=1, unique_ids=list())In this example, we prepare a list for unique IDs, for example when iterating over an API which returns duplicate entries and we might want to limit the amount of extra calls we make for each entry detail.
Best Practises
- Store Minimal Data: Only store essential information in the pagination token. In the example, only unique identifiers are stored, not entire objects.
- Use Efficient Data Structures: Choose data structures that balance functionality and size. For example, using a list of strings for unique IDs instead of more complex objects.
- Limit Data Volume: If the amount of data grows too large, consider implementing a cutoff or using a more efficient storage method.