Pagination is an important aspect of handling large datasets in API responses. Lumos SDK provides a standardised approach to implement pagination across different APIs, ensuring efficient data retrieval and consistent behaviour.
Key principles
- Handle Large Datasets: Efficiently retrieve and process large amounts of data from APIs.
- Optimize Performance: Reduce load on both the client and server by fetching data in smaller chunks.
- Maintain Consistency: Provide a uniform interface for pagination across different API implementations.
- Pagination Behaviour: Ensure that the page token returned is
None
when Lumos should stop calling the endpoint.
Implementing the pagination class
Pagination in Lumos SDK typically involves two main components:
- AΒ
Pagination
Β class that represents the current page state. - AΒ
NextPageToken
Β class that handles serialization and deserialization of pagination information or the above page state.
# Example: abc/pagination.py
from connector.utils.pagination import (
NextPageTokenInterface,
PaginationBase,
create_next_page_token,
)
DEFAULT_PAGE_SIZE = 100
class AbcPagination(PaginationBase):
resource_type: str | None = None
page: int
@classmethod
def default(cls, resource_type: str) -> "AbcPagination":
return cls(
resource_type=resource_type,
page=1,
)
def next_page(self) -> AbcPagination:
return AbcPagination(
resource_type=self.resource_type,
page=self.page + 1,
)
AbcNextPageToken = create_next_page_token(AbcPagination, "AbcNextPageToken")
def make_page(next: list[AbcPagination], size: int) -> Page | None:
next_page_token = AbcNextPageToken.from_paginations(next)
return Page(token=next_page_token.token, size=size) if next_page_token.token else None
Recommendations and requirements to consider:
-
Pagination
Class: Represents the current page state (e.g.,ΒAbcPagination
).- Includes any relevant fields likeΒ
resource_type
Β andΒpage
which you might need to paginate by.-
The resulting string token is using compression and can be populated with various parameters you might need to filter or paginate by on a per-capability basis. Note that all HTTP request parameters have size constraints.
-
- ImplementsΒ
default
Β method to create an initial pagination state. - ImplementsΒ
next_page
Β method to generate the next page state.
- Includes any relevant fields likeΒ
-
NextPageToken
Class: Handles serialization and deserialization of pagination information.- Created using theΒ
create_next_page_token
Β utility function. - Provides methods to convert between token strings andΒ
Pagination
Β objects.
- Created using theΒ
-
Helper functions, while not mandatory, are often added to:
- Simplify the creation ofΒ
Page
Β objects (e.g.,Βmake_page
Β function). - Handle specific pagination logic for the API
- Provide utility methods for pagination-related operations
- Simplify the creation ofΒ
Using Pagination in Capabilities
The following is a code example of how to paginate:
# Example: abc/capabilities_read.py
from connector.oai.capability import get_page
def list_accounts(args: ListAccountsRequest) -> ListAccountsResponse:
page = get_page(args)
page_size = page.size or DEFAULT_PAGE_SIZE
try:
current_pagination = NextPageToken(page.token).paginations()[0]
except IndexError:
current_pagination = Pagination.default(resource_type)
async with AbcClient(args) as client:
params = {
"page": current_pagination.page,
"perPage": page_size,
}
response = await client.getUsers(params=params)
data = response.json()
accounts = data.to_accounts()
if more_data_available:
next_pagination = [current_pagination.next_page()]
else:
next_pagination = []
next_page = make_page(next_pagination, page_size)
return ListAccountsResponse(
response=accounts,
page=next_page,
)
Key considerations
- TheΒ
get_page(args)
Β helper function is used to retrieve pagination information from the request arguments. page_size
Β is determined using the page size from the request or a default value.- The sameΒ
page_size
Β is used across all pages to maintain consistency. - Instead of using template strings for endpoints, the endpoint is typically stored in the pagination object or determined based on the resource type or custom argument.
Handling Multiple Endpoint Pagination
For capabilities that internally call a larger set of API endpoints you can make use of the list nature of the Pagination token:
# Example: abc/capabilities_read.py:list_accounts
# ...
page = get_page(args)
page_size = page.size or DEFAULT_PAGE_SIZE
try:
current_paginations = NextPageToken(page.token).paginations()
except IndexError:
current_paginations = [Pagination.default(endpoint="/first-endpoint")]
async with Abcclient as client:
for pagination in current_paginations:
params = {
"page": pagination.page,
"perPage": page_size,
}
response = await client.get(pagination.endpoint, params=params)
# ...
Storing Additional Information in the Pagination Tokens
In some cases, it's necessary to store additional data across pages to ensure data consistency or uniqueness.
class AbcPagination(PaginationBase):
page: int
unique_ids: list[str]
@classmethod
def default(cls, endpoint: str) -> "AbcPagination":
return cls(endpoint=endpoint, page=1, unique_ids=list())
In this example, we prepare a list for unique IDs, for example when iterating over an API which returns duplicate entries and we might want to limit the amount of extra calls we make for each entry detail.
Best Practises
- Store Minimal Data: Only store essential information in the pagination token. In the example, only unique identifiers are stored, not entire objects.
- Use Efficient Data Structures: Choose data structures that balance functionality and size. For example, using a list of strings for unique IDs instead of more complex objects.
- Limit Data Volume: If the amount of data grows too large, consider implementing a cutoff or using a more efficient storage method.